Skip to content

Refactoring lexer to treat all characters as UTF-8 #2309

@tamaroning

Description

@tamaroning

Related to #2287

The lexer currently handles ASCII and non-ASCII characters differently, though Rust accepts UTF-8 source.
The main problem in our lexer is mix use of skip_codepoint_input() (1~4 byte skip) and skip_input(int n) (one byte skip) (also, peek_codepont_input() and peek_input(int n)) , which makes difficult to support Unicode such as identifiers, and whitespaces in our lexer simply.

To deal with this problem, we need

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions