-
Notifications
You must be signed in to change notification settings - Fork 182
Closed
Labels
Description
Related to #2287
The lexer currently handles ASCII and non-ASCII characters differently, though Rust accepts UTF-8 source.
The main problem in our lexer is mix use of skip_codepoint_input()
(1~4 byte skip) and skip_input(int n)
(one byte skip) (also, peek_codepont_input()
and peek_input(int n)
) , which makes difficult to support Unicode such as identifiers, and whitespaces in our lexer simply.
To deal with this problem, we need
- to modify
peek_input(int n)
andskip_input(int n)
to return and skip a UTF-8 character, - to replace all use of
peek_codepoint_input()
andskip_codepoint_input()
withpeek_input
andskip_input
respectively, - to remove
get_codepoint_input_length()
andcurrent_char32
field in Lexer, - to check if the input source is valid as UTF-8,
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Done