Refactoring lexer to treat all characters as UTF-8

Related to https://github.com/Rust-GCC/gccrs/issues/2287

The lexer currently handles ASCII and non-ASCII characters differently, though Rust accepts UTF-8 source.
The main problem in our lexer is mix use of `skip_codepoint_input()` (1~4 byte skip) and `skip_input(int n)` (one byte skip) (also, `peek_codepont_input()` and `peek_input(int n)`) , which makes difficult to support Unicode such as identifiers, and whitespaces in our lexer simply.

To deal with this problem, we need
- [x] to modify `peek_input(int n)` and `skip_input(int n)` to return and skip a UTF-8 character,
  - https://github.com/Rust-GCC/gccrs/pull/2307
- [x] to replace all use of `peek_codepoint_input()` and `skip_codepoint_input()` with `peek_input` and `skip_input` respectively,
  - https://github.com/Rust-GCC/gccrs/pull/2347
- [x] to remove `get_codepoint_input_length()` and `current_char32` field in Lexer,
  - https://github.com/Rust-GCC/gccrs/pull/2347
- [x] to check if the input source is valid as UTF-8,
  - https://github.com/Rust-GCC/gccrs/pull/2307



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactoring lexer to treat all characters as UTF-8 #2309

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Refactoring lexer to treat all characters as UTF-8 #2309

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions