Skip to content

Fix and clarify CR LF normalization and CR in string literals #1944

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 25, 2025

Conversation

LukasKalbertodt
Copy link
Member

This was slightly incorrect before. Relevant commits changing this:

The normalization is not applied repeatedly, so CR LF pairs can still exist. Further, given that the normalization happens before lexing, the part "other than as part of such a string continuation escape" is not useful. Either it was CR LF in the raw input, but has already been transformed already (so the lexical grammar does not see CR). Or there is a surviving CR LF pair after the normalization, which is disallowed tho.

Here are two test programs showing this behavior:

printf 'fn main() { "a\r\r\n\nb"; }' > code.rs | rustc -

Results in:

error: bare CR not allowed in string, use `\r` instead
 --> <anon>:1:15
  |
1 | fn main() { "a␍
  |               ^
  |
help: escape the character
  |
1 | fn main() { "a\r
  |               ++

And

printf 'fn main() { "a\\\r\r\n\nb"; }' > code.rs | rustc -

Results in

error: unknown character escape: `\r`
 --> <anon>:1:16
  |
1 | fn main() { "a\␍
  |                ^ unknown character escape
  |
  = help: this is an isolated carriage return; consider checking your editor and version control settings

This was slightly incorrect before. Relevant commits changing this:
- fa56fdb
- 27e1ec9

The normalization is not applied repeatedly, so CR LF pairs can still
exist. Further, given that the normalization happens before lexing,
the part "other than as part of such a string continuation escape" is
not useful. Either it was CR LF in the raw input, but has already been
transformed already (so the lexical grammar does not see CR). Or there
is a surviving CR LF pair after the normalization, which is disallowed
tho.

Here are two test programs showing this behavior:

    printf 'fn main() { "a\r\r\n\nb"; }' > code.rs | rustc -

Results in:

error: bare CR not allowed in string, use `\r` instead
 --> <anon>:1:15
  |
1 | fn main() { "a␍
  |               ^
  |
help: escape the character
  |
1 | fn main() { "a\r
  |               ++

And

    printf 'fn main() { "a\\\r\r\n\nb"; }' > code.rs | rustc -

Results in

error: unknown character escape: `\r`
 --> <anon>:1:16
  |
1 | fn main() { "a\␍
  |                ^ unknown character escape
  |
  = help: this is an isolated carriage return; consider checking your editor and version control settings
@rustbot rustbot added the S-waiting-on-review Status: The marked PR is awaiting review from a maintainer label Jul 25, 2025
Copy link
Contributor

@ehuss ehuss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@ehuss ehuss added this pull request to the merge queue Jul 25, 2025
Merged via the queue into rust-lang:master with commit 5b3ca00 Jul 25, 2025
5 checks passed
@rustbot rustbot removed the S-waiting-on-review Status: The marked PR is awaiting review from a maintainer label Jul 25, 2025
@LukasKalbertodt LukasKalbertodt deleted the cr-lf-fixes branch July 25, 2025 16:03
LukasKalbertodt added a commit to LukasKalbertodt/litrs that referenced this pull request Jul 25, 2025
The specification now says that CR LF normalization is part of the
pre-processing prior to tokenization. Since it's not part of the lex
grammar, this commit removes it from the parsing code as well. CR are
simply disallowed fully now.

This is technically a breaking change, but unlikely to be noticed by
any real world input. In proc macros, all input has been normalized by
the Rust compiler anyway, so only very weird input (CR CR LF) would
have been accepted previously, but not after this commit.

See:
- rust-lang/reference#1944
- rust-lang/reference@fa56fdb
- rust-lang/reference@27e1ec9
tgross35 added a commit to tgross35/rust that referenced this pull request Aug 7, 2025
Update books

## rust-lang/book

5 commits in b2d1a0821e12a676b496d61891b8e3d374a8e832..3e9dc46aa563ca0c53ec826c41b05f10c5915925
2025-08-02 01:33:29 UTC to 2025-07-14 21:23:38 UTC

- Appendix B and Appendix D from tech review (rust-lang/book#4466)
- Chapter 21 from tech review (rust-lang/book#4464)
- Chapter 20 from tech review (rust-lang/book#4460)
- Chapter 19 from tech review (rust-lang/book#4446)
- Chapter 18 from tech review (rust-lang/book#4445)

## rust-lang/reference

12 commits in 1f45bd41fa6c17b7c048ed6bfe5f168c4311206a..1be151c051a082b542548c62cafbcb055fa8944f
2025-08-05 19:51:40 UTC to 2025-07-14 19:49:01 UTC

- Fix build output directory in README (rust-lang/reference#1950)
- Update `link_name` to use the attribute template (rust-lang/reference#1896)
- Update `no_link` to use the attribute template (rust-lang/reference#1898)
- Update `proc_macro_derive` to use the attribute template (rust-lang/reference#1888)
- Update `automatically_derived` to use the attribute template (rust-lang/reference#1884)
- Update `derive` to use the attribute template (rust-lang/reference#1883)
- Fix and clarify CR LF normalization and CR in string literals (rust-lang/reference#1944)
- glossary.md: tweak description of "dispatch" (rust-lang/reference#1938)
- add missing id, r[asm.operand-type.supported-operands.const] (rust-lang/reference#1939)
- &str and &[u8] have the same layout (rust-lang/reference#1848)
- Rename and rewrite the "question mark operator" (rust-lang/reference#1931)
- Change "allocated object" to "allocation". (rust-lang/reference#1930)

## rust-lang/rust-by-example

3 commits in e386be5f44af711854207c11fdd61bb576270b04..bd1279cdc9865bfff605e741fb76a0b2f07314a7
2025-08-04 13:41:04 UTC to 2025-08-02 15:41:59 UTC

- Improve the activity instructions in `print_display` (rust-lang/rust-by-example#1948)
- Minor fixes (whitespace, typo, i32->u32) (rust-lang/rust-by-example#1947)
- Document drawbacks of alternatives to match binding (rust-lang/rust-by-example#1946)
Zalathar added a commit to Zalathar/rust that referenced this pull request Aug 7, 2025
Update books

## rust-lang/book

5 commits in b2d1a0821e12a676b496d61891b8e3d374a8e832..3e9dc46aa563ca0c53ec826c41b05f10c5915925
2025-08-02 01:33:29 UTC to 2025-07-14 21:23:38 UTC

- Appendix B and Appendix D from tech review (rust-lang/book#4466)
- Chapter 21 from tech review (rust-lang/book#4464)
- Chapter 20 from tech review (rust-lang/book#4460)
- Chapter 19 from tech review (rust-lang/book#4446)
- Chapter 18 from tech review (rust-lang/book#4445)

## rust-lang/reference

12 commits in 1f45bd41fa6c17b7c048ed6bfe5f168c4311206a..1be151c051a082b542548c62cafbcb055fa8944f
2025-08-05 19:51:40 UTC to 2025-07-14 19:49:01 UTC

- Fix build output directory in README (rust-lang/reference#1950)
- Update `link_name` to use the attribute template (rust-lang/reference#1896)
- Update `no_link` to use the attribute template (rust-lang/reference#1898)
- Update `proc_macro_derive` to use the attribute template (rust-lang/reference#1888)
- Update `automatically_derived` to use the attribute template (rust-lang/reference#1884)
- Update `derive` to use the attribute template (rust-lang/reference#1883)
- Fix and clarify CR LF normalization and CR in string literals (rust-lang/reference#1944)
- glossary.md: tweak description of "dispatch" (rust-lang/reference#1938)
- add missing id, r[asm.operand-type.supported-operands.const] (rust-lang/reference#1939)
- &str and &[u8] have the same layout (rust-lang/reference#1848)
- Rename and rewrite the "question mark operator" (rust-lang/reference#1931)
- Change "allocated object" to "allocation". (rust-lang/reference#1930)

## rust-lang/rust-by-example

3 commits in e386be5f44af711854207c11fdd61bb576270b04..bd1279cdc9865bfff605e741fb76a0b2f07314a7
2025-08-04 13:41:04 UTC to 2025-08-02 15:41:59 UTC

- Improve the activity instructions in `print_display` (rust-lang/rust-by-example#1948)
- Minor fixes (whitespace, typo, i32->u32) (rust-lang/rust-by-example#1947)
- Document drawbacks of alternatives to match binding (rust-lang/rust-by-example#1946)
rust-timer added a commit to rust-lang/rust that referenced this pull request Aug 7, 2025
Rollup merge of #145026 - rustbot:docs-update, r=ehuss

Update books

## rust-lang/book

5 commits in b2d1a0821e12a676b496d61891b8e3d374a8e832..3e9dc46aa563ca0c53ec826c41b05f10c5915925
2025-08-02 01:33:29 UTC to 2025-07-14 21:23:38 UTC

- Appendix B and Appendix D from tech review (rust-lang/book#4466)
- Chapter 21 from tech review (rust-lang/book#4464)
- Chapter 20 from tech review (rust-lang/book#4460)
- Chapter 19 from tech review (rust-lang/book#4446)
- Chapter 18 from tech review (rust-lang/book#4445)

## rust-lang/reference

12 commits in 1f45bd41fa6c17b7c048ed6bfe5f168c4311206a..1be151c051a082b542548c62cafbcb055fa8944f
2025-08-05 19:51:40 UTC to 2025-07-14 19:49:01 UTC

- Fix build output directory in README (rust-lang/reference#1950)
- Update `link_name` to use the attribute template (rust-lang/reference#1896)
- Update `no_link` to use the attribute template (rust-lang/reference#1898)
- Update `proc_macro_derive` to use the attribute template (rust-lang/reference#1888)
- Update `automatically_derived` to use the attribute template (rust-lang/reference#1884)
- Update `derive` to use the attribute template (rust-lang/reference#1883)
- Fix and clarify CR LF normalization and CR in string literals (rust-lang/reference#1944)
- glossary.md: tweak description of "dispatch" (rust-lang/reference#1938)
- add missing id, r[asm.operand-type.supported-operands.const] (rust-lang/reference#1939)
- &str and &[u8] have the same layout (rust-lang/reference#1848)
- Rename and rewrite the "question mark operator" (rust-lang/reference#1931)
- Change "allocated object" to "allocation". (rust-lang/reference#1930)

## rust-lang/rust-by-example

3 commits in e386be5f44af711854207c11fdd61bb576270b04..bd1279cdc9865bfff605e741fb76a0b2f07314a7
2025-08-04 13:41:04 UTC to 2025-08-02 15:41:59 UTC

- Improve the activity instructions in `print_display` (rust-lang/rust-by-example#1948)
- Minor fixes (whitespace, typo, i32->u32) (rust-lang/rust-by-example#1947)
- Document drawbacks of alternatives to match binding (rust-lang/rust-by-example#1946)
github-actions bot pushed a commit to rust-lang/miri that referenced this pull request Aug 8, 2025
Update books

## rust-lang/book

5 commits in b2d1a0821e12a676b496d61891b8e3d374a8e832..3e9dc46aa563ca0c53ec826c41b05f10c5915925
2025-08-02 01:33:29 UTC to 2025-07-14 21:23:38 UTC

- Appendix B and Appendix D from tech review (rust-lang/book#4466)
- Chapter 21 from tech review (rust-lang/book#4464)
- Chapter 20 from tech review (rust-lang/book#4460)
- Chapter 19 from tech review (rust-lang/book#4446)
- Chapter 18 from tech review (rust-lang/book#4445)

## rust-lang/reference

12 commits in 1f45bd41fa6c17b7c048ed6bfe5f168c4311206a..1be151c051a082b542548c62cafbcb055fa8944f
2025-08-05 19:51:40 UTC to 2025-07-14 19:49:01 UTC

- Fix build output directory in README (rust-lang/reference#1950)
- Update `link_name` to use the attribute template (rust-lang/reference#1896)
- Update `no_link` to use the attribute template (rust-lang/reference#1898)
- Update `proc_macro_derive` to use the attribute template (rust-lang/reference#1888)
- Update `automatically_derived` to use the attribute template (rust-lang/reference#1884)
- Update `derive` to use the attribute template (rust-lang/reference#1883)
- Fix and clarify CR LF normalization and CR in string literals (rust-lang/reference#1944)
- glossary.md: tweak description of "dispatch" (rust-lang/reference#1938)
- add missing id, r[asm.operand-type.supported-operands.const] (rust-lang/reference#1939)
- &str and &[u8] have the same layout (rust-lang/reference#1848)
- Rename and rewrite the "question mark operator" (rust-lang/reference#1931)
- Change "allocated object" to "allocation". (rust-lang/reference#1930)

## rust-lang/rust-by-example

3 commits in e386be5f44af711854207c11fdd61bb576270b04..bd1279cdc9865bfff605e741fb76a0b2f07314a7
2025-08-04 13:41:04 UTC to 2025-08-02 15:41:59 UTC

- Improve the activity instructions in `print_display` (rust-lang/rust-by-example#1948)
- Minor fixes (whitespace, typo, i32->u32) (rust-lang/rust-by-example#1947)
- Document drawbacks of alternatives to match binding (rust-lang/rust-by-example#1946)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants