-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Move WTF-8 code from std into core and alloc #145335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
rustbot has assigned @Mark-Simulacrum. Use |
This comment has been minimized.
This comment has been minimized.
de0269e
to
6b931cf
Compare
6b931cf
to
2e9c755
Compare
This comment has been minimized.
This comment has been minimized.
Hmm, that appears to be a genuine rustdoc bug: if the module is marked as |
2e9c755
to
c7e23c2
Compare
This comment has been minimized.
This comment has been minimized.
c7e23c2
to
fc792fd
Compare
This comment has been minimized.
This comment has been minimized.
fc792fd
to
86b0104
Compare
This comment has been minimized.
This comment has been minimized.
d0674a4
to
3fc7d04
Compare
#[test] | ||
fn wtf8_to_ascii_lowercase() { | ||
let lowercase = Wtf8::from_str("").to_ascii_lowercase(); | ||
assert_eq!(lowercase.bytes, b""); | ||
|
||
let lowercase = Wtf8::from_str("GrEeN gRaPeS! 🍇").to_ascii_lowercase(); | ||
assert_eq!(lowercase.bytes, b"green grapes! \xf0\x9f\x8d\x87"); | ||
|
||
let lowercase = unsafe { Wtf8::from_bytes_unchecked(b"\xED\xA0\x80").to_ascii_lowercase() }; | ||
assert_eq!(lowercase.bytes, b"\xED\xA0\x80"); | ||
assert!(!lowercase.is_known_utf8); | ||
} | ||
|
||
#[test] | ||
fn wtf8_to_ascii_uppercase() { | ||
let uppercase = Wtf8::from_str("").to_ascii_uppercase(); | ||
assert_eq!(uppercase.bytes, b""); | ||
|
||
let uppercase = Wtf8::from_str("GrEeN gRaPeS! 🍇").to_ascii_uppercase(); | ||
assert_eq!(uppercase.bytes, b"GREEN GRAPES! \xf0\x9f\x8d\x87"); | ||
|
||
let uppercase = unsafe { Wtf8::from_bytes_unchecked(b"\xED\xA0\x80").to_ascii_uppercase() }; | ||
assert_eq!(uppercase.bytes, b"\xED\xA0\x80"); | ||
assert!(!uppercase.is_known_utf8); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to get rid of these tests because they didn't add much on top of the make_ascii_*case
tests, and because the #[cfg(not(test))]
on the incoherent impl Wtf8
block would have required me to make separate standalone functions for these to ensure that we're testing the correct version of Wtf8Buf
. Just felt easier to delete them since the tests aren't adding a whole lot.
@@ -0,0 +1 @@ | |||
// All `wtf8` tests live in library/alloctests/tests/wtf8.rs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
str
module has its own version of this so I decided to make one here too.
@@ -181,6 +88,7 @@ impl fmt::Display for Wtf8Buf { | |||
} | |||
} | |||
|
|||
#[cfg_attr(test, allow(dead_code))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, all methods would be tested, but my goal was to move the code, not improve its test suite. The code originally had allow(dead_code)
in the entire module, so, strictly speaking, this is an improvement.
[.., 0xED, b2 @ 0xA0..=0xAF, b3] => Some(decode_surrogate(b2, b3)), | ||
_ => None, | ||
} | ||
#[cfg(not(test))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of these are due to the unfortunate way that alloctests
works when testing private internals. Since we have to include a copy of the module in the tests crate, we end up having two versions of these methods implemented on the same Wtf8
type, but returning Wtf8Buf
from two different crates. This is why I decided to leave standalone methods that are emitted in all cases and just wrap them outside of tests.
@@ -1046,21 +572,19 @@ impl Iterator for EncodeWide<'_> { | |||
} | |||
} | |||
|
|||
impl fmt::Debug for EncodeWide<'_> { | |||
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { | |||
f.debug_struct("EncodeWide").finish_non_exhaustive() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlike Wtf8CodePoints
, it's not entirely clear how you'd reconstruct the original string from EncodeWide
and include the unpaired surrogate, so, I decided to not bother for now. Someone can write a better debug implementation later.
3fc7d04
to
556a689
Compare
556a689
to
486f9ed
Compare
This is basically a small portion of #129411 with a smaller scope. It does not* affect any public APIs; this code is still internal to the standard library. It just moves the WTF-8 code into
core
andalloc
so it can be accessed byno_std
crates likebacktrace
.Like we do with ordinary strings, the tests are still located entirely in
alloc
, rather than splitting them intocore
andalloc
.Reviewer note: for ease of review, this is split into three commits:
You can review commits 1 and 3 to verify these claims, but commit 2 contains the majority of the changes you should care about.