Skip to content

Implement Debug for EncodeWide #140153

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

thaliaarchi
Copy link
Contributor

@thaliaarchi thaliaarchi commented Apr 22, 2025

Since std::os::windows::ffi::EncodeWide is reexported from std::sys_common::wtf8::EncodeWide, which has #![allow(missing_debug_implementations)] in the parent module, it does not implement Debug.

Implement it like core::str::Chars. Format the WTF-16 code units as hex, zero-padded to 4 wide, since it would not be semantically correct to render them like char.

This becomes insta-stable.

r? libs-api

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Apr 22, 2025
@bjorn3
Copy link
Member

bjorn3 commented Apr 23, 2025

We can't format it like char, because \u escape sequences for surrogate halves are invalid syntax in Rust.

Even if rust doesn't accept \u for surrogate halves, we can still format them as \u. Debug isn't guaranteed to be valid rust source code anyway.

@thaliaarchi
Copy link
Contributor Author

thaliaarchi commented Apr 23, 2025

Here are the options, as I see them. I think formatting them as chars when not surrogates is the most readable for most texts. Rust style seems to lean towards uppercase hex.

  1. EncodeWide([97, 233, 32, 55357, 55357, 56489]): decimal
  2. EncodeWide([0x61, 0xE9, 0x20, 0xD83D, 0xD83D, 0xDCA9]): hex, zero-pad min 2, upper
  3. EncodeWide([0x61, 0xe9, 0x20, 0xd83d, 0xd83d, 0xdca9]): hex, zero-pad min 2, lower
  4. EncodeWide([0x0061, 0x00E9, 0x0020, 0xD83D, 0xD83D, 0xDCA9]): hex, zero-pad 4, upper
  5. EncodeWide([0x0061, 0x00e9, 0x0020, 0xd83d, 0xd83d, 0xdca9]): hex, zero-pad 4, lower
  6. EncodeWide(['a', 'é', ' ', '\u{D83D}', '\u{D83D}', '\u{DCA9}']): pseudo-char, escaped surrogates, upper
  7. EncodeWide(['a', 'é', ' ', '\u{d83d}', '\u{d83d}', '\u{dca9}']): pseudo-char, escaped surrogates, lower

@tgross35 tgross35 added needs-fcp This change is insta-stable, or significant enough to need a team FCP to proceed. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. and removed T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Apr 25, 2025
@thaliaarchi
Copy link
Contributor Author

thaliaarchi commented Jun 26, 2025

I've updated formatting to be 4-wide zero-padded hex (style 4)

For another opinion, @ChrisDenton wrote:

I think EncodeWide, semantically, produces a string like a str but with a different encoding. That said, EncodeWide itself is an iterator more akin to a Bytes iterator of &str (except u16 instead of bytes). But this is all made more complicated by the fact that std doesn't have a WideStr type so people typically collect into a Vec<u16>, which doesn't know it's a wide string. Whereas the iterator does know. So I am sympathetic to the char like representation but for debug I do think we really have to go with a hex repr. And my gut preference is for the 4 zero pad, which is more suggestive of unicode without introducing illegal rust syntax. But I reserve the right to change my mind on that 😆

I think the char-like repr is a non-starter in any case because it looks like rust code but may be illegal (i.e. for surrogates).

(in favor of style 4)

Since `std::os::windows::ffi::EncodeWide` is reexported from
`std::sys_common::wtf8::EncodeWide`, which has
`#![allow(missing_debug_implementations)]` in the parent module, it does
not implement `Debug`.
@thaliaarchi
Copy link
Contributor Author

What's needed next to move this along?

@ChrisDenton ChrisDenton added the I-libs-api-nominated Nominated for discussion during a libs-api team meeting. label Jun 27, 2025
@ChrisDenton
Copy link
Member

I've nominated for libs-api. I don't know if it needs a full libs API discussion but I believe at least someone from the team will need to sign off on it.

@joshtriplett
Copy link
Member

Since trait impls are insta-stable, if we want to do this it would require a libs-api FCP. Nominating seems like the right next step.

@ChrisDenton
Copy link
Member

Ah, right.

To clarify my stream of consciousness that was quoted earlier, I don't think we should use char syntax because char very explicitly can't encode surrogates.

I do think it's nice to use hex here as that's what Unicode uses. Zero padded hex makes it look even more like Unicode's U+xxxx format but is valid rust.

@Amanieu
Copy link
Member

Amanieu commented Jul 1, 2025

We discussed this in the @rust-lang/libs-api meeting. We definitely agree that EncodeWide needs a Debug impl, although we don't have stable guarantees on the exact output of a Debug impl. We're happy with option 4 that you have chosen.

@rfcbot merge

@rfcbot
Copy link
Collaborator

rfcbot commented Jul 1, 2025

Team member @Amanieu has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@Amanieu Amanieu removed the I-libs-api-nominated Nominated for discussion during a libs-api team meeting. label Jul 1, 2025
@rfcbot rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. and removed proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. labels Jul 1, 2025
@rfcbot
Copy link
Collaborator

rfcbot commented Jul 1, 2025

🔔 This is now entering its final comment period, as per the review above. 🔔

@ChrisDenton
Copy link
Member

although we don't have stable guarantees on the exact output of a Debug impl

To expand on that, we are free to change the implementation in the future because of the lack of guarantee’s on the output of Debug. So if someone can make a good argument for one of the other implementations, then that it's always possible to switch. There did not seem to be a strong preference amongst libs-api members but they did agree on the desirability of having a Debug implementation, whatever it is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. needs-fcp This change is insta-stable, or significant enough to need a team FCP to proceed. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants