Skip to content

webassembly/library: Decode string to print as utf-8. #10064

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

Anton-2
Copy link
Contributor

@Anton-2 Anton-2 commented Nov 23, 2022

Current code is looping over bytes, and triggering one event for each.
We can use UTF8ToString to decode the buffer directly, and send the whole string in one event.
Bonus : this allow to use utf-8 encoded strings in MicroPython and have them displayed correctly.

@Anton-2 Anton-2 changed the title webassembly/library : decode string to print as utf-8 webassembly/library: Decode string to print as utf-8. Nov 23, 2022
@Anton-2 Anton-2 force-pushed the master branch 2 times, most recently from f56a6e8 to b496166 Compare November 23, 2022 21:04
@jimmo
Copy link
Member

jimmo commented Nov 24, 2022

Thanks @Anton-2 -- this is the same issue as reported here: https://twitter.com/stuartcw/status/1594905739029536768

We were talking about this exact code a couple of days ago on Discord -- https://discordapp.com/channels/574275045187125269/1022482520845070407/1044491730776502314

I'm not sure if the right answer is for the print event to generate a decoded string (which is what this PR does). InsteadI think it should just copy the data into a UInt8Array and pass that on... i.e. I don't think this layer should be unicode-aware, it's just transporting bytes. It's possible for the Python program to generate utf-8 output that will straddle two different calls to mp_js_write. If something wants to re-assemble this into decoded strings, then that should happen at a different point (for example, the xterm.js will do this).

@Anton-2
Copy link
Contributor Author

Anton-2 commented Nov 24, 2022

Ok, but what to do then for the node.js part ? We are writing directly to process.stdout, will this accept an UInt8Array ?

And how can I trigger the case of "It's possible for the Python program to generate utf-8 output that will straddle two different calls to mp_js_write" to test it ?

I've got a version working like you described, tested successfully with MicroPyScript, but don't know how to test it on node.js.

@jimmo
Copy link
Member

jimmo commented Nov 25, 2022

We are writing directly to process.stdout, will this accept an UInt8Array ?

I believe so -- https://nodejs.org/api/stream.html#writablewritechunk-encoding-callback

And how can I trigger the case of "It's possible for the Python program to generate utf-8 output that will straddle two different calls to mp_js_write" to test it ?

It's not currently possible on the webassembly port (because sys.stdout isn't available, although that's easy to enable at compile time).

On a microcontroller target, you can write the utf-8 encoded bytes in chunks... e.g.

>>> _,_ = sys.stdout.write(b'\xe2\xad'), sys.stdout.write(b'\x90')
⭐

Although I notice that in CPython this isn't possible (sys.stdout won't let you write bytes, only str).

However this works in CPython

>>> _,_ = os.write(sys.stdout.fileno(), b'\xe2\xad'), os.write(sys.stdout.fileno(), b'\x90')
⭐

If I strace that process, I can see that two separate writes were done to the terminal:

write(1, "\342\255", 2)                 = 2
write(1, "\220", 1)                     = 1

You can flush and sleep etc the terminal will correctly reassemble it.

@Anton-2
Copy link
Contributor Author

Anton-2 commented Nov 25, 2022

Let's go with this, new PR in #10079.

@Anton-2 Anton-2 closed this Nov 25, 2022
RetiredWizard pushed a commit to RetiredWizard/micropython that referenced this pull request Feb 15, 2025
…on-main

Translations update from Hosted Weblate
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants