Skip to content

Implement StringConcat in the interpreter. #6403

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Mar 15, 2024

Conversation

rluble
Copy link
Contributor

@rluble rluble commented Mar 14, 2024

No description provided.

Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@gkdn
Copy link
Contributor

gkdn commented Mar 15, 2024

Don't we need to pay special care around concat of two invalid UTF-16 strings? (e.g. with unmatched surrogate pairs)
Maybe we could add some tests around it.

@kripken
Copy link
Member

kripken commented Mar 15, 2024

Don't we need to pay special care around concat of two invalid UTF-16 strings?

I am actually not sure what the spec says about that. It's worth looking into, though it could be left as a TODO from my perspective.

@rluble
Copy link
Contributor Author

rluble commented Mar 15, 2024

Don't we need to pay special care around concat of two invalid UTF-16 strings?

I am actually not sure what the spec says about that. It's worth looking into, though it could be left as a TODO from my perspective.

So I couldn't find detailed specification for the behavior of concat other than the documentations for String.concat which basically says that the concatted string corresponds to the concatenation of the char representation; seems quite vague. Tested the behavior and in fact it is just the concatenation of the WTF-16 chars, so you can get a valid UTF-16 by concatenating invalid UTF-16 parts. This is both in JS and Java.

Do the Literals in the binaryen representation correspond to WTF-16 chars or to UTF-8 bytes?

@kripken
Copy link
Member

kripken commented Mar 15, 2024

Do the Literals in the binaryen representation correspond to WTF-16 chars or to UTF-8 bytes?

I believe the intention was that they correspond to bytes, though that was the plan when we intended to support more than JS strings (as the stringref proposal did). If all we need to support is WTF-16 for JS then perhaps we can simplify that.

Tested the behavior and in fact it is just the concatenation of the WTF-16 chars, so you can get a valid UTF-16 by concatenating invalid UTF-16 parts. This is both in JS and Java.

Thanks, that makes sense. That makes me think that maybe a good way to get coverage here is to find the JS tests for String.concat and port them to wasm. Hopefully they are in a convenient form in test262. In any case, let's leave that for later; for now all the recent PRs at least handle the simple ascii cases properly.

@kripken
Copy link
Member

kripken commented Mar 15, 2024

(CI / lint seems to find some errors here which need to be fixed before landing - whitespace?)

@rluble
Copy link
Contributor Author

rluble commented Mar 15, 2024

Still trying to get used to the workflow. Do you use the git clang-format to format the sources? no parameters? I don't think my last push fixed the formatting.

@kripken
Copy link
Member

kripken commented Mar 15, 2024

Doing git clang-format origin/main should work. You can also apply the diff from CI though, or fix it up manually.

@kripken
Copy link
Member

kripken commented Mar 15, 2024

To do it manually, copy the diff from CI which is

diff --git a/src/wasm-interpreter.h b/src/wasm-interpreter.h
index 99733bc64..13ac19d92 [10](https://github.com/WebAssembly/binaryen/actions/runs/8290790765/job/22689511066?pr=6403#step:6:11)0644
--- a/src/wasm-interpreter.h
+++ b/src/wasm-interpreter.h
@@ -1919,7 +1919,7 @@ public:
     }
     return Literal(int32_t(data->values.size()));
   }
-  Flow visitStringConcat(StringConcat* curr) { 
+  Flow visitStringConcat(StringConcat* curr) {
     NOTE_ENTER("StringConcat");
     Flow flow = visit(curr->left);
     if (flow.breaking()) {
@@ -[19](https://github.com/WebAssembly/binaryen/actions/runs/8290790765/job/22689511066?pr=6403#step:6:20)86,7 +1986,7 @@ public:
 
     return Literal(int32_t(refData->values.size()));
   }
- 
+
   Flow visitStringEq(StringEq* curr) {
     NOTE_ENTER("StringEq");
     Flow flow = visit(curr->left);

into a local file, say a.diff, and then do git apply a.diff (from the root dir of the repo).

@kripken kripken enabled auto-merge (squash) March 15, 2024 03:30
@kripken kripken merged commit c166ca0 into WebAssembly:main Mar 15, 2024
@gkdn gkdn mentioned this pull request Aug 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants