Monomorphize all the things #6760

kripken · 2024-07-17T20:11:34Z

Previously call operands were monomorphized (considered as part of the
call context, so we can create a specialized function with those operands
fixed) if they were constant or had a different type than the function
parameter's type. This generalizes that to pull in pretty much all the code
we possibly can, including nested code. For example:

(call $foo
  (struct.new $struct
    (i32.const 10)
    (local.get $x)
    (local.get $y)
  )
)

This can turn into

(call $foo_mono
  (local.get $x)
  (local.get $y)
)

The struct.new and even one of the struct.new's children is moved into the
called function, replacing the original ref argument with two other ones. If the
original called function was this:

(func $foo (param $ref (ref ..))
  ..
)

then the monomorphized function then looks like this:

(func $foo_mono (param $x i32) (param $y i32)
  (local $ref (ref ..))
  (local.set $ref
    (struct.new $struct
      (i32.const 10)
      (local.get $x)
      (local.get $y)
    )
  )
  ..
)

The struct.new and its constant child appear here, and we read the
parameters.

To do this, generalize the code that creates the call context to accept
everything that is impossible to copy (like a local.get) or that would be
tricky and likely unworthwhile (like another call or a tuple). Also check
for effect interactions, as this code motion does some reordering.

For this to work, we need to adjust how we compute the costs we
compare when deciding what to monomorphize. Before we just
compared the called function to the monomorphized called function,
which was good enough when the call context only contained consts,
but now it can contain arbitrarily nested code. The proper comparison
is between these two:

Old function + call context
New monomorphized function

Including the call context makes this a fair comparison. In the example
above, the struct.new and the i32.const are part of the call context,
and so they are in the monomorphized function, so if we didn't count
them in other function we'd decide not to optimize anything with a large
context.

The new functionality is tested in a new file. A few parts of existing
tests needed changes to not become pointless after this improvement,
namely by replacing stuff that we now optimize with things that we
don't like replacing an i32.eqz with a local.get. There are also a
handful of test outcomes that change in CAREFUL mode due to the
new cost analysis.

This is the last major work here. After this I think all that is left is
heuristics on what size functions to work etc.

This reverts commit 8da6c96.

tlively

Comments on code so far. Will look at tests next.

tlively · 2024-07-17T21:14:05Z

src/passes/Monomorphize.cpp

+    // Go in reverse post-order as explained earlier, noting what cannot be
+    // moved into the context, and while accumulating the effects that are not
+    // moving.
+    std::unordered_set<Expression*> unMovable;


Suggested change

std::unordered_set<Expression*> unMovable;

std::unordered_set<Expression*> unmovable;

No need to capitalize the M, I think.

Hmm, good point. Looking into this, both unmovable and immovable are words, and apparently immovable fits physical objects better, so I switched to that.

tlively · 2024-07-17T21:15:34Z

src/passes/Monomorphize.cpp

+      // that because if a parent is unmovable then we can't move the children
+      // into the context (if we did, they would execute after the parent, but
+      // it needs their values).
+      auto currUnMovable = unMovable.count(curr) > 0;


Just as short and slightly clearer.

Suggested change

auto currUnMovable = unMovable.count(curr) > 0;

bool currUnMovable = unMovable.count(curr) > 0;

tlively · 2024-07-17T21:24:45Z

src/passes/Monomorphize.cpp

+      if (currUnMovable) {
+        // Regardless of whether this was marked unmovable because of the
+        // parent, or because we just found it cannot be moved, accumulate the
+        // effects, and also mark its immediate children (so that we do the same
+        // when we get to them).
+        nonMovingEffects.visit(curr);
+        for (auto* child : ChildIterator(curr)) {
+          unMovable.insert(child);
+        }
+      }
+    }


Do we get much benefit from moving partial argument expressions into the function? It seems that leaving some of the leaf expressions as local.gets of arguments will usually inhibit any further optimization of the full argument expression in the callee beyond what could have been done in the caller.

At least sometimes we do. In the example in the top comment, we move the struct.new into the called function, where it might not escape and therefore be optimized out entirely. We don't know that value, but we do still see the big picture better when we move code into the target.

Another example: if a call sends (i32.eqz (local.get $anything)) then we can at least move the i32.eqz, and then the optimizer can see that that has at most 1 bit set.

In general, there is more opportunity to optimize the more we move, I think, but you might be right that opportunities decrease with unknown values as leaves. When I look into tuning I might add a flag to control this (though I hope not to need many flags).

Makes sense, thanks!

tlively · 2024-07-17T21:36:24Z

src/passes/Monomorphize.cpp

+    if (Properties::isControlFlowStructure(curr)) {
+      // We can in principle move entire control flow structures with their
+      // children, but for simplicity stop when we see one rather than look
+      // inside to see if we could transfer all its contents.
+      return false;
+    }


But we already scan all its children, so the code would already handle moving entire control flow structures correctly, right?

The problem would be that in general we can move a parent without moving the children. (i32.eqz (local.get $anything)) can be partially optimized, with the eqz moved but not the local.get. But

(block $b (br $b) )

will break if we move the block but not the br. So if we want to move the block we'd need to look ahead (really behind since we traverse in reverse) to see that all the children work out. That might be worth doing but I'm not sure, and it adds complexity.

I see, and trying to partially move an if-else would also cause problems now that I think about it more.

tlively

Tests look good!

test/lit/passes/monomorphize-consts.wast

Co-authored-by: Thomas Lively <[email protected]>

This was done in #6760 ("monomorphize all the things").

kripken added 30 commits July 15, 2024 12:49

start

33b1861

test

d847cf0

test

bbc569d

test

b4d4451

prep

f651bd1

write

4adb0e2

write

68a86da

write

6847c97

write

8da6c96

Revert "write"

29ef15c

This reverts commit 8da6c96.

work

4a12032

fix

c5919de

work

8bf9457

work

eef0022

work

76e9355

work

4335682

work

145f503

work

ee49a74

fix

ef15ad2

[NFC] Clarify and standardize order in flexibleCopy

0f499e9

Merge branch 'copy-order' into mono-all-the-things

1a7f449

fix

65907f8

simpl

e62123c

test

e7316de

Merge remote-tracking branch 'origin/main' into mono-all-the-things

aa157d1

problem!

190a63e

hmm

aa97148

possible fix

599eca5

work

72cea41

work

c2deb77

kripken added 17 commits July 16, 2024 16:03

work

cc80a37

work

a0b754c

work

a79e2ef

rework

0f1aba6

fix

47088bf

work

8a5c211

work

59944e4

work

90f03d0

simpl

3a9ce15

format

9f4505c

Merge remote-tracking branch 'origin/main' into mono-all-the-things

4e96c9a

update after merge

c43f8cf

work

c4fe24b

work

0369457

work

f1441fc

simpl

eab406b

test

e7ceffa

kripken requested a review from tlively July 17, 2024 20:11

Update another test

25bc6d8

tlively reviewed Jul 17, 2024

View reviewed changes

kripken added 3 commits July 17, 2024 15:10

unMovable => immovable

70fab3d

feedback: auto => bool

fc4072c

extra comments after review discussion

298ca5b

tlively approved these changes Jul 18, 2024

View reviewed changes

test/lit/passes/monomorphize-consts.wast Outdated Show resolved Hide resolved

Update test/lit/passes/monomorphize-consts.wast

040831f

Co-authored-by: Thomas Lively <[email protected]>

kripken merged commit b91966f into WebAssembly:main Jul 18, 2024
13 checks passed

kripken deleted the mono-all-the-things branch July 18, 2024 18:21

gkdn mentioned this pull request Aug 31, 2024

stringconsts gkdn/binaryen#1

Closed

kripken mentioned this pull request Jan 29, 2025

Remove stale TODO in Monomorphize.cpp [NFC] #7249

Merged

kripken added a commit that referenced this pull request Jan 29, 2025

Remove stale TODO in Monomorphize.cpp [NFC] (#7249)

d0321bc

This was done in #6760 ("monomorphize all the things").

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Monomorphize all the things #6760

Monomorphize all the things #6760

Uh oh!

kripken commented Jul 17, 2024

Uh oh!

tlively left a comment

Uh oh!

tlively Jul 17, 2024

Uh oh!

kripken Jul 17, 2024

Uh oh!

tlively Jul 17, 2024

Uh oh!

tlively Jul 17, 2024

Uh oh!

kripken Jul 17, 2024

Uh oh!

tlively Jul 17, 2024

Uh oh!

tlively Jul 17, 2024

Uh oh!

kripken Jul 17, 2024

Uh oh!

tlively Jul 17, 2024

Uh oh!

tlively left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

	std::unordered_set<Expression*> unMovable;
	std::unordered_set<Expression*> unmovable;

	auto currUnMovable = unMovable.count(curr) > 0;
	bool currUnMovable = unMovable.count(curr) > 0;

Monomorphize all the things #6760

Monomorphize all the things #6760

Uh oh!

Conversation

kripken commented Jul 17, 2024

Uh oh!

tlively left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tlively left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!