Skip to content

Monomorphize all the things #6760

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 52 commits into from
Jul 18, 2024
Merged

Conversation

kripken
Copy link
Member

@kripken kripken commented Jul 17, 2024

Previously call operands were monomorphized (considered as part of the
call context, so we can create a specialized function with those operands
fixed) if they were constant or had a different type than the function
parameter's type. This generalizes that to pull in pretty much all the code
we possibly can, including nested code. For example:

(call $foo
  (struct.new $struct
    (i32.const 10)
    (local.get $x)
    (local.get $y)
  )
)

This can turn into

(call $foo_mono
  (local.get $x)
  (local.get $y)
)

The struct.new and even one of the struct.new's children is moved into the
called function, replacing the original ref argument with two other ones. If the
original called function was this:

(func $foo (param $ref (ref ..))
  ..
)

then the monomorphized function then looks like this:

(func $foo_mono (param $x i32) (param $y i32)
  (local $ref (ref ..))
  (local.set $ref
    (struct.new $struct
      (i32.const 10)
      (local.get $x)
      (local.get $y)
    )
  )
  ..
)

The struct.new and its constant child appear here, and we read the
parameters.

To do this, generalize the code that creates the call context to accept
everything that is impossible to copy (like a local.get) or that would be
tricky and likely unworthwhile (like another call or a tuple). Also check
for effect interactions, as this code motion does some reordering.

For this to work, we need to adjust how we compute the costs we
compare when deciding what to monomorphize. Before we just
compared the called function to the monomorphized called function,
which was good enough when the call context only contained consts,
but now it can contain arbitrarily nested code. The proper comparison
is between these two:

  • Old function + call context
  • New monomorphized function

Including the call context makes this a fair comparison. In the example
above, the struct.new and the i32.const are part of the call context,
and so they are in the monomorphized function, so if we didn't count
them in other function we'd decide not to optimize anything with a large
context.

The new functionality is tested in a new file. A few parts of existing
tests needed changes to not become pointless after this improvement,
namely by replacing stuff that we now optimize with things that we
don't like replacing an i32.eqz with a local.get. There are also a
handful of test outcomes that change in CAREFUL mode due to the
new cost analysis.

This is the last major work here. After this I think all that is left is
heuristics on what size functions to work etc.

@kripken kripken requested a review from tlively July 17, 2024 20:11
Copy link
Member

@tlively tlively left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments on code so far. Will look at tests next.

// Go in reverse post-order as explained earlier, noting what cannot be
// moved into the context, and while accumulating the effects that are not
// moving.
std::unordered_set<Expression*> unMovable;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::unordered_set<Expression*> unMovable;
std::unordered_set<Expression*> unmovable;

No need to capitalize the M, I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, good point. Looking into this, both unmovable and immovable are words, and apparently immovable fits physical objects better, so I switched to that.

// that because if a parent is unmovable then we can't move the children
// into the context (if we did, they would execute after the parent, but
// it needs their values).
auto currUnMovable = unMovable.count(curr) > 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as short and slightly clearer.

Suggested change
auto currUnMovable = unMovable.count(curr) > 0;
bool currUnMovable = unMovable.count(curr) > 0;

Comment on lines 323 to 333
if (currUnMovable) {
// Regardless of whether this was marked unmovable because of the
// parent, or because we just found it cannot be moved, accumulate the
// effects, and also mark its immediate children (so that we do the same
// when we get to them).
nonMovingEffects.visit(curr);
for (auto* child : ChildIterator(curr)) {
unMovable.insert(child);
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we get much benefit from moving partial argument expressions into the function? It seems that leaving some of the leaf expressions as local.gets of arguments will usually inhibit any further optimization of the full argument expression in the callee beyond what could have been done in the caller.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least sometimes we do. In the example in the top comment, we move the struct.new into the called function, where it might not escape and therefore be optimized out entirely. We don't know that value, but we do still see the big picture better when we move code into the target.

Another example: if a call sends (i32.eqz (local.get $anything)) then we can at least move the i32.eqz, and then the optimizer can see that that has at most 1 bit set.

In general, there is more opportunity to optimize the more we move, I think, but you might be right that opportunities decrease with unknown values as leaves. When I look into tuning I might add a flag to control this (though I hope not to need many flags).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks!

Comment on lines 404 to 409
if (Properties::isControlFlowStructure(curr)) {
// We can in principle move entire control flow structures with their
// children, but for simplicity stop when we see one rather than look
// inside to see if we could transfer all its contents.
return false;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we already scan all its children, so the code would already handle moving entire control flow structures correctly, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem would be that in general we can move a parent without moving the children. (i32.eqz (local.get $anything)) can be partially optimized, with the eqz moved but not the local.get. But

(block $b
  (br $b)
)

will break if we move the block but not the br. So if we want to move the block we'd need to look ahead (really behind since we traverse in reverse) to see that all the children work out. That might be worth doing but I'm not sure, and it adds complexity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, and trying to partially move an if-else would also cause problems now that I think about it more.

Copy link
Member

@tlively tlively left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests look good!

@kripken kripken merged commit b91966f into WebAssembly:main Jul 18, 2024
13 checks passed
@kripken kripken deleted the mono-all-the-things branch July 18, 2024 18:21
@gkdn gkdn mentioned this pull request Aug 31, 2024
kripken added a commit that referenced this pull request Jan 29, 2025
This was done in #6760 ("monomorphize all the things").
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants