-
Notifications
You must be signed in to change notification settings - Fork 793
Monomorphize all the things #6760
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This reverts commit 8da6c96.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments on code so far. Will look at tests next.
src/passes/Monomorphize.cpp
Outdated
// Go in reverse post-order as explained earlier, noting what cannot be | ||
// moved into the context, and while accumulating the effects that are not | ||
// moving. | ||
std::unordered_set<Expression*> unMovable; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::unordered_set<Expression*> unMovable; | |
std::unordered_set<Expression*> unmovable; |
No need to capitalize the M
, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, good point. Looking into this, both unmovable
and immovable
are words, and apparently immovable
fits physical objects better, so I switched to that.
src/passes/Monomorphize.cpp
Outdated
// that because if a parent is unmovable then we can't move the children | ||
// into the context (if we did, they would execute after the parent, but | ||
// it needs their values). | ||
auto currUnMovable = unMovable.count(curr) > 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just as short and slightly clearer.
auto currUnMovable = unMovable.count(curr) > 0; | |
bool currUnMovable = unMovable.count(curr) > 0; |
src/passes/Monomorphize.cpp
Outdated
if (currUnMovable) { | ||
// Regardless of whether this was marked unmovable because of the | ||
// parent, or because we just found it cannot be moved, accumulate the | ||
// effects, and also mark its immediate children (so that we do the same | ||
// when we get to them). | ||
nonMovingEffects.visit(curr); | ||
for (auto* child : ChildIterator(curr)) { | ||
unMovable.insert(child); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we get much benefit from moving partial argument expressions into the function? It seems that leaving some of the leaf expressions as local.get
s of arguments will usually inhibit any further optimization of the full argument expression in the callee beyond what could have been done in the caller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least sometimes we do. In the example in the top comment, we move the struct.new
into the called function, where it might not escape and therefore be optimized out entirely. We don't know that value, but we do still see the big picture better when we move code into the target.
Another example: if a call sends (i32.eqz (local.get $anything))
then we can at least move the i32.eqz
, and then the optimizer can see that that has at most 1 bit set.
In general, there is more opportunity to optimize the more we move, I think, but you might be right that opportunities decrease with unknown values as leaves. When I look into tuning I might add a flag to control this (though I hope not to need many flags).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, thanks!
src/passes/Monomorphize.cpp
Outdated
if (Properties::isControlFlowStructure(curr)) { | ||
// We can in principle move entire control flow structures with their | ||
// children, but for simplicity stop when we see one rather than look | ||
// inside to see if we could transfer all its contents. | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we already scan all its children, so the code would already handle moving entire control flow structures correctly, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem would be that in general we can move a parent without moving the children. (i32.eqz (local.get $anything))
can be partially optimized, with the eqz
moved but not the local.get. But
(block $b
(br $b)
)
will break if we move the block
but not the br
. So if we want to move the block
we'd need to look ahead (really behind since we traverse in reverse) to see that all the children work out. That might be worth doing but I'm not sure, and it adds complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, and trying to partially move an if-else would also cause problems now that I think about it more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests look good!
Co-authored-by: Thomas Lively <[email protected]>
This was done in #6760 ("monomorphize all the things").
Previously call operands were monomorphized (considered as part of the
call context, so we can create a specialized function with those operands
fixed) if they were constant or had a different type than the function
parameter's type. This generalizes that to pull in pretty much all the code
we possibly can, including nested code. For example:
This can turn into
The
struct.new
and even one of thestruct.new
's children is moved into thecalled function, replacing the original ref argument with two other ones. If the
original called function was this:
then the monomorphized function then looks like this:
The
struct.new
and its constant child appear here, and we read theparameters.
To do this, generalize the code that creates the call context to accept
everything that is impossible to copy (like a
local.get
) or that would betricky and likely unworthwhile (like another call or a tuple). Also check
for effect interactions, as this code motion does some reordering.
For this to work, we need to adjust how we compute the costs we
compare when deciding what to monomorphize. Before we just
compared the called function to the monomorphized called function,
which was good enough when the call context only contained consts,
but now it can contain arbitrarily nested code. The proper comparison
is between these two:
Including the call context makes this a fair comparison. In the example
above, the
struct.new
and thei32.const
are part of the call context,and so they are in the monomorphized function, so if we didn't count
them in other function we'd decide not to optimize anything with a large
context.
The new functionality is tested in a new file. A few parts of existing
tests needed changes to not become pointless after this improvement,
namely by replacing stuff that we now optimize with things that we
don't like replacing an
i32.eqz
with alocal.get
. There are also ahandful of test outcomes that change in CAREFUL mode due to the
new cost analysis.
This is the last major work here. After this I think all that is left is
heuristics on what size functions to work etc.