-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[aarch64] add sbgemm inner product op #1768
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[aarch64] add sbgemm inner product op #1768
Conversation
Hi @jondea , @nSircombe , @cfRod , appreciate if any of you can review this PR? thank you! |
HI @snadampal , thanks for the patch! I'm just looking into this now. Just so I can test it, what benchdnn run lines are you using to test, and in particular which ones do you want to enable with this patch? |
Hi @jondea , I have tested oneDNN inner product tests in fastmath and normal mode to make sure no regressions. The main addition in this PR is to allow |
Sorry for the slow reply. It looks great in principle, I'm glad it's such a simple change to enable this! My only concern is removing the check that the memory format is any. I thought that oneDNN expects you to change the memory format only if it is set to any? For reference, On the PyTorch side, are you planning to pass any or a specific format? |
Hi @jondea , thanks for the review. I know ACL expects from pytorch side, I will pass format::any for non-compile path, and the reordered format (whatever ACL returns as dst format for the reordered/packed weights) if the graph was compiled and use pre-packed weights. so, how about we check for either I'm suggesting the above blocked formats because this is what I see during the ACL reorders for non fastmath fp32 weights: |
If you wanted a quick work around, you could pass in format any from PyTorch when you re-init the primitive even though you know it's going to be that format. Nevertheless, this change does make sense in the long term. I think instead of hard coding it in oneDNN, it makes sense to check that the format passed into init matches the one returned from ACL. That would be more flexible to future changes in ACL. |
600bb98
to
9745998
Compare
Hi @jondea , given this is the right direction, I extended the supported formats to I didn't want to workaround ideep as it will be too hacky solution; since there won't be any on-the-fly reorders in the compiled case, we need to hack the weight md format at multiple places, during create as well as exec, which i don't think is a good option. |
I'm trying to add few tests for ip in sbgemm mode. Could someone please point me to documentation on how to add benchdnn unit tests for new configurations? Thank you! |
I think we need a bit more logic inside To be concrete, I think as it stands, this will fail benchdnn if you run with a |
052924f
to
d0389c8
Compare
Hi @jondea , addressed your feedback about the error checks for blocked format and also added unit tests. Please note that for supporting the unit test with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes! I just have two minor comments.
if ((weights_format_kind_received == format_kind::blocked) | ||
&& !(dnnl_memory_desc_equal( | ||
&weights_md_received, &weights_md_))) { | ||
return status::unimplemented; |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
0eee71e
to
d66926a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thank you for the contribution!
Looks like the CI failures are not related to this PR, I have seen CI failing for other PRs too, with the same errors.
|
Hi @snadampal Thanks for notifying, we are aware of this issue and a fix is submitted into the internal repository. |
@igorsafo , thanks for confirming. I guess CI will be triggered for this PR once the internal issue is fixed, right? |
Once the fix is landed we will need to rebase this branch on top of the latest |
@snadampal The fixes landed into the |
with weights pre-packing enabled in torch.compile(), the weights come already reorderd and in oneDNN format, so, allowing format_kind::blocked as one of the supported formats for acl inner product primitive.
d66926a
to
ffdbe04
Compare
Hi @igorsafo , done, rebased the PR. |
Hi @igorsafo , please let me know if anything else to be covered for this PR to get merged. thank you. |
Hi @igorsafo , we planned this PR for oneDNN 3.4. It was tagged for v3.4 but not yet merged, it's been approved and CI passing, could you please prioritize it? |
thank you! |
Hi @snadampal , sorry for the wait, the changes were promoted into main and backported into rls-v3.4. |
Description
Added sbgemm inner product op to enable PyTorch torch.compile() and bf16 fastmath kernels to work together on aarch64.
Fixes # (github issue)
Checklist
General
make test
andmake test_benchdnn_*
) pass locally for each commit?same pass rate as in main branch.
Performance improvements
New features
Bug fixes
RFC PR