-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Add optimized BGEMM for NEOVERSEN2 target #5399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This re-uses the existing NEOVERSEN2 8x4 `sbgemm` kernel to implement `bgemm`.
I think there is a missing include here: it does not build in the weekly openblas-libs tests because |
maybe it is a toolchain question, or you are using additional code checking options ? I have only limited options for testing the most recent Neoverse cpus - our Cirun job uses an Ubuntu Jammy image that appears to be stuck at gcc11, and the most modern hardware in the GCC Compile Farm is a N1. The code in question still appears to compile on my Pixel8 phone with gcc-15 though |
This fails compilation on CI for macos-arm64. When I run it locally on a macbook M1, I do not see compilation of the
and the similar command on linux-arm64 fails tests, since it does not actually have bfloat
Right, we should probably use a |
Ahh, the difference is that the CI run specifies Lines 425 to 431 in d23680b
So for me a minimal reproducer for the build failure is this. Maybe the CI here does not care about the undefined function warning.
|
probably the xcode 15.4 toolchain ? |
Looks like arm_neon.h should be included, while arm_bf16.h would be included automatically from either this or arm_sve.h if needed - #5396 had already fixed this in the N1/V1 kernels, but not here |
This re-uses the existing NEOVERSEN2 8x4
sbgemm
kernel to implementbgemm
.