Skip to content

Add optimized BGEMM for NEOVERSEN2 target #5399

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 25, 2025

Conversation

Mousius
Copy link
Contributor

@Mousius Mousius commented Jul 24, 2025

This re-uses the existing NEOVERSEN2 8x4 sbgemm kernel to implement bgemm.

This re-uses the existing NEOVERSEN2 8x4 `sbgemm` kernel to implement `bgemm`.
@martin-frbg martin-frbg added this to the 0.3.31 milestone Jul 24, 2025
@martin-frbg martin-frbg merged commit c9204f7 into OpenMathLib:develop Jul 25, 2025
87 checks passed
@mattip
Copy link
Contributor

mattip commented Aug 3, 2025

I think there is a missing include here: it does not build in the weekly openblas-libs tests because vcvtah_f32_bf16 is not declared. It seems including both #include <arm_bf16.h> and #include <arm_neon.h> is needed. The failed run is here. Is there something different about the openblas-libs CI that is missing those includes?

@martin-frbg
Copy link
Collaborator

maybe it is a toolchain question, or you are using additional code checking options ? I have only limited options for testing the most recent Neoverse cpus - our Cirun job uses an Ubuntu Jammy image that appears to be stuck at gcc11, and the most modern hardware in the GCC Compile Farm is a N1. The code in question still appears to compile on my Pixel8 phone with gcc-15 though

@Mousius
Copy link
Contributor Author

Mousius commented Aug 3, 2025

@mattip potentially a bigger issue is that this is building BGEMM/SBGEMM and GEMV variants by default thanks to #5396 - which likely increases the binary size for no gain to numpy.

@mattip
Copy link
Contributor

mattip commented Aug 3, 2025

maybe it is a toolchain question, or you are using additional code checking options?

This fails compilation on CI for macos-arm64. When I run it locally on a macbook M1, I do not see compilation of the sbgemm_kernel_8x4_neoversen2.c kernel.

CFLAGS=' -ftrapping-math -mmacos-version-min=11.0 -fvisibility=protected -Wno-uninitialized'
make BUFFERSIZE=20 DYNAMIC_ARCH=1 QUIET_MAKE=1 USE_OPENMP=0 NUM_THREADS=64 BINARY=64 INTERFACE64=1 SYMBOLSUFFIX=64_ LIBNAMESUFFIX=64_ OBJCONV=/Users/runner/work/openblas-libs/openblas-libs/objconv/objconv SYMBOLPREFIX=scipy_ LIBNAMEPREFIX=scipy_ FIXED_LIBNAME=1 TARGET=VORTEX

and the similar command on linux-arm64 fails tests, since it does not actually have bfloat

CFLAGS=' -fvisibility=protected -Wno-uninitialized'
2025-08-03T02:05:19.0182311Z + make BUFFERSIZE=20 DYNAMIC_ARCH=1 QUIET_MAKE=1 USE_OPENMP=0 NUM_THREADS=64 BINARY=64 OBJCONV=/io/objconv/objconv SYMBOLPREFIX=scipy_ LIBNAMEPREFIX=scipy_ FIXED_LIBNAME=1 TARGET=ARMV8

building BGEMM/SBGEMM and GEMV variants by default

Right, we should probably use a DYNAMIC_LIST to not use the bfloat kernels on linux-arm64

@mattip
Copy link
Contributor

mattip commented Aug 3, 2025

Ahh, the difference is that the CI run specifies MACOSX_DEPLOYMENT_TARGET="11.0", which then allows SVE and SME here

OpenBLAS/Makefile.system

Lines 425 to 431 in d23680b

ifeq ($(OSNAME), Darwin)
ifndef MACOSX_DEPLOYMENT_TARGET
ifeq ($(ARCH), arm64)
export MACOSX_DEPLOYMENT_TARGET=11.0
export NO_SVE = 1
export NO_SME = 1
else

So for me a minimal reproducer for the build failure is this. Maybe the CI here does not care about the undefined function warning.

export MACOSX_DEPLOYMENT_TARGET="11.0"
export CFLAG=-Werror
make TARGET=NEOVERSEN2

@martin-frbg
Copy link
Collaborator

probably the xcode 15.4 toolchain ?
and dynamic_list won't change anything about bfloat16 support, you need to build with BUILD_BFLOAT16=0

@martin-frbg
Copy link
Collaborator

Looks like arm_neon.h should be included, while arm_bf16.h would be included automatically from either this or arm_sve.h if needed - #5396 had already fixed this in the N1/V1 kernels, but not here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants