Skip to content

fommil/multiblas

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

multiblas

Optimal runtime selection of CPU / GPU BLAS (with an obligatory reference to the 5th Element).

Need

BLAS (1979) is a set of basic linear algebra operations such as vector dot product and matrix multiplication. BLAS is used extensively in high-performance computing with machine optimised implementations available for specific CPU chipsets (e.g. Intel, AMD, Apple, ATLAS and OpenBLAS).

More recently, BLAS has been implemented for GPUs by NVIDIA and AMD (the latter using the OpenCL standard). Such implementations result in dramatic speedups for larger arrays, but are much slower for small to medium sized arrays (less than ~100,000 entries).

Clearly, BLAS is best implemented by machine optimised implementations (i.e. the CPU) for small to medium arrays, and by the GPU for larger arrays.

Approach

MultiBLAS is a proof-of-concept that delegates between two implementations of BLAS using dlopen/dlsym on UNIX systems and LoadLibrary/GetProcAddress on Windows systems.

The First Milestone of the project is to implement delegation, machine-specific performance tuning, and demonstrate performance results for DDOT and DGEMM on OS X, Linux and Windows.

It is very important that the BLAS and CBLAS APIs are preserved. Decades of middleware has been developed to use the BLAS API and it has been exposed to other languages, e.g. by netlib-java.

The second milestone (requires funding) will cover the full BLAS API.

Future milestones will deal with issues such as dynamic load balancing, minimising startup time from cold, auto-batching and beyond.

Donations

Please consider supporting this open source project with a donation:

Donate via Paypal

Contributing

Contributors are encouraged to fork this repository and issue pull requests. Contributors implicitly agree to assign an unrestricted licence to Sam Halliday, but retain the copyright of their code (this means we both have the freedom to update the licence for those contributions).

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages