You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Dec 22, 2021. It is now read-only.
There is no efficient way to represent loading of narrow-type vector with extension to wide-type vector, e.g. Load 4 uint16_t values and extend to 4 x uint32_t vector. To simulate such operation with the current API, we'd need to load values as a 64-bit scalar (potentially spilling to two registers on 32-bit architectures), transfer to SIMD register (expensive!), and then use shuffles to get it into proper places. With the native SIMD ISA, it can be implemented more efficiently: