Skip to content
This repository was archived by the owner on Dec 22, 2021. It is now read-only.
This repository was archived by the owner on Dec 22, 2021. It is now read-only.

Load with extension operation #23

@Maratyszcza

Description

@Maratyszcza

There is no efficient way to represent loading of narrow-type vector with extension to wide-type vector, e.g. Load 4 uint16_t values and extend to 4 x uint32_t vector. To simulate such operation with the current API, we'd need to load values as a 64-bit scalar (potentially spilling to two registers on 32-bit architectures), transfer to SIMD register (expensive!), and then use shuffles to get it into proper places. With the native SIMD ISA, it can be implemented more efficiently:

  • PMOVZXWD xmm, [mem] on x86 with SSE4.1
  • MOVQ xmm, [mem] + PXOR xmm0, xmm0 + PUNPCKLWD xmm, xmm0 on SSE2
  • VLD1.16 {dX}, [rAddr] + VMOVL.U16 qX, dX on ARMv7+NEON
  • LD1 {Vx.4H}, xAddr + UXTL Vx.4S, Vx.4H on ARM64

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions