Skip to content

Sub-optimal codegen for f32 tuples #32045

@bsteinb

Description

@bsteinb

Similar to the behavior observed in #32031, tuples of f32 and f64 seem to be passed to functions in GPRs.

The f32 tuple takes an especially large hit, since the two f32 are passed inside a single 64 bit GPR and have to be excracted and compressed via shift and or instructions. Even with inlining turned on, this does not go away.

The f64 tuple is not as bad as the f32 tuple. Without inlining it does some moves to and from the SIMD registers and with inlining turned on, the tuple is kept in a SIMD register and the loop is vectorized and unrolled.

EDIT: Forgot to link to the code example on playpen.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-codegenArea: Code generation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions