Sub-optimal codegen for f32 tuples

Similar to the behavior observed in #32031, tuples of `f32` and `f64` seem to be passed to functions in GPRs.

The `f32` tuple takes an especially large hit, since the two f32 are passed inside a single 64 bit GPR and have to be excracted and compressed via `shift` and `or` instructions. Even with inlining turned on, this does not go away.

The `f64` tuple is not as bad as the `f32` tuple. Without inlining it does some `move`s to and from the SIMD registers and with inlining turned on, the tuple is kept in a SIMD register and the loop is vectorized and unrolled.

**EDIT:** Forgot to link to the [code example on playpen](http://is.gd/2LW7JO).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sub-optimal codegen for f32 tuples #32045

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Sub-optimal codegen for f32 tuples #32045

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions