Skip to content

Inefficient horizontal boolean reductions for ARMv7 in Thumb2 mode #215

@hsivonen

Description

@hsivonen

Steps to reproduce

  1. Clone https://github.com/hsivonen/encoding_rs
  2. Checkout the simd branch
  3. Edit src/mem.rs to change copy_ascii_to_ascii to inline(never)
  4. Compile in release mode
  5. objdump -d the result
  6. Checkout the packed_simd branch
  7. Edit src/mem.rs to change copy_ascii_to_ascii to inline(never)
  8. Compile in release mode
  9. objdump -d the result

Actual results

With the old simd crate, boolean reductions on ARMv7 use vpmax.u8 twice to fold the vector onto itself and then uses vmov.32 once. packed_simd instead uses vmov.32 four times and them ORs them together on the ALU.

simd:

   6c17e:	f921 0a0f 	vld1.8	{d0-d1}, [r1]
   6c182:	ef89 2050 	vshr.s8	q1, q0, #7
   6c186:	ff02 2a03 	vpmax.u8	d2, d2, d3
   6c18a:	ff02 2a00 	vpmax.u8	d2, d2, d0
   6c18e:	ee12 1b10 	vmov.32	r1, d2[0]
   6c192:	2900      	cmp	r1, #0

packed_simd:

   56334:	f961 0a0f 	vld1.8	{d16-d17}, [r1]
   56338:	efc9 2070 	vshr.s8	q9, q8, #7
   5633c:	ee33 1b90 	vmov.32	r1, d19[1]
   56340:	ee32 4b90 	vmov.32	r4, d18[1]
   56344:	ee13 5b90 	vmov.32	r5, d19[0]
   56348:	ee12 6b90 	vmov.32	r6, d18[0]
   5634c:	4321      	orrs	r1, r4
   5634e:	ea46 0405 	orr.w	r4, r6, r5
   56352:	4321      	orrs	r1, r4

Expected results

Expected packed_simd to implement horizontal boolean reductions on ARMv7 the same way as simd.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions