Open
Description
Version: rustc 1.89.0-nightly (be19eda 2025-06-22)
Link to the Godbolt snippet.
https://godbolt.org/z/q3j8d567v
I tried this code:
// Generates properly optimized code.
pub fn iter_any(v: [u32; 8], x: u32) -> bool {
v.iter().any(|v| *v == x)
}
// Generates branching code.
pub fn contains(v: [u32; 8], x: u32) -> bool {
v.contains(&x)
}
I expected both to be the same, or the contains to be more optimal as it is more specialized.
Instead, this happened: the contains
seems to generate worse code than iter().any
.
Interestingly, if I copy the code directly from core
:
pub fn core_impl(v: [u32; 8], x: u32) -> bool {
// Make our LANE_COUNT 4x the normal lane count (aiming for 128 bit vectors).
// The compiler will nicely unroll it.
const LANE_COUNT: usize = 4 * (128 / (size_of::<u32>() * 8));
// SIMD
let mut chunks = v.chunks_exact(LANE_COUNT);
for chunk in &mut chunks {
if chunk.iter().fold(false, |acc, y| acc | (*y == x)) {
return true;
}
}
// Scalar remainder
return chunks.remainder().iter().any(|y| *y == x);
}
This will again generate the better code: https://godbolt.org/z/389GW9c99.