-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Open
Labels
C-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing suchCategory: An issue highlighting optimization opportunities or PRs implementing suchI-slowIssue: Problems and improvements with respect to performance of generated code.Issue: Problems and improvements with respect to performance of generated code.P-lowLow priorityLow priorityT-libsRelevant to the library team, which will review and decide on the PR/issue.Relevant to the library team, which will review and decide on the PR/issue.regression-from-stable-to-stablePerformance or correctness regression from one stable version to another.Performance or correctness regression from one stable version to another.
Description
I've finally gotten around to doing some proper benchmarking of rust versions for my crate:
http://chimper.org/rawloader-rustc-benchmarks/
As can be seen in the graph on that page there's a general performance improvement over time but there are some very negative outliers. Most (maybe all) of them seem to be very simple loops that decode packed formats. Since rust 1.25 those are seeing 30-40% degradations in performance. I've extracted a minimal test case that shows the issue:
fn decode_12le(buf: &[u8], width: usize, height: usize) -> Vec<u16> {
let mut out: Vec<u16> = vec![0; width*height];
for (row, line) in out.chunks_mut(width).enumerate() {
let inb = &buf[(row*width*12/8)..];
for (o, i) in line.chunks_mut(2).zip(inb.chunks(3)) {
let g1: u16 = i[0] as u16;
let g2: u16 = i[1] as u16;
let g3: u16 = i[2] as u16;
o[0] = ((g2 & 0x0f) << 8) | g1;
o[1] = (g3 << 4) | (g2 >> 4);
}
}
out
}
fn main() {
let width = 5000;
let height = 4000;
let buffer: Vec<u8> = vec![0; width*height*12/8];
for _ in 0..100 {
decode_12le(&buffer, width, height);
}
}
Here's a test run on my machine:
$ rustc +1.24.0 -C opt-level=3 bench_decode.rs
$ time ./bench_decode
real 0m4.817s
user 0m3.581s
sys 0m1.236s
$ rustc +1.25.0 -C opt-level=3 bench_decode.rs
$ time ./bench_decode
real 0m6.263s
user 0m5.067s
sys 0m1.196s
Metadata
Metadata
Assignees
Labels
C-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing suchCategory: An issue highlighting optimization opportunities or PRs implementing suchI-slowIssue: Problems and improvements with respect to performance of generated code.Issue: Problems and improvements with respect to performance of generated code.P-lowLow priorityLow priorityT-libsRelevant to the library team, which will review and decide on the PR/issue.Relevant to the library team, which will review and decide on the PR/issue.regression-from-stable-to-stablePerformance or correctness regression from one stable version to another.Performance or correctness regression from one stable version to another.