Skip to content

flt2dec: replace for loop by iter_mut #144205

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

hkBst
Copy link
Member

@hkBst hkBst commented Jul 20, 2025

Perf is explored in #144118, which initially showed small losses, but then also showed significant gains. Both are real, but given the smallness of the losses, this seems a good change.

@rustbot
Copy link
Collaborator

rustbot commented Jul 20, 2025

r? @workingjubilee

rustbot has assigned @workingjubilee.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jul 20, 2025
@hkBst hkBst changed the title flt2dec: fix some clippy lints flt2dec: change a for loop by iter_mut Jul 20, 2025
@hkBst hkBst changed the title flt2dec: change a for loop by iter_mut flt2dec: replace a for loop by iter_mut Jul 20, 2025
@hkBst hkBst changed the title flt2dec: replace a for loop by iter_mut flt2dec: replace for loop by iter_mut Jul 20, 2025
@workingjubilee
Copy link
Member

Performance related so isolating it for the usual reasons, though it's unclear how much it matters.

@bors r+ rollup=never

@bors
Copy link
Collaborator

bors commented Jul 20, 2025

📌 Commit 67b272a has been approved by workingjubilee

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 20, 2025
for j in i + 1..d.len() {
d[j] = b'0';
}
d.iter_mut().skip(i + 1).for_each(|c| *c = b'0');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think this is clearly better or worse than before. But how about d[i+1..].fill(b'0')?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be even better. I added it as an option to my bench crate, and performance on eu dev machines looks like this:
For aarch64:

test bench_round_up_fill ... bench:     974,458.62 ns/iter (+/- 17,236.78)
test bench_round_up_for  ... bench:   1,055,622.70 ns/iter (+/- 2,128.75)
test bench_round_up_iter ... bench:     855,721.80 ns/iter (+/- 2,159.27)

For x86_64:

test bench_round_up_fill ... bench:     730,473.60 ns/iter (+/- 393.90)
test bench_round_up_for  ... bench:     730,497.57 ns/iter (+/- 894.02)
test bench_round_up_iter ... bench:     740,172.60 ns/iter (+/- 954.45)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On aarch the (+/-) is always strangely large.

Copy link
Contributor

@hanna-kruppe hanna-kruppe Jul 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect your benchmarks are reading tea leaves, not nailing down meaningful differences, because at least the iter_mut().skip() version should also be readily identifiable as equivalent to memset. Consider checking if one variant calls memset and the other doesn’t — if memset vs inline loop makes a real difference then any changes here will be extremely fragile as they depend on loop idiom recognition working or not working for this loop (which may also differ between isolated benchmark of this one function vs. benchmark of the whole flt2dec rabbit hole).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. I looked at the codegen in your godbolt link and both variants you tried already get turned into memsets. The reason why they benchmark differently is that your benchmark runs on a 100k character buffer filled with '9's so the cost is dominated by the initial "scan backwards for first non-9" part instead. I don't know why the phrasing of the memset makes a difference for codegen in that part of the function, but now I really don't trust the numbers nor do I have faith that this will at all reproduce in the context of the flt2dec routines. The actual buffer size is orders of magnitude smaller, effects on other arms of this match aren't benchmarked at all, and if it's weird spooky action at a distance that affects the codegen for the rposition loop, then inlining it into the callers may have similarly unpredictable effects.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these concerns could be avoided by benchmarking a full trip through the formatting machinery. I appreciate that it's difficult to find an input that hits the path you're interested in and makes that piece of the code hot enough to get a measurable signal. But the "easier" alternative of reducing to the smallest possible benchmark and putting it a microscope can waste your time in other ways!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I reduce length to 10, then I get:
Aarch64:

test bench_round_up_fill ... bench:          14.86 ns/iter (+/- 0.01)
test bench_round_up_for  ... bench:          16.07 ns/iter (+/- 0.01)
test bench_round_up_iter ... bench:          16.07 ns/iter (+/- 0.06)

X86_64:

test bench_round_up_fill ... bench:          10.47 ns/iter (+/- 0.01)
test bench_round_up_for  ... bench:          12.85 ns/iter (+/- 0.02)
test bench_round_up_iter ... bench:           9.84 ns/iter (+/- 0.07)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't know if those numbers are at all meaningful, but at least they no longer give a reason to not use the fill version for the readability win, I guess?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fill it is!

@workingjubilee
Copy link
Member

@bors r-

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jul 20, 2025
@workingjubilee
Copy link
Member

assuming this doesn't magically fail tidy

@bors r+

@bors
Copy link
Collaborator

bors commented Jul 20, 2025

📌 Commit f147716 has been approved by workingjubilee

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jul 20, 2025
bors added a commit that referenced this pull request Jul 21, 2025
flt2dec: replace for loop by iter_mut

Perf is explored in #144118, which initially showed small losses, but then also showed significant gains. Both are real, but given the smallness of the losses, this seems a good change.
@bors
Copy link
Collaborator

bors commented Jul 21, 2025

⌛ Testing commit f147716 with merge 42266c2...

@jieyouxu
Copy link
Member

I think queue is very borked

@jieyouxu
Copy link
Member

@bors retry r- (manual status refresh, maybe github outage yesterday?)

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jul 22, 2025
@jieyouxu
Copy link
Member

@bors r=workingjubilee

@bors
Copy link
Collaborator

bors commented Jul 22, 2025

📌 Commit f147716 has been approved by workingjubilee

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jul 22, 2025
bors added a commit that referenced this pull request Jul 22, 2025
flt2dec: replace for loop by iter_mut

Perf is explored in #144118, which initially showed small losses, but then also showed significant gains. Both are real, but given the smallness of the losses, this seems a good change.
@bors
Copy link
Collaborator

bors commented Jul 22, 2025

⌛ Testing commit f147716 with merge c0b282f...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants