Skip to content

[clang] Improve getFileIDLocal binary search. #146510

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 1, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 16 additions & 25 deletions clang/lib/Basic/SourceManager.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -812,6 +812,8 @@ FileID SourceManager::getFileIDSlow(SourceLocation::UIntTy SLocOffset) const {
/// loaded one.
FileID SourceManager::getFileIDLocal(SourceLocation::UIntTy SLocOffset) const {
assert(SLocOffset < NextLocalOffset && "Bad function choice");
assert(SLocOffset >= LocalSLocEntryTable[0].getOffset() &&
"Invalid SLocOffset");

// After the first and second level caches, I see two common sorts of
// behavior: 1) a lot of searched FileID's are "near" the cached file
Expand Down Expand Up @@ -855,35 +857,24 @@ FileID SourceManager::getFileIDLocal(SourceLocation::UIntTy SLocOffset) const {
break;
}

NumProbes = 0;
while (true) {
unsigned MiddleIndex = (GreaterIndex-LessIndex)/2+LessIndex;
SourceLocation::UIntTy MidOffset =
getLocalSLocEntry(MiddleIndex).getOffset();

++NumProbes;

// If the offset of the midpoint is too large, chop the high side of the
// range to the midpoint.
if (MidOffset > SLocOffset) {
GreaterIndex = MiddleIndex;
continue;
}
while (LessIndex < GreaterIndex) {
++NumBinaryProbes;

// If the middle index contains the value, succeed and return.
if (MiddleIndex + 1 == LocalSLocEntryTable.size() ||
SLocOffset < getLocalSLocEntry(MiddleIndex + 1).getOffset()) {
FileID Res = FileID::get(MiddleIndex);
unsigned MiddleIndex = LessIndex + (GreaterIndex - LessIndex) / 2;

// Remember it. We have good locality across FileID lookups.
LastFileIDLookup = Res;
NumBinaryProbes += NumProbes;
return Res;
}
SourceLocation::UIntTy MidOffset =
LocalSLocEntryTable[MiddleIndex].getOffset();

// Otherwise, move the low-side up to the middle index.
LessIndex = MiddleIndex;
if (MidOffset <= SLocOffset)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why equal? Shouldn't "we found this!" mean we exit the loop? And does the '+1' here cause us to miss it if the offset is exactly 'midoffset'?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an upper_bound binary search, it finds the first element which is greater than SLocOffset. We will -1 for the final result.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, hrmph.... I guess I see here. I guess that the 'we just managed to be searching for the very beginning of this file' is rare enough to not be worth a branch.

Though, is this safe (the +1/-1 dance)with 'GreaterIndex'? Despite its name, GreaterIndex is initialized to the 'size' (so not really an index... sigh). Though I guess the linear search ensures we've already checked the end....

I think I've convinced myself this is right enough. Thanks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, initializing GreaterIndex to size can be a bit confusing, but that initial value is never actually used.

GreaterIndex always points to an entry whose offset is strictly greater than SLocOffset. The search space is defined as [LessIndex, GreaterIndex), so we know by design that GreaterIndex is never a valid result.

LessIndex = MiddleIndex + 1;
else
GreaterIndex = MiddleIndex;
}

// At this point, LessIndex is the index of the *first element greater than*
// SLocOffset. The element we are actually looking for is the one immediately
// before it.
return LastFileIDLookup = FileID::get(LessIndex - 1);
}

/// Return the FileID for a SourceLocation with a high offset.
Expand Down
Loading