Skip to content

[X86] Align f128 and i128 to 16 bytes when passing on x86-32 #138092

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 17, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions llvm/docs/ReleaseNotes.md
Original file line number Diff line number Diff line change
Expand Up @@ -233,6 +233,8 @@ Changes to the X86 Backend
--------------------------

* `fp128` will now use `*f128` libcalls on 32-bit GNU targets as well.
* On x86-32, `fp128` and `i128` are now passed with the expected 16-byte stack
alignment.

Changes to the OCaml bindings
-----------------------------
Expand Down
32 changes: 32 additions & 0 deletions llvm/lib/Target/X86/X86CallingConv.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -374,5 +374,37 @@ static bool CC_X86_64_I128(unsigned &ValNo, MVT &ValVT, MVT &LocVT,
return true;
}

/// Special handling for i128 and fp128: on x86-32, i128 and fp128 get legalized
/// as four i32s, but fp128 must be passed on the stack with 16-byte alignment.
/// Technically only fp128 has a specified ABI, but it makes sense to handle
/// i128 the same until we hear differently.
static bool CC_X86_32_I128_FP128(unsigned &ValNo, MVT &ValVT, MVT &LocVT,
CCValAssign::LocInfo &LocInfo,
ISD::ArgFlagsTy &ArgFlags, CCState &State) {
assert(ValVT == MVT::i32 && "Should have i32 parts");
SmallVectorImpl<CCValAssign> &PendingMembers = State.getPendingLocs();
PendingMembers.push_back(
CCValAssign::getPending(ValNo, ValVT, LocVT, LocInfo));

if (!ArgFlags.isInConsecutiveRegsLast())
return true;

unsigned NumRegs = PendingMembers.size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this is an assertion-only variable. In non-assertion builds, this gives unused variable warnings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like somebody got this already fe1941967267e472

assert(NumRegs == 4 && "Should have two parts");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert(NumRegs == 4 && "Should have two parts");
assert(NumRegs == 4 && "Should have four parts");

Minor nit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix in #149386


int64_t Offset = State.AllocateStack(16, Align(16));
PendingMembers[0].convertToMem(Offset);
PendingMembers[1].convertToMem(Offset + 4);
PendingMembers[2].convertToMem(Offset + 8);
PendingMembers[3].convertToMem(Offset + 12);

State.addLoc(PendingMembers[0]);
State.addLoc(PendingMembers[1]);
State.addLoc(PendingMembers[2]);
State.addLoc(PendingMembers[3]);
PendingMembers.clear();
return true;
}

// Provides entry points of CC_X86 and RetCC_X86.
#include "X86GenCallingConv.inc"
5 changes: 5 additions & 0 deletions llvm/lib/Target/X86/X86CallingConv.td
Original file line number Diff line number Diff line change
Expand Up @@ -859,6 +859,11 @@ def CC_X86_32_C : CallingConv<[
// The 'nest' parameter, if any, is passed in ECX.
CCIfNest<CCAssignToReg<[ECX]>>,

// i128 and fp128 need to be passed on the stack with a higher alignment than
// their legal types. Handle this with a custom function.
CCIfType<[i32],
CCIfConsecutiveRegs<CCCustom<"CC_X86_32_I128_FP128">>>,

// On swifttailcc pass swiftself in ECX.
CCIfCC<"CallingConv::SwiftTail",
CCIfSwiftSelf<CCIfType<[i32], CCAssignToReg<[ECX]>>>>,
Expand Down
15 changes: 12 additions & 3 deletions llvm/lib/Target/X86/X86ISelLoweringCall.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -237,9 +237,18 @@ EVT X86TargetLowering::getSetCCResultType(const DataLayout &DL,
bool X86TargetLowering::functionArgumentNeedsConsecutiveRegisters(
Type *Ty, CallingConv::ID CallConv, bool isVarArg,
const DataLayout &DL) const {
// i128 split into i64 needs to be allocated to two consecutive registers,
// or spilled to the stack as a whole.
return Ty->isIntegerTy(128);
// On x86-64 i128 is split into two i64s and needs to be allocated to two
// consecutive registers, or spilled to the stack as a whole. On x86-32 i128
// is split to four i32s and never actually passed in registers, but we use
// the consecutive register mark to match it in TableGen.
if (Ty->isIntegerTy(128))
return true;

// On x86-32, fp128 acts the same as i128.
if (Subtarget.is32Bit() && Ty->isFP128Ty())
return true;

return false;
Comment on lines 237 to +251
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is the right approach; functionArgumentNeedsConsecutiveRegisters doesn't really seem like the correct thing because the type will never be passed in regesters, but I can't come up with another way to "mark" a register set to indicate that it needs custom lowering. Is there a better way to do this?

Cc @nikic since I think you rewrote the x86-64 custom lowering a couple of times.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably also match vector types somehow because _m64, __m128, __m256, and __m512 are specified to have an alignment of 8, 16, 32, and 64 bytes, respectively.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, unfortunately the CC lowering is pretty limited in the information it can access, and backends have a lot of different hacks to work around that. I'd like to improve that, but this looks like an acceptable hack for now.

}

/// Helper for getByValTypeAlignment to determine
Expand Down
Loading
Loading