Skip to content

[clang] "modular_format" attribute for functions using format strings #147431

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: users/mysterymath/modular-printf/ir
Choose a base branch
from

Conversation

mysterymath
Copy link
Contributor

@mysterymath mysterymath commented Jul 8, 2025

This provides a C language modular_format attribute. This combines with information from the existing format to set the new IR modular-format attribute.

The purpose of these attributes is to enable "modular printf". A statically linked libc can provide a modular variant of printf that only weakly references implementation routines. Regular printf would strongly reference those routines, and the compiler would transform calls with constant format strings to calls to the modular printf, along with strong references to aspect symbols that bring in those aspects.

See issue #146159 for context.

@mysterymath mysterymath requested a review from AaronBallman July 8, 2025 00:02
@mysterymath
Copy link
Contributor Author

mysterymath commented Jul 8, 2025

Sending this out as a draft to obtain some early feedback about the direction of the implemenation of the Modular Printf RFC.

Prev PR: #147429
Next PR: #147426

Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that would help me is if the PR came with tests so we could see examples of its usage. (The docs could use examples as well.) I'm having a bit of a hard time understanding the attribute and its effects.

Copy link
Collaborator

@erichkeane erichkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No real comments here. I still don't really understand what this is for, which tells me we probably need some additional work on the commit message and documentation.

@@ -2569,6 +2569,18 @@ void CodeGenModule::ConstructAttributeList(StringRef Name,

if (TargetDecl->hasAttr<ArmLocallyStreamingAttr>())
FuncAttrs.addAttribute("aarch64_pstate_sm_body");

if (auto *ModularFormat = TargetDecl->getAttr<ModularFormatAttr>()) {
// TODO: Error checking
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a heck of a TODO :) Though, I'd expect us to do diagnostics during our normal checking of the format string, so we shouldn't really require anything here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, fair; this is very much a Draft PR. My intent was to get this in front of a bunch of eyes sooner rather than later, as this PR set touches everything every layer from the compiler through to libc (skipping the linker).

@mysterymath
Copy link
Contributor Author

One thing that would help me is if the PR came with tests so we could see examples of its usage. (The docs could use examples as well.) I'm having a bit of a hard time understanding the attribute and its effects.

Very fair; I was relying a lot on the tracking issue and RFC discussion for context. I've added some meat to the PR description, and I've added a brief example to the attribute docs. I'll add a test once I get the chance.

Comment on lines +9453 to +9454
``printf(var, 42)`` would be untouched. A call to ``printf("%d", 42)`` would
become a call to ``__modular_printf`` with the same arguments, as would
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So will any call to printf with a constant format specifier string be rewritten to call __modular_printf?

Also, who is responsible for writing these attributes? Are they only in the libc implementation, or can a user write one of these themselves on their own declarations? I'm asking because I wonder about compatibility; e.g., the call dispatches to __modular_printf but that doesn't know about some particular extension being used in the format specifier and so the code appears to misbehave.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So will any call to printf with a constant format specifier string be rewritten to call __modular_printf?

That's correct.

Also, who is responsible for writing these attributes? Are they only in the libc implementation, or can a user write one of these themselves on their own declarations? I'm asking because I wonder about compatibility; e.g., the call dispatches to __modular_printf but that doesn't know about some particular extension being used in the format specifier and so the code appears to misbehave.

Users could use these for their own implementations, in particular to allow functions that e.g. wrap vsnprintf to do logging etc. As for compatibility, if the compiler understands aspect names that the implementation doesn't, there's no issue, as the compiler will not spontaneously emit them if not requested. If an implementation requests a verdict on an implementation aspect unknown to the compiler, the compiler will conservatively report that the aspect is required. The modular_format attribute provided by the code and the aspect references emitted by the compiler thus form a sort of two-phase handshake between the code and compiler.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So will any call to printf with a constant format specifier string be rewritten to call __modular_printf?

That's correct.

Good to know, thanks!

Also, who is responsible for writing these attributes? Are they only in the libc implementation, or can a user write one of these themselves on their own declarations? I'm asking because I wonder about compatibility; e.g., the call dispatches to __modular_printf but that doesn't know about some particular extension being used in the format specifier and so the code appears to misbehave.

Users could use these for their own implementations, in particular to allow functions that e.g. wrap vsnprintf to do logging etc. As for compatibility, if the compiler understands aspect names that the implementation doesn't, there's no issue, as the compiler will not spontaneously emit them if not requested. If an implementation requests a verdict on an implementation aspect unknown to the compiler, the compiler will conservatively report that the aspect is required. The modular_format attribute provided by the code and the aspect references emitted by the compiler thus form a sort of two-phase handshake between the code and compiler.

My concern is more about dispatching in ways the user may not anticipate and getting observably different behavior. e.g., the user calls printf("%I64d", 0LL) and they were getting the MSVC CRT printf call which supported that modifier but now calls __modular_printf which doesn't know about the modifier. What happens in that kind of situation?

Copy link
Contributor Author

@mysterymath mysterymath Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is more about dispatching in ways the user may not anticipate and getting observably different behavior. e.g., the user calls printf("%I64d", 0LL) and they were getting the MSVC CRT printf call which supported that modifier but now calls __modular_printf which doesn't know about the modifier. What happens in that kind of situation?

Ah, if I understand what you're getting at, that can't happen: it's explicitly out of scope for the feature.

The modular_format attribute exists to advertise to compiler that is compiling calls to a function that the implementation can be split by redirecting calls and emitting relocs to various symbols. A header file is the only plausible mechanism to tell the compiler this, and that means that the header would need to be provided by and intrinsically tied to a specific version of the implementation. Otherwise, it would be impossible to determine what aspects the implementation requires to be emitted to function correctly.

Accordingly, this feature would primarily be useful for cases where libc is statically linked in and paired with its own headers. (llvm-libc, various embedded libcs, etc.) I suppose it's technically possible to break out printf implementation parts into a family of individual dynamic libraries, but even then, any libc header set that required that the libc implementation be dynamically replaceable would not be able to include modular_format.

So, for implementations that use this feature, printf and __modular_printf would always be designed together. To avoid ever introducing two full printf implementations into the link, printf would be a thin wrapper around __modular_printf that also requests every possible aspect of the implementation. This would mean that the two could never diverge.

As an aside, this is my first time landing a RFC across so many components of LLVM. I wasn't sure how much detail to include in each change; my intuition was to try to provide links to the RFC instead. I don't want the above reasoning to get buried, and it gives me pause that it wasn't readily accessible. But I'm also not entirely sure where it should live going forward. Advice would be appreciated.

@mysterymath mysterymath force-pushed the users/mysterymath/modular-printf/ir branch from 271a63f to c2e511c Compare July 21, 2025 22:24
@mysterymath mysterymath force-pushed the users/mysterymath/modular-printf/clang branch from b358c38 to 7730cb0 Compare July 21, 2025 22:24
This provides a C language version of the new IR modular-format
attribute. This, in concert with the format attribute, allows a library
function to declare that a modular version of its implementation is
available.

See issue #146159 for context.
@mysterymath mysterymath force-pushed the users/mysterymath/modular-printf/clang branch from 7730cb0 to 0ed3487 Compare July 22, 2025 20:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants