Skip to content

[RFC] Provide custom stacks functionality #1251

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

FloThinksPi
Copy link
Member

@FloThinksPi FloThinksPi commented Jul 16, 2025

Click Here for a better reviewable/readable version.

Related RFC-0040

@beyhan beyhan requested review from a team, rkoster, beyhan, Gerg, stephanme and cweibel and removed request for a team July 16, 2025 15:32
@beyhan beyhan added toc rfc CFF community RFC labels Jul 16, 2025
Copy link

@cweibel cweibel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the the forced migration to cflinuxfs4 was hard (and sounds like some folks have not made that jump yet), unless the desired stack is being maintained in some way I would be cautious in allowing that stack to be the default.

I do like the idea of being able to natively support alternate stacks (in our case creating a "hardened" cflinuxfs4 stack) but for every additional stack provided to customers we need to make sure smoke/acceptance tests still pass

provided one or a remote one by checking if the stack is an exact match
in the stacks table(it already does this to check validity of the
manifest/request) and if it's not an exact match try to evaluate it as
remote container image reference. If it does not match the container url schema produce a error message.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make our compliance folks happy, it would nice to generate a checksum or similar so that the stack image used is one which has already been scanned and allowed for by the operators (instead of blindly relying on a url which could have changes/updates/injections which would be hard to spot)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already part of the docker features. Cf has a way to reference images https://docs.cloudfoundry.org/devguide/deploy-apps/push-docker.html
The tag can be a digest already as of today - its a bit hidden in the docs as its called version there.
This RFC just uses whats already there so also this feature.

@FloThinksPi
Copy link
Member Author

I do like the idea of being able to natively support alternate stacks (in our case creating a "hardened" cflinuxfs4 stack) but for every additional stack provided to customers we need to make sure smoke/acceptance tests still pass

We explicitly do not have to assure that! We only have to do it for system stacks that are shipped as part of cf-deployment. Similar how we do it with buildpacks we only test the buildpacks we ship in cf-deployment. If a customer uses a custom buildpack(here this feature already exists for years) and thereby takes ownership of the buildpack he uses https://docs.cloudfoundry.org/buildpacks/custom.html then its his obligation to make sure it is compatible with the system stack anyway already.
With custom stacks, as written in the rfc, we require to use also a custom buildpack. Its not possible to use a custom stack with a system buildpack. Thus the app developer can take over full ownership of this stack - if he requires that for whatever reason - similar as he could partially with the custom buildpacks. It is an optional thing for a CF user to opt in to do that and in no circumstance is CF Community obliged to test a custom buildpack nor a custom stack.

Copy link
Member

@beyhan beyhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow the RFC draft creation process and change the name of the file to rfc-draft-provide-custom-stacks-functionality.md because our automation generates and assigns the RFC number when it is accepted and merged.

@beyhan beyhan moved this from Inbox to In Progress in CF Community Jul 22, 2025
@beyhan beyhan self-requested a review July 23, 2025 09:03
@Gerg
Copy link
Member

Gerg commented Jul 29, 2025

Observation: This RFC introduces a platform dependency on an external container registry, if you want to use this feature. While a registry dependency has existed for Docker/CNB lifecycles, this would be a first for regular buildpacks. This may be an adoption barrier, since these buildpack apps wouldn't require a registry until their stack is removed.

As a thought exercise, I could imagine an alternate implementation where the app stack is a tar file that is uploaded to the CAPI blobstore.

@FloThinksPi
Copy link
Member Author

Observation: This RFC introduces a platform dependency on an external container registry, if you want to use this feature. While a registry dependency has existed for Docker/CNB lifecycles, this would be a first for regular buildpacks. This may be an adoption barrier, since these buildpack apps wouldn't require a registry until their stack is removed.

As a thought exercise, I could imagine an alternate implementation where the app stack is a tar file that is uploaded to the CAPI blobstore.

True, i covered this in https://github.com/cloudfoundry/community/pull/1251/files#diff-b9b4cb8a848bbbf5ae034e92f8810d910e77b55cdb30882d8c111fe7f19db8bdR358-R364

Since fixing the availabillity issues for docker lifecycle is another big topic the idea was to propose another RFC for that specifically :)
As long as this is not fixed we might can add to this RFC that this feature flag(which is defaulted to off) is experimental due to this reason.

@beyhan
Copy link
Member

beyhan commented Jul 30, 2025

Observation: This RFC introduces a platform dependency on an external container registry, if you want to use this feature. While a registry dependency has existed for Docker/CNB lifecycles, this would be a first for regular buildpacks. This may be an adoption barrier, since these buildpack apps wouldn't require a registry until their stack is removed.
As a thought exercise, I could imagine an alternate implementation where the app stack is a tar file that is uploaded to the CAPI blobstore.

True, i covered this in https://github.com/cloudfoundry/community/pull/1251/files#diff-b9b4cb8a848bbbf5ae034e92f8810d910e77b55cdb30882d8c111fe7f19db8bdR358-R364

Since fixing the availabillity issues for docker lifecycle is another big topic the idea was to propose another RFC for that specifically :) As long as this is not fixed we might can add to this RFC that this feature flag(which is defaulted to off) is experimental due to this reason.

To my understanding @Gerg concern isn’t about whether registries are reliable, but about whether we should introduce that external dependency at all. Introducing a registry can impact adoption. One additional use case for this could be an air-gapped environment where teams will be forced to maintain a private registry in case they would like to use this feature.. That extra infrastructure brings operational overhead and complexity, whereas a solution that relies solely on the Cloud Foundry components already in place works out of the box, behaves predictably in both connected and disconnected environments.

Copy link
Member

@beyhan beyhan Aug 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also change the folder of this file to rfc-draft-provide-custom-stacks-functionality because it is not yet clear which number will be assigned to this RFC and it will be renamed when the number is generated after approval and merge.

@Gerg
Copy link
Member

Gerg commented Aug 5, 2025

Observation: This RFC introduces a platform dependency on an external container registry, if you want to use this

...

To my understanding @Gerg concern isn’t about whether registries are reliable, but about whether we should introduce that external dependency at all. Introducing a registry can impact adoption. One additional use case for this could be an air-gapped environment where teams will be forced to maintain a private registry in case they would like to use this feature.. That extra infrastructure brings operational overhead and complexity, whereas a solution that relies solely on the Cloud Foundry components already in place works out of the box, behaves predictably in both connected and disconnected environments.

As an example scenario:

I'm a Cloud Foundry operator, and I have a number of apps running on cflinuxfs3 in my environment. I want to move over to cflinuxfs4 only, for security reasons and to keep up-to-date with the latest CF releases. I only support traditional buildpack apps on my environment (no Docker, no CNB).

Currently, my only option is to force-update them to cflinuxfs4, using something like Stack Auditor (assuming I can't get the app devs to do it), which isn't guaranteed to work.

This RFC gives me another option, but only if I have access to a container registry in my environment. I can't use public registries (e.g. Docker Hub) for security/compliance reasons. So, I'd now have to deploy/operate my own private registry in order to use this feature (and only for this feature), which is a significant barrier to entry.


It could be that most/all CF operators would already have a container registry, either for CF Docker apps, or for other platforms (e.g. Kubernetes). Maybe the above scenario is too much of an edge case to worry about in 2025.

Alternatively, this could be evidence that CF should start including a container registry as part of the "batteries included" experience (similar to how we include the WebDAV blobstore). Though, in this particular case, I'm not sure it buys us much over just storing the stacks in the existing blobstore.

@Gerg
Copy link
Member

Gerg commented Aug 5, 2025

If the primary use case is for stack migration, I could imagine a UX where the stack is automatically persisted for the app, behind the scenes. Something like:

$ cf freeze-app-stack my-app

That command would take a snapshot of the stack currently used by the app and copy it to the CC blobstore. In the future, the app will use the "frozen" app stack, until the app is updated to use a regular stack.

This makes the stack migration use case more seamless, but it doesn't support use cases like app developers running custom stacks (which could be a good thing 🤔).

@rkoster
Copy link
Contributor

rkoster commented Aug 6, 2025

If the primary use case is for stack migration, I could imagine a UX where the stack is automatically persisted for the app, behind the scenes. Something like:

$ cf freeze-app-stack my-app

That command would take a snapshot of the stack currently used by the app and copy it to the CC blobstore. In the future, the app will use the "frozen" app stack, until the app is updated to use a regular stack.

This makes the stack migration use case more seamless, but it doesn't support use cases like app developers running custom stacks (which could be a good thing 🤔).

I like the idea from a UX point of view, but maybe it should go even further and just freeze the whole app, meaning droplet + stack. Just freezing the stack won't help much of system buildpacks have removed support for that stack.

Basically you are taking an existing app and making an OCI image out of the stack + droplet, but store it in the blobstore instead of an OCI registry.

@beyhan
Copy link
Member

beyhan commented Aug 6, 2025

If the primary use case is for stack migration, I could imagine a UX where the stack is automatically persisted for the app, behind the scenes. Something like:

$ cf freeze-app-stack my-app

That command would take a snapshot of the stack currently used by the app and copy it to the CC blobstore. In the future, the app will use the "frozen" app stack, until the app is updated to use a regular stack.
This makes the stack migration use case more seamless, but it doesn't support use cases like app developers running custom stacks (which could be a good thing 🤔).

I like the idea from a UX point of view, but maybe it should go even further and just freeze the whole app, meaning droplet + stack. Just freezing the stack won't help much of system buildpacks have removed support for that stack.

Basically you are taking an existing app and making an OCI image out of the stack + droplet, but store it in the blobstore instead of an OCI registry.

I have concerns about freezing the app at this stage, as it would prevent any updates until the migration to the next technology stack is complete. In my experience, teams typically need the flexibility to continue updating and maintaining their applications throughout the migration process, rather than having a hard freeze in place.

@stephanme
Copy link
Member

In a standard cf-deployment, you have the last 5 droplets as history. This history is kept if staging fails because of missing system buildpacks and/or because the stack was disabled (see #1220, disabled = apps continue to run but can't be staged anymore).

If a user has ignored the deprecation timeline and all announcements (happens only too often) but still "needs the flexibility to continue updating and maintaining their applications throughout the migration process", the user needs to do something with the app before the app can be staged again:

  • configure a custom stack
  • configure a custom buildpack if the buildpacks for the old stacks got already removed from the system buildpacks

The main use case that I see for custom stacks it to provide a rather quick solution for a user escalation. I don't think that this has to be effortless for the user nor does it have to provide the same nice experience as CF usually provides for buildpack apps. And I don't see this as a permanent solution for apps to use old stacks - just as a workaround.

That said, I think a registry based solution for custom stacks is good enough for the stack migration use case. Maintaining the custom stack in blobstore (e.g. via cf freeze-app-stack my-app or a dedicated custom stack upload) could be the next iteration and address more use cases than stack migration.

@beyhan
Copy link
Member

beyhan commented Aug 13, 2025

In a standard cf-deployment, you have the last 5 droplets as history. This history is kept if staging fails because of missing system buildpacks and/or because the stack was disabled (see #1220, disabled = apps continue to run but can't be staged anymore).

If a user has ignored the deprecation timeline and all announcements (happens only too often) but still "needs the flexibility to continue updating and maintaining their applications throughout the migration process", the user needs to do something with the app before the app can be staged again:

  • configure a custom stack
  • configure a custom buildpack if the buildpacks for the old stacks got already removed from the system buildpacks

The main use case that I see for custom stacks it to provide a rather quick solution for a user escalation. I don't think that this has to be effortless for the user nor does it have to provide the same nice experience as CF usually provides for buildpack apps. And I don't see this as a permanent solution for apps to use old stacks - just as a workaround.

That said, I think a registry based solution for custom stacks is good enough for the stack migration use case. Maintaining the custom stack in blobstore (e.g. via cf freeze-app-stack my-app or a dedicated custom stack upload) could be the next iteration and address more use cases than stack migration.

I think shrinking the scope only for the migration use case in this RFC and having this hidden behind a feature flag will leave enough options to evolve the feature in the future or don't use the current state.


##### CF API Changes

First of all the CF API SHOULD add a new feature flag similar to the `diego_docker` feature flag that allows to enable the use of lifecycle docker container images. This flag SHOULD be called `diego_custom_stacks` and be disabled by default in the CF API.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rfc CFF community RFC toc
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

6 participants