Skip to content

KEP 1645: clarify the impact of conflict condition #5463

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

zhiying-lin
Copy link

  • One-line PR description: clarify the impact of conflict condition
  • Other comments:
    The impact of a conflicting service on existing imported services isn’t explicit in the KEP; it’s possible to understand the conflict resolution process as implying that ServiceImports are constructed based on the resolution order, but it would be useful to make that explicit in the KEP

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory labels Jul 30, 2025
@k8s-ci-robot k8s-ci-robot requested review from JeremyOT and skitt July 30, 2025 09:56
@k8s-ci-robot k8s-ci-robot added the sig/multicluster Categorizes an issue or PR as relevant to SIG Multicluster. label Jul 30, 2025
@k8s-ci-robot
Copy link
Contributor

Welcome @zhiying-lin!

It looks like this is your first PR to kubernetes/enhancements 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/enhancements has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jul 30, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @zhiying-lin. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Jul 30, 2025
@skitt
Copy link
Member

skitt commented Jul 30, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 30, 2025
@michaelawyu
Copy link

Makes sense to me ✨

@skitt
Copy link
Member

skitt commented Jul 30, 2025

/cc @tpantelis @MrFreezeex

@@ -991,6 +991,8 @@ not simply be merged, a `ServiceExportConflict` condition will be set on all
The conflict will be resolved by assigning precedence based on each
`ServiceExport`'s `creationTimestamp`, from oldest to newest.**

**Note:** When a `ServiceExport`'s conflict condition changes from `True` to `False` due to this resolution policy, runtime traffic remains unaffected. The oldest cluster will win the conflict and continue to be referenced in the `ServiceImport`, maintaining service continuity.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify the last statement, specifically what you mean by "continue to be referenced in the ServiceImport"?

Also, there's a few scenarios whereby the conflict condition can transition from True to False:

  1. A conflicting cluster's service was updated so that it no longer conflicts
  2. A conflicting cluster's service was unexported and there's no longer any conflict
  3. The oldest cluster's service properties were updated and there's no longer any conflict
  4. The oldest cluster's service was unexported and there's no longer any conflict

I wouldn't expect the first 3 to adversely affect the derived service. 4) would entail determining the next oldest exported service as the winner and may entail a change to the derived service's properties which possibly could affect runtime traffic? So should an implementation avoid applying a change that might be disruptive, eg changing the headlessness of the derived service (K8s doesn't allow this)? I think this is what you're implying - if so, perhaps we should explicitly state it.

Copy link
Member

@mikemorris mikemorris Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhiying-lin was the intent describing the flip from False to True? Conflict is one of those strange negative-polarity conditions where True is the state where something is bad (but agreed in spirit with the clarification of older state remaining effective to avoid data plane disruption)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my bad, my original intent is to describe the flip from from FALSE to TRUE. Updated the PR to reflect that.

@tpantelis good question about case 4. Ideally the implementation should avoid applying the disruptive change, but there is no place to present such information to the user, "serviceImport remains unchanged because of xxxx"

I'm wondering how you handle today.

Copy link
Contributor

@tpantelis tpantelis Jul 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updates look good.

I'm wondering how you handle today.

We don't apply disruptive changes. The only property (currently) that really fits this criteria is headlessness (ie Type). In fact, in Submariner, we don't even export a subsequent cluster's service if the Type doesn't match that of the first exporting cluster (we set a condition indicating such).

Another interesting scenario is if the first observed exporting cluster isn't actually the oldest. Eg, the ServiceExport for cluster1 is observed and the service exported. Some time later, cluster2 is exported but it's timestamp is actually older for some reason, in which case, cluster2 should technically take precedence and possibly change the derived service's property. However, if the Type differs and Type is treated as immutable and cluster2 is rejected then is that technically non-compliance with the MCS spec (in Submariner, we treat oldest as first observed in this case)? But I think the statements you added wrt maintaining service continuity should cover this case as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Cilium we are changing the type of the ServiceImport with whatever type has the oldest ServiceExport and it will disrupt traffic at least for this very specific situation.

Both sides seems to have some kind of cons as if you stick with one type you would have to tell user to delete the ServiceImport manually I guess? Also until very recently we had no ServiceImport conditions to warn that it was still stuck with a wrong type somehow 🤔

Copy link
Member

@MrFreezeex MrFreezeex Jul 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we were to standardize Headless-ness change to be like Submariner is doing right now I would probably expect some sentence specifically talking about that in the KEP I think though.

Also any other change to other fields could be disruptive in some form (although probably less impactful than Headless-ness). So I am not sure if the current phrasing in this PR helps that much 🤔

EDIT: ah woops I mis-interpreted the PR text (confused True meaning with False) it's fine as it uses "may" for things that could change some fields.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, there's no perfect solution here. using "may" is to leave some rooms for different implementations.

@tpantelis
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 31, 2025
@skitt
Copy link
Member

skitt commented Jul 31, 2025

@zhiying-lin could you please squash your commits?

@k8s-ci-robot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 1, 2025
@zhiying-lin
Copy link
Author

@zhiying-lin could you please squash your commits?
done

Copy link
Contributor

@ryanzhang-oss ryanzhang-oss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an important clarification of the behvaior of the system when we have conflicts.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: MrFreezeex, ryanzhang-oss, zhiying-lin
Once this PR has been reviewed and has the lgtm label, please assign jeremyot for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory ok-to-test Indicates a non-member PR verified by an org member that is safe to test. sig/multicluster Categorizes an issue or PR as relevant to SIG Multicluster. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants