-
Notifications
You must be signed in to change notification settings - Fork 1.6k
KEP 1645: clarify the impact of conflict condition #5463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Welcome @zhiying-lin! |
Hi @zhiying-lin. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/ok-to-test |
Makes sense to me ✨ |
@@ -991,6 +991,8 @@ not simply be merged, a `ServiceExportConflict` condition will be set on all | |||
The conflict will be resolved by assigning precedence based on each | |||
`ServiceExport`'s `creationTimestamp`, from oldest to newest.** | |||
|
|||
**Note:** When a `ServiceExport`'s conflict condition changes from `True` to `False` due to this resolution policy, runtime traffic remains unaffected. The oldest cluster will win the conflict and continue to be referenced in the `ServiceImport`, maintaining service continuity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify the last statement, specifically what you mean by "continue to be referenced in the ServiceImport"?
Also, there's a few scenarios whereby the conflict condition can transition from True
to False
:
- A conflicting cluster's service was updated so that it no longer conflicts
- A conflicting cluster's service was unexported and there's no longer any conflict
- The oldest cluster's service properties were updated and there's no longer any conflict
- The oldest cluster's service was unexported and there's no longer any conflict
I wouldn't expect the first 3 to adversely affect the derived service. 4) would entail determining the next oldest exported service as the winner and may entail a change to the derived service's properties which possibly could affect runtime traffic? So should an implementation avoid applying a change that might be disruptive, eg changing the headlessness of the derived service (K8s doesn't allow this)? I think this is what you're implying - if so, perhaps we should explicitly state it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhiying-lin was the intent describing the flip from False
to True
? Conflict
is one of those strange negative-polarity conditions where True
is the state where something is bad (but agreed in spirit with the clarification of older state remaining effective to avoid data plane disruption)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my bad, my original intent is to describe the flip from from FALSE
to TRUE
. Updated the PR to reflect that.
@tpantelis good question about case 4. Ideally the implementation should avoid applying the disruptive change, but there is no place to present such information to the user, "serviceImport remains unchanged because of xxxx"
I'm wondering how you handle today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The updates look good.
I'm wondering how you handle today.
We don't apply disruptive changes. The only property (currently) that really fits this criteria is headlessness (ie Type
). In fact, in Submariner, we don't even export a subsequent cluster's service if the Type
doesn't match that of the first exporting cluster (we set a condition indicating such).
Another interesting scenario is if the first observed exporting cluster isn't actually the oldest. Eg, the ServiceExport
for cluster1 is observed and the service exported. Some time later, cluster2 is exported but it's timestamp is actually older for some reason, in which case, cluster2 should technically take precedence and possibly change the derived service's property. However, if the Type
differs and Type
is treated as immutable and cluster2 is rejected then is that technically non-compliance with the MCS spec (in Submariner, we treat oldest as first observed in this case)? But I think the statements you added wrt maintaining service continuity should cover this case as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Cilium we are changing the type of the ServiceImport with whatever type has the oldest ServiceExport and it will disrupt traffic at least for this very specific situation.
Both sides seems to have some kind of cons as if you stick with one type you would have to tell user to delete the ServiceImport manually I guess? Also until very recently we had no ServiceImport conditions to warn that it was still stuck with a wrong type somehow 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we were to standardize Headless-ness change to be like Submariner is doing right now I would probably expect some sentence specifically talking about that in the KEP I think though.
Also any other change to other fields could be disruptive in some form (although probably less impactful than Headless-ness). So I am not sure if the current phrasing in this PR helps that much 🤔
EDIT: ah woops I mis-interpreted the PR text (confused True
meaning with False
) it's fine as it uses "may" for things that could change some fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, there's no perfect solution here. using "may" is to leave some rooms for different implementations.
/lgtm |
@zhiying-lin could you please squash your commits? |
21b4371
to
f4a0f06
Compare
New changes are detected. LGTM label has been removed. |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an important clarification of the behvaior of the system when we have conflicts.
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: MrFreezeex, ryanzhang-oss, zhiying-lin The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
The impact of a conflicting service on existing imported services isn’t explicit in the KEP; it’s possible to understand the conflict resolution process as implying that ServiceImports are constructed based on the resolution order, but it would be useful to make that explicit in the KEP