Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add conformance tests for EndpointSlices #79

Merged
merged 1 commit into from
Oct 31, 2024

Conversation

tpantelis
Copy link
Contributor

@tpantelis tpantelis commented Oct 2, 2024

Look for an MCS EndpointSlice using the criteria whereby the endpointslice.kubernetes.io/managed-by label is not equal to endpointslice-controller.k8s.io or any MCS label is present. If not found then fail the test as non-conformant. Otherwise verify the required MCS labels are correctly set.

The tests are labeled with "EndpointSlice" so, if EndpointSlices aren't supported by the implementation, the user can use the "EndpointSlice" label to filter out the tests.

Fixes #70

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 2, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @tpantelis. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 2, 2024
@tpantelis
Copy link
Contributor Author

/cc @skitt

@tpantelis
Copy link
Contributor Author

/cc @MrFreezeex When you get a chance, could you test this with Cilium (assuming Cilium uses EndpointSlices)? I've tested it with Submariner but Submariner doesn't add the ServiceImport ownership (currently, I'll be adding that at some point) so I haven't tested the ServiceImport ownership check passing.

@MrFreezeex
Copy link
Member

MrFreezeex commented Oct 3, 2024

/cc @MrFreezeex When you get a chance, could you test this with Cilium (assuming Cilium uses EndpointSlices)? I've tested it with Submariner but Submariner doesn't add the ServiceImport ownership (currently, I'll be adding that at some point) so I haven't tested the ServiceImport ownership check passing.

Actually we don't do that too, our EndpointSlice have an OwnerReference to a Service which have an OwnerReference to the actual ServiceImport. We are doing it this way because while the EndpointSlice synchronization was done with MCS-API in mind it is actually tied to the annotation system of global Services (aka the way we are doing multicluster networking without MCS-API) so that even non MCS-API users can use this feature (as we are the CNI + replacing kube-proxy before this EndpointSlice synchronization feature we were keeping the remote endpoints internally and not exposing it anywhere on the Kubernetes API).

I mentioned that two sig meeting ago as while reading the associated issue to this I thought the OwnerReference was explicitly stated as something mandatory in the KEP and people that were on the call were all agreeing that the KEP shoudn't mandate to have an OwnerReference to a ServiceImport as long as there is any cleanup mechanism in place and that the conformance tests may check that EndpointSlices are cleaned up correctly but not necessarily in what way it does this. But after that call when I wanted to PR the KEP, I realized that it wasn't technically stated explicitly in the KEP actually x).

@tpantelis
Copy link
Contributor Author

I mentioned that two sig meeting ago as while reading the associated issue to this I thought the OwnerReference was explicitly stated as something mandatory in the KEP and people that were on the call were all agreeing that the KEP shoudn't mandate to have an OwnerReference to a ServiceImport as long as there is any cleanup mechanism in place and that the conformance tests may check that EndpointSlices are cleaned up correctly but not necessarily in what way it does this.

OK - I can remove the OwnerReference check. That would help Submariner compliance too since we don't use it for cleanup either 😄 I'll expand the test to un-export and ensure the EPS is cleaned up.

But after that call when I wanted to PR the KEP, I realized that it wasn't technically stated explicitly in the KEP actually x).

I think it is in this section at the end of the second paragraph: "EndpointSlices will have an owner reference to their associated ServiceImport.". I can push a PR to remove that line.

@tpantelis
Copy link
Contributor Author

I can push a PR to remove that line.

kubernetes/enhancements#4900

@tpantelis
Copy link
Contributor Author

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 8, 2024
@tpantelis
Copy link
Contributor Author

/test pull-mcs-api-verify

@tpantelis
Copy link
Contributor Author

/cc @skitt @MrFreezeex

eps := &list.Items[i]

if slices.IndexFunc(eps.OwnerReferences, func(r metav1.OwnerReference) bool {
return r.Kind == "Service"
Copy link
Member

@MrFreezeex MrFreezeex Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As our owner reference for those endpointslices in Cilium is a Service too this wouldn't work for us. IIUC this check is to ensure that we don't end picking up regular/non multi cluster endpointslice so could this be replaced by checking the label discoveryv1.LabelManagedBy (like it's already checked above actually) perhaps 🙏?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That leads to a Catch-22 — one of the purposes of the test is to verify that EndpointSlices created by the MCS controller have the appropriate “managed by” label 😉. Given the leniency of the spec though I suspect we won’t be able to come up with a guaranteed test. Perhaps checking that the “managed by” label isn’t endpointslice-controller.k8s.io is good enough: if an MCS controller uses that label value, it’s asking for trouble anyway, so it’s fair game to ignore such EndpointSlices IMO; and we can specify that the conformance suite shouldn’t be run with non-default EndpointSlice controllers. Another option would be to allow the user to specify the “managed-by” value to check for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going by the original statement in the spec, "EndpointSlices will have an owner reference to their associated ServiceImport", but that's been since removed. I didn't check for the endpointslice.kubernetes.io/managed-by label here in order to verify this statement in the spec, "These EndpointSlice objects will be marked as managed by the clusterset service controller, so that the endpoint slice controller doesn’t delete them". But I guess checking that the endpointslice.kubernetes.io/managed-by label is not equal to endpointslice-controller.k8s.io is the only reliable way to distinguish a K8s EndpointSlice from an MCS one.

Copy link
Contributor Author

@tpantelis tpantelis Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skitt Our comments crossed. It sounds like we concluded the same thing re: checking endpointslice.kubernetes.io/managed-by here.

Another option is to fail the test as non-conformant if we don't find an EndpointSlice not managed by endpointslice-controller.k8s.io. If the implementation doesn't support MCS EndpointSlices then the user can filter out the test via the EndpointSlice Ginkgo label. But the test is also labeled as Optional so, by default, it would effectively be treated as required which could be viewed as inconsistent although it's just a categorization for the report.

@skitt @MrFreezeex WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds about right that was also very similar to what I had in mind too, thanks 🙏!

Copy link
Contributor Author

@tpantelis tpantelis Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK so the MCS labels get propagated to the local EPS by the K8s EPS controller via the service labels. In this case, Cilium is still compliant b/c the EPS isn't actually created/managed by an MCS controller although your kubernetes/enhancements#4930 PR further clarifies it in the spec.

So there's really no way to tell if the EPS was created by an MCS controller or the K8s controller, hence cannot verify the endpointslice.kubernetes.io/managed-by label. We could add a flag to try to guide the test as to who manages the EPS but Cilium muddies that further b/c some are MCS-managed and some not. So I think the only thing we can search for is the presence of at least one MCS label.

BTW, with #81 we could verify the value of the multicluster.kubernetes.io/source-cluster label although I'm not entirely sure if the cluster name extracted from the kubeconfig is guaranteed to match what the MCS implementation views as the cluster name (@skitt ).

Copy link
Member

@MrFreezeex MrFreezeex Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK so the MCS labels get propagated to the local EPS by the K8s EPS controller via the service labels. In this case, Cilium is still compliant b/c the EPS isn't actually created/managed by an MCS controller although your kubernetes/enhancements#4930 PR further clarifies it in the spec.

So there's really no way to tell if the EPS was created by an MCS controller or the K8s controller, hence cannot verify the endpointslice.kubernetes.io/managed-by label. We could add a flag to try to guide the test as to who manages the EPS but Cilium muddies that further b/c some are MCS-managed and some not. So I think the only thing we can search for is the presence of at least one MCS label.

Yeah I am thinking that the managed-by check could be an opt-out check somehow. IMO this should be fine as is, Cilium is probably the only one doing that and to integrate this test to Cilium correctly I would also have to change a bit the test because endpointslice syncing is opt-in in Cilium so I would probably need to add a way to hook into the Service creation or some kind of argument . This means that I could also add something to optionally ignore the managed-by check at the ~ same time.

BTW, with #81 we could verify the value of the multicluster.kubernetes.io/source-cluster label although I'm not entirely sure if the cluster name extracted from the kubeconfig is guaranteed to match what the MCS implementation views as the cluster name (@skitt ).

What's in kubeconfig is kind of free form so I would guess that you could provide a kubeconfig that is different on many implementations x). Although the conformance test could probably just assume that the user provides something that actually match if it's clearly stated somewhere 🤷‍♂️.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this should be fine as is

I assume you mean the way it is now in my latest commit even tho it would fail on Cilium. Of course, you could always filter out this test since it's optional.

endpointslice syncing is opt-in in Cilium

We have a similar situation with Submariner re: ServiceImport VIP. By default we don't set this so the test fails. However we recently added an option to set the VIP so when running the conformance tests we set the option. However this causes the DNS test to fail as we rely on some external component to handle that. But the DNS test is optional so I can filter it out 😄 I assume you could do similar, ie filter out the EPS test according to your opt-in setting.

add something to optionally ignore the managed-by check at the ~ same time.

I could do that with this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, with #81 we could verify the value of the multicluster.kubernetes.io/source-cluster label although I'm not entirely sure if the cluster name extracted from the kubeconfig is guaranteed to match what the MCS implementation views as the cluster name (@skitt ).

What's in kubeconfig is kind of free form so I would guess that you could provide a kubeconfig that is different on many implementations x). Although the conformance test could probably just assume that the user provides something that actually match if it's clearly stated somewhere 🤷‍♂️.

Yup, the cluster name in the kubeconfig is only relevant on the client, it doesn’t have to be related to the cluster id used by the MCS controller.

Copy link
Contributor Author

@tpantelis tpantelis Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a skip-verify-eps-managed-by CLI flag defaulting to false. if true, it still checks that the endpointslice.kubernetes.io/managed-by label is present but does not check that value.

@tpantelis tpantelis force-pushed the eps_conformance branch 2 times, most recently from b05a597 to dd51285 Compare October 25, 2024 12:41
Look for an MCS EndpointSlice using the criteria whereby the
"endpointslice.kubernetes.io/managed-by" label is not equal to
endpointslice-controller.k8s.io" or any MCS label is present.
If not found then fail the test as non-conformant. Otherwise
verify the required MCS labels are correctly set.

The tests are labeled with "EndpointSlice" so, if EndpointSlices
aren't supported by the implmentation, the user can use the
"EndpointSlice" label to filter out the tests.

Fixes kubernetes-sigs#70

Signed-off-by: Tom Pantelis <[email protected]>
Copy link
Member

@MrFreezeex MrFreezeex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for bearing with me that looks great 🙇
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 25, 2024
@tpantelis
Copy link
Contributor Author

Thanks a lot for bearing with me that looks great 🙇 /lgtm

@skitt ? 🙏

@skitt
Copy link
Member

skitt commented Oct 31, 2024

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: MrFreezeex, skitt, tpantelis

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 31, 2024
@k8s-ci-robot k8s-ci-robot merged commit 3e8e407 into kubernetes-sigs:master Oct 31, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Conformance test: EndpointSlices
4 participants