Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix- When removing all services from a cluster deployment the services remain deployed #1141

Merged
merged 4 commits into from
Mar 7, 2025

Conversation

kylewuolle
Copy link
Contributor

@kylewuolle kylewuolle commented Feb 28, 2025

This simple fix resolves an issue where the services aren't removed by sveltos when the services list is empty / missing in the cluster deployment after previously being deployed. The fix is to remove the sveltos profile which triggers the adon controller to un-deploy all the helm charts associated with the profile.

Testing performed:

  1. Create cluster deployment with nginx/kyverno installed
  2. Remove nginx and apply the cluster deployment yaml
  3. nginx is uninstalled as expected
  4. Remove kyverno and apply the yaml file again and now kyverno is uninstalled. Previous to this fix kyverno would have remained.

Closes #1132

Copy link
Contributor

@zerospiel zerospiel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not find the suggested approach sustainable, nor does it fit into the controllers pattern: the Profile is being created unconditionally several LOC above and then it gets almost immediately removed. I'd suggest adding extra logic to assert should we create a Profile instance or not and/or update the instance accordingly to the given services in the .spec.

@zerospiel zerospiel requested a review from wahabmk March 3, 2025 11:21
@wahabmk
Copy link
Contributor

wahabmk commented Mar 3, 2025

This needs to be corrected for the MultiClusterService as well. Both of them call the sveltos.GetHelmChartOpts function so probably that's a reasonably place to fix. BTW looking at the following loop within this function:

for _, svc := range services {

I am trying to understand why we are seeing this bug. The way I understand it is that if len(services) == 0, then the opts returned by this function will also be empty. So when opts is passed onto the ReconcileClusterProfile function, the resulting ClusterProfile spec should also have .spec.HelmCharts=[]. So when this spec is finally reconciled, shouldn't that have the effect of updating the ClusterProfile object to have 0 helm charts?

Why is that not happening?

@kylewuolle
Copy link
Contributor Author

This needs to be corrected for the MultiClusterService as well. Both of them call the sveltos.GetHelmChartOpts function so probably that's a reasonably place to fix. BTW looking at the following loop within this function:

for _, svc := range services {

I am trying to understand why we are seeing this bug. The way I understand it is that if len(services) == 0, then the opts returned by this function will also be empty. So when opts is passed onto the ReconcileClusterProfile function, the resulting ClusterProfile spec should also have .spec.HelmCharts=[]. So when this spec is finally reconciled, shouldn't that have the effect of updating the ClusterProfile object to have 0 helm charts?

Why is that not happening?

It appears that sveltos doesn't handle this situation right (or maybe it does depending on how you look at things). I guess basically they assume that if you want to remove the last helm chart you delete the profile.

@kylewuolle
Copy link
Contributor Author

I do not find the suggested approach sustainable, nor does it fit into the controllers pattern: the Profile is being created unconditionally several LOC above and then it gets almost immediately removed. I'd suggest adding extra logic to assert should we create a Profile instance or not and/or update the instance accordingly to the given services in the .spec.

It actually creates or updates if you look into the ReconcileProfile function.

@kylewuolle kylewuolle requested a review from zerospiel March 3, 2025 21:38
@zerospiel
Copy link
Contributor

I do not find the suggested approach sustainable, nor does it fit into the controllers pattern: the Profile is being created unconditionally several LOC above and then it gets almost immediately removed. I'd suggest adding extra logic to assert should we create a Profile instance or not and/or update the instance accordingly to the given services in the .spec.

It actually creates or updates if you look into the ReconcileProfile function.

I understand that, my point was to add logic to skip creation if it should not be created at all and update the conditions accordingly. The current approach is creating, immediately removing, and repeating the same. So I discourage such API calls, especially if one thinks about not a single managed object, but at least dozens or hundreds.

And please address that @wahabmk mentioned above:

This needs to be corrected for the MultiClusterService as well.

Copy link
Contributor

@zerospiel zerospiel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes, details are here

@wahabmk
Copy link
Contributor

wahabmk commented Mar 4, 2025

Circling back to my comment, I remember testing some time back that the Profile/ClusterProfile object gets created with 0 HelmCharts when services = 0, which is why I made the following comment:

// NOTE: The Profile/ClusterProfile object will be updated with
// no helm charts if len(mc.Spec.Services) == 0. This will result
// in the helm charts being uninstalled on matching clusters if
// Profile/ClusterProfile originally had len(m.Spec.Sevices) > 0.

The "Making services list empty" section of #362 (where it was first written) verifies that this is the case. I wonder why this is not happening anymore, or are we missing something? Sveltos as of v0.46.1 still allows creation of Profile/ClusterProfile with 0 HelmCharts:

~ kubectl get profiles.config.projectsveltos.io kyverno -o yaml -n kcm-system 
apiVersion: config.projectsveltos.io/v1beta1
kind: Profile
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"config.projectsveltos.io/v1beta1","kind":"Profile","metadata":{"annotations":{},"name":"kyverno","namespace":"kcm-system"},"spec":{"clusterSelector":{"matchLabels":{"env":"prod"}}}}
  creationTimestamp: "2025-03-04T12:35:32Z"
  finalizers:
  - profilefinalizer.projectsveltos.io
  generation: 2
  name: kyverno
  namespace: kcm-system
  resourceVersion: "8571"
  uid: 60ad175d-81f6-4d7a-ae9e-9bbfc553e1eb
spec:
  clusterSelector:
    matchLabels:
      env: prod
  continueOnConflict: false
  continueOnError: false
  reloader: false
  stopMatchingBehavior: WithdrawPolicies
  syncMode: Continuous
  tier: 100

@zerospiel
Copy link
Contributor

zerospiel commented Mar 4, 2025

@wahabmk, there is probably just an error calculating the statuses, and that is that simple because I was purely relying on the clusterdeployments status field. At the moment, I can clearly see that according to the code, given no services, nothing will update the status, hence it will be stale.

So, here is the following remark in the issue:

Extra: please validate whether a service removal also leads to its cleanup on the corresponding cluster (e.g., having at least 2 services, and then removing one entry from the list should lead to removal of the former).

@wahabmk
Copy link
Contributor

wahabmk commented Mar 4, 2025

@wahabmk, there is probably just an error calculating the statuses, and that is that simple because I was purely relying on the clusterdeployments status field. At the moment, I can clearly see that according to the code, given no services, nothing will update the status, hence it will be stale.

So, here is the following remark in the issue:

Extra: please validate whether a service removal also leads to its cleanup on the corresponding cluster (e.g., having at least 2 services, and then removing one entry from the list should lead to removal of the former).

Yes, that makes sense. So it's probably not a problem of services not being removed, but that the ClusterDeployment and MultiClusterService objects still show stale status, whereas, they should show empty status for helm charts. If so, then it is likely due to this loop in updateServicesStatus().

@zerospiel
Copy link
Contributor

zerospiel commented Mar 4, 2025

Exactly! I am glad we'd localized the problem, so ultimately it is the error of calculating the status. There are several approaches for the conditions, the default one is to reset the conditions at the start of the reconcile loop and then recalculate it. @kylewuolle could you please make the corresponding amends regarding the comments above?

And I assume that should be done for both mcs and cld

@kylewuolle
Copy link
Contributor Author

OK so it turns out I was wrong! If you remove the helm charts from the profile it does uninstall properly at least in the latest version of sveltos. I've changed the PR to be just about ensuring the status is updated properly in the case of removal of all the helm charts.

@kylewuolle kylewuolle requested a review from zerospiel March 5, 2025 22:35
zerospiel
zerospiel previously approved these changes Mar 6, 2025
Copy link
Contributor

@zerospiel zerospiel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please fix the linter

…e sveltos Profile which will trigger the addon controller to remove the installed services.
…ame logic for multiservice and also added check for service length so the status is updated properly when services are removed completely
@zerospiel zerospiel merged commit fb186aa into k0rdent:main Mar 7, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Complete Services removal does not lead to its actual removal
3 participants