Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[0.9] [SURE-8550] drift detection is generating secrets without cleaning #2515

Closed
kkaempf opened this issue Jun 13, 2024 · 3 comments
Closed
Assignees
Milestone

Comments

@kkaempf
Copy link
Collaborator

kkaempf commented Jun 13, 2024

SURE-8550

Issue description:

When enabling Self Healing (drift detection) Fleet will generate a new secret every time drift is detected. To a point where it might exhaust Rancher.
Fleet 0.9.4

Business impact:

For the customer Rancher went down due to too many secrets being cached

Troubleshooting steps:

Disabling self healing will clean the secrets

Repro steps:

  • create a git repo with a simple deployment (e.g. https://github.com/rbreddy/bundledependency/tree/main/hello-world)
  • enable self healing
  • scale the Deployment resource up (e.g. from 1 to 2 replicas), and observe how it's automatically reverted.
  • multiple secrets (type helm.sh/release.v1) will be created in the namespace for the deployment
    • This can also be checked using helm history commands in the target namespace and specifying the Helm release name.

Workaround:

Is a workaround available and implemented? yes
What is the workaround: disable self healing (disabling self healing also remove all the secrets)

Actual behavior:

Multiple secrets are created for a single "correction", and old ones are preserved.

Expected behavior:

Only 1 secret is created per "correction", while keeping the total number of Helm releases at a maximum of just 2.

Files, logs, traces:

Additional notes:

helm history  test-fastweb-hello-world -n hello
REVISION	UPDATED                 	STATUS    	CHART                   	APP VERSION	DESCRIPTION
164     	Wed Jun 12 15:06:08 2024	superseded	nginx-rancherhello-0.0.1	0.0.0      	Upgrade complete
165     	Wed Jun 12 15:06:09 2024	superseded	nginx-rancherhello-0.0.1	0.0.0      	Upgrade complete
166     	Wed Jun 12 15:06:24 2024	superseded	nginx-rancherhello-0.0.1	0.0.0      	Rollback to 165
167     	Wed Jun 12 15:06:31 2024	deployed  	nginx-rancherhello-0.0.1	0.0.0      	Rollback to 166
@kkaempf kkaempf added this to Fleet Jun 13, 2024
@github-project-automation github-project-automation bot moved this to 🆕 New in Fleet Jun 13, 2024
@kkaempf kkaempf moved this from 🆕 New to 🏗 In progress in Fleet Jun 13, 2024
@aruiz14 aruiz14 added this to the v2.8-Next1 milestone Jun 14, 2024
@aruiz14 aruiz14 changed the title [SURE-8550] drift detection is generating secrets without cleaning [0.9] [SURE-8550] drift detection is generating secrets without cleaning Jun 14, 2024
@aruiz14 aruiz14 moved this from 🏗 In progress to 👀 In review in Fleet Jun 14, 2024
@rancher rancher deleted a comment from rancherbot Jun 14, 2024
@aruiz14
Copy link
Contributor

aruiz14 commented Jun 14, 2024

/forwardport v2.9.0

@weyfonk
Copy link
Contributor

weyfonk commented Jun 17, 2024

Additional QA

Problem

Correcting drift on Fleet-deployed resources would create a new Helm release, and a new sh.helm.<ID> secret every time, leading to an expanding set of stored secrets and Helm history items. This could lead to performance issues.

Solution

Helm Rollback operations, used internally by Fleet to correct drift, now obey Fleet's global limit on Helm history, restricting the number of kept history items to 2.

Testing

(See repro steps above)

  1. Create a GitRepo with drift correction enabled, either via the above example, or as follows:
kind: GitRepo
apiVersion: fleet.cattle.io/v1alpha1
metadata:
  name: test-drift-secrets
spec:
  repo: https://github.com/rancher/fleet-test-data
  paths:
  - simple-chart
  correctDrift:
    enabled: true
    force: true
  1. Edit the deployment. In this simple-chart example, this could consist in editing the ConfigMap created from this GitRepo.

  2. Check that even after Fleet restores the deployment to its specified state (undoing manual changes), Helm history for the corresponding release still contains only 2 elements.

@sbulage
Copy link
Contributor

sbulage commented Jul 15, 2024

System Information Before Upgrade After Upgrade
Rancher Version 2.8.5 2.8.6
Fleet Version 0.9.5 0.9.5 --> 0.9.6-rc.4

Steps performed:

  1. Created GitRepo by enabling correctDrift
  2. Wait for Grafana application to be install.
  3. Updated deployment from 1-2.
  4. Saw that correctDrift was restoring it back to 1.
  5. Repeated steps 3 atleast 5 times.
  6. Every time it restored the replica count to 1 as expected.
  7. Saw increase in no. of secrets every time made changes to deployment.
  8. Later upgraded Rancher from 2.8.5 to 2.8.6-alpha3.
  9. Wait for the upgrade finish
  10. Again changed replica count from 1-2.
  11. Verified that secrets count was lowered.
  12. Also, checked helm history command which shows only 2 entries.

Outputs:

Secrets Before Upgrade
satya@opensuse15:~> kubectl get secrets -n myapp-grafana -w
NAME                            TYPE                 DATA   AGE
grafana                         Opaque               3      2m56s
sh.helm.release.v1.grafana.v1   helm.sh/release.v1   1      2m56s
sh.helm.release.v1.grafana.v2   helm.sh/release.v1   1      2m55s
sh.helm.release.v1.grafana.v2   helm.sh/release.v1   1      3m11s
sh.helm.release.v1.grafana.v3   helm.sh/release.v1   1      14s
sh.helm.release.v1.grafana.v4   helm.sh/release.v1   1      4s
sh.helm.release.v1.grafana.v5   helm.sh/release.v1   1      4s
sh.helm.release.v1.grafana.v6   helm.sh/release.v1   1      3s
sh.helm.release.v1.grafana.v7   helm.sh/release.v1   1      3s
sh.helm.release.v1.grafana.v8   helm.sh/release.v1   1      7s
sh.helm.release.v1.grafana.v9    helm.sh/release.v1   1      5s
sh.helm.release.v1.grafana.v10   helm.sh/release.v1   1      1s
Secrets After Upgrade
satya@opensuse15:~> kubectl get secrets -n myapp-grafana 
NAME                             TYPE                 DATA   AGE
grafana                          Opaque               3      23m
sh.helm.release.v1.grafana.v10   helm.sh/release.v1   1      20m
sh.helm.release.v1.grafana.v11   helm.sh/release.v1   1      15s
Helm history before upgrade
satya@opensuse15:~> helm history -n myapp-grafana grafana 
REVISION	UPDATED                 	STATUS    	CHART        	APP VERSION	DESCRIPTION     
1       	Mon Jul 15 08:22:44 2024	superseded	grafana-8.3.4	11.1.0     	Install complete
2       	Mon Jul 15 08:22:44 2024	superseded	grafana-8.3.4	11.1.0     	Upgrade complete
3       	Mon Jul 15 08:25:56 2024	superseded	grafana-8.3.4	11.1.0     	Rollback to 2   
4       	Mon Jul 15 08:26:09 2024	superseded	grafana-8.3.4	11.1.0     	Rollback to 3   
5       	Mon Jul 15 08:26:13 2024	superseded	grafana-8.3.4	11.1.0     	Rollback to 4   
6       	Mon Jul 15 08:26:17 2024	superseded	grafana-8.3.4	11.1.0     	Rollback to 5   
7       	Mon Jul 15 08:26:20 2024	superseded	grafana-8.3.4	11.1.0     	Rollback to 6   
8       	Mon Jul 15 08:26:22 2024	superseded	grafana-8.3.4	11.1.0     	Rollback to 7   
9       	Mon Jul 15 08:26:30 2024	superseded	grafana-8.3.4	11.1.0     	Rollback to 8   
10      	Mon Jul 15 08:26:34 2024	deployed  	grafana-8.3.4	11.1.0     	Rollback to 9   
Helm history after upgrade
satya@opensuse15:~> helm history -n myapp-grafana grafana 
REVISION	UPDATED                 	STATUS    	CHART        	APP VERSION	DESCRIPTION   
10      	Mon Jul 15 08:26:34 2024	superseded	grafana-8.3.4	11.1.0     	Rollback to 9 
11      	Mon Jul 15 08:46:21 2024	deployed  	grafana-8.3.4	11.1.0     	Rollback to 10

@sbulage sbulage closed this as completed Jul 15, 2024
@github-project-automation github-project-automation bot moved this from Needs QA review to ✅ Done in Fleet Jul 15, 2024
@kkaempf kkaempf added the JIRA Must shout label Aug 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

4 participants