Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SURE-8309] fleet-agent in rke2 cluster repeatedly deploying managed-system-agent bundle #2856

Closed
kkaempf opened this issue Sep 16, 2024 · 6 comments
Assignees
Milestone

Comments

@kkaempf
Copy link
Collaborator

kkaempf commented Sep 16, 2024

SURE-8309

Issue description:

If you check the fleet-agent logs in an rke2 cluster that is provisioned by Rancher, you will see it spam that it is deploying the managed-system-agent bundle.

time="2024-04-24T22:42:00Z" level=info msg="Deploying bundle cluster-fleet-default-quickstart-aws-custom-c1aeed1484bc/quickstart-aws-custom-managed-system-agent"
time="2024-04-24T22:42:01Z" level=info msg="Deploying bundle cluster-fleet-default-quickstart-aws-custom-c1aeed1484bc/quickstart-aws-custom-managed-system-agent"
time="2024-04-24T22:42:04Z" level=info msg="Deploying bundle cluster-fleet-default-quickstart-aws-custom-c1aeed1484bc/quickstart-aws-custom-managed-system-agent"
...

Business impact:

It seems like a nuisance more than anything. The cluster is healthy.

Troubleshooting steps:

I think the log is spamming this in error. When I check the actual last update time in the status of the manifest for the bundle, it doesn't show it being deployed since it was originally deployed

Repro steps:

  • Customer is using Rancher 2.7.10. I even reproduced this in Rancher 2.8.3 with v1.28.8 +rke2r1
  • Create a new RKE2 cluster via Rancher
  • Explore the cluster and view the logs for the fleet-agent pod
@kkaempf kkaempf added this to the v2.9.3 milestone Sep 16, 2024
@kkaempf kkaempf added this to Fleet Sep 16, 2024
@github-project-automation github-project-automation bot moved this to 🆕 New in Fleet Sep 16, 2024
@kkaempf kkaempf added the JIRA Must shout label Sep 24, 2024
@manno
Copy link
Member

manno commented Sep 30, 2024

Related to #2869

@manno
Copy link
Member

manno commented Sep 30, 2024

This should be fixed by #2917

@manno manno moved this from 🆕 New to Needs QA review in Fleet Sep 30, 2024
@kkaempf
Copy link
Collaborator Author

kkaempf commented Oct 1, 2024

@manno #2917 got merged into main while the issue here is scheduled for 2.9.3 🤔 /cc @weyfonk

@sbulage sbulage self-assigned this Oct 3, 2024
@weyfonk weyfonk moved this from Needs QA review to 📋 Backlog in Fleet Oct 8, 2024
@weyfonk weyfonk moved this from 📋 Backlog to Needs QA review in Fleet Oct 9, 2024
@weyfonk weyfonk modified the milestones: v2.9.3, v2.10.0 Oct 9, 2024
@weyfonk
Copy link
Contributor

weyfonk commented Oct 9, 2024

@manno #2917 got merged into main while the issue here is scheduled for 2.9.3 🤔 /cc @weyfonk

Good catch. Updated the milestone to v2.10.0 on this issue and created #2944 as a backport to v2.9.3.

@weyfonk
Copy link
Contributor

weyfonk commented Oct 10, 2024

Additional QA

Problem

Adding an RKE2 cluster to a Fleet setup would lead to multiple log messages about the Fleet agent bundle being deployed, although that deployment would actually happen only once.

Solution

Moved the log message to ensure that it is only produced when a deployment actually happened, not when it was skipped due to the bundle and release already being deployed.

Testing

Engineering Testing

Manual Testing

Observed a single Deployed bundle log for a Fleet agent bundle, vs multiple identical logs before the fix.
Tests were run against a local k3s cluster, as this issue is expected to be independent from the underlying K8s distribution.

Automated Testing

N/A (tests do not cover logging)

QA Testing Considerations

We should validate that this fix works on RKE2 as well (single Deployed bundle log for Fleet agent deployment).

⚠️ The log message appears as Deploying bundle in the issue description, because it involves an older Fleet version. From Fleet 0.10 onwards, that message is Deployed bundle, as it is produced right after a deployment, not before starting it.

Regressions Considerations

N/A

@sbulage
Copy link
Contributor

sbulage commented Oct 30, 2024

System Information

Before Upgrade

Rancher Version Fleet Version
Prime:2.9.3 Fleet: 0.10.4

After Upgrade

Rancher Version Fleet Version
v2.10.0-alpha5 fleet:v0.11.0-beta.3

Before Upgrade and After Upgrade is performed just to make sure nothing is broken.


Before Upgrade

  • Deploy a Rancher Prime latest released version (I installed prime/2.9.3)
  • Import downstream cluster. I used k3d and k3s
  • Observed the logs of the fleet-agent installed in the downstream cluster.

In the fleet-agent pod logs you'll see the repeated messages related to Deployed bundle

Note: I filtered with the string Deployed bundle to see log repetition behavior.

Logs:

Fleet agent logs before upgrade
{"level":"info","ts":"2024-10-30T13:27:39Z","logger":"bundledeployment.DeployBundle","msg":"Deployed bundle","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"fleet-agent-imported-0","namespace":"cluster-fleet-default-imported-0-f1158df57e01"},"namespace":"cluster-fleet-default-imported-0-f1158df57e01","name":"fleet-agent-imported-0","reconcileID":"4b512da4-0bc1-4b3e-9c10-440c88878380","deploymentID":"s-4f421ffd2d21effac6a8620b53a15eed5f18eeb0289d4ac2c0c2790e7e46a:8eaf3c183fe289136e7a2da3ceafa5c435e8f098e386324237baeb8811cf8b21","appliedDeploymentID":"","release":"cattle-fleet-system/fleet-agent-imported-0:1","DeploymentID":"s-4f421ffd2d21effac6a8620b53a15eed5f18eeb0289d4ac2c0c2790e7e46a:8eaf3c183fe289136e7a2da3ceafa5c435e8f098e386324237baeb8811cf8b21"}

After Upgrade

  • Upgraded same cluster to v2.10.0-alpha5 which installs Fleet version: 0.11.0-beta.3
  • Observe the fleet-agent pod logs on the imported clusters.

In the fleet-agent pod logs you'll see the only single entry for Deployed bundle

Note: I filtered with the string Deployed bundle to see log repetition behavior.


Logs:

Fleet agent logs After upgrade
{"level":"info","ts":"2024-10-30T15:52:57Z","logger":"bundledeployment.DeployBundle","msg":"Deployed bundle","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"fleet-agent-local","namespace":"cluster-fleet-local-local-1a3d67d0a899"},"namespace":"cluster-fleet-local-local-1a3d67d0a899","name":"fleet-agent-local","reconcileID":"f00fc4f1-a2d1-4e9c-bd7f-011e20a2f36c","deploymentID":"s-ca0f84354c57b23a7bb0d88e3947dfbb6c16397c40c78e94a6589e3d974c5:32eb9f72e71f28d6a3ee815ddf77e461aa78ec0b6870afe4765c423d31bf6557","appliedDeploymentID":"s-caf1a1a3ddc0a15edb34c257394e886c52ffbabd2060c478472d9b7f9cf08:32eb9f72e71f28d6a3ee815ddf77e461aa78ec0b6870afe4765c423d31bf6557","release":"cattle-fleet-local-system/fleet-agent-local:3","DeploymentID":"s-ca0f84354c57b23a7bb0d88e3947dfbb6c16397c40c78e94a6589e3d974c5:32eb9f72e71f28d6a3ee815ddf77e461aa78ec0b6870afe4765c423d31bf6557"}

@sbulage sbulage closed this as completed Oct 30, 2024
@github-project-automation github-project-automation bot moved this from Needs QA review to ✅ Done in Fleet Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

4 participants