Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[synthetic-monitoring-agent] fix deployment not starting on update/auto-scaling #2994

Closed
wants to merge 3 commits into from

Conversation

Iridias
Copy link

@Iridias Iridias commented Feb 27, 2024

The implementation of the agent does not seem to allow for concurrency.
Thus scaling the deployment - either on helm-upgrade or auto-scaling - will result in the new PODs to never become ready!

So any helm-upgrade will run into a timeout and then abort.

In the logs you'll find the following messages (if debug is enabled):

{"level":"info","program":"synthetic-monitoring-agent","subsystem":"updater","error":"registering probe with synthetic-monitoring-api, response: probe already exists","was_connected":false,"connection_state":"READY","time":1708611775289,"caller":"github.com/grafana/synthetic-monitoring-agent/internal/checks/checks.go:259","message":"broke out of loop"}
{"level":"warn","program":"synthetic-monitoring-agent","subsystem":"updater","error":"registering probe with synthetic-monitoring-api, response: probe already exists","connection_state":"READY","time":1708611775289,"caller":"github.com/grafana/synthetic-monitoring-agent/internal/checks/checks.go:309","message":"handling check changes"}

With emphasis on: response: probe already exists

To fix that, I changed the Deployment to a StatefulSet, as k8s ensures, that the old POD is killed/deleted before spawning the new one.
I also removed all the autoscaling-resources, as they're not useful anyway.

And of course, I also successfully tested the changes on one of our clusters.

@CLAassistant
Copy link

CLAassistant commented Feb 27, 2024

CLA assistant check
All committers have signed the CLA.

Signed-off-by: Iridias <[email protected]>
@Iridias Iridias requested review from unguiculus, Whyeasy and a team as code owners March 14, 2024 19:24
@zanhsieh
Copy link
Collaborator

@Iridias
Can you split your PR one PR per chart? Otherwise the CI won't be able to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants