[synthetic-monitoring-agent] fix deployment not starting on update/auto-scaling #2994

Iridias · 2024-02-27T11:37:57Z

The implementation of the agent does not seem to allow for concurrency.
Thus scaling the deployment - either on helm-upgrade or auto-scaling - will result in the new PODs to never become ready!

So any helm-upgrade will run into a timeout and then abort.

In the logs you'll find the following messages (if debug is enabled):

{"level":"info","program":"synthetic-monitoring-agent","subsystem":"updater","error":"registering probe with synthetic-monitoring-api, response: probe already exists","was_connected":false,"connection_state":"READY","time":1708611775289,"caller":"github.com/grafana/synthetic-monitoring-agent/internal/checks/checks.go:259","message":"broke out of loop"}
{"level":"warn","program":"synthetic-monitoring-agent","subsystem":"updater","error":"registering probe with synthetic-monitoring-api, response: probe already exists","connection_state":"READY","time":1708611775289,"caller":"github.com/grafana/synthetic-monitoring-agent/internal/checks/checks.go:309","message":"handling check changes"}

With emphasis on: response: probe already exists

To fix that, I changed the Deployment to a StatefulSet, as k8s ensures, that the old POD is killed/deleted before spawning the new one.
I also removed all the autoscaling-resources, as they're not useful anyway.

And of course, I also successfully tested the changes on one of our clusters.

CLAassistant · 2024-02-27T11:38:03Z

All committers have signed the CLA.

Signed-off-by: Iridias <[email protected]>

zanhsieh · 2024-03-31T22:42:38Z

@Iridias
Can you split your PR one PR per chart? Otherwise the CI won't be able to merge.

Iridias requested review from torstenwalter and zanhsieh as code owners February 27, 2024 11:37

Iridias force-pushed the main branch from 2b5d409 to d948295 Compare February 27, 2024 11:48

Iridias added 2 commits March 14, 2024 20:20

synthetic-monitoring-agent: change Deployment to StatefulSet

e304345

Signed-off-by: Iridias <[email protected]>

removed autoscaling-resources / increased chart-version

a59c56f

Signed-off-by: Iridias <[email protected]>

Iridias force-pushed the main branch from 462d66c to a59c56f Compare March 14, 2024 19:20

updated helm-docs

0cb4b09

Signed-off-by: Iridias <[email protected]>

Iridias requested review from unguiculus, Whyeasy and a team as code owners March 14, 2024 19:24

zanhsieh closed this Mar 31, 2024

Iridias mentioned this pull request Apr 10, 2024

[synthetic-monitoring-agent] fix deployment not starting on update/auto-scaling #3070

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[synthetic-monitoring-agent] fix deployment not starting on update/auto-scaling #2994

[synthetic-monitoring-agent] fix deployment not starting on update/auto-scaling #2994

Iridias commented Feb 27, 2024

CLAassistant commented Feb 27, 2024 •

edited

Loading

zanhsieh commented Mar 31, 2024

[synthetic-monitoring-agent] fix deployment not starting on update/auto-scaling #2994

[synthetic-monitoring-agent] fix deployment not starting on update/auto-scaling #2994

Conversation

Iridias commented Feb 27, 2024

CLAassistant commented Feb 27, 2024 • edited Loading

zanhsieh commented Mar 31, 2024

CLAassistant commented Feb 27, 2024 •

edited

Loading