-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ACM-11757: refactor status update management to avoid flapping status #1526
ACM-11757: refactor status update management to avoid flapping status #1526
Conversation
Skipping CI for Draft Pull Request. |
516969b
to
d3fa29e
Compare
d3fa29e
to
f9ac6b3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: philipgough, thibaultmg The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
75ccf36
to
9645628
Compare
New changes are detected. LGTM label has been removed. |
Quality Gate failedFailed conditions See analysis details on SonarCloud Catch issues before they fail your Quality Gate with our IDE extension SonarLint |
@thibaultmg: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
This PR changes how the addon status reporting is managed, based on this specification.
It first implements a status package that enables atomic update of each component's status by adding two specific condition types in the addon conditions list:
MetricsCollector
andUwlMetricsCollector
.This package is used by the endpoint operator to set the
UpdateSuccessful
orUpdateFailed
reasons (plus some others), and by the metrics collector to set theForwardSuccessful
orForwardFailed
reasons for each collector.In order to avoid the flapping behavior described in the linked issue, it forbids some transitions such as
UpdateFailed
->ForwardXxx
. The idea is that to report a consistent status, the collector must first be in the state required by the endpoint operator.These specific condition types are then aggregated by the status controller into the expected condition types by ACM (
Available
,Degraded
andProgressing
).One edge case that is not handled here is if the collector pod stops after having set the reason to
ForwardSuccessful
. In such a case, we get anAvailable
status while it is not working. We could add more logic, requiring the collector to do regular updates on its condition timestamp and the controller would monitor this. Let's keep it simple while nobody complains about it 😄