Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ACM-11757: refactor status update management to avoid flapping status #1526

Merged
merged 13 commits into from
Aug 21, 2024

Conversation

thibaultmg
Copy link
Contributor

@thibaultmg thibaultmg commented Jul 8, 2024

This PR changes how the addon status reporting is managed, based on this specification.

It first implements a status package that enables atomic update of each component's status by adding two specific condition types in the addon conditions list: MetricsCollector and UwlMetricsCollector.
This package is used by the endpoint operator to set the UpdateSuccessful or UpdateFailed reasons (plus some others), and by the metrics collector to set the ForwardSuccessful or ForwardFailed reasons for each collector.

In order to avoid the flapping behavior described in the linked issue, it forbids some transitions such as UpdateFailed -> ForwardXxx. The idea is that to report a consistent status, the collector must first be in the state required by the endpoint operator.

These specific condition types are then aggregated by the status controller into the expected condition types by ACM (Available, Degraded and Progressing).

One edge case that is not handled here is if the collector pod stops after having set the reason to ForwardSuccessful. In such a case, we get an Available status while it is not working. We could add more logic, requiring the collector to do regular updates on its condition timestamp and the controller would monitor this. Let's keep it simple while nobody complains about it 😄

Copy link

openshift-ci bot commented Jul 8, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@thibaultmg thibaultmg force-pushed the refactor_status_update branch from 516969b to d3fa29e Compare July 24, 2024 12:20
@thibaultmg thibaultmg force-pushed the refactor_status_update branch from d3fa29e to f9ac6b3 Compare July 24, 2024 12:21
@thibaultmg thibaultmg marked this pull request as ready for review July 24, 2024 12:21
@thibaultmg thibaultmg changed the title add status update ACM-11757: refactor status update management to avoid flapping status Jul 24, 2024
Copy link
Contributor

@philipgough philipgough left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link

openshift-ci bot commented Aug 21, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: philipgough, thibaultmg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [philipgough,thibaultmg]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
@thibaultmg thibaultmg force-pushed the refactor_status_update branch from 75ccf36 to 9645628 Compare August 21, 2024 13:38
@openshift-ci openshift-ci bot removed the lgtm label Aug 21, 2024
Copy link

openshift-ci bot commented Aug 21, 2024

New changes are detected. LGTM label has been removed.

Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
B Security Rating on New Code (required ≥ A)

See analysis details on SonarCloud

Catch issues before they fail your Quality Gate with our IDE extension SonarLint

Copy link

openshift-ci bot commented Aug 21, 2024

@thibaultmg: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/sonarcloud 9645628 link false /test sonarcloud

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@thibaultmg thibaultmg enabled auto-merge (squash) August 21, 2024 14:05
@thibaultmg thibaultmg merged commit c088a5a into stolostron:main Aug 21, 2024
13 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants