-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ACM-12251: New addon status design document #1472
ACM-12251: New addon status design document #1472
Conversation
Skipping CI for Draft Pull Request. |
I'm missing an analogy here to help visualize what's the objective of this new design. Overall, I think we need a solution similar to the Node status conditions. I think this design might be inspired by them, but I'm not sure. Node's in k8s can have a bunch of conditions with different types:
All these conditions are totally independent and more than one of them can be present at the same time. Note that these conditions follow the conventions that "abnormal is true". Example: if the "normal" state is to not have disk pressure, we know that the default value has to be Considering we must have those conditions and can't have super custom ones (we don't have decision power on the status of the addons), I think the proposal is good. When aggregating the spokes addon status into the hub's, will we increase a custom "message" that will allow a reader to quickly identify the spoke(s) with issues? A small note: I don't see a way for the Endpoint Metrics Operator to transition into "Available". That stats only has outgoing transitions (update successful/failure, disabled metrics), but no incoming transitions. |
No, I did not know about it, but I guess it's good news.
Yes, this is the spirit of conditions. They can be treated as simple observations. But in our case, IIUC, we are reporting to the hub wether we are in one of the following states:
I haven't described this part, will do some additions.
This is the idea. I didn't want to go too deep into details so that this document remains easy to maintain. But I'll add a word about it.
The operator only installs and updates the collectors. It can't state wether it is functioning properly, only the collector can make this "observation". Given our conversation, I wonder wether we can reduce the condition types to only the 3 standard ones, plus a unique one by collector. Because to me, |
@thibaultmg I'm aware of the specific statuses that the addons should have according to their documentation, that's why I wrote I'm not saying we need extra status conditions like the Node object has. They would definitely be useful though, as they can indicate the problem directly in the addon resource without requiring a deeper investigation. Paired with what I mentioned about the message of the status aggregating the names of the spokes with issues, this could easily tell anyone |
On a separate note, we have to be careful with the state and their interaction with upgrades. For instance, we could end up with an upgrade blocked because fo the status of the monitoring addons. |
I should have addressed most comments with the latest update.
@douglascamata I agree, let me know if you think of a scenario where this can happen. |
Quality Gate passedIssues Measures |
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
d4a60f2
to
a628376
Compare
Quality Gate passedIssues Measures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: subbarao-meduri, thibaultmg The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
The addon status update is currently causing issues such as:
This document aims to specify the expected behavior. It is a target for achieving a consistent management of the spoke addon status being reported to the hub.
It does not describe the current implementation.
Main differences are: