-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🌱 enable leader election of klusterlet-agent on single node managed clusters #727
🌱 enable leader election of klusterlet-agent on single node managed clusters #727
Conversation
971f439
to
4f59822
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #727 +/- ##
==========================================
+ Coverage 63.48% 63.51% +0.02%
==========================================
Files 185 185
Lines 17827 17838 +11
==========================================
+ Hits 11317 11329 +12
+ Misses 5578 5577 -1
Partials 932 932
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
pkg/registration/spoke/spokeagent.go
Outdated
@@ -359,7 +359,10 @@ func (o *SpokeAgentConfig) RunSpokeAgentWithSpokeInformers(ctx context.Context, | |||
if err := wait.PollUntilContextCancel(bootstrapCtx, 1*time.Second, true, o.internalHubConfigValidFunc); err != nil { | |||
// TODO need run the bootstrap CSR forever to re-establish the client-cert if it is ever lost. | |||
stopBootstrap() | |||
return fmt.Errorf("failed to wait for hub client config for managed cluster to be ready: %w", err) | |||
// DO NOT return error if context is canceled, allows the leader to release the leadership. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good in general.
Just a question: do we need to doc in which case internalHubConfigValidFunc should return err and it means the agent return with error, and will start a new leader election?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, more comments added.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: haoqing0110, qiujian16 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
…sters Signed-off-by: Qing Hao <[email protected]>
4f59822
to
c0f542f
Compare
/lgtm |
ed367fd
into
open-cluster-management-io:main
Summary
The leader election of the klusterlet-agent is disabled on the single node managed cluster to speed up the restart procedure (open-cluster-management-io/registration-operator#193). While it might cause that two pods with different configuration, such as images or bootstrap config, running at the same time. Some unexpected behavior happens in this situation, for example, when old and new agents both exist, they override hub-kubeconfig-secret with different value and trigger each other create csr constantly.
To fix those issues, it is necessary to enable the leader election of the klusterlet-agent on the single node managed cluster.
Related issue(s)
#695