Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🌱 enable leader election of klusterlet-agent on single node managed clusters #727

Conversation

haoqing0110
Copy link
Member

Summary

The leader election of the klusterlet-agent is disabled on the single node managed cluster to speed up the restart procedure (open-cluster-management-io/registration-operator#193). While it might cause that two pods with different configuration, such as images or bootstrap config, running at the same time. Some unexpected behavior happens in this situation, for example, when old and new agents both exist, they override hub-kubeconfig-secret with different value and trigger each other create csr constantly.

To fix those issues, it is necessary to enable the leader election of the klusterlet-agent on the single node managed cluster.

Related issue(s)

#695

Copy link

codecov bot commented Nov 29, 2024

Codecov Report

Attention: Patch coverage is 62.50000% with 6 lines in your changes missing coverage. Please review.

Project coverage is 63.51%. Comparing base (93db6de) to head (c0f542f).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
pkg/registration/spoke/spokeagent.go 76.92% 2 Missing and 1 partial ⚠️
pkg/singleton/spoke/agent.go 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #727      +/-   ##
==========================================
+ Coverage   63.48%   63.51%   +0.02%     
==========================================
  Files         185      185              
  Lines       17827    17838      +11     
==========================================
+ Hits        11317    11329      +12     
+ Misses       5578     5577       -1     
  Partials      932      932              
Flag Coverage Δ
unit 63.51% <62.50%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@qiujian16 qiujian16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@@ -359,7 +359,10 @@ func (o *SpokeAgentConfig) RunSpokeAgentWithSpokeInformers(ctx context.Context,
if err := wait.PollUntilContextCancel(bootstrapCtx, 1*time.Second, true, o.internalHubConfigValidFunc); err != nil {
// TODO need run the bootstrap CSR forever to re-establish the client-cert if it is ever lost.
stopBootstrap()
return fmt.Errorf("failed to wait for hub client config for managed cluster to be ready: %w", err)
// DO NOT return error if context is canceled, allows the leader to release the leadership.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good in general.

Just a question: do we need to doc in which case internalHubConfigValidFunc should return err and it means the agent return with error, and will start a new leader election?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, more comments added.

Copy link
Contributor

openshift-ci bot commented Nov 29, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: haoqing0110, qiujian16

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@haoqing0110 haoqing0110 force-pushed the br_agent-leader-election branch from 4f59822 to c0f542f Compare November 29, 2024 07:34
@qiujian16
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Nov 29, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit ed367fd into open-cluster-management-io:main Nov 29, 2024
15 checks passed
@haoqing0110 haoqing0110 deleted the br_agent-leader-election branch December 4, 2024 07:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants