Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[server] Directed leadership transfer CLI and API #17383

Merged
merged 26 commits into from
Oct 4, 2023
Merged

Conversation

angrycub
Copy link
Contributor

@angrycub angrycub commented Jun 1, 2023

This PR adds an API endpoint and a CLI command to allow cluster operators to move leadership from the current leader to a specified server node. This can reduce leadership churn during rolling updates by allowing the operator to migrate leadership to the first upgraded node.

Commits

  • Add directed leadership transfer func
  • Add leadership transfer RPC endpoint
  • Add ACL tests for leadership-transfer endpoint
  • Add HTTP API route and implementation
  • Add to Go API client
  • Implement CLI command
  • Add documentation

Fixes: #7376

Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome @angrycub. I've left a few comments on how we might be able to refine this.

nomad/operator_endpoint.go Outdated Show resolved Hide resolved
nomad/operator_endpoint.go Outdated Show resolved Hide resolved
api/operator.go Outdated Show resolved Hide resolved
command/operator_raft_leadership_transfer.go Outdated Show resolved Hide resolved
website/content/api-docs/operator/raft.mdx Show resolved Hide resolved
nomad/operator_endpoint.go Show resolved Hide resolved
nomad/operator_endpoint.go Outdated Show resolved Hide resolved
@MattJustMatt
Copy link
Contributor

Thank you for this! Was looking for this command and excited to see this PR.

Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking really good @angrycub! I think just a few more tweaks and we can land this.

command/operator_raft_leadership_transfer.go Outdated Show resolved Hide resolved
nomad/operator_endpoint.go Outdated Show resolved Hide resolved
nomad/leader.go Outdated Show resolved Hide resolved
nomad/operator_endpoint_test.go Show resolved Hide resolved
@vercel
Copy link

vercel bot commented Oct 3, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
nomad-storybook-and-ui ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 4, 2023 1:41pm
1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
nomad ⬜️ Ignored (Inspect) Visit Preview Oct 4, 2023 1:41pm

Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! :shipit:

nomad/operator_endpoint.go Outdated Show resolved Hide resolved
nomad/operator_endpoint.go Outdated Show resolved Hide resolved
Comment on lines +150 to +151
<Tabs>
<Tab heading="Nomad CLI">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave this for now, but adding a brand new way to demonstrate API examples is the sort of thing that really doesn't belong in a feature PR. We end up with a bunch of one-off decisions that way that aren't applied equally across the API docs. In the future, if you want to do this sort of thing, make a separate PR for it so that we can discuss it independently of critical feature work.

Comment on lines +159 to +190
for i := 0; i < retryCount; i++ {
err := s.raft.LeadershipTransferToServer(to.ID, to.Address).Error()
if err == nil {
s.logger.Info("successfully transferred leadership")
return nil
}

// "cannot transfer leadership to itself"
// Handled at top of function, but reapplied here to prevent retrying if
// it occurs while we are retrying
if err.Error() == "cannot transfer leadership to itself" {
s.logger.Debug("leadership transfer to current leader is a no-op")
return nil
}

// ErrRaftShutdown: Don't retry if raft is shut down.
if err == raft.ErrRaftShutdown {
return err
}

// ErrUnsupportedProtocol: Don't retry if the Raft version doesn't
// support leadership transfer since this will never succeed.
if err == raft.ErrUnsupportedProtocol {
return fmt.Errorf("leadership transfer not supported with Raft version lower than 3")
}

// ErrEnqueueTimeout: This seems to be the valid time to retry.
s.logger.Error("failed to transfer leadership attempt, will retry",
"attempt", i,
"retry_limit", retryCount,
"error", err,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is real nice. 👍

@angrycub angrycub merged commit 8a93ff3 into main Oct 4, 2023
23 checks passed
@angrycub angrycub deleted the f-leader-transfer branch October 4, 2023 16:20
@tgross tgross added this to the 1.7.0 milestone Oct 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API/command to initiate graceful change in leadership
3 participants