-
Notifications
You must be signed in to change notification settings - Fork 400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
*: introduce judge_split_prevote #420
base: master
Are you sure you want to change the base?
Conversation
During split prevote, a campaign will fail because all nodes think it will collect enough votes, so after they actually start campaign, no one votes for the other, the campaign has to fail. `judge_split_prevote` solves the problem by adding extra constraint to split prevote: only vote for nodes that have greater IDs. It's easy to conclude that it works for peer numbers not greater 5. For 7 nodes, it's still possible to split again. But it should be enough for most cases. Because the constraint is only added for split prevote, so even failure won't lead to worse result. Signed-off-by: Jay Lee <[email protected]>
Signed-off-by: Jay Lee <[email protected]>
Signed-off-by: Jay Lee <[email protected]>
cf94594
to
158f7f1
Compare
I have tested it with 10k regions of both configuration capacity 3 and 5, both can finish in two election timeout when one TiKV is down. |
Signed-off-by: Jay Lee <[email protected]>
// judge split vote can break symmetry of campaign, but as | ||
// it only happens during split vote, the impact should not | ||
// be significant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can we tell if a campaign will end up split vote?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could cause some raft nodes with lower ID impossible to become the leader even we want to transfer leadership to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the configuration size is an odd number and leader is down, split vote can probably happen If two nodes are in PreCandidate state. judge_split_prevote
only works on prevote, transfering leader skips prevote, so they won't have impact on the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
judge_split_prevote only works on prevote
Besides transfer leader, a node needs to pass pre-campaign before start the actual campaign, so judge_split_prevote
will impact the whole election process (pre-vote should consider enabled as this is when judge_split_prevote
work).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so
judge_split_prevote
will impact the whole election process...
Indeed, it depends on whether nodes are working slowly. But in my tests, when one node is down, and after elections are finished, the leader count on each nodes don't have much differences (I remove balance leader scheduler before shutdown a node). And even it leads to more leaders on some node, it should not be a problem with the help of PD to reach a eventually balance.
To avoid split prevote, maybe we can record the prevote like the read vote (but not need to persist it) and reject incoming vote request in the same term like what real campaign does. |
Recording votes won't solve split vote. If pre-campaign works like actual campaign and split vote can happen in actual campaign, then it can also happen in pre-campaign. |
We can't prevent split vote completely, but with random election timeout and vote recording, split vote should happen rarely. |
How can recording vote reduce the probability of split vote? It just make the split happen in early stage, from actual vote to pre vote. The strategy here can solve split completely in configuration size 3, and make it hardly happen in configuration size 5. |
After recording a prevote at future term, a node should not start pre-compaing or prevote for other nodes (at the same future term) as it already prevote at that future term. |
A common situation of split vote in 3 voters situation is that leader is down, and two followers start campaign at the same time. How does the strategy you describe make the campaign succeed in one round of election? |
It can't, forget about it, I have some misunderstanding about the problem previously. |
Signed-off-by: Jay Lee <[email protected]>
Signed-off-by: Jay Lee <[email protected]>
// it only happens during split vote, the impact should not | ||
// be significant. | ||
!self.judge_split_prevote | ||
|| self.state != StateRole::PreCandidate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add some comments for transfer leader?
Signed-off-by: Jay Lee <[email protected]>
Signed-off-by: Jay Lee <[email protected]>
// When judge_split_prevote, reject explicitly to make candidate exit PreCandiate early | ||
// so it will vote for other peer later. | ||
if self.judge_split_prevote | ||
&& m.get_msg_type() == MessageType::MsgRequestPreVote | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems Follower
and PreCandidate
are not different, both of them can (pre)vote to other peers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follower
will ignore all PreVoteResponse
. For example, A and B split pre-votes, and C can vote for both A and B. If B is chosen, and A doesn't step down to follower, it will still start campaign when C's prevote is received.
Signed-off-by: Jay Lee <[email protected]>
PTAL |
Signed-off-by: Jay Lee <[email protected]>
…rs into introduce-judge-split-vote Signed-off-by: Jay Lee <[email protected]>
During split prevote, a campaign will fail because all nodes think
it will collect enough votes, so after they actually start campaign,
no one votes for the other, the campaign has to fail.
judge_split_prevote
solves the problem by adding extra constraintto split prevote: only vote for nodes that have greater IDs. It's easy
to conclude that it works for peer numbers less than 5. For >=5 nodes,
it's still possible to split again. But it should be enough for most
cases. Because the constraint is only added for split prevote, so even
failure won't lead to worse result.