Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disconnecting 1 server (not leader) out of 3 causes the remaining 2 servers to lose the leadership #1782

Closed
arie-stratoscale opened this issue Mar 2, 2016 · 12 comments
Labels
type/bug Feature does not function as expected

Comments

@arie-stratoscale
Copy link

We have a test that disconnects a network interface on consul server every 30 seconds for 30 seconds. I believe the expected behavior for the 2 remaining consul servers is to acquire and keep a leadership between them, but the result is that they periodically lose the leadership when the third server disconnects.

Logs from the leader node:
2016/03/02 04:21:11 [INFO] consul: cluster leadership lost
2016/03/02 04:21:11 [INFO] raft: pipelining replication to peer 1.124.245.220:8300
2016/03/02 04:21:11 [INFO] raft: aborting pipeline replication to peer 1.124.245.220:8300
2016/03/02 04:21:11 [ERR] http: Request GET /v1/kv/cluster/nodestates?recurse, error: No cluster leader from=127.0.0.1:50835
2016/03/02 04:21:12 [ERR] raft: Failed to heartbeat to 1.124.239.48:8300: dial tcp 1.124.239.48:8300: i/o timeout
2016/03/02 04:21:12 [WARN] raft: Rejecting vote request from 1.124.239.48:8300 since our last index is greater (5064, 5010)
2016/03/02 04:21:12 [ERR] http: Request PUT /v1/kv/cluster/rabbit/keepalive, error: No cluster leader from=127.0.0.1:50539
2016/03/02 04:21:12 [ERR] http: Request GET /v1/kv/cluster/nodestates?recurse, error: No cluster leader from=127.0.0.1:50835
2016/03/02 04:21:12 [WARN] raft: Heartbeat timeout reached, starting election
2016/03/02 04:21:12 [INFO] raft: Node at 1.124.192.47:8300 [Candidate] entering Candidate state
2016/03/02 04:21:12 [INFO] raft: Election won. Tally: 2
2016/03/02 04:21:12 [INFO] raft: Node at 1.124.192.47:8300 [Leader] entering Leader state
2016/03/02 04:21:12 [INFO] consul: cluster leadership acquired
2016/03/02 04:21:12 [INFO] consul: New leader elected: stratonode0
2016/03/02 04:21:12 [INFO] raft: pipelining replication to peer 1.124.245.220:8300
2016/03/02 04:21:12 [WARN] raft: AppendEntries to 1.124.239.48:8300 rejected, sending older logs (next: 5011)
2016/03/02 04:21:12 [INFO] raft: pipelining replication to peer 1.124.239.48:8300
2016/03/02 04:21:23 [INFO] agent: Synced check 'service:influxdb'
2016/03/02 04:21:31 [INFO] serf: EventMemberJoin: stratonode12080 1.124.239.48
2016/03/02 04:21:31 [INFO] consul: adding LAN server stratonode12080 (Addr: 1.124.239.48:8300) (DC: stratocluster-14594902935824492167__1_2_1_0_04b35d92)
2016/03/02 04:21:31 [INFO] consul: member 'stratonode12080' joined, marking health alive
2016/03/02 04:21:43 [WARN] raft: Failed to contact 1.124.239.48:8300 in 500.155252ms
2016/03/02 04:21:43 [WARN] raft: Failed to contact 1.124.239.48:8300 in 978.369848ms
2016/03/02 04:21:44 [WARN] raft: Failed to contact 1.124.239.48:8300 in 1.476780564s
2016/03/02 04:21:46 [INFO] memberlist: Suspect stratonode12080 has failed, no acks received
2016/03/02 04:21:48 [INFO] memberlist: Suspect stratonode12080 has failed, no acks received
2016/03/02 04:21:48 [INFO] serf: EventMemberFailed: stratonode12080 1.124.239.48
2016/03/02 04:21:48 [INFO] consul: removing LAN server stratonode12080 (Addr: 1.124.239.48:8300) (DC: stratocluster-14594902935824492167__1_2_1_0_04b35d92)
2016/03/02 04:21:48 [INFO] consul: member 'stratonode12080' failed, marking health critical
2016/03/02 04:21:52 [ERR] raft: Failed to heartbeat to 1.124.239.48:8300: read tcp 1.124.192.47:53238->1.124.239.48:8300: i/o timeout
2016/03/02 04:21:52 [ERR] raft: Failed to pipeline AppendEntries to 1.124.239.48:8300: write tcp 1.124.192.47:53208->1.124.239.48:8300: use of closed network connection
2016/03/02 04:21:52 [INFO] raft: aborting pipeline replication to peer 1.124.239.48:8300
2016/03/02 04:21:53 [WARN] raft: Failed to contact 1.124.239.48:8300 in 500.120413ms
2016/03/02 04:21:53 [WARN] raft: Failed to contact 1.124.239.48:8300 in 991.282529ms
2016/03/02 04:21:54 [WARN] raft: Failed to contact 1.124.239.48:8300 in 1.483350039s
2016/03/02 04:22:02 [ERR] raft: Failed to AppendEntries to 1.124.239.48:8300: dial tcp 1.124.239.48:8300: i/o timeout
2016/03/02 04:22:02 [ERR] raft: Failed to heartbeat to 1.124.239.48:8300: dial tcp 1.124.239.48:8300: i/o timeout
2016/03/02 04:22:11 [ERR] yamux: keepalive failed: connection write timeout
2016/03/02 04:22:11 [ERR] yamux: Failed to write body: write tcp 1.124.192.47:8300->1.124.239.48:57512: use of closed network connection
2016/03/02 04:22:11 [ERR] consul.rpc: multiplex conn accept failed: keepalive timeout from=1.124.239.48:57512
2016/03/02 04:22:12 [ERR] raft: Failed to AppendEntries to 1.124.239.48:8300: dial tcp 1.124.239.48:8300: i/o timeout
2016/03/02 04:22:13 [ERR] raft: Failed to heartbeat to 1.124.239.48:8300: dial tcp 1.124.239.48:8300: i/o timeout
2016/03/02 04:22:14 [WARN] raft: Rejecting vote request from 1.124.239.48:8300 since we have a leader: 1.124.192.47:8300
2016/03/02 04:22:14 [ERR] raft: peer 1.124.239.48:8300 has newer term, stopping replication
2016/03/02 04:22:14 [INFO] raft: Node at 1.124.192.47:8300 [Follower] entering Follower state
2016/03/02 04:22:14 [INFO] raft: aborting pipeline replication to peer 1.124.245.220:8300
2016/03/02 04:22:14 [INFO] consul: cluster leadership lost

Logs from the failing node:
2016/03/02 04:21:11 [WARN] dns: Query results too stale, re-requesting
2016/03/02 04:21:11 [ERR] dns: rpc error: No cluster leader
2016/03/02 04:21:11 [WARN] dns: Query results too stale, re-requesting
2016/03/02 04:21:11 [ERR] dns: rpc error: No cluster leader
2016/03/02 04:21:11 [WARN] dns: Query results too stale, re-requesting
2016/03/02 04:21:11 [ERR] dns: rpc error: No cluster leader
2016/03/02 04:21:11 [WARN] dns: Query results too stale, re-requesting
2016/03/02 04:21:11 [ERR] dns: rpc error: No cluster leader
2016/03/02 04:21:11 [WARN] raft: Election timeout reached, restarting election
2016/03/02 04:21:11 [INFO] raft: Node at 1.124.239.48:8300 [Candidate] entering Candidate state
2016/03/02 04:21:11 [ERR] http: Request GET /v1/kv/consulServers, error: No cluster leader from=127.0.0.1:58941
2016/03/02 04:21:33 [ERR] raft-net: Failed to flush response: write tcp 1.124.239.48:8300->1.124.192.47:64849: write: connection reset by peer
2016/03/02 04:21:34 [INFO] agent: Synced check 'service:nrad'
2016/03/02 04:21:44 [WARN] raft: Heartbeat timeout reached, starting election
2016/03/02 04:21:44 [INFO] raft: Node at 1.124.239.48:8300 [Candidate] entering Candidate state
2016/03/02 04:21:44 [ERR] http: Request GET /v1/kv/cluster/nodestates?recurse, error: No cluster leader from=127.0.0.1:60377
2016/03/02 04:21:44 [ERR] http: Request GET /v1/kv/cluster/nodestates?recurse, error: No cluster leader from=127.0.0.1:60378
2016/03/02 04:21:44 [INFO] memberlist: Suspect stratonode11300 has failed, no acks received
2016/03/02 04:21:45 [WARN] raft: Election timeout reached, restarting election
2016/03/02 04:22:13 [WARN] raft: Election timeout reached, restarting election
2016/03/02 04:22:13 [INFO] raft: Node at 1.124.239.48:8300 [Candidate] entering Candidate state
2016/03/02 04:22:14 [ERR] raft: Failed to make RequestVote RPC to 1.124.192.47:8300: dial tcp 1.124.192.47:8300: i/o timeout
2016/03/02 04:22:14 [ERR] raft: Failed to make RequestVote RPC to 1.124.245.220:8300: dial tcp 1.124.245.220:8300: i/o timeout
2016/03/02 04:22:14 [ERR] raft: Failed to make RequestVote RPC to 1.124.192.47:8300: dial tcp 1.124.192.47:8300: i/o timeout
2016/03/02 04:22:14 [ERR] raft: Failed to make RequestVote RPC to 1.124.245.220:8300: dial tcp 1.124.245.220:8300: i/o timeout
2016/03/02 04:22:15 [INFO] raft: Node at 1.124.239.48:8300 [Follower] entering Follower state
2016/03/02 04:22:15 [WARN] raft: Failed to get previous log: 5155 log not found (last: 5105)

Third server:
2016/03/02 04:21:13 [INFO] consul: New leader elected: stratonode0
2016/03/02 04:21:31 [INFO] serf: EventMemberJoin: stratonode12080 1.124.239.48
2016/03/02 04:21:31 [INFO] consul: adding LAN server stratonode12080 (Addr: 1.124.239.48:8300) (DC: stratocluster-14594902935824492167__1_2_1_0_04b35d92)
2016/03/02 04:21:46 [INFO] memberlist: Suspect stratonode12080 has failed, no acks received
2016/03/02 04:21:48 [INFO] memberlist: Suspect stratonode12080 has failed, no acks received
2016/03/02 04:21:48 [INFO] memberlist: Marking stratonode12080 as failed, suspect timeout reached
2016/03/02 04:21:48 [INFO] serf: EventMemberFailed: stratonode12080 1.124.239.48
2016/03/02 04:21:48 [INFO] consul: removing LAN server stratonode12080 (Addr: 1.124.239.48:8300) (DC: stratocluster-14594902935824492167__1_2_1_0_04b35d92)
2016/03/02 04:22:14 [ERR] http: Request GET /v1/kv/cluster/rabbit/keepalive, error: rpc error: No cluster leader from=127.0.0.1:64410
2016/03/02 04:22:14 [ERR] http: Request GET /v1/kv/cluster/nodestates?recurse, error: rpc error: No cluster leader from=127.0.0.1:64484
2016/03/02 04:22:14 [ERR] http: Request GET /v1/kv/cluster/nodestates?recurse, error: rpc error: No cluster leader from=127.0.0.1:64484
2016/03/02 04:22:15 [ERR] agent: coordinate update error: rpc error: No cluster leader
2016/03/02 04:22:15 [ERR] http: Request GET /v1/kv/cluster/nodes/givenstates/stratonode13788.node.strato, error: rpc error: No cluster leader from=127.0.0.1:64412
2016/03/02 04:22:15 [ERR] http: Request GET /v1/kv/cluster/rabbit/keepalive, error: rpc error: No cluster leader from=127.0.0.1:64410
2016/03/02 04:22:15 [ERR] http: Request GET /v1/kv/cluster/nodestates?recurse, error: rpc error: No cluster leader from=127.0.0.1:64484
2016/03/02 04:22:15 [ERR] http: Request GET /v1/kv/cluster/nodestates?recurse, error: rpc error: No cluster leader from=127.0.0.1:64484
2016/03/02 04:22:15 [ERR] http: Request GET /v1/kv/cluster/nodestates?recurse, error: rpc error: No cluster leader from=127.0.0.1:64485
2016/03/02 04:22:15 [INFO] consul: New leader elected: stratonode0

@arie-stratoscale
Copy link
Author

Is it possible this happens because the disconnected node restarts elections many times, and each time elects itself, increasing the term (https://github.com/hashicorp/raft/blob/master/raft.go#L1612), reaching a higher term than the 2 nodes that are connected?

@sitano
Copy link

sitano commented Mar 3, 2016

We ran into the same thing with a big cluster (3 servers, a lot of clients).

@sitano
Copy link

sitano commented Mar 3, 2016

@arie-stratoscale any ideas on how increment of the term on continuous self reelection related to - i/o timeouts?

@bogdanov1609
Copy link
Contributor

We are also faced with this problem.
Every time when we disable 1 of the 3 servers, occur leader re-elections.
When we put it back to the cluster, re-elections occur again and all this time the cluster is not available for writing.

consul -v
Consul v0.6.3
Consul Protocol: 3 (Understands back to: 1)

@arie-stratoscale
Copy link
Author

I tried to build latest consul (with latest) raft and test again. It seems that the elections being stuck was fixed, but re-elections are still triggered by a server that left joining again. Again, I suspect the fact that the server tries to run elections, voting for himself and increment the Term (while the other server's Term remains the same)

@sitano
Copy link

sitano commented Mar 3, 2016

@arie-stratoscale yes, but it inc only local Term trying selfElect, so it should not affect whole cluster convergence. why it does not see other servers leadership or anything is the question... looks like some race cond. something prevents it to become a Follower of someone or Good Candidate.

@arie-stratoscale
Copy link
Author

Once the disconnected server re-connects, he sends his new Term to the current leader. The current leader, seeing a Term newer than his, steps down and a new election starts. Why is this election stuck in version 0.6.3 I don't know, but compiling latest consul and raft master seems to solve the election being stuck (but still, a disconnect server reconnecting causes an election)

@sitano
Copy link

sitano commented Mar 3, 2016

It is similar to goraft/raft#161 or hashicorp/raft#69 . What i see from my logs case, one server was heavily calling for AppendEntries (thinking its a leader), and second heavily tried to selfElect. Both endup infinite retries.

@slackpad slackpad added the type/bug Feature does not function as expected label Mar 11, 2016
@systeminsightsbuild
Copy link

We seem to be seeing the same issue with 0.6.3.

@ghost
Copy link

ghost commented May 5, 2016

raft-net: Failed to flush response: write tcp 10.0.12.250:8300->10.0.18.20:37991: write: broken pipe

⋊> ~ consul -v 15:52:01
Consul v0.6.4
Consul Protocol: 3 (Understands back to: 1)

The consul lost leader information ,

⋊> ~ curl -v http://127.0.0.1:8500/v1/status/leader                                                                                                                   15:54:17
*   Trying 127.0.0.1...
* Connected to 127.0.0.1 (127.0.0.1) port 8500 (#0)
> GET /v1/status/leader HTTP/1.1
> Host: 127.0.0.1:8500
> User-Agent: curl/7.48.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< Content-Type: application/json
< Date: Thu, 05 May 2016 07:54:19 GMT
< Content-Length: 2
< 
* Connection #0 to host 127.0.0.1 left intact
""⏎   

and on the leader side:


5月 05 15:54:24  consul[4282]: 2016/05/05 15:54:24 [WARN] raft: Rejecting vote request from 10.0.12.250:8300 since we have a leader: 10.0.18.20:8300
5月 05 15:54:25  consul[4282]: 2016/05/05 15:54:25 [WARN] raft: Rejecting vote request from 10.0.12.250:8300 since we have a leader: 10.0.18.20:8300
5月 05 15:54:27 consul[4282]: 2016/05/05 15:54:27 [WARN] raft: Rejecting vote request from 10.0.12.250:8300 since we have a leader: 10.0.18.20:8300

And more error log

05 15:57:31 w consul[31655]:     2016/05/05 15:57:31 [INFO] agent: (LAN) joined: 1 Err: <nil>
5月 05 15:57:31 w consul[31655]:     2016/05/05 15:57:31 [ERR] agent: failed to sync remote state: No cluster leader
5月 05 15:57:31 w consul[31655]:     2016/05/05 15:57:31 [WARN] raft: Failed to get previous log: 738 log not found (last: 725)
5月 05 15:57:31 w consul[31655]:     2016/05/05 15:57:31 [INFO] raft: Removed ourself, transitioning to follower
5月 05 15:57:31 w consul[31655]:     2016/05/05 15:57:31 [INFO] raft: Removed ourself, transitioning to follower
5月 05 15:57:31 w consul[31655]:     2016/05/05 15:57:31 [INFO] raft: Removed ourself, transitioning to follower
5月 05 15:57:31 w consul[31655]:     2016/05/05 15:57:31 [INFO] raft: Removed ourself, transitioning to follower
5月 05 15:57:31 w consul[31655]:     2016/05/05 15:57:31 [INFO] raft: Removed ourself, transitioning to follower
5月 05 15:57:31 w consul[31655]:     2016/05/05 15:57:31 [INFO] raft: Removed ourself, transitioning to follower
5月 05 15:57:31 w consul[31655]:     2016/05/05 15:57:31 [INFO] raft: Removed ourself, transitioning to follower
5月 05 15:57:31 w consul[31655]:     2016/05/05 15:57:31 [INFO] raft: Removed ourself, transitioning to follower
5月 05 15:57:31 w consul[31655]:     2016/05/05 15:57:31 [INFO] raft: Removed ourself, transitioning to follower
5月 05 15:57:31 w consul[31655]:     2016/05/05 15:57:31 [INFO] raft: Removed ourself, transitioning to follower
5月 05 15:57:31 w consul[31655]:     2016/05/05 15:57:31 [INFO] raft: Removed ourself, transitioning to follower
5月 05 15:57:31 w consul[31655]:     2016/05/05 15:57:31 [INFO] raft: Removed ourself, transitioning to follower
5月 05 15:57:31 w consul[31655]:     2016/05/05 15:57:31 [INFO] raft: Removed ourself, transitioning to follower
5月 05 15:57:31 w consul[31655]:     2016/05/05 15:57:31 [INFO] raft: Removed ourself, transitioning to follower
5月 05 15:57:31 w consul[31655]:     2016/05/05 15:57:31 [INFO] raft: Removed ourself, transitioning to follower
5月 05 15:57:31 w consul[31655]:     2016/05/05 15:57:31 [INFO] raft: Removed ourself, transitioning to follower
5月 05 15:57:31 w consul[31655]:     2016/05/05 15:57:31 [INFO] raft: Removed ourself, transitioning to follower
5月 05 15:57:31 w consul[31655]:     2016/05/05 15:57:31 [INFO] raft: Removed ourself, transitioning to follower
5月 05 15:57:31 w consul[31655]:     2016/05/05 15:57:31 [INFO] raft: Removed ourself, transitioning to follower
5月 05 15:57:31 w consul[31655]:     2016/05/05 15:57:31 [INFO] raft: Removed ourself, transitioning to follower
5月 05 15:57:31 w consul[31655]:     2016/05/05 15:57:31 [INFO] raft: Removed ourself, transitioning to follower
5月 05 15:57:31 w consul[31655]:     2016/05/05 15:57:31 [INFO] raft: Removed ourself, transitioning to follower
5月 05 15:57:31 w consul[31655]:     2016/05/05 15:57:31 [INFO] raft: Removed ourself, transitioning to follower

@vervas
Copy link

vervas commented Sep 16, 2016

Could it be that there is an improvement on this with the 0.7.0 release?

@slackpad
Copy link
Contributor

Consul 0.7.0 added a new check during bootstrap-expect which helps prevent new servers from starting spurious leader elections (servers will confirm via RPC that all the peers aren't part of an existing cluster). We've also done a lot of work on management of peer changes.

Closing this one out - please open a new issue if you are still having trouble.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Feature does not function as expected
Projects
None yet
Development

No branches or pull requests

6 participants