-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Single Consul server stop and start causes election even with established quorum #1674
Comments
Hi @glenwong - this should definitely not cause a leader election. This log line is very strange from your leader when the other node joins:
Is it possible that node is being started in bootstrap mode and maybe leading to a split brain situation? |
Hi @slackpad. My consul info gives back this for two of the machines:
And this for the third:
And the config.json I'm using for the servers is:
I've tried taking out the "bootstrap_expect" option as well as the "skip_leave_on_interrupt" but still continue to repro the issue. |
Also to add more info. In order to repro this I have to leave the killed process down until I see this in the other consul servers logs:
If I kill the process for a shorter time and bring it up, I don't observe the re-election. |
Hmm - your configuration looks good. Just one sanity check - is it possible there's any sharing of state through /var/consul being mapped to the same place in the Vagrant boxes (want to make sure there's nothing weird with corruption possible along those lines). If not, I'll need to dig in and try to see why it has a newer term. |
Never mind on that last comment - I can reproduce this. Taking a look. |
I think I know what's going on with this one. When you fail a server for several seconds, you end up running up the heartbeat/AppendEntries failures counter down in the Raft layer. These end up causing the attempts to back off, up to a limit of about 50 seconds: https://github.com/hashicorp/raft/blob/master/replication.go#L311 This ends up isolating the newly-added server from heartbeat messages, so it doesn't hear from the leader and then starts an election, which is the right thing to do. It looks like the Raft implementation handles all this ok and settles back into a stable state, but it would be good to see if we can make this better. I think this isn't a big problem in practice because typically servers restart in under a second, or they die and are replaced later. |
Thanks for the explanation. My concern is that while Raft is settling down, there is no cluster leader so for the Consul clients the service is basically out for a second or two which isn't a huge deal. It just doesn't seem like a non-leader dying and rejoining should ever cause any outage while a stable quorum exists. Also I'm a bit worried about scenarios where a machine in the cluster is being intermittent. For comparison, I tested this scenario with etcd and zookeeper with an identical setup and didn't experience any outage. |
@glenwong agreed - we should be able to fix this and avoid this situation. In practice this doesn't seem to come up very often - even an intermittent machine would have to hit kind of a sweet spot in the backoff timing. You can set things like https://www.consul.io/docs/agent/options.html#allow_stale to allow other servers to service read requests (this is for DNS and there are similar controls for the HTTP endpoints) when a leader isn't available. This is helpful for riding out leader elections in general, and allow for reads to scale better across all your servers. |
@slackpad You mention that there are similar configuration controls for enabling/permitting stale responses via HTTP endpoints. Sorry for the stupid question, but which settings are those ? I see the DNS-related state configuration settings, but nothing explicitly about HTTP responses. Thanks |
Hi @mrwilby those are controlled per-request and not configured on the agent. Please take a look at https://www.consul.io/docs/agent/http.html, specifically the "Consistency Modes" section. This is the most relaxed:
|
We did some additional work in #1782 (comment) to address this. |
I have a test case with 3 vagrant boxes running Consul in server mode and I noticed if I kill one of the Consul processes and then start it again, it causes an election even though there is already an established leader. This causes a momentary http api access outage for clients. I've attached the logs I see below: the machines are called proxy001, proxy002, and proxy003
Leader logs:
Restarted Server logs:
From reading the docs, I was under the impression that with a cluster of 3, a single failure of a non-leader node and subsequent rejoin shouldn't cause a leader election to occur?
The text was updated successfully, but these errors were encountered: