etcd leader to follower heartbeat delay #16362
Replies: 5 comments 2 replies
-
It looks like a network issue between the leader and follower etcd node. Is it a transitent issue or persistent? If you have set up a dashboard based on etcd/server/etcdserver/api/rafthttp/metrics.go Lines 154 to 158 in 4a5e9d1 The metric value is published every 30s and measured as a smoothed measured RTT with every 4s prober.
raft does use batching and pipeline to improve the throughput but typically it won't have an order of seconds delay. |
Beta Was this translation helpful? Give feedback.
-
@chaochn47 - On longevity the issue is seen very frequently which leads to apiserver livenessprobe failures and cluster instability. As you mentioned raft does use batching and pipeline to improve the throughput - can you point me to the code where this is happens ? |
Beta Was this translation helpful? Give feedback.
-
batching: pipeline:
|
Beta Was this translation helpful? Give feedback.
-
Could you please provide minimal steps how to reproduce the issue without using longevity? If not, at least we can conclude the issue is highly relevant to the longevity setup but not etcd issue. |
Beta Was this translation helpful? Give feedback.
-
Thank you for your question, this support issue will be moved to our Discussion Forums.
|
Beta Was this translation helpful? Give feedback.
-
Bug report criteria
What happened?
We have been observing instability of etcd on longevity clusters (configuration provided below). Raised an issue on etcd and as per suggestions moved to v3.5.6. However, the issue is still observed, no improvement is seen.
Assuming it to be a disk IO delay issue, moved the /data/dir of etcd to tmpfs. The disk related metrics like wal sync reduced drastically, however, the delay in etcd queries are still seen (as shown below)
{"level":"info","ts":"2023-07-28T04:51:51.896Z","caller":"traceutil/trace.go:171","msg":"trace[704844056] linearizableReadLoop","detail":"{readStateIndex:49273142; appliedIndex:49273142; }","duration":"4.307763075s","start":"2023-07-28T04:51:47.588Z","end":"2023-07-28T04:51:51.895Z","steps":["trace[704844056] 'read index received' (duration: 4.307752129s)","trace[704844056] 'applied index is now lower than readState.Index' (duration: 7.524µs)"],"step_count":2}
As per above trace log - it took 4s to receive read index. Here, we observed the delay happened between the leader and follower when sending a heartbeat.
Following are the heartbeats sent from leader, logged at - stream.go
{MsgHeartbeat 1516085617212468964 14130861031026517270 112 0 0 [{0 0 EntryNormal [0 0 0 0 20 109 125 163]}] 49273142 {[] {{[] [] [] [] false} 0 0}} false 0 [229 22 137 154 210 120 206 104] 0} 2023-07-28 04:51:47.58917773 +0000 UTC m=+276.202110931
{MsgHeartbeat 1516085617212468964 14130861031026517270 112 0 0 [{0 0 EntryNormal [0 0 0 0 35 135 217 147]}] 49273142 {[] {{[] [] [] [] false} 0 0}} false 0 [229 22 137 154 210 120 206 104] 0} 2023-07-28 04:51:48.075661076 +0000 UTC m=+276.688594264
{MsgHeartbeat 1516085617212468964 14130861031026517270 112 0 0 [{0 0 EntryNormal [0 0 0 0 22 87 43 45]}] 49273142 {[] {{[] [] [] [] false} 0 0}} false 0 [229 22 137 154 210 120 206 104] 0} 2023-07-28 04:51:48.574961786 +0000 UTC m=+277.187895003
{MsgHeartbeat 1516085617212468964 14130861031026517270 112 0 0 [{0 0 EntryNormal [0 0 0 0 59 99 37 90]}] 49273142 {[] {{[] [] [] [] false} 0 0}} false 0 [229 22 137 154 210 120 206 104] 0} 2023-07-28 04:51:49.075885782 +0000 UTC m=+277.688818970
{MsgHeartbeat 1516085617212468964 14130861031026517270 112 0 0 [{0 0 EntryNormal [0 0 0 0 4 81 55 220]}] 49273142 {[] {{[] [] [] [] false} 0 0}} false 0 [229 22 137 154 210 120 206 104] 0} 2023-07-28 04:51:49.575413385 +0000 UTC m=+278.188346602
{MsgHeartbeat 1516085617212468964 14130861031026517270 112 0 0 [{0 0 EntryNormal [0 0 0 0 18 218 225 9]}] 49273142 {[] {{[] [] [] [] false} 0 0}} false 0 [229 22 137 154 210 120 206 104] 0} 2023-07-28 04:51:50.075246414 +0000 UTC m=+278.688179606
{MsgHeartbeat 1516085617212468964 14130861031026517270 112 0 0 [{0 0 EntryNormal [0 0 0 0 38 224 107 187]}] 49273142 {[] {{[] [] [] [] false} 0 0}} false 0 [229 22 137 154 210 120 206 104] 0} 2023-07-28 04:51:50.575974035 +0000 UTC m=+279.188907390
{MsgHeartbeat 1516085617212468964 14130861031026517270 112 0 0 [{0 0 EntryNormal [0 0 0 0 21 197 69 80]}] 49273142 {[] {{[] [] [] [] false} 0 0}} false 0 [229 22 137 154 210 120 206 104] 0} 2023-07-28 04:51:51.075443734 +0000 UTC m=+279.688376922
{MsgHeartbeat 1516085617212468964 14130861031026517270 112 0 0 [{0 0 EntryNormal [0 0 0 0 26 24 166 188]}] 49273142 {[] {{[] [] [] [] false} 0 0}} false 0 [229 22 137 154 210 120 206 104] 0} 2023-07-28 04:51:51.575599189 +0000 UTC m=+280.188532443
Following are the heartbeats received at follower, logged at - stream.go
{MsgHeartbeat 1516085617212468964 14130861031026517270 112 0 0 [{0 0 EntryNormal [0 0 0 0 20 109 125 163]}] 49273142 {[] {{[] [] [] [] false} 0 0}} false 0 [229 22 137 154 210 120 206 104] 0} 2023-07-28 04:51:51.894079712 +0000 UTC m=+277.045465036
{MsgHeartbeat 1516085617212468964 14130861031026517270 112 0 0 [{0 0 EntryNormal [0 0 0 0 35 135 217 147]}] 49273142 {[] {{[] [] [] [] false} 0 0}} false 0 [229 22 137 154 210 120 206 104] 0} 2023-07-28 04:51:51.894542369 +0000 UTC m=+277.045927717
{MsgHeartbeat 1516085617212468964 14130861031026517270 112 0 0 [{0 0 EntryNormal [0 0 0 0 22 87 43 45]}] 49273142 {[] {{[] [] [] [] false} 0 0}} false 0 [229 22 137 154 210 120 206 104] 0} 2023-07-28 04:51:51.894678212 +0000 UTC m=+277.046063568
{MsgHeartbeat 1516085617212468964 14130861031026517270 112 0 0 [{0 0 EntryNormal [0 0 0 0 59 99 37 90]}] 49273142 {[] {{[] [] [] [] false} 0 0}} false 0 [229 22 137 154 210 120 206 104] 0} 2023-07-28 04:51:51.89520582 +0000 UTC m=+277.046591168
{MsgHeartbeat 1516085617212468964 14130861031026517270 112 0 0 [{0 0 EntryNormal [0 0 0 0 4 81 55 220]}] 49273142 {[] {{[] [] [] [] false} 0 0}} false 0 [229 22 137 154 210 120 206 104] 0} 2023-07-28 04:51:51.895351185 +0000 UTC m=+277.046736554
{MsgHeartbeat 1516085617212468964 14130861031026517270 112 0 0 [{0 0 EntryNormal [0 0 0 0 18 218 225 9]}] 49273142 {[] {{[] [] [] [] false} 0 0}} false 0 [229 22 137 154 210 120 206 104] 0} 2023-07-28 04:51:51.89575468 +0000 UTC m=+277.047140031
{MsgHeartbeat 1516085617212468964 14130861031026517270 112 0 0 [{0 0 EntryNormal [0 0 0 0 38 224 107 187]}] 49273142 {[] {{[] [] [] [] false} 0 0}} false 0 [229 22 137 154 210 120 206 104] 0} 2023-07-28 04:51:51.895936191 +0000 UTC m=+277.047321549
{MsgHeartbeat 1516085617212468964 14130861031026517270 112 0 0 [{0 0 EntryNormal [0 0 0 0 21 197 69 80]}] 49273142 {[] {{[] [] [] [] false} 0 0}} false 0 [229 22 137 154 210 120 206 104] 0} 2023-07-28 04:51:51.8963489 +0000 UTC m=+277.047734258
{MsgHeartbeat 1516085617212468964 14130861031026517270 112 0 0 [{0 0 EntryNormal [0 0 0 0 26 24 166 188]}] 49273142 {[] {{[] [] [] [] false} 0 0}} false 0 [229 22 137 154 210 120 206 104] 0} 2023-07-28 04:51:51.896495372 +0000 UTC m=+277.047880726
If the time stamps are observed here, there is a difference of around 4s between the first heartbeat sent out from leader and the one received at follower. Also, at leader end the timestamps are with different intervals, however, at follower end, it looks like they all have been received almost at similar timestamp like a queue/batch is released.
Is there any batching happening in etcd which releases multiple messages at once or any other explanation for this delay ?
What did you expect to happen?
etcd follower should receive the heartbeat message from leader instantly
How can we reproduce it (as minimally and precisely as possible)?
It is reproduced on every longevity setup of
Anything else we need to know?
No response
Etcd version (please run commands below)
Etcd configuration (command line flags or environment variables)
Setup Configuration
Cluster Size - 3
Master Nodes - 3
Member Nodes - 0
Kubernetes version - v1.22.17
H/W of each node
80 CPUs - Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz
512 GB RAM
1 Gig Network Adapter - This interface is only used for etcd
4 TB SSD Storage
Etcd Configuration
Version - 3.5.6
Service/Pod - Etcd running as systemd service
/data/dir - mounted on tmpfs
Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)
Relevant log output
No response
Beta Was this translation helpful? Give feedback.
All reactions