-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Sentinel cluster settings 1 node network iface down, Probability unable to query the master node, MasterAddr error: context deadline exceeded #3172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
output:
|
in #3334 , In the failure scenario, a faulty node in the Sentinel cluster causes a context timeout. A scenario may be missed here. When the last select-case handles the err, there may be a correct master address at this time, but it is not judged.
or
|
@kwenzh i will review this once again today. Last time I reviewed it it looked like if all nodes are done, the done channel is closed and only then the wait group will be done. Anyway, the first approach makes sense and will be a more robust approach. Feel free to open a PR, just include a comment why we have this check in the error case. |
Yes, that's correct. After waiting for all nodes to complete, close the done channel, and also wait for the faulty sentinel node to finish exiting the coroutine. Here, the select-case will first retrieve from err. |
#3349 is merged. |
Issue tracker is used for reporting bugs and discussing new features. Please use
stackoverflow for supporting issues.
in a 3 node cluster, 3 sentinel + 3 redis-server, named: A 、B、C node, Construct C node network card goes offline, eg:
ifconfig eth0 down
, then the client reconnects to the Redis Sentinel to find the master address with funcNewFailoverClient
Expected Behavior
Current Behavior
context deadline exceeded
, when it try to connect C sentinel node, return err in https://github.com/redis/go-redis/blob/master/sentinel.go#L559, although A and B is work normaly, the context is deadline in this time, Because the faulty node C is placed in the first place during random sentinel addresses, C exhausts the context time, resulting in the immediate context timeout of A and BPossible Solution
Steps to Reproduce
ifconfig etho down
NewFailoverClient
context deadline exceeded,
Context (Environment)
Detailed Description
I think the point is,
The text was updated successfully, but these errors were encountered: