-
Notifications
You must be signed in to change notification settings - Fork 151
Fix retries and failover #320
base: master
Are you sure you want to change the base?
Fix retries and failover #320
Conversation
cade947
to
2be410c
Compare
Pushed this to our staging and it seems to work great with authentication failures / stepdowns etc. (and SSL enabled)! |
This reverts commit edd9eed.
Looks good to me. @arthurnn What do you think? |
+1 |
Found one more issue, if you have a replicaset and you want to re-sync a node (because of disk usage) and the node is in
Steps taken:
It will keep on retrying to authenticate on this node causing constant failures. |
+1 this sees like to fix the issue, too #268 |
+1 this works for me. Anybody using it in production? |
@jperichon, we've been using it successfully in production for 3+ months. We added a couple of patches on top of it to fix up things it missed. Haven't seen any problems with the included commits though--they've been great. |
Pull-request that fixes failover and retry mechanism.
Changes in details:
with_retry
method toCluster
-- it belongs there as it operates on cluster.Node#flush
was was usingensure_connected
, which involves failover, however processing of database messages after executing operations (and raising errors based on them) was outside ofensure_connected
block, therefore failover mechanism wasn't exercised in most cases it was meant for.Reconfigure
failover mechanism -- it was raising new exceptions but not retrying -- it should be good enough to just retry.Errors
class toReply
class, so errors recognition is in one place.Outcome of those changes is that you can kill / restart mongo replica-set nodes in whatever order and as often as you like. You can even stop all of them for couple of seconds (driven by
retry_count
andretry_interval
) and application will be able to recover without loosing any operations or throwing errors.