Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

Failover authentication fixes #311

Closed
wants to merge 2 commits into from
Closed

Failover authentication fixes #311

wants to merge 2 commits into from

Conversation

thijsc
Copy link
Contributor

@thijsc thijsc commented Aug 30, 2014

This pull fixes an issue we've been experiencing in production because a replica set failover can trigger authentication failures. For these failures the Ignore strategy was used, which doesn't mark the node as down. Moped then keeps trying to authenticate with this node in down state which stalls the entire system.

In this pull the strategy for AuthenticationFailure is switched to Retry and we also mark a node as down when there's an authentication failure in the read operation. This behavior is similar to what was in place before 6f211ac. These changes (mostly) fix the issues we had during failover.

thijsc and others added 2 commits August 30, 2014 11:13
When the replica set is reconfigured or a node is in startup mode this
can cause authentication errors. We now use the Retry failover strategy
in this case so we try again and mark the node as down if the operation
fails again.
@coveralls
Copy link

Coverage Status

Coverage increased (+0.0%) when pulling 92d18b1 on appsignal:failover_auth_fix into 2d92a6b on mongoid:master.

@thijsc thijsc changed the title Failover auth fix Failover authentication fixes Aug 30, 2014
@dawid-sklodowski
Copy link

I think your pull-request won't fix the issue here, as failover mechanism for authentication issues is not executed, because:

Node#flush is sending operations to database in ensure_connected block, which handles failover. However processing of messages returned by database is outside of ensure_connected ( https://github.com/mongoid/moped/blob/master/lib/moped/node.rb#L594 ) block, hence exceptions raised there won't be handled by failover mechanism. I believe authentication/authorization error (which might happen after restarting replica-set nodes) is returned in that message.

I've opened a pull-request, which I believe fixes failover/retries issues here: #320

@matsimitsu
Copy link

Yeah we were noticing the same thing, i'll ask @thijsc to close this.
Btw have you seen: #315 ?

@dawid-sklodowski
Copy link

Looking at it

@thijsc
Copy link
Contributor Author

thijsc commented Sep 22, 2014

Closing this one since #315 and #320 seem more promising.

@thijsc thijsc closed this Sep 22, 2014
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants