After stepping down the primary node, queries failed with error 16550: "not authorized for query on test.users" #247

tanpeter · 2013-12-24T23:47:36Z

I'm using moped 1.5.1, mongodb 2.4.8

Steps to reproduce:

Have a replica set of 3 nodes
A script that does one insert, and one query every couple seconds in the loop (using eventual consistency)
While the script is running, on mongo shell, do rs0:PRIMARY> rs.stepDown(). As the result, one of the secondary nodes becomes primary and the ex-primary becomes a secondary.
Before the replica set information gets refreshed, insert operations still go the ex-primary and get dropped (no new doc created).
After the replica set information refreshes, new docs are now created.
However the query operation starts failing with this error:

#<Moped::Errors::QueryFailure: The operation: #<Moped::Protocol::Query
  @length=61
  @request_id=98
  @response_to=0
  @op_code=2004
  @flags=[:slave_ok]
  @full_collection_name="test.users"
  @skip=0
  @limit=-1
  @selector={:_id=>"52ba15d345f540523400007a"}
  @fields=nil>
failed with error 16550: "not authorized for query on test.users"

On the ex-primary, now-secondary mongodb logs, similar error:

[conn327] assertion 16550 not authorized for query on test.users ns:test.users query:{ _id: ObjectId('52ba0b3c45f5406fd9000086') }

The error only happens on the ex-primary, the other secondary node is fine.
The error won't go away until either I restart the script, or I restart the ex-primary node.

It seems to me that moped continues to use the connection to the ex-primary node for querying purpose now but mongodb requires reauth (or even reconnect, that's what mongo shell does).

Thanks in advance for looking into the issue.

-Peter

The text was updated successfully, but these errors were encountered:

tanpeter · 2013-12-24T23:50:36Z

Here is my test script used in the steps above.

require 'moped'
require 'pp'

#Moped.logger = Logger.new($stdout)

session = Moped::Session.new(%w(
  node1, node2, node3
), refresh_interval: 60)
session.use :test
session.login("test", "password")

1000.to_i.times {|x|
  begin
    id = Moped::BSON::ObjectId.new
    session.with(safe: false)[:users].insert({_id: id, created_at: Time.now })
    sleep 1
    pp x
    pp session[:users].find(_id: id).one
    sleep 1
  rescue => e
    pp e
  end
}

tanpeter · 2014-01-08T00:33:52Z

In moped v1.5.1, node#query method, it tried to re-auth when getting 16550 unauthorized error (you can see the comment says exactly like the scenario I reported). The problem is the "database" variable could be either String or Symbol; when it is a Symbol, that causes the auth.has_key?(database) lookup fails (line 372) so the re-auth never happens.

    def query(database, collection, selector, options = {})
      operation = Protocol::Query.new(database, collection, selector, options)

      process(operation) do |reply|
        if reply.query_failed?
          if reply.unauthorized? && auth.has_key?(database)
            # If we got here, most likely this is the case of Moped
            # authenticating successfully against the node originally, but the
            # node has been reset or gone down and come back up. The most
            # common case here is a rs.stepDown() which will reinitialize the
            # connection. In this case we need to requthenticate and try again,
            # otherwise we'll just raise the error to the user.
            login(database, *auth[database])
            reply = query(database, collection, selector, options)
          else
            raise Errors::QueryFailure.new(operation, reply.documents.first)
          end
        end
        reply
      end
    end

Converting the database variable to a String fixed the issue. However it appears that in 2.0.0-beta, the error/failover handling and auth have major refactoring and it looks like this bug doesn't happen anymore (I didn't do thorough testing with 2.0.0).

To workaround this issue, just use string as a database name (e.g. session.use 'test' not session.use :test) and you won't run into this issue.

arthurnn · 2014-01-08T01:12:00Z

Can you tried this against moped master? is the error still happening?

tanpeter · 2014-01-09T00:53:11Z

I ran against the latest master branch; in short, yes it is still happening. But the sequence of events has changed so I'll just state what I observed below:

It seems like even with safe: false option (which should be default), an insert operation incurs a read, probably the getlasterror command (in v1.5.1 with safe: false, inserts never complaint about unauthorized errors, now they do). Here is a reply from an insert operation:

#<Moped::Protocol::Reply
  @length=177
  @request_id=7658
  @response_to=11
  @op_code=1
  @flags=[:await_capable]
  @cursor_id=0
  @offset=0
  @count=1
  @documents=[{"n"=>0, "lastOp"=>#<BSON::Timestamp:0x007fa26bb324b8 @seconds=1389227990, @increment=1>, "connectionId"=>212, "writtenTo"=>[{"_id"=>0, "host"=>"ip-10-231-128-103:27017"}], "wtime"=>0, "err"=>nil, "ok"=>1.0}]>

Using the same test script in my second post, all the read operations went to the primary. I tried "session.with(consistency: :eventual)" and it didn't help.
Because of 2, when I did rs.stepDown(), reads went to the new primary and they were fine. However I did two more rs.stepDown (I have total 3 nodes), it made moped to send read operations to the node which had been stepped down before, and 16550 unauthorized errors happened:

durran · 2014-01-09T07:33:44Z

See readme for master changes - safe: false and consistency: :eventual are no longer valid options and would be noops. https://github.com/mongoid/moped/blob/master/CHANGELOG.md#api-changes-backwards-incompatible

tanpeter · 2014-01-09T19:46:01Z

Thanks for the explanation, I should have read the CHANGELOG.

Now with read: :secondary option, the same error occurs whenever the read hits the ex-primary node

#<Moped::Protocol::Reply
  @length=100
  @request_id=176592
  @response_to=85
  @op_code=1
  @flags=[:query_failure]
  @cursor_id=0
  @offset=0
  @count=1
  @documents=[{"$err"=>"not authorized for query on test.users", "code"=>16550}]>
#<Moped::Errors::QueryFailure: The operation: #<Moped::Protocol::Query
  @length=61
  @request_id=85
  @response_to=0
  @op_code=2004
  @flags=[:slave_ok]
  @full_collection_name="test.users"
  @skip=0
  @limit=-1
  @selector={:_id=>BSON::ObjectId('52cde95e737061416a110000')}
  @fields=nil>
failed with error 16550: "not authorized for query on test.users"

tanpeter · 2014-01-09T19:53:17Z

Also the workaround for moped 1.5.1 (ie. use session.use 'test' instead of session.use :test) doesn't work for 2.0

chrisckchang · 2014-07-29T06:34:41Z

@ptan-iswifter are you still seeing this issue with 2.0.0? I just tried testing on 1.5.1 and 2.0.0 and it looks like this is still an issue... :/

chrisckchang · 2014-07-30T17:49:37Z

was able to resolve using refresh_interval setting.

duromano · 2014-08-20T17:28:52Z

@chrischang12 how exactly did you solve this with the refresh_interval setting?

chrisckchang · 2014-08-20T17:32:51Z

@duromano refresh_interval helps discover the status of nodes quicker. by default mongoid/moped caches information about a node for 300 seconds/5 minutes. if there are any changes to a node's state within these 5 minutes, that means that the driver won't know about it. lowering the refresh_interval makes the driver check the status of nodes more frequently.

see source:
https://github.com/mongoid/moped/blob/master/lib/moped/cluster.rb#L96

duromano · 2014-08-20T17:56:55Z

@chrischang12 hmmm I see.. It makes sense! Thanks!
But I'm not sure if it will solve my problem, cause once this "Moped authorization error" pops up, it never comes back to normal (even after 5, 30, 60.. minutes), unless I restart my application.
What drives me crazy is the fact that the application runs normal for days, and suddenly I get this error.. :/

chrisckchang · 2014-08-20T18:01:06Z

@duromano i'm not too familiar with how moped creates/manages connections but one potential workaround could be simply closing the connection that throws the error after X number of times and creating a new one. i'm assuming here that you're only getting that error when the driver reuses existing connections, as opposed to newly created connections.

zarqman · 2014-09-04T22:40:08Z

We just got caught by this in production. Seems both rs.reconfig() and rs.stepDown() can trigger it.

A review of the mongodb logs suggests that moped may be losing the authentication credentials as a db.auth() call is not being logged (but is logged when the app first starts, so we know auth is in use and configured correctly). We do see a new connection to the mongodb server at the time of the error, just not the expected db.auth() right after.

If this diagnosis is true, that's the real source of the problem, and not the error & trace below, which are just the natural result of the auth error.

Current error, as of moped 2.0.0:

Moped::Errors::QueryFailure: The operation: #<Moped::Protocol::Query
  @length=202
  @request_id=6
  @response_to=0
  @op_code=2004
  @flags=[:slave_ok]
  @full_collection_name="xxxx.yyyy"
  @skip=0
  @limit=-1
  @selector={"$query"=>{"domains"=>"zzzz"}, "$orderby"=>{:_id=>1}}
  @fields=nil>
failed with error 16550: "not authorized for query on xxxx.yyyy"

Backtrace (apologies for formatting):

moped/operation/read.rb ? in block in execute at line 50
moped/node.rb ? in [] at line 594
moped/node.rb ? in block (2 levels) in flush at line 594
moped/node.rb ? in map at line 593
moped/node.rb ? in block in flush at line 593
moped/node.rb ? in block in logging at line 617
active_support/notifications.rb ? in block in instrument at line 159
active_support/notifications/instrumenter.rb ? in instrument at line 20
active_support/notifications.rb ? in instrument at line 159
moped/instrumentable.rb ? in instrument at line 31
moped/node.rb ? in logging at line 616
moped/node.rb ? in flush at line 587
moped/node.rb ? in process at line 391
moped/operation/read.rb ? in execute at line 48
moped/node.rb ? in read at line 648
moped/node.rb ? in query at line 411
moped/query.rb ? in block in first at line 128
moped/cluster.rb ? in block in with_primary at line 243
moped/node.rb ? in block in ensure_primary at line 204
moped/executable.rb ? in execute at line 25
moped/node.rb ? in ensure_primary at line 203
moped/cluster.rb ? in with_primary at line 242
moped/read_preference/primary_preferred.rb ? in block in with_node at line 47
moped/read_preference/selectable.rb ? in call at line 65
moped/read_preference/selectable.rb ? in with_retry at line 65
moped/read_preference/primary_preferred.rb ? in with_node at line 45
moped/query.rb ? in first at line 127
mongoid/query_cache.rb ? in block in first_with_cache at line 187
mongoid/query_cache.rb ? in with_cache at line 135
mongoid/query_cache.rb ? in first_with_cache at line 186
mongoid/contextual/mongo.rb ? in block (2 levels) in first at line 199
mongoid/contextual/mongo.rb ? in with_sorting at line 535
mongoid/contextual/mongo.rb ? in block in first at line 198
mongoid/contextual/mongo.rb ? in try_cache at line 447
mongoid/contextual/mongo.rb ? in first at line 197
mongoid/contextual.rb ? in first at line 20
mongoid/findable.rb ? in find_by at line 107
our_model.rb ? in smart_find at line 141 (application)
...

wandenberg · 2014-09-05T16:34:30Z

Hi,

I was working on these problems and created a fix here.
The is the same fix to 1.5.2 is here.

nadilsons · 2014-09-06T03:49:34Z

Great @wandenberg !!!
Works fine here! Tks!

wandenberg · 2014-09-18T14:52:44Z

This is good.
I did just a small change, move the code to (1.5.0|2.0.0)-fix_retries_operations branches
and created the pull request for 2.0.0 and 1.5.2

arthurnn · 2014-10-16T18:03:10Z

closing this as #324 was merged and it should solve this issue

ghost assigned durran Jan 8, 2014

iabdulin mentioned this issue Jan 20, 2014

Random "error 16550: not authorized for query on db_name.collection_name" #252

Closed

zarqman mentioned this issue Oct 4, 2014

Fix 16550 "not authorized", TypeError, and others #324

Merged

arthurnn closed this as completed Oct 16, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

After stepping down the primary node, queries failed with error 16550: "not authorized for query on test.users" #247

After stepping down the primary node, queries failed with error 16550: "not authorized for query on test.users" #247

tanpeter commented Dec 24, 2013

tanpeter commented Dec 24, 2013

tanpeter commented Jan 8, 2014

arthurnn commented Jan 8, 2014

tanpeter commented Jan 9, 2014

durran commented Jan 9, 2014

tanpeter commented Jan 9, 2014

tanpeter commented Jan 9, 2014

chrisckchang commented Jul 29, 2014

chrisckchang commented Jul 30, 2014

duromano commented Aug 20, 2014

chrisckchang commented Aug 20, 2014

duromano commented Aug 20, 2014

chrisckchang commented Aug 20, 2014

zarqman commented Sep 4, 2014

wandenberg commented Sep 5, 2014

nadilsons commented Sep 6, 2014

wandenberg commented Sep 18, 2014

arthurnn commented Oct 16, 2014

After stepping down the primary node, queries failed with error 16550: "not authorized for query on test.users" #247

After stepping down the primary node, queries failed with error 16550: "not authorized for query on test.users" #247

Comments

tanpeter commented Dec 24, 2013

tanpeter commented Dec 24, 2013

tanpeter commented Jan 8, 2014

arthurnn commented Jan 8, 2014

tanpeter commented Jan 9, 2014

durran commented Jan 9, 2014

tanpeter commented Jan 9, 2014

tanpeter commented Jan 9, 2014

chrisckchang commented Jul 29, 2014

chrisckchang commented Jul 30, 2014

duromano commented Aug 20, 2014

chrisckchang commented Aug 20, 2014

duromano commented Aug 20, 2014

chrisckchang commented Aug 20, 2014

zarqman commented Sep 4, 2014

wandenberg commented Sep 5, 2014

nadilsons commented Sep 6, 2014

wandenberg commented Sep 18, 2014

arthurnn commented Oct 16, 2014