Skip to content
This repository has been archived by the owner on May 25, 2018. It is now read-only.

Deadlock/Lock when executing too many concurrent similar jobs #3

Open
felipeclopes opened this issue Jun 24, 2014 · 6 comments
Open

Comments

@felipeclopes
Copy link

My application have a high volume of messages, and my jobs have this pattern:

ConvertMyMessageWorker.as_promise(message['id']).then do
    PersistToDBWorker.perform_async(message['id'])
end

So when the system starts queuing messages, it may happen that it got n enqueued ConvertMyMessageWorker where n is the total amount of Sidekiq concurrency.

When that happens, the next jobs to be enqueued try to get a connection from Redis pool and we got an exception similar to the following one:

2014-06-24T04:22:48Z 26045 TID-oxt24qbl0 WARN: Waited 5 sec
2014-06-24T04:22:48Z 26045 TID-oxt24qbl0 WARN: /Users/felipeclopes/.rvm/gems/ruby-2.1.2/gems/connection_pool-2.0.0/lib/connection_pool/timed_stack.rb:42:in `block (2 levels) in pop'
/Users/felipeclopes/.rvm/gems/ruby-2.1.2/gems/connection_pool-2.0.0/lib/connection_pool/timed_stack.rb:34:in `loop'
/Users/felipeclopes/.rvm/gems/ruby-2.1.2/gems/connection_pool-2.0.0/lib/connection_pool/timed_stack.rb:34:in `block in pop'
/Users/felipeclopes/.rvm/gems/ruby-2.1.2/gems/connection_pool-2.0.0/lib/connection_pool/timed_stack.rb:33:in `synchronize'
/Users/felipeclopes/.rvm/gems/ruby-2.1.2/gems/connection_pool-2.0.0/lib/connection_pool/timed_stack.rb:33:in `pop'
/Users/felipeclopes/.rvm/gems/ruby-2.1.2/gems/connection_pool-2.0.0/lib/connection_pool.rb:69:in `checkout'
/Users/felipeclopes/.rvm/gems/ruby-2.1.2/gems/connection_pool-2.0.0/lib/connection_pool.rb:56:in `with'
/Users/felipeclopes/projects/sidekiq-promise/lib/sidekiq/promise/middleware.rb:24:in `publish_message'
/Users/felipeclopes/projects/sidekiq-promise/lib/sidekiq/promise/middleware.rb:20:in `job_errored'
/Users/felipeclopes/projects/sidekiq-promise/lib/sidekiq/promise/server_middleware.rb:15:in `rescue in call'
/Users/felipeclopes/projects/sidekiq-promise/lib/sidekiq/promise/server_middleware.rb:6:in `call'
/Users/felipeclopes/.rvm/gems/ruby-2.1.2/gems/sidekiq-3.1.4/lib/sidekiq/middleware/chain.rb:124:in `block in invoke'
/Users/felipeclopes/.rvm/gems/ruby-2.1.2/gems/sidekiq-3.1.4/lib/sidekiq/middleware/server/active_record.rb:6:in `call'
/Users/felipeclopes/.rvm/gems/ruby-2.1.2/gems/sidekiq-3.1.4/lib/sidekiq/middleware/chain.rb:124:in `block in invoke'
/Users/felipeclopes/.rvm/gems/ruby-2.1.2/gems/sidekiq-3.1.4/lib/sidekiq/middleware/server/retry_jobs.rb:62:in `call'
/Users/felipeclopes/.rvm/gems/ruby-2.1.2/gems/sidekiq-3.1.4/lib/sidekiq/middleware/chain.rb:124:in `block in invoke'
/Users/felipeclopes/.rvm/gems/ruby-2.1.2/gems/sidekiq-3.1.4/lib/sidekiq/middleware/server/logging.rb:11:in `block in call'
/Users/felipeclopes/.rvm/gems/ruby-2.1.2/gems/sidekiq-3.1.4/lib/sidekiq/logging.rb:22:in `with_context'
/Users/felipeclopes/.rvm/gems/ruby-2.1.2/gems/sidekiq-3.1.4/lib/sidekiq/middleware/server/logging.rb:7:in `call'
/Users/felipeclopes/.rvm/gems/ruby-2.1.2/gems/sidekiq-3.1.4/lib/sidekiq/middleware/chain.rb:124:in `block in invoke'
/Users/felipeclopes/.rvm/gems/ruby-2.1.2/gems/sidekiq-3.1.4/lib/sidekiq/middleware/chain.rb:127:in `call'
/Users/felipeclopes/.rvm/gems/ruby-2.1.2/gems/sidekiq-3.1.4/lib/sidekiq/middleware/chain.rb:127:in `invoke'
/Users/felipeclopes/.rvm/gems/ruby-2.1.2/gems/sidekiq-3.1.4/lib/sidekiq/processor.rb:51:in `block in process'
/Users/felipeclopes/.rvm/gems/ruby-2.1.2/gems/sidekiq-3.1.4/lib/sidekiq/processor.rb:94:in `stats'
/Users/felipeclopes/.rvm/gems/ruby-2.1.2/gems/sidekiq-3.1.4/lib/sidekiq/processor.rb:50:in `process'
/Users/felipeclopes/.rvm/gems/ruby-2.1.2/gems/celluloid-0.15.2/lib/celluloid/calls.rb:25:in `public_send'
/Users/felipeclopes/.rvm/gems/ruby-2.1.2/gems/celluloid-0.15.2/lib/celluloid/calls.rb:25:in `dispatch'
/Users/felipeclopes/.rvm/gems/ruby-2.1.2/gems/celluloid-0.15.2/lib/celluloid/calls.rb:122:in `dispatch'
/Users/felipeclopes/.rvm/gems/ruby-2.1.2/gems/celluloid-0.15.2/lib/celluloid/actor.rb:322:in `block in handle_message'
/Users/felipeclopes/.rvm/gems/ruby-2.1.2/gems/celluloid-0.15.2/lib/celluloid/actor.rb:416:in `block in task'
/Users/felipeclopes/.rvm/gems/ruby-2.1.2/gems/celluloid-0.15.2/lib/celluloid/tasks.rb:55:in `block in initialize'
/Users/felipeclopes/.rvm/gems/ruby-2.1.2/gems/celluloid-0.15.2/lib/celluloid/tasks/task_fiber.rb:13:in `block in create'

Do you have any idea for a workaround or a fix to this issue?

Regards,

@jimsynz
Copy link
Owner

jimsynz commented Jun 24, 2014

Can you make an estimate at how many n is? The problem is that for every time you call a worker as_promise then we're checking a redis connection off the pool and not returning it until the promise resolves or rejects. See https://github.com/jamesotron/sidekiq-promise/blob/master/lib/sidekiq/promise/worker.rb for details.

Since it's using redis' pubsub and it's just waiting on messages, maybe I can rework it to use a single subscriber connection, which should stop it from exhausting the connection pool. In the mean time, can you increase the connection pool?

@felipeclopes
Copy link
Author

I think there are two problems with the approach of waiting for subscribers. The first one is the connection_pool that is easily fixed with a refactor to use a single connection as you explained, but there is also another issue.

Just for sake of context, my application is consuming the Twitter stream, so it have a huge volume of an infinite stream.

The main problem relies on the incapacity to process a stream(continuous process) that have a higher volume than the the Sidekiq is able to consume. In this case, with the first code I posted

ConvertMyMessageWorker.as_promise(message['id']).then do
    PersistToDBWorker.perform_async(message['id'])
end

We may end up with all the workers in Sidekiq, working on ConvertMyMessageWorker waiting for a free worker to run the PersistToDBWorker to release the original process. In this case, no mater how big is your pool, all your workers will be waiting for something to finish, without a possibility to finish any of them.

Am I correct in my assumption?

Again, sorry for bothering you, I'm just very interested in use this Gem that makes background processing so clean and add a lot of possibilities when using Sidekiq.

@jimsynz
Copy link
Owner

jimsynz commented Jun 24, 2014

I'm not so sure that this is the case. Since PersistToDBWorker is being called with perform_async and not as_promise the ConvertMyMessageWorker will resolve as soon as the then block returns (i.e. as soon as the job is queued). If you're inner job was running with as_promise also then that could cause the issue you describe. Is that not simply that you have not enough worker resource to deal with the workload at hand?

@jimsynz
Copy link
Owner

jimsynz commented Jul 17, 2014

I haven't forgotten about this - I've been trying to come up with a nice way to share Redis connections, but haven't had a lot of luck due to Celluloid being interesting. In the mean time I have a simplest possible fix that I can do. Standby for action.

@jimsynz
Copy link
Owner

jimsynz commented Jul 28, 2015

Okay, so I was forced to rework this and should be fixed as of 5151445 - can you test if you're still so inclined?

@felipeclopes
Copy link
Author

Sorry, I had to abandon the project, due to the issue open. My project was critical and I had to move with other solution.

Though as I still think this is the best approach to tackle Sidekiq Jobs dependencies, I'll give it a try again and let you know.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants