ActiveRecord::ConnectionTimeoutError "all pooled connections were in use" after enabling concurrency #469

MattFenelon · 2024-03-19T10:36:09Z

I recently enabled concurrency on a system and it resulted in a lot of ActiveRecord::ConnectionTimeoutError errors.

I believe it's because ActiveRecord has a 1 thread = 1 connection policy and the database connection pool for this system is limited to the number of puma threads. As soon as the system receives enough requests to use up the connection pool, the concurrently side-loaded resource threads are unable to get a connection from the pool, and vice versa.

But I'm surprised no one has come across this issue before so perhaps I've misunderstood the problem.

One solution might be to increase the system's database connection pool size. This would allow Graphiti's threads to have enough connections to not exhaust the thread pool. The difficulty would be in sizing the db pool. Given each request can sideload x resources, the total number of database connections required at any given time would be x * number of concurrent requests.

Ideally though I think it would make sense to allow the thread pool size to be configured. Concurrent::Promise, used to side-loaded resources, has by default a thread pool equal to java's int max.

What do you think?

vendor/bundle/ruby/3.2.0/gems/activerecord-7.1.3/lib/active_record/connection_adapters/abstract/connection_pool/queue.rb:128:in `block in wait_poll': could not obtain a connection from the pool within 5.000 seconds (waited 5.005 seconds); all pooled connections were in use (ActiveRecord::ConnectionTimeoutError)
    from vendor/bundle/ruby/3.2.0/gems/activerecord-7.1.3/lib/active_record/connection_adapters/abstract/connection_pool/queue.rb:117:in `loop'
    from vendor/bundle/ruby/3.2.0/gems/activerecord-7.1.3/lib/active_record/connection_adapters/abstract/connection_pool/queue.rb:117:in `wait_poll'
    from vendor/bundle/ruby/3.2.0/gems/activerecord-7.1.3/lib/active_record/connection_adapters/abstract/connection_pool/queue.rb:78:in `internal_poll'
    from vendor/bundle/ruby/3.2.0/gems/activerecord-7.1.3/lib/active_record/connection_adapters/abstract/connection_pool/queue.rb:204:in `internal_poll'
    from vendor/bundle/ruby/3.2.0/gems/activerecord-7.1.3/lib/active_record/connection_adapters/abstract/connection_pool/queue.rb:73:in `block in poll'
    from vendor/ruby-3.2.2/lib/ruby/3.2.0/monitor.rb:202:in `synchronize'
    from vendor/ruby-3.2.2/lib/ruby/3.2.0/monitor.rb:202:in `mon_synchronize'
    from vendor/bundle/ruby/3.2.0/gems/activerecord-7.1.3/lib/active_record/connection_adapters/abstract/connection_pool/queue.rb:82:in `synchronize'
    from vendor/bundle/ruby/3.2.0/gems/activerecord-7.1.3/lib/active_record/connection_adapters/abstract/connection_pool/queue.rb:73:in `poll'
    from vendor/bundle/ruby/3.2.0/gems/activerecord-7.1.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:663:in `acquire_connection'
    from vendor/bundle/ruby/3.2.0/gems/activerecord-7.1.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:353:in `checkout'
    from vendor/bundle/ruby/3.2.0/gems/activerecord-7.1.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:182:in `connection'
    from vendor/bundle/ruby/3.2.0/gems/activerecord-7.1.3/lib/active_record/connection_adapters/abstract/connection_handler.rb:246:in `retrieve_connection'
    from vendor/bundle/ruby/3.2.0/gems/activerecord-7.1.3/lib/active_record/connection_handling.rb:287:in `retrieve_connection'
    from vendor/bundle/ruby/3.2.0/gems/activerecord-7.1.3/lib/active_record/connection_handling.rb:254:in `connection'
    from vendor/bundle/ruby/3.2.0/gems/activerecord-7.1.3/lib/active_record/querying.rb:62:in `_query_by_sql'
    from vendor/bundle/ruby/3.2.0/gems/activerecord-7.1.3/lib/active_record/relation.rb:998:in `block in exec_main_query'
    from vendor/bundle/ruby/3.2.0/gems/activerecord-7.1.3/lib/active_record/relation.rb:1018:in `skip_query_cache_if_necessary'
    from vendor/bundle/ruby/3.2.0/gems/activerecord-7.1.3/lib/active_record/relation.rb:984:in `exec_main_query'
    from vendor/bundle/ruby/3.2.0/gems/activerecord-7.1.3/lib/active_record/relation.rb:962:in `block in exec_queries'
    from vendor/bundle/ruby/3.2.0/gems/activerecord-7.1.3/lib/active_record/relation.rb:1018:in `skip_query_cache_if_necessary'
    from vendor/bundle/ruby/3.2.0/gems/activerecord-7.1.3/lib/active_record/relation.rb:956:in `exec_queries'
    from vendor/bundle/ruby/3.2.0/gems/activerecord-7.1.3/lib/active_record/relation.rb:742:in `load'
    from vendor/bundle/ruby/3.2.0/gems/activerecord-7.1.3/lib/active_record/relation.rb:264:in `records'
    from vendor/bundle/ruby/3.2.0/gems/activerecord-7.1.3/lib/active_record/relation.rb:259:in `to_ary'
    from lib/graphiti_adapters/an_adapter.rb:19:in `resolve'
    from vendor/bundle/ruby/3.2.0/gems/graphiti-1.3.9/lib/graphiti/resource.rb:117:in `resolve'
    from vendor/bundle/ruby/3.2.0/gems/graphiti-1.3.9/lib/graphiti/scope.rb:22:in `block in resolve'
    from vendor/bundle/ruby/3.2.0/gems/graphiti-1.3.9/lib/graphiti/scope.rb:81:in `block in broadcast_data'
    from vendor/bundle/ruby/3.2.0/gems/graphiti-1.3.9/lib/graphiti.rb:65:in `block in broadcast'
    from vendor/bundle/ruby/3.2.0/gems/activesupport-7.1.3/lib/active_support/notifications.rb:206:in `block in instrument'
    from vendor/bundle/ruby/3.2.0/gems/activesupport-7.1.3/lib/active_support/notifications/instrumenter.rb:58:in `instrument'
    from vendor/bundle/ruby/3.2.0/gems/activesupport-7.1.3/lib/active_support/notifications.rb:206:in `instrument'
    from vendor/bundle/ruby/3.2.0/gems/graphiti-1.3.9/lib/graphiti.rb:64:in `broadcast'
    from vendor/bundle/ruby/3.2.0/gems/graphiti-1.3.9/lib/graphiti/scope.rb:80:in `broadcast_data'
    from vendor/bundle/ruby/3.2.0/gems/graphiti-1.3.9/lib/graphiti/scope.rb:20:in `resolve'
    from vendor/bundle/ruby/3.2.0/gems/graphiti-1.3.9/lib/graphiti/resource_proxy.rb:68:in `data'
    from vendor/bundle/ruby/3.2.0/gems/graphiti-1.3.9/lib/graphiti/sideload.rb:231:in `load'
    from vendor/bundle/ruby/3.2.0/gems/graphiti-1.3.9/lib/graphiti/sideload.rb:299:in `resolve'
    from vendor/bundle/ruby/3.2.0/gems/graphiti-1.3.9/lib/graphiti/scope.rb:47:in `block (2 levels) in resolve_sideloads'
    from vendor/bundle/ruby/3.2.0/gems/concurrent-ruby-1.2.3/lib/concurrent-ruby/concurrent/executor/safe_task_executor.rb:24:in `block in execute'
    from vendor/bundle/ruby/3.2.0/gems/concurrent-ruby-1.2.3/lib/concurrent-ruby/concurrent/synchronization/mutex_lockable_object.rb:48:in `block in synchronize'
    from vendor/bundle/ruby/3.2.0/gems/concurrent-ruby-1.2.3/lib/concurrent-ruby/concurrent/synchronization/mutex_lockable_object.rb:48:in `synchronize'
    from vendor/bundle/ruby/3.2.0/gems/concurrent-ruby-1.2.3/lib/concurrent-ruby/concurrent/synchronization/mutex_lockable_object.rb:48:in `synchronize'
    from vendor/bundle/ruby/3.2.0/gems/concurrent-ruby-1.2.3/lib/concurrent-ruby/concurrent/executor/safe_task_executor.rb:22:in `execute'
    from vendor/bundle/ruby/3.2.0/gems/concurrent-ruby-1.2.3/lib/concurrent-ruby/concurrent/promise.rb:564:in `block in realize'
    from vendor/bundle/ruby/3.2.0/gems/concurrent-ruby-1.2.3/lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:359:in `run_task'
    from vendor/bundle/ruby/3.2.0/gems/concurrent-ruby-1.2.3/lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:350:in `block (3 levels) in create_worker'
    from vendor/bundle/ruby/3.2.0/gems/concurrent-ruby-1.2.3/lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:341:in `loop'
    from vendor/bundle/ruby/3.2.0/gems/concurrent-ruby-1.2.3/lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:341:in `block (2 levels) in create_worker'
    from vendor/bundle/ruby/3.2.0/gems/concurrent-ruby-1.2.3/lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:340:in `catch'
    from vendor/bundle/ruby/3.2.0/gems/concurrent-ruby-1.2.3/lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:340:in `block in create_worker'

The text was updated successfully, but these errors were encountered:

github-actions · 2024-03-20T17:49:48Z

🎉 This issue has been resolved in version 1.6.0 🎉

The release is available on:

v1.6.0
GitHub release

Your semantic-release bot 📦🚀

jkeen · 2024-03-26T22:39:15Z

@MattFenelon moving our convo from #470 into here.

In short I'm still having a ton of trouble with web server lockups after this change and it's hard to figure out what exactly is going on… but it seems like without this change everything is a-ok. My DB pool values and the sideload thread max values seem to be reasonable, and yet sometimes things will just hang causing the webserver to not respond to requests.

I think we need to roll this thing back and investigate further on an unreleased branch. The part that worries me most is that there's no error message or any sort of clue that this is the issue, meaning there could be people out there with apps breaking without a clear cut cause. Also for me in order to get things running properly again it has required a server restart, which is a real bummer.

I'm going to put out a new patch release without these changes, and then maybe we can work together to try and resolve what's going on by working off a branch?

…n what's causing thread pool hangs. refs #469 revert: "thread pool scope and mutex need to be global across all instances of Scope for it to be a global thread pool (#471)" revert: "add thread pool and concurrency_max_threads configuration option (#470)" This reverts commit 51fb51c. This reverts commit 697d761.

## [1.6.3](v1.6.2...v1.6.3) (2024-03-26) ### Bug Fixes * Remove thread pool executor logic until we get a better handle on what's causing thread pool hangs. refs [#469](#469) ([7941b6f](7941b6f)), closes [#471](#471) [#470](#470)

MattFenelon · 2024-03-27T10:50:27Z

Sorry to hear that @jkeen. I see you've reverted, that makes sense until we can clear this up.

In terms of diagnosing:

What're your puma thread and workers set to?
Did you use the default concurrency_max_threads setting?
What's pool set to in database.yml?
Does your database have a connection limit? If so, what is it?
How many dynos/containers/vms do you have running?

…handle on what's causing thread pool hangs. refs graphiti-api#469" This reverts commit 7941b6f.

This option allows to limit the maximum number of resources that can be sideloaded concurrently. With a properly configured connection pool, this ensures that the activerecord's connection pool is not exhausted by the sideloading process. The thread pool configuration is based on ActiveRecord's global_thread_pool_async_query_executor. This was previously attempted but there were reports of deadlocks. This code differs from the original by using Concurrency::Delay assigned to a constant instead of a regular Mutex. The Delay+constant is how concurrent-ruby sets up their global thread pools so it's more likely to be correct. Closes graphiti-api#469 See: https://github.com/rails/rails/blob/94faed4dd4c915829afc2698e055a0265d9b5773/activerecord/lib/active_record.rb#L279-L287 See: graphiti-api#470

MattFenelon · 2024-03-27T12:36:28Z

The fact that it's locking up completely makes me think it's a deadlock related to the use of the Mutex. I've gone with a different approach in #472 but I haven't been able to test those changes yet. Do you want to give it a try? I can test them in about a week or so.

…n what's causing thread pool hangs. refs graphiti-api#469 revert: "thread pool scope and mutex need to be global across all instances of Scope for it to be a global thread pool (graphiti-api#471)" revert: "add thread pool and concurrency_max_threads configuration option (graphiti-api#470)" This reverts commit 51fb51c. This reverts commit 697d761.

## [1.6.3](graphiti-api/graphiti@v1.6.2...v1.6.3) (2024-03-26) ### Bug Fixes * Remove thread pool executor logic until we get a better handle on what's causing thread pool hangs. refs [graphiti-api#469](graphiti-api#469) ([7941b6f](graphiti-api@7941b6f)), closes [graphiti-api#471](graphiti-api#471) [graphiti-api#470](graphiti-api#470)

jkeen · 2024-03-27T22:32:09Z

For the staging site that kept running into issues yesterday:

DB_POOL = 30
RAILS_MAX_THREADS = 10
GRAPHITI_CONCURRENCY_MAX_THREADS = 3
WEB_CONCURRENCY = 2

It's running on a VPS and when inspecting the process list it didn't seem like any processes had CPU spiked, or anything else that might usually jump out at me as a problem. I wasn't sure what I should be looking for in spotting any hung threads from the command line, even. Any pointers there?

I can give #472 a try this week sometime. Appreciate the follow up

MattFenelon · 2024-04-12T10:48:38Z

I think I've found the issue. It only seems to happen when side-loading children of child resources. Does that tally up with the side loading you're doing, e.g. include=resource1.resource2(.resource3)+? I don't think this happens when the includes are only 1 level deep, e.g. include=resource1,resource2,(resource3)+.

For the sake of an example, let's imagine a thread pool of 1 and an unbounded queue:

The parent resource creates Promise 1 to load child-resource :authors
Waits for Promise 1 to finish
Thread 1 is leased to run Promise 1. There are no more threads available in the pool.
Thread 1 - Promise 1 - the authors resource creates Promise 2 to load child-resource :books
Thread 1 - Promise 1 - waits for Promise 2 to finish
Promise 2 can't run because the thread pool is exhausted. Thread 1 can't be released because Promise 1 is waiting on Promise 2. This is a deadlock.

Though it's technically a deadlock the graphiti code uses sleep to wait for the promises so it manifests itself as a hang. In my PR I replaced the sleep with Concurrent::Promise.zip(*promises).value!, which connects the threads together, allowing ruby to detect the deadlock and raise a fatal exception No live threads left. Deadlock?.

I think it can be fixed by making the child resource sideload logic non-blocking. That would remove the need for the child resources to wait, which would then free up the threads to be used for other promises in the queue. But it probably needs some kind of queue to allow the parent resource to know when all child resources are loaded.

The parent resource creates Promise 1 to load child-resource :authors
Waits for all promises in the queue to finish
Thread 1 is leased to run Promise 1. There are no more threads available in the pool.
Thread 1 - Promise 1 - the authors resource creates Promise 2 to load child-resource :books
Thread 1 - Promise 1 finishes. Thread 1 is added back to the thread pool.
Thread 1 is leased to run Promise 2.
Thread 1 - Promise 2 - the books resource is loaded. There are no other resources to load.
Thread 1 - Promise 2 finishes.
The parent resource stops waiting as all promises in the queue have finished
The parent resource returns. All resources have been loaded at this point.

A few ifs there but I'm going to try this approach and see how I get on.

I've added a successfully failing test case that shows the deadlock: e4de93e

jkeen · 2024-04-12T14:24:29Z

And just for the sake of clarity—how is that scenario you outlined working now in the released version of graphiti?

I guess what I'm trying to understand is if the strategy you're describing works, does it yield in performance gains, or does it yield a more safe approach to what is currently happening by default?

MattFenelon · 2024-04-17T13:33:25Z

Sure I can do that. There's no (real) limit to the threads that the released version of graphiti can use. Instead the problem is that the unbounded threads eventually can exhaust the available database connection pool as ActiveRecord has a 1 thread = 1 connection policy.

For this failure example, we have a puma thread pool of 1 and a database connection pool of 1:

Thread 1 - The parent resource is accessed.
Thread 1 - The parent resource loads data from ActiveRecord. ActiveRecord leases the thread a connection from the connection pool. The database connection pool is now empty
Thread 1 - The parent resource creates Thread 2 to load the :authors resource
Thread 1 - waits on thread 2.
Thread 2 - Attempts to load author data from ActiveRecord. AR attempts to lease a database connection to the thread but can't because thread 1 has the only available connection.
Thread 2 - raises ActiveRecord::ConnectionTimeoutError "all pooled connections were in use".

In the real world this is less likely to happen because there have to be enough web threads running at the same time to exhaust the database pool.

For example, consider we have 5 puma threads and a database connection pool of 5:

Thread 1 - The parent resource is accessed.
Thread 1 - The parent resource loads data from ActiveRecord. ActiveRecord leases the thread a connection from the connection pool. There are 4 connections left in the pool
Thread 1 - The parent resource creates Thread 2 to load the :authors resource
Thread 1 - waits on thread 2.
Thread 2 - Attempts to load author data from ActiveRecord. ActiveRecord leases the thread a connection from the connection pool. There are 3 connections left in the pool
Thread 3 - a new request comes in. ActiveRecord leases the thread a connection from the connection pool. There are 2 connections left in the pool
Thread 2 - finishes. The connection is made available to the pool again. There are 3 connections left in the pool
Thread 1 - finishes waiting. The connection is made available to the pool again. There are 4 connections left in the pool

And so on. If someone has limited traffic, limited sideloads, or a large database connection pool, they may never see this issue.

A naive way to solve it is to just increase the database connection pool but the AR error will raise its head eventually with enough traffic and sideloads.

jkeen · 2024-06-07T20:27:22Z

@MattFenelon I was adjusting values for db pool, max threads, and web_concurrency today when I ran square into this exact problem! I never ran into it before because of how my database.yml is setup where RAILS_MAX_THREADS wasn't being used for the pool value, where instead I had a separate DB_POOL var defined (and set at 40). Others might have a similar setup, thereby never running into it

MattFenelon · 2024-09-27T16:09:11Z

@jkeen Good to know!

I agree it's very much based on each app's configuration and resource lay out. With low sideloads and/or a large database connection pool, it's unlikely to ever be a problem.

That said, the default for new Rails' apps is to use RAILS_MAX_THREADS for the database connection pool. An app with that configuration will run into the problem sooner or later.

https://github.com/rails/rails/blame/13d5f87208921cc5139cd53b280c42f23c334826/railties/lib/rails/generators/rails/app/templates/config/databases/postgresql.yml.tt

MattFenelon mentioned this issue Mar 19, 2024

Add thread pool and concurrency_max_threads configuration option #470

Merged

jkeen closed this as completed in #470 Mar 20, 2024

github-actions bot added the released label Mar 20, 2024

jkeen mentioned this issue Mar 26, 2024

Fix thread pool scope #471

Merged

jkeen reopened this Mar 26, 2024

MattFenelon added a commit to MattFenelon/graphiti that referenced this issue Mar 27, 2024

Revert "fix: Remove thread pool executor logic until we get a better …

32a4cff

…handle on what's causing thread pool hangs. refs graphiti-api#469" This reverts commit 7941b6f.

MattFenelon linked a pull request Mar 27, 2024 that will close this issue

Add thread pool with promises to limit concurrent sideloading #472

Open

7 tasks

jkeen added bug and removed released labels Mar 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ActiveRecord::ConnectionTimeoutError "all pooled connections were in use" after enabling concurrency #469

ActiveRecord::ConnectionTimeoutError "all pooled connections were in use" after enabling concurrency #469

MattFenelon commented Mar 19, 2024

github-actions bot commented Mar 20, 2024

jkeen commented Mar 26, 2024

MattFenelon commented Mar 27, 2024

MattFenelon commented Mar 27, 2024

jkeen commented Mar 27, 2024

MattFenelon commented Apr 12, 2024

jkeen commented Apr 12, 2024

MattFenelon commented Apr 17, 2024

jkeen commented Jun 7, 2024 •

edited

Loading

MattFenelon commented Sep 27, 2024 •

edited

Loading

ActiveRecord::ConnectionTimeoutError "all pooled connections were in use" after enabling concurrency #469

ActiveRecord::ConnectionTimeoutError "all pooled connections were in use" after enabling concurrency #469

Comments

MattFenelon commented Mar 19, 2024

github-actions bot commented Mar 20, 2024

jkeen commented Mar 26, 2024

MattFenelon commented Mar 27, 2024

MattFenelon commented Mar 27, 2024

jkeen commented Mar 27, 2024

MattFenelon commented Apr 12, 2024

jkeen commented Apr 12, 2024

MattFenelon commented Apr 17, 2024

jkeen commented Jun 7, 2024 • edited Loading

MattFenelon commented Sep 27, 2024 • edited Loading

jkeen commented Jun 7, 2024 •

edited

Loading

MattFenelon commented Sep 27, 2024 •

edited

Loading