Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option to check for liveness on checkout #47

Open
rdp opened this issue Apr 5, 2017 · 8 comments
Open

Add an option to check for liveness on checkout #47

rdp opened this issue Apr 5, 2017 · 8 comments

Comments

@rdp
Copy link

rdp commented Apr 5, 2017

Feature request: AFAICT there is no concept of "is this connection still alive" in the connection pool system.

Adding something like that might be nice. The easiest way (on my head) I can think of to implement this would be "at checkout time" it performs some trivial check like "select 1" or what not, before returning a connection, or something similar.

Thanks!

@rdp
Copy link
Author

rdp commented Apr 12, 2017

It seems that if I bring a database down then up (to test this) the pool seems to still recover after about 5 tries (which is of course 5 too many) but possibly there already is some logic like this in there, which is good news. Another option (though possibly more complex) might be to re-check connections "on checkin" as well as "periodically if idle" FWIW :)

@stakach
Copy link
Contributor

stakach commented Feb 13, 2021

did you end up finding a solution for this @rdp ?

Working with GCP's hosted postgres compatible database and new connections take upwards of 2 seconds to connect meaning we really need to rely on the pool.
So we have the pool configured with max_idle_pool_size == max_pool_size == 50 but if nothing is going on for awhile the connections eventually timeout and broken sockets are returned causing 500 errors in our web app Caused by: (DB::ConnectionLost)

Another issue is the pool doesn't recover from a DB outage - we work around that using health checks, but it's not optimal

@rdp
Copy link
Author

rdp commented Feb 16, 2021 via email

@bcardiff
Copy link
Member

The DB pool should recover from db outage within some limits. See https://crystal-lang.org/reference/database/connection_pool.html

My idea would be to have something like Go's ConnMaxLifetime but I would like to hear what would be better.

  1. To ensurer that initial_pool_size young connections are always available in the pool.
  2. To ensurer that at least initial_pool_size young connections are always available in the pool while honoring max_idle_pool_size, max_pool_size.
  3. Upon connection checkout recreate it if the max lifetime was exceeded.

Having a connection check upon checkout could be combined easily with 3, but 1 and 2 could also be extended to incorporate a specific check defined per driver.

@stakach
Copy link
Contributor

stakach commented Feb 16, 2021

Keeping the initial_pool_size worth of connections young (not exceeding the max lifetime) makes sense to me
it might be disruptive to recycle them all at once unless a new connection is spun up before an old connection is removed

I think point 3 might be problematic, at least with GCP, as the new connection would take time to be ready.
Might make sense to refresh the connection on return to the pool - i.e. don't return it and instead create a new connection
With points 1 and 2 making sure there are young connections generally available it is probably less of an issue that you occasionally have an older connection if the user configures the max_life_time timeout with a bit of a buffer.

Might be better to call it a target_conn_life_time to hint at this

@stakach
Copy link
Contributor

stakach commented Sep 3, 2021

We were recently doing some testing with https://github.com/anykeyh/clear and can confirm that it doesn't recover from a DB disconnection.

The reason for this is that clear handles the DB::ConnectionLost exception however even if the connection is closed? it is returned to the pool... which seems like an oversight.
I think closed connections should be discarded regardless of if ConnectionLost is raised.

Looking at this line: https://github.com/crystal-lang/crystal-db/blob/master/src/db/database.cr#L126
We could do something like

    def using_connection
      connection = self.checkout
      begin
        yield connection
      ensure
        raise ConnectionLost.new(connection) if connection.closed?
        connection.release
      end
    end

or possible something like connection.release unless connection.closed? would work. Not entirely across the internals here

@robcole
Copy link

robcole commented Apr 26, 2022

@stakach I’ve been running into frequent (5-10 daily) ConnectionLost errors in a new Lucky app, and I’m curious if the work you did in a25f336 resolved this?

I’m testing a patch right now similar to what you quoted above, but from my reading of the code it seems like the call to .release should handle all of this regardless. I’m a bit stumped. 🤔

@stakach
Copy link
Contributor

stakach commented Apr 26, 2022

@robcole It should have resolved this, however it's up to the DB implementation to implement retries, for the postgres shard I think you do this via the connection string retry_attempts=4 etc
postgres://user@/test?initial_pool_size=5&retry_attempts=4

that way if a disconnection does occur, your app recovers seamlessly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants