-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error: "Remaining connection slots are reserved for non-replication superuser connections" #117
Comments
Traceback from within the Docker container on Materials Cloud (
|
@giovannipizzi or @sphuber would you know what the issue is here? Does it have to do with the SQLAlchemy sessions setup in AiiDA or could it be something different? |
This is a message from PostgreSQL that it has too many open connections. The default is configured to something like a 100. Exceed that number and new connections will simply be refused. Note that this is across all databases. So if you have a PostgreSQL installation with a hundred databases and each database has a single open connection, then you also hit the limit. Since this is a server, this could be simply this, a scaling/configuration problem, or there may be a bug in our code where connections are not correctly cleaned up. I suspect it is the former where the database just gets too many concurrent open connection requests and some of them fail. How many databases are you running on this server and are they getting a lot of (simultaneous) requests? |
This is connected to a PostgreSQL database cluster on a separate db server, which hosts all Materials Cloud's production databases. Connecting to another database, i.e., another AiiDA-OPTIMADE server, which uses a PostgreSQL database on the separate db server, there is no issue when doing queries. Each AiiDA-OPTIMADE server is running within Docker, where the external PostgreSQL databases are found through a closed Docker network (as far as I understand, perhaps @ltalirz could clarify this?) |
On another note, it seems the error is (temporarily) not thrown. |
There are a number of separate processes using that database (AiiDA REST APIs, optimade APIs, discover sections) but certainly less than 100 overall. |
these are the active connections from the development server on Materials Cloud using the command I linked above
Idle connectionsAs one can see from There is a nice writeup on this topic here and, in essence, the answer is: connections are a scarce resource and idle worker processes should not hold open connections. We should check the behavior of the REST API, Optimade API and discover sections in this regard and adapt them, if necessary. Accounting
Even though the wsgi daemon for the REST API, the optimade API container and the discover section containers each run just a single process (I checked), I suspect that each thread will use its own connection. Perhaps someone can confirm (if threads were sharing DB connection, some logic would need to be in place for them not to interfere between each other; I guess this is not the case). The wsgi daemons, for example, are currently configured to use up to 20 threads [1], however, I guess only threads that are actually "used" at least once will acquire an open connection. [1] All these threads + a few more are created in advance as one can see from:
|
Excellent detective work @ltalirz ! Thank you. I think we have a quite large issue on our hands here, at least in the context of Materials Cloud, and future similar uses of AiiDA databases. I will continue the discussion in aiidateam/aiida-core#4374, since it seems the solution depends foremost on AiiDA core and what policy we decide should be prevalent concerning this issue. If we decide not to address it in AiiDA, it then becomes a matter of implementing a workaround or solution for Materials Cloud. |
Since you guys are mostly just using AiiDA to query the database and are not running the daemon, the only connections should go through the query builder. Even for Django, this goes through a SqlAlchemy and it is SqlAlchemy that manages a connection pool. The idea is then that we don't have to close the connections (which also has an overhead and closing/opening everytime may not be the most efficient) and we simply reuse open ones when needed. I think this may ultimately simply be a configuration problem. Of course if you start serving too many applications from a single server and PSQL cluster, at some point you run out of resources. If you think the current amount of projects should perfectly be manageable with the default of a 100 connections, then we can always configure the connection pool of SqlAlchemy. @CasperWA added a commit not too long ago (the commit to test the correct handling of the session for the REST API) that allows to configure the parameters of the SQLA connection pool. I don't think it is fully plumbed through that you can configure this dynamically per AiiDA profile, but if necessary you could add this. Otherwise you can try a temporary hardcoded solution and limit the number of connections in a pool. The default number of connections per pool seems to be 5. |
Following error was found from the scdm / autowannier server hosted on Materials Cloud:
This means the server raised a Python exception like:
OperationalError: FATAL: remaining connection slots are reserved for non-replication superuser connections
.The text was updated successfully, but these errors were encountered: