Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OperationalError: the connection is closed #1180

Open
paulzakin opened this issue Sep 4, 2024 · 9 comments
Open

OperationalError: the connection is closed #1180

paulzakin opened this issue Sep 4, 2024 · 9 comments

Comments

@paulzakin
Copy link
Contributor

Hey @ewjoachim and @medihack an interesting error we got from 2 AM to 10 AM this morning. OperationalError: the connection is closed. Here is the stack trace (below). I think it might be related to #1134. We have been using the pool feature in Django 5.1 for a couple of weeks now (and python manage.py procrastinate worker) but this is the first time this real happended. What happended, as far as I can tell, is that the worker kept reusing a dead connection for some reason. So every X seconds that a task was run, this error would be thrown. But I think it is even more mysterious then that, because some of the tasks succeeded, indictating that one of the worker processes had a good connection and the other had a bad one (we run two processes on the same server). The solution was simply kill those two workers and start two new ones, and that solved the problem.

So, is there any way that when that error is thrown, Procrastinate can "get" a new connection from the pool rather than simply throwing an error to avoid this problem? I was under the impression that Procrastinate manages the pool for the worker, not Django, so this seems doable to me?

OperationalError: the connection is closed
  File "django/db/backends/base/base.py", line 298, in _cursor
    return self._prepare_cursor(self.create_cursor(name))
  File "django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
  File "django/db/backends/postgresql/base.py", line 433, in create_cursor
    cursor = self.connection.cursor()
  File "psycopg/connection.py", line 213, in cursor
    self._check_connection_ok()
  File "psycopg/_connection_base.py", line 524, in _check_connection_ok
    raise e.OperationalError("the connection is closed")
OperationalError: the connection is closed
  File "procrastinate/worker.py", line 281, in run_job
    task_result = await await_func(*job_args, **job.task_kwargs)
  File "server/blueprint.py", line 80, in wrapper
    raise error
  File "server/blueprint.py", line 53, in wrapper
    result = await (
  File "concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "line_profiler/line_profiler.py", line 150, in wrapper
    result = func(*args, **kwds)
  File "server/tasks/event_records.py", line 18, in recurring_event_record_processing
    EventStreamService.process_records()
  File "server/services/aws/event_stream.py", line 60, in process_records
    for event_record in event_records:
  File "django/db/models/query.py", line 400, in __iter__
    self._fetch_all()
  File "django/db/models/query.py", line 1928, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
  File "django/db/models/query.py", line 91, in __iter__
    results = compiler.execute_sql(
  File "django/db/models/sql/compiler.py", line 1572, in execute_sql
    cursor = self.connection.cursor()
  File "django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
  File "django/db/backends/base/base.py", line 320, in cursor
    return self._cursor()
  File "django/db/backends/base/base.py", line 297, in _cursor
    with self.wrap_database_errors:
  File "django/db/utils.py", line 91, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "django/db/backends/base/base.py", line 298, in _cursor
    return self._prepare_cursor(self.create_cursor(name))
  File "django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
  File "django/db/backends/postgresql/base.py", line 433, in create_cursor
    cursor = self.connection.cursor()
  File "psycopg/connection.py", line 213, in cursor
    self._check_connection_ok()
  File "psycopg/_connection_base.py", line 524, in _check_connection_ok
    raise e.OperationalError("the connection is closed")
@medihack
Copy link
Member

medihack commented Sep 4, 2024

If I understand it correctly, the Procrastinate worker uses its own connection pool independent of the Django connection pool (it just fetches the connection parameters from Django). But I wonder if this error originates from the worker or from within the task and thereby a Django connection (from your traceback: File "django/db/models/query.py", line 400, in __iter__). Are those very long-running tasks?

@paulzakin
Copy link
Contributor Author

Very possibly! But now I’m confused. Are you saying the task itself is executed using a Django connection, rather than a procrastinate one? But not all the tasks are long running and this has only happened now over the past couple of weeks so it is odd for sure. What would happen is every minute a short task would run (10 seconds or less) and if used a dead connection effectively. Maybe there is way within Procrastinate to force the connection to release? The stack trace bubbles it from Django to Procrastinate

@medihack
Copy link
Member

medihack commented Sep 5, 2024

Very possibly! But now I’m confused. Are you saying the task itself is executed using a Django connection, rather than a procrastinate one?

Yes, I think so. Procrastinate extracts the connection parameters from a Django connection, but then uses its own connection pool for everything that is worker related. However, when you access the Django ORM in a task, it uses the Django connections (or that pool if it is set up in Django 5.1).
As a sync task is run in its own thread, the Django connections won't be automatically closed (there is no request-response cycle like in a Django view where it is automatically closed). So, from my understanding (and when reading some of the many issues on the web), one has to close them manually. That's why we call django.db.close_old_connections() in our tasks before an expected long-running calculation so that when the connections are re-opened when needed again. As stated in #1134 CONN_MAX_AGE may work, too. Procrastinate can't do anything about that as it has no information on when those Django connections are needed and when they are not.

@medihack
Copy link
Member

medihack commented Sep 5, 2024

@ewjoachim When Procrastinate fetches the connection parameters from Django it creates a new (Django) connection. I wonder if we should explicitly close it after we got the parameters. I don't think it has anything to do with the reported issue, but it could still be problematic as this connection may also time out.

@ewjoachim
Copy link
Member

This might be a solid lead.

@medihack
Copy link
Member

medihack commented Sep 5, 2024

I dug a bit deeper, but I was wrong. When the connection parameters are fetched from Django, no connection is opened. Contrary to the name create_connection, it just creates a database wrapper that fetches the database parameters from the Django settings (that's why its named wrapper in Procrastinate 😂).
@paulzakin I still recommend manually closing those Django connections. Please report back if that issue nevertheless occurs.

@paulzakin
Copy link
Contributor Author

paulzakin commented Sep 5, 2024

Huh! Ok. Weird. The pool in Django I thought would do that. It's a new feature though - maybe it is a bit buggy on there end. Would there be a good place within Procrastinate to close those connections you guys think?

Or should I try to intercept them somewhere else

@medihack
Copy link
Member

medihack commented Sep 5, 2024

Huh! Ok. Weird. The pool in Django I thought would do that. It's a new feature though - maybe it is a bit buggy on there end. Would there be a good place within Procrastinate to close those connections you guys think?

Or should I try to intercept them somewhere else

I can't say much about the new pool feature, but it looks like the errors we had to deal with (without using the pool setting). Those entirely disappeared when we manually closed the connections at specific points in our task. But I also saw some code somewhere where they caught the error, closed the connection, and retried again.
I also wonder how the Django pool feature is related to CONN_MAX_AGE and CONN_HEATH_CHECKS (see "Connection management". The main problem is that Django database connections are built around the request-response cycle, which is not present inside a background task.
When I find time, I can play a bit with long-running tasks that access the Django ORM (with and without the pool option). Do you use the default pool ("pool": True) or pass in you own pool?

@paulzakin
Copy link
Contributor Author

paulzakin commented Sep 5, 2024

No rush, and thanks a bunch! We use the default pool: True - nothing fancy. We can do the close function at the end of each long running task, which makes sense - but I do feel this is something other Procrastinate users will stumble into for sure. Probably worth having some logic in Procrastinate that when raise e.OperationalError("the connection is closed") by Procrastinate, it closes the Django connection as well? That might be what we end up implementing if it happens a bunch more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants