Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run sync task in its own subprocess #1161

Open
medihack opened this issue Aug 15, 2024 · 5 comments
Open

Run sync task in its own subprocess #1161

medihack opened this issue Aug 15, 2024 · 5 comments
Labels
Issue appropriate for: People up for a challenge 🤯 This issue probably will be challenging to tackle Issue contains: Exploration & Design decisions 🧩 We don't know how this will be implemented yet Issue contains: Some Python 🐍 This issue involves writing some Python code Issue contains: Some Windows 🪟 This issue will require skills and/or testing capabilities under Windows Issue type: Feature ⭐️ Add a new feature that didn't exist before Issue type: Refactor 🔄 Issues involves rewriting some code in a better way PR type: breaking 💥 Contains breaking changes

Comments

@medihack
Copy link
Member

Currently, a synchronous task runs in its own thread (since v2.13.0 / PR #1160). As discussed in #1156, we should evaluate whether we want to run a synchronous task in its own subprocess.

Advantages:

  • Good for CPU-intensive work (for I/O intensive work, we already have async tasks)
  • Shutdown and timeout handling is easier as subprocesses can be killed
  • Tasks are more isolated (no threading issues)

Disadvantages:

  • Complex implementation
    • Communication between job and worker (signals)
    • Calling the job_manager (database context, connector, ....)
  • Problematic Windows support (no forking of processes on Windows)
@medihack medihack added Issue type: Feature ⭐️ Add a new feature that didn't exist before Issue contains: Exploration & Design decisions 🧩 We don't know how this will be implemented yet Issue contains: Some Python 🐍 This issue involves writing some Python code Issue type: Refactor 🔄 Issues involves rewriting some code in a better way Issue appropriate for: People up for a challenge 🤯 This issue probably will be challenging to tackle PR type: breaking 💥 Contains breaking changes Issue contains: Some Windows 🪟 This issue will require skills and/or testing capabilities under Windows labels Aug 15, 2024
@medihack medihack added this to the Version 3.0 milestone Aug 15, 2024
@ewjoachim
Copy link
Member

I have to admit I have limited multiprocessing experience (it just happens to never have been on my radar).

From what I know, Pipes and Queues might be what Python gives us to communicate between processes. Object put in queues are pickled. pipes let us transfer text payloads.

Another complex point (but linked to the JobManager point) is the psycopg pool: does multiprocesing imply that each process will open its own pool ? That might be a little overkill, though I don't know how it will play. Especially: we might not need a connection except if we use task.defer from within the task. We could hack something to use a special connector in the task process that sends postgres queries in the pipe, to be handled by the parent.

@medihack
Copy link
Member Author

From what I know, Pipes and Queues might be what Python gives us to communicate between processes. Object put in queues are pickled. pipes let us transfer text payloads.

Yes, and events (for something like the abort request).

Another complex point (but linked to the JobManager point) is the psycopg pool: does multiprocesing imply that each process will open its own pool ? That might be a little overkill, though I don't know how it will play. Especially: we might not need a connection except if we use task.defer from within the task. We could hack something to use a special connector in the task process that sends postgres queries in the pipe, to be handled by the parent.

Yes, if the database is queried directly, each process would use its own connection pool. I find it difficult to judge whether this could really be a problem in a real-life application or just a theoretical problem. Having a special connector and doing something like RPC between the processes sounds like a cool idea. Unfortunately, the same problem exists for database connections besides Procrastinate, when, for example, users do Django model queries. Those connections would take place in the subprocess anyway.

@ewjoachim
Copy link
Member

I'm a bit wary of changing the model just like this.
I wonder if we should maybe add options (we don't have to pick them all):

  • Legacy runner that runs async tasks & sync tasks via async to sync (breaking change could be that you need to set your worker to legacy explicitly to get the previous behaviour, or legacy could be the default when nothing is specified with a DeprecationWarning)
  • Subprocess runner

As usual the annoying part is to try and guess how it's going to be like for folks who use just async, just sync, a mix of both, with or without Django etc.

@medihack
Copy link
Member Author

medihack commented Sep 6, 2024

Or, make it configurable, as Huey does with worker types. We could have a sync_type option on the worker (or the task itself).
But there is so much stuff already in the v3 release that we should postpone this feature for a later release. We can add it as an experimental feature with a minor release (and keep the default sync tasks threaded), and when we are sure it's stable, we can still switch to subprocesses as the default.

@ewjoachim
Copy link
Member

We can add it as an experimental feature

Yes that was my point: I'm comfortable with adding it, I'm comfortable with it having better support over some features (such as aborting) than the standard way but I'm not (yet) comfortable with removing the way we do things now.

@medihack medihack removed this from the Version 3.0 milestone Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue appropriate for: People up for a challenge 🤯 This issue probably will be challenging to tackle Issue contains: Exploration & Design decisions 🧩 We don't know how this will be implemented yet Issue contains: Some Python 🐍 This issue involves writing some Python code Issue contains: Some Windows 🪟 This issue will require skills and/or testing capabilities under Windows Issue type: Feature ⭐️ Add a new feature that didn't exist before Issue type: Refactor 🔄 Issues involves rewriting some code in a better way PR type: breaking 💥 Contains breaking changes
Projects
None yet
Development

No branches or pull requests

2 participants