-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor celery tasks orchestration #3603
Refactor celery tasks orchestration #3603
Conversation
520d00c
to
bf37fc9
Compare
src/openforms/emails/templates/emails/templatetags/cosign_information.html
Show resolved
Hide resolved
src/openforms/submissions/tests/test_on_completion_retry_chain.py
Outdated
Show resolved
Hide resolved
670f5de
to
55d3d3c
Compare
Yes! Before the sequence of tasks in |
Ok I see, I don't know if it is possible to have a kind of lock to avoid the race conditions in the first place, while still having |
I thought of maybe using the select_for_update to lock the row. However, it needs to run in a transaction and the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chonky PR gets a chonky review 😬
src/openforms/submissions/migrations/0004_auto_20231117_1430.py
Outdated
Show resolved
Hide resolved
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #3603 +/- ##
==========================================
+ Coverage 95.96% 96.03% +0.06%
==========================================
Files 683 689 +6
Lines 21919 22133 +214
Branches 2533 2556 +23
==========================================
+ Hits 21035 21255 +220
+ Misses 611 605 -6
Partials 273 273 ☔ View full report in Codecov by Sentry. |
31db0cd
to
4447669
Compare
src/openforms/registrations/contrib/zgw_apis/tests/test_backend.py
Outdated
Show resolved
Hide resolved
src/openforms/registrations/contrib/stuf_zds/tests/test_failure_modes.py
Outdated
Show resolved
Hide resolved
4447669
to
b411b2f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm seeing some confusion/indecision about the on_post_submission_event
chain.
Some of the code (with the finaliser) aims to let the chain run entirely and add the end reconcile the state & prepare for a potential next run, by figuring out if some things errored and whether they should set the needs_on_completion_retry
flag. This looks like a (commendable) attempt at only having a single place that sets this flag.
However, that conflicts with the idea that some tasks in the chain are expected to abort entirely on (unexpected) exceptions, and the expectation is task-dependant:
- appointment registration: task failure leads to chain abortion and feedback back to the end-user. A re-submit of the submission completion is expected.
- pre-registration: failure is not fatal, a temporary reference is generated and execution continues. No retry flag appears to be set (shouldn't it be set though?)
- report generation - should not fail, but if it does -> fatal
- registration: failure sets retry flag and is fatal if we're inside the retry variant
- payment status update: failure sets flag, failure is fatal when retrying otherwise not
- finalise completion: no failures expected here
Looking at this, I see a couple of challenges still left to solve that impact how elegant our code can be:
is_retrying
is derived from thesubmission
instance state based on the value of the flag. So if task 1 sets that flag and failure is not fatal, for task 2 after it, it will seems as if we're in the retry flow while we could still be very much inside the initial run (!). This essentially requires us to postpone setting the flag until the very end, but that makes it harder to figure out why we need to set the flag. It also requires our tasks to not crash the entire chain because otherwise we risk consistency bugs.- tasks themselves know their implementation and requirements, but don't know in which context they are running (am I in a chain? am I running standalone?) and therefore cannot perform certain side effects reliably
- ideally we have a single celery workflow (chain) to run for the various events that happen so that we, the developers, can follow the code. We also want locality of behaviour so that we don't have to jump around/context switch too much to figure out what is happening where
- we want it to be clear where the
needs_retry
flag is being set
A couple thoughts I have right now that may improve this orchestration:
- rather than deriving
is_retrying
from the submission instance, can we not make this an explicit task argument instead? If we only have a single chain to invoke with a particular event, we can map the event to theis_retrying
boolean and pass it as an explicit task argument - the above bullet makes it possible to set the flag when the problem occurs (locality of code) without affecting downstream tasks in the chain (since they have been passed in upfront whether it's retry behaviour or not)
- we could even translate the
is_retrying
flag into ais_failure_fatal
argument for the task so that we eitherreturn
orraise
on failures. I think we would then also not need the custom exception types anymore. - the finaliser set_retry_flag could be changed into
maybe_reset_retry_flag
, so that combined with theis_retrying
explicit argument, we know whether we should break the retry flow because the finalizer will only be called when the entire chain has succeeded
src/openforms/submissions/tests/test_on_completion_retry_chain.py
Outdated
Show resolved
Hide resolved
acdf14d
to
a512635
Compare
Now when a submission is completed, when a cosign is done, when a payment is completed or a retry flow is triggered (either because the registration failed or a payment status update failed), the same task chain is called. This is to try and keep a maintainable overview of where tasks come from.
6bca176
to
10cf117
Compare
Fixes #3005 (partially)