Refactor celery tasks orchestration #3603

SilviaAmAm · 2023-11-13T16:02:54Z

Fixes #3005 (partially)

src/openforms/emails/confirmation_emails.py

src/openforms/emails/templates/emails/templatetags/cosign_information.html

src/openforms/payments/services.py

src/openforms/submissions/tasks/__init__.py

src/openforms/submissions/tasks/emails.py

src/openforms/submissions/tests/test_on_completion_chain.py

src/openforms/submissions/tests/test_on_completion_retry_chain.py

src/openforms/submissions/tests/test_post_submission_event.py

src/openforms/submissions/tests/test_task_co_sign.py

src/openforms/submissions/utils.py

src/openforms/submissions/tests/test_on_completion_chain.py

src/openforms/payments/views.py

src/openforms/submissions/tasks/__init__.py

SilviaAmAm · 2023-11-20T15:25:38Z

@Viicos

Was this the reason on_completion_task_ids was moved to a separate model?

Yes! Before the sequence of tasks in on_completion was only called when the submission was completed (and so we could save the tasks ids on the submission field). But now, the on_post_submission_event is called in multiple places, so if it triggered twice at the same time, then you risk that one call overwrites the task ids set by the other call. This is why I decided to go for a separate model.

Viicos · 2023-11-20T15:30:00Z

so if it triggered twice at the same time, then you risk that one call overwrites the task ids set by the other call.

Ok I see, I don't know if it is possible to have a kind of lock to avoid the race conditions in the first place, while still having on_completion_task_ids as a field 🤔

SilviaAmAm · 2023-11-20T15:38:55Z

I thought of maybe using the select_for_update to lock the row. However, it needs to run in a transaction and the on_post_submission_event runs when the transaction is committed so it is no longer in the transaction :S

sergei-maertens

Chonky PR gets a chonky review 😬

src/openforms/emails/confirmation_emails.py

src/openforms/payments/tasks.py

src/openforms/payments/tests/test_views.py

src/openforms/payments/views.py

src/openforms/submissions/migrations/0004_auto_20231117_1430.py

src/openforms/submissions/tests/test_post_submission_event.py

src/openforms/submissions/tests/test_submission_completion.py

src/openforms/submissions/tests/test_submission_status.py

src/openforms/appointments/tests/test_confirmation_emails.py

codecov · 2023-11-27T13:16:20Z

Codecov Report

Attention: 4 lines in your changes are missing coverage. Please review.

Comparison is base (8002396) 95.96% compared to head (10cf117) 96.03%.
Report is 39 commits behind head on master.

Files	Patch %	Lines
src/openforms/payments/tasks.py	93.33%	0 Missing and 1 partial ⚠️
src/openforms/submissions/api/viewsets.py	66.66%	0 Missing and 1 partial ⚠️
...rms/submissions/models/post_completion_metadata.py	93.75%	1 Missing ⚠️
src/openforms/submissions/status.py	90.00%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #3603      +/-   ##
==========================================
+ Coverage   95.96%   96.03%   +0.06%     
==========================================
  Files         683      689       +6     
  Lines       21919    22133     +214     
  Branches     2533     2556      +23     
==========================================
+ Hits        21035    21255     +220     
+ Misses        611      605       -6     
  Partials      273      273

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/openforms/submissions/models/post_completion_metadata.py

src/openforms/submissions/models/submission.py

src/openforms/submissions/tasks/__init__.py

src/openforms/submissions/tests/test_admin.py

src/openforms/registrations/contrib/zgw_apis/tests/test_backend.py

src/openforms/registrations/tests/test_registration_hook.py

src/openforms/registrations/contrib/stuf_zds/tests/test_failure_modes.py

sergei-maertens

I think I'm seeing some confusion/indecision about the on_post_submission_event chain.

Some of the code (with the finaliser) aims to let the chain run entirely and add the end reconcile the state & prepare for a potential next run, by figuring out if some things errored and whether they should set the needs_on_completion_retry flag. This looks like a (commendable) attempt at only having a single place that sets this flag.

However, that conflicts with the idea that some tasks in the chain are expected to abort entirely on (unexpected) exceptions, and the expectation is task-dependant:

appointment registration: task failure leads to chain abortion and feedback back to the end-user. A re-submit of the submission completion is expected.
pre-registration: failure is not fatal, a temporary reference is generated and execution continues. No retry flag appears to be set (shouldn't it be set though?)
report generation - should not fail, but if it does -> fatal
registration: failure sets retry flag and is fatal if we're inside the retry variant
payment status update: failure sets flag, failure is fatal when retrying otherwise not
finalise completion: no failures expected here

Looking at this, I see a couple of challenges still left to solve that impact how elegant our code can be:

is_retrying is derived from the submission instance state based on the value of the flag. So if task 1 sets that flag and failure is not fatal, for task 2 after it, it will seems as if we're in the retry flow while we could still be very much inside the initial run (!). This essentially requires us to postpone setting the flag until the very end, but that makes it harder to figure out why we need to set the flag. It also requires our tasks to not crash the entire chain because otherwise we risk consistency bugs.
tasks themselves know their implementation and requirements, but don't know in which context they are running (am I in a chain? am I running standalone?) and therefore cannot perform certain side effects reliably
ideally we have a single celery workflow (chain) to run for the various events that happen so that we, the developers, can follow the code. We also want locality of behaviour so that we don't have to jump around/context switch too much to figure out what is happening where
we want it to be clear where the needs_retry flag is being set

A couple thoughts I have right now that may improve this orchestration:

rather than deriving is_retrying from the submission instance, can we not make this an explicit task argument instead? If we only have a single chain to invoke with a particular event, we can map the event to the is_retrying boolean and pass it as an explicit task argument
the above bullet makes it possible to set the flag when the problem occurs (locality of code) without affecting downstream tasks in the chain (since they have been passed in upfront whether it's retry behaviour or not)
we could even translate the is_retrying flag into a is_failure_fatal argument for the task so that we either return or raise on failures. I think we would then also not need the custom exception types anymore.
the finaliser set_retry_flag could be changed into maybe_reset_retry_flag, so that combined with the is_retrying explicit argument, we know whether we should break the retry flow because the finalizer will only be called when the entire chain has succeeded

src/openforms/registrations/error_handler.py

src/openforms/registrations/tasks.py

src/openforms/registrations/tests/test_registration_hook.py

src/openforms/submissions/models/post_completion_metadata.py

src/openforms/submissions/tests/renderer/test_renderer.py

src/openforms/submissions/tests/test_on_completion_retry_chain.py

src/openforms/registrations/tasks.py

src/openforms/registrations/tests/test_pre_registration.py

Now when a submission is completed, when a cosign is done, when a payment is completed or a retry flow is triggered (either because the registration failed or a payment status update failed), the same task chain is called. This is to try and keep a maintainable overview of where tasks come from.

SilviaAmAm marked this pull request as draft November 13, 2023 16:03

SilviaAmAm force-pushed the feature/3005-task-orchestration-refactor branch 3 times, most recently from 520d00c to bf37fc9 Compare November 15, 2023 16:39

SilviaAmAm commented Nov 16, 2023

View reviewed changes

SilviaAmAm commented Nov 17, 2023

View reviewed changes

src/openforms/submissions/tests/test_on_completion_chain.py Show resolved Hide resolved

SilviaAmAm commented Nov 17, 2023

View reviewed changes

src/openforms/submissions/tests/test_on_completion_chain.py Show resolved Hide resolved

SilviaAmAm force-pushed the feature/3005-task-orchestration-refactor branch 3 times, most recently from 670f5de to 55d3d3c Compare November 17, 2023 14:41

SilviaAmAm marked this pull request as ready for review November 17, 2023 15:25

Viicos reviewed Nov 20, 2023

View reviewed changes

src/openforms/payments/views.py Outdated Show resolved Hide resolved

src/openforms/submissions/tasks/__init__.py Outdated Show resolved Hide resolved

sergei-maertens requested changes Nov 22, 2023

View reviewed changes

SilviaAmAm force-pushed the feature/3005-task-orchestration-refactor branch from 31db0cd to 4447669 Compare November 27, 2023 13:37

sergei-maertens reviewed Nov 28, 2023

View reviewed changes

SilviaAmAm force-pushed the feature/3005-task-orchestration-refactor branch from 4447669 to b411b2f Compare November 30, 2023 15:18

SilviaAmAm requested a review from sergei-maertens November 30, 2023 15:31

sergei-maertens reviewed Dec 1, 2023

View reviewed changes

SilviaAmAm force-pushed the feature/3005-task-orchestration-refactor branch from acdf14d to a512635 Compare December 6, 2023 10:04

SilviaAmAm requested a review from sergei-maertens December 6, 2023 10:04

sergei-maertens approved these changes Dec 11, 2023

View reviewed changes

SilviaAmAm added 3 commits December 11, 2023 17:19

📝 [#3005] Document new task orchestration

8f9e6db

✅ [#3005] Update/Add/Delete tests after task orchestration refactor

10cf117

SilviaAmAm force-pushed the feature/3005-task-orchestration-refactor branch from 6bca176 to 10cf117 Compare December 11, 2023 16:19

sergei-maertens merged commit e5caf66 into master Dec 11, 2023
26 checks passed

sergei-maertens deleted the feature/3005-task-orchestration-refactor branch December 11, 2023 16:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor celery tasks orchestration #3603

Refactor celery tasks orchestration #3603

SilviaAmAm commented Nov 13, 2023

SilviaAmAm commented Nov 20, 2023

Viicos commented Nov 20, 2023

SilviaAmAm commented Nov 20, 2023 •

edited

Loading

sergei-maertens left a comment

codecov bot commented Nov 27, 2023 •

edited

Loading

sergei-maertens left a comment

Refactor celery tasks orchestration #3603

Refactor celery tasks orchestration #3603

Conversation

SilviaAmAm commented Nov 13, 2023

SilviaAmAm commented Nov 20, 2023

Viicos commented Nov 20, 2023

SilviaAmAm commented Nov 20, 2023 • edited Loading

sergei-maertens left a comment

Choose a reason for hiding this comment

codecov bot commented Nov 27, 2023 • edited Loading

Codecov Report

sergei-maertens left a comment

Choose a reason for hiding this comment

SilviaAmAm commented Nov 20, 2023 •

edited

Loading

codecov bot commented Nov 27, 2023 •

edited

Loading