Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor celery tasks orchestration #3603

Merged
merged 3 commits into from
Dec 11, 2023

Conversation

SilviaAmAm
Copy link
Contributor

Fixes #3005 (partially)

@SilviaAmAm SilviaAmAm marked this pull request as draft November 13, 2023 16:03
@SilviaAmAm SilviaAmAm force-pushed the feature/3005-task-orchestration-refactor branch 3 times, most recently from 520d00c to bf37fc9 Compare November 15, 2023 16:39
@SilviaAmAm SilviaAmAm force-pushed the feature/3005-task-orchestration-refactor branch 3 times, most recently from 670f5de to 55d3d3c Compare November 17, 2023 14:41
@SilviaAmAm SilviaAmAm marked this pull request as ready for review November 17, 2023 15:25
src/openforms/payments/views.py Outdated Show resolved Hide resolved
src/openforms/submissions/tasks/__init__.py Outdated Show resolved Hide resolved
@SilviaAmAm
Copy link
Contributor Author

@Viicos

Was this the reason on_completion_task_ids was moved to a separate model?

Yes! Before the sequence of tasks in on_completion was only called when the submission was completed (and so we could save the tasks ids on the submission field). But now, the on_post_submission_event is called in multiple places, so if it triggered twice at the same time, then you risk that one call overwrites the task ids set by the other call. This is why I decided to go for a separate model.

@Viicos
Copy link
Contributor

Viicos commented Nov 20, 2023

so if it triggered twice at the same time, then you risk that one call overwrites the task ids set by the other call.

Ok I see, I don't know if it is possible to have a kind of lock to avoid the race conditions in the first place, while still having on_completion_task_ids as a field 🤔

@SilviaAmAm
Copy link
Contributor Author

SilviaAmAm commented Nov 20, 2023

I thought of maybe using the select_for_update to lock the row. However, it needs to run in a transaction and the on_post_submission_event runs when the transaction is committed so it is no longer in the transaction :S

Copy link
Member

@sergei-maertens sergei-maertens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chonky PR gets a chonky review 😬

src/openforms/emails/confirmation_emails.py Show resolved Hide resolved
src/openforms/payments/tasks.py Outdated Show resolved Hide resolved
src/openforms/payments/tests/test_views.py Outdated Show resolved Hide resolved
src/openforms/payments/views.py Outdated Show resolved Hide resolved
src/openforms/submissions/tests/test_submission_status.py Outdated Show resolved Hide resolved
Copy link

codecov bot commented Nov 27, 2023

Codecov Report

Attention: 4 lines in your changes are missing coverage. Please review.

Comparison is base (8002396) 95.96% compared to head (10cf117) 96.03%.
Report is 39 commits behind head on master.

Files Patch % Lines
src/openforms/payments/tasks.py 93.33% 0 Missing and 1 partial ⚠️
src/openforms/submissions/api/viewsets.py 66.66% 0 Missing and 1 partial ⚠️
...rms/submissions/models/post_completion_metadata.py 93.75% 1 Missing ⚠️
src/openforms/submissions/status.py 90.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3603      +/-   ##
==========================================
+ Coverage   95.96%   96.03%   +0.06%     
==========================================
  Files         683      689       +6     
  Lines       21919    22133     +214     
  Branches     2533     2556      +23     
==========================================
+ Hits        21035    21255     +220     
+ Misses        611      605       -6     
  Partials      273      273              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@SilviaAmAm SilviaAmAm force-pushed the feature/3005-task-orchestration-refactor branch from 31db0cd to 4447669 Compare November 27, 2023 13:37
src/openforms/submissions/models/submission.py Outdated Show resolved Hide resolved
src/openforms/submissions/tasks/__init__.py Outdated Show resolved Hide resolved
src/openforms/submissions/tests/test_admin.py Outdated Show resolved Hide resolved
@SilviaAmAm SilviaAmAm force-pushed the feature/3005-task-orchestration-refactor branch from 4447669 to b411b2f Compare November 30, 2023 15:18
Copy link
Member

@sergei-maertens sergei-maertens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm seeing some confusion/indecision about the on_post_submission_event chain.

Some of the code (with the finaliser) aims to let the chain run entirely and add the end reconcile the state & prepare for a potential next run, by figuring out if some things errored and whether they should set the needs_on_completion_retry flag. This looks like a (commendable) attempt at only having a single place that sets this flag.

However, that conflicts with the idea that some tasks in the chain are expected to abort entirely on (unexpected) exceptions, and the expectation is task-dependant:

  • appointment registration: task failure leads to chain abortion and feedback back to the end-user. A re-submit of the submission completion is expected.
  • pre-registration: failure is not fatal, a temporary reference is generated and execution continues. No retry flag appears to be set (shouldn't it be set though?)
  • report generation - should not fail, but if it does -> fatal
  • registration: failure sets retry flag and is fatal if we're inside the retry variant
  • payment status update: failure sets flag, failure is fatal when retrying otherwise not
  • finalise completion: no failures expected here

Looking at this, I see a couple of challenges still left to solve that impact how elegant our code can be:

  • is_retrying is derived from the submission instance state based on the value of the flag. So if task 1 sets that flag and failure is not fatal, for task 2 after it, it will seems as if we're in the retry flow while we could still be very much inside the initial run (!). This essentially requires us to postpone setting the flag until the very end, but that makes it harder to figure out why we need to set the flag. It also requires our tasks to not crash the entire chain because otherwise we risk consistency bugs.
  • tasks themselves know their implementation and requirements, but don't know in which context they are running (am I in a chain? am I running standalone?) and therefore cannot perform certain side effects reliably
  • ideally we have a single celery workflow (chain) to run for the various events that happen so that we, the developers, can follow the code. We also want locality of behaviour so that we don't have to jump around/context switch too much to figure out what is happening where
  • we want it to be clear where the needs_retry flag is being set

A couple thoughts I have right now that may improve this orchestration:

  • rather than deriving is_retrying from the submission instance, can we not make this an explicit task argument instead? If we only have a single chain to invoke with a particular event, we can map the event to the is_retrying boolean and pass it as an explicit task argument
  • the above bullet makes it possible to set the flag when the problem occurs (locality of code) without affecting downstream tasks in the chain (since they have been passed in upfront whether it's retry behaviour or not)
  • we could even translate the is_retrying flag into a is_failure_fatal argument for the task so that we either return or raise on failures. I think we would then also not need the custom exception types anymore.
  • the finaliser set_retry_flag could be changed into maybe_reset_retry_flag, so that combined with the is_retrying explicit argument, we know whether we should break the retry flow because the finalizer will only be called when the entire chain has succeeded

src/openforms/registrations/error_handler.py Outdated Show resolved Hide resolved
src/openforms/registrations/error_handler.py Outdated Show resolved Hide resolved
src/openforms/registrations/error_handler.py Outdated Show resolved Hide resolved
src/openforms/registrations/error_handler.py Outdated Show resolved Hide resolved
src/openforms/registrations/error_handler.py Outdated Show resolved Hide resolved
src/openforms/registrations/tasks.py Outdated Show resolved Hide resolved
src/openforms/submissions/tests/renderer/test_renderer.py Outdated Show resolved Hide resolved
@SilviaAmAm SilviaAmAm force-pushed the feature/3005-task-orchestration-refactor branch from acdf14d to a512635 Compare December 6, 2023 10:04
src/openforms/registrations/tasks.py Outdated Show resolved Hide resolved
src/openforms/registrations/tasks.py Show resolved Hide resolved
src/openforms/registrations/tasks.py Outdated Show resolved Hide resolved
src/openforms/registrations/tests/test_pre_registration.py Outdated Show resolved Hide resolved
Now when a submission is completed, when a cosign is done, when a payment is completed or a retry flow is triggered (either because the registration failed or a payment status update failed), the same task chain is called. This is to try and keep a maintainable overview of where tasks come from.
@SilviaAmAm SilviaAmAm force-pushed the feature/3005-task-orchestration-refactor branch from 6bca176 to 10cf117 Compare December 11, 2023 16:19
@sergei-maertens sergei-maertens merged commit e5caf66 into master Dec 11, 2023
26 checks passed
@sergei-maertens sergei-maertens deleted the feature/3005-task-orchestration-refactor branch December 11, 2023 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

As form designer I want to require payment before the form is submitted/registered
3 participants