feat: Use built-in retry features of Celery when retrying Credentials grading tasks #34074

justinhynes · 2024-01-18T18:50:30Z

Description

This PR updates the send_grade_to_credentials task to use built-in features of Celery to handle task retries. We believe the current exponential backoff and retry mechanism is too aggressive and can cause issues overloading Credentials when many grade updates occur at once (because of a big notify_credentials run or a regrade kicked off by a change to a grading policy in a course).

justinhynes · 2024-01-18T19:05:20Z

openedx/core/djangoapps/credentials/tasks/v1/tasks.py

+    max_retries=10,
+    retry_backoff=30,
+    retry_backoff_max=600,
+    retry_jitter=True,


Total number that each individual task will be attempted stays the same (11 -- initial attempt + 10 retries).

This change updates the time period before an initial retry. Before it was 1s, now it will be "some number" between 1s - 30s (because we are also going to use Celery's jittering functionality to stagger retried tasks).

The backoff will double each time to a maximum of 600s (10 minutes). The original implementation went about 30m, but this felt really long. I'm hoping that if we space out retries a bit more, if Credentials doesn't get overloaded then we can process the tasks quicker.

While I believe retry_jitter is enabled by default, I added it as an argument for verbosity.

deborahgu · 2024-01-18T20:16:55Z

openedx/core/djangoapps/credentials/tasks/v1/tasks.py

@@ -43,12 +42,16 @@
    CertificateStatuses.downloadable,
 ]

-# Maximum number of retries before giving up. For reference, 11 retries with exponential backoff yields a maximum
-# waiting time of 2047 seconds (about 30 minutes). Setting this to None could yield unwanted behavior: infinite retries.
-MAX_RETRIES = 11


[nonblocking] is there a particular reason why you took this out of being the constant?

I just figured we didn't need it anymore, with the number of retries being part of the task declaration. It was only used in the retry logic for this task too, and I didn't think it needed to be a constant anymore.

deborahgu

I left a question and a non-blocking comment -- I don't see a reason to stop using the constant for max retries (just called in the decorator and the test in the same way), but I think it's fine either way.

deborahgu · 2024-01-18T20:24:11Z

openedx/core/djangoapps/credentials/tasks/v1/tasks.py

+        'verified': verified,
+        'lms_last_updated_at': grade_last_updated
+    }
+    logger.info(f"Running task `send_grade_to_credentials` for username {username} with data: {data}")


[question] Is this considered safe information for our logs? it's not exactly PII – it's just a username and grades, nothing personally identifiable – but I wanted to make sure we consider that a safe update.

to be clear, I'm not saying it's not. I just assume we have standards about what you can send the logs and what you can't, & I don't know them.

This is one of those "stuck between a rock and a hard place" logging decisions for me. I'm trying to value what would be valuable for debugging, balanced against what potential PII are we logging. Over the years, we've tried really hard NOT to log username (and log LMS User Id instead as the identifier in logs)... but Credentials mostly cares about username.

I thought the value of having some insight as to what we're sending to Credentials is more valuable than keeping the username out of our logs. AFAIK it's a strong preference, not a hard requirement to exclude info like username (which I do think is considered PII).

yeah, that makes sense. I was more worried about sending the username with the grade, but that's not really PII by any definition, even if it is something the user would expect to have extra privacy applied to.

openedx/core/djangoapps/credentials/tasks/v1/tasks.py

… grading tasks

edx-pipeline-bot · 2024-01-22T19:57:06Z

2U Release Notice: This PR has been deployed to the edX staging environment in preparation for a release to production.

edx-pipeline-bot · 2024-01-22T20:17:54Z

2U Release Notice: This PR has been deployed to the edX production environment.

edx-pipeline-bot · 2024-01-22T20:17:54Z

2U Release Notice: This PR has been deployed to the edX production environment.

justinhynes commented Jan 18, 2024

View reviewed changes

justinhynes marked this pull request as ready for review January 18, 2024 20:09

deborahgu reviewed Jan 18, 2024

View reviewed changes

deborahgu approved these changes Jan 18, 2024

View reviewed changes

justinhynes force-pushed the jhynes/APER-3121 branch from 7cb44c5 to f5c3e8e Compare January 22, 2024 18:09

feat: Use built-in retry features of Celery when retrying Credentials…

ee380b7

… grading tasks

justinhynes force-pushed the jhynes/APER-3121 branch from f5c3e8e to ee380b7 Compare January 22, 2024 18:09

justinhynes merged commit 18775cb into master Jan 22, 2024
63 checks passed

justinhynes deleted the jhynes/APER-3121 branch January 22, 2024 18:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Use built-in retry features of Celery when retrying Credentials grading tasks #34074

feat: Use built-in retry features of Celery when retrying Credentials grading tasks #34074

justinhynes commented Jan 18, 2024

justinhynes Jan 18, 2024

deborahgu Jan 18, 2024

justinhynes Jan 22, 2024

deborahgu left a comment

deborahgu Jan 18, 2024

justinhynes Jan 22, 2024

deborahgu Jan 22, 2024

edx-pipeline-bot commented Jan 22, 2024

edx-pipeline-bot commented Jan 22, 2024

edx-pipeline-bot commented Jan 22, 2024

feat: Use built-in retry features of Celery when retrying Credentials grading tasks #34074

feat: Use built-in retry features of Celery when retrying Credentials grading tasks #34074

Conversation

justinhynes commented Jan 18, 2024

Description

justinhynes Jan 18, 2024

Choose a reason for hiding this comment

deborahgu Jan 18, 2024

Choose a reason for hiding this comment

justinhynes Jan 22, 2024

Choose a reason for hiding this comment

deborahgu left a comment

Choose a reason for hiding this comment

deborahgu Jan 18, 2024

Choose a reason for hiding this comment

justinhynes Jan 22, 2024

Choose a reason for hiding this comment

deborahgu Jan 22, 2024

Choose a reason for hiding this comment

edx-pipeline-bot commented Jan 22, 2024

edx-pipeline-bot commented Jan 22, 2024

edx-pipeline-bot commented Jan 22, 2024