Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Use built-in retry features of Celery when retrying Credentials grading tasks #34074

Merged
merged 1 commit into from
Jan 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 30 additions & 32 deletions openedx/core/djangoapps/credentials/tasks/v1/tasks.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@
from urllib.parse import urljoin

from celery import shared_task
from celery.exceptions import MaxRetriesExceededError
from celery.utils.log import get_task_logger
from celery_utils.logged_task import LoggedTask
from django.conf import settings
Expand Down Expand Up @@ -43,12 +42,16 @@
CertificateStatuses.downloadable,
]

# Maximum number of retries before giving up. For reference, 11 retries with exponential backoff yields a maximum
# waiting time of 2047 seconds (about 30 minutes). Setting this to None could yield unwanted behavior: infinite retries.
MAX_RETRIES = 11
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nonblocking] is there a particular reason why you took this out of being the constant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just figured we didn't need it anymore, with the number of retries being part of the task declaration. It was only used in the retry logic for this task too, and I didn't think it needed to be a constant anymore.



@shared_task(bind=True, ignore_result=True)
@shared_task(
bind=True,
ignore_result=True,
autoretry_for=(Exception,),
max_retries=10,
retry_backoff=30,
retry_backoff_max=600,
retry_jitter=True,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Total number that each individual task will be attempted stays the same (11 -- initial attempt + 10 retries).

This change updates the time period before an initial retry. Before it was 1s, now it will be "some number" between 1s - 30s (because we are also going to use Celery's jittering functionality to stagger retried tasks).

The backoff will double each time to a maximum of 600s (10 minutes). The original implementation went about 30m, but this felt really long. I'm hoping that if we space out retries a bit more, if Credentials doesn't get overloaded then we can process the tasks quicker.

While I believe retry_jitter is enabled by default, I added it as an argument for verbosity.

)
@set_code_owner_attribute
def send_grade_to_credentials(
self,
Expand All @@ -62,6 +65,10 @@ def send_grade_to_credentials(
"""
Celery task to notify the Credentials IDA of an "interesting" grade change via an API call.

If an exception occurs when trying to send data to the Credentials IDA, we will retry the task a maximum number of
11 times (initial attempt + 10 retries). We are relying on built-in functionality of Celery to add a randomized
jitter to the retries so that the tasks don't retry exactly at the same time.

Args:
username (string): The username of the learner we are currently processing
course_run_key (string): String identifier of the course run associated with the grade update
Expand All @@ -70,35 +77,26 @@ def send_grade_to_credentials(
percent_grade (float): Number representing the learner's grade in this course run
grade_last_updated (string): String describing the last time this grade was modified in the LMS
"""
logger.info(f"Running task send_grade_to_credentials for username {username} and course {course_run_key}")
data = {
'username': username,
'course_run': course_run_key,
'letter_grade': letter_grade,
'percent_grade': percent_grade,
'verified': verified,
'lms_last_updated_at': grade_last_updated
}
logger.info(f"Running task `send_grade_to_credentials` for username {username} with data: {data}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question] Is this considered safe information for our logs? it's not exactly PII – it's just a username and grades, nothing personally identifiable – but I wanted to make sure we consider that a safe update.

to be clear, I'm not saying it's not. I just assume we have standards about what you can send the logs and what you can't, & I don't know them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of those "stuck between a rock and a hard place" logging decisions for me. I'm trying to value what would be valuable for debugging, balanced against what potential PII are we logging. Over the years, we've tried really hard NOT to log username (and log LMS User Id instead as the identifier in logs)... but Credentials mostly cares about username.

I thought the value of having some insight as to what we're sending to Credentials is more valuable than keeping the username out of our logs. AFAIK it's a strong preference, not a hard requirement to exclude info like username (which I do think is considered PII).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that makes sense. I was more worried about sending the username with the grade, but that's not really PII by any definition, even if it is something the user would expect to have extra privacy applied to. :shipit:


countdown = 2 ** self.request.retries
course_key = CourseKey.from_string(course_run_key)
credentials_client = get_credentials_api_client(User.objects.get(username=settings.CREDENTIALS_SERVICE_USERNAME))
api_url = urljoin(f"{get_credentials_api_base_url(org=course_key.org)}/", "grades/")

try:
justinhynes marked this conversation as resolved.
Show resolved Hide resolved
credentials_client = get_credentials_api_client(
User.objects.get(username=settings.CREDENTIALS_SERVICE_USERNAME)
)
api_url = urljoin(f"{get_credentials_api_base_url(org=course_key.org)}/", "grades/")
response = credentials_client.post(
api_url,
data={
'username': username,
'course_run': course_run_key,
'letter_grade': letter_grade,
'percent_grade': percent_grade,
'verified': verified,
'lms_last_updated_at': grade_last_updated
}
)
response.raise_for_status()
logger.info(f"Sent grade for course {course_run_key} for user {username}")
except Exception: # lint-amnesty, pylint: disable=W0703
grade_str = f'(percent: {percent_grade} letter: {letter_grade})'
error_msg = f'Failed to send grade {grade_str} for course {course_run_key} for user {username}.'
logger.exception(error_msg)
exception = MaxRetriesExceededError(f"Failed to send grade to credentials. Reason: {error_msg}")
raise self.retry(exc=exception, countdown=countdown, max_retries=MAX_RETRIES) # pylint: disable=raise-missing-from
response = credentials_client.post(
api_url,
data=data,
)
response.raise_for_status()
logger.info(f"Sent grade for user {username} in course {course_run_key} to Credentials")


@shared_task(base=LoggedTask, ignore_result=True)
Expand Down
4 changes: 2 additions & 2 deletions openedx/core/djangoapps/credentials/tests/test_tasks.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,14 +82,14 @@ def test_happy_path(self, mock_get_api_client):

def test_retry(self, mock_get_api_client):
"""
Test that we retry when an exception occurs.
Test that we retry the appropriate number of times when an exception occurs.
"""
mock_get_api_client.side_effect = boom

task = tasks.send_grade_to_credentials.delay('user', 'course-v1:org+course+run', True, 'A', 1.0, None)

pytest.raises(Exception, task.get)
assert mock_get_api_client.call_count == (tasks.MAX_RETRIES + 1)
assert mock_get_api_client.call_count == 11


@ddt.ddt
Expand Down
Loading