Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Reduce database calls when generating problem responses report #33940

Merged
merged 2 commits into from
Jan 16, 2024

Conversation

ahmed-zubair-1998
Copy link
Contributor

Description

During the process of generating report for problem responses, there are two places where N + 1 query problem exist. In both cases, StudentModule objects are fetched and looped over where student.username field for each object is accessed. This result in a separate database call to get the username for each student response.

This problem is fixed by creating a join to fetch the related table in the original query using select_related. In a test conducted on report having 5000 StudentModule objects, the number of queries for the request reduced from 8363 to 29. The total time taken for the task reduced from 23764 ms to 7394 ms.


This PR makes no change in functionality. Only performance improvement is made by reducing the number of database queries.

To generate the problem responses report /courses/{course_id}/instructor/api/get_problem_responses api is called. During this process, all student responses for a particular block are fetched in a DB call. Then the responses are looped over and one field from a related table is fetched in each iteration. This process happens in two places in the same function (1 and 2). This results in large number of database queries (N + 1 query problem) which could be reduced by making a join in original query. As the number of student responses increase, the performance will be impacted more due to large number of database queries.

To test the performance improvements, I carried out some tests on my laptop with tutor based dev setup. I created a test course having a poll and a survey xblock in a course. I then enrolled a user to go through the course, answering the poll and survey. I then enrolled 2499 new users and programatically copied the response from the first user for all these new user. In total there were 2500 users in the course and 5000 StudentModule rows. Then I generated a report for the course using original code and then with suggested changes. The results are added below.

Original Code

SQL Queries:
Original-SQL_queries

Time Taken from Task Progress Log:
Original-Task_progress

Suggested Code

SQL Queries:
Suggested-SQL_queries

Time Taken from Task Progress Log:
Suggested-Task_progress

Results

The number of queries for the request reduced from 8363 to 29 (99.7%). The total time taken for the task reduced from 23764 ms to 7394 ms (68.8%). Below screenshot of Grafana showing container stats for CPU and memory usage also highlight the performance improvement.
Docker-container-stats-Grafana

Testing instructions

  1. Author a new course from studio. Only add the default Survey and Poll xblock units
  2. Enroll a user into the course and manually go through the course, responding to the survey and the poll
  3. Enroll N number of test users into this course. create_random_users management command can be used for this purpose
  4. Open up a shell in LMS to copy over the StudentModule entries from the original user to the newly created test users for this course. Use the following code for this:
course_key = <Insert your course key here>
c_id = CourseOverview.objects.get(id=course_key).id
orig_user = StudentModule.objects.filter(course_id=c_id).earliest('id').student_id
base_submissions = StudentModule.objects.filter(student_id=orig_user)
new_enrollments = CourseEnrollment.objects.filter(course=c).exclude(user=orig_user)

new_submissions = []
for enrollment in new_enrollments:
    for submission in base_submissions:
        new_submission = deepcopy(submission)
        new_submission.pk = None
        new_submission.student = enrollment.user
        new_submissions.append(new_submission)
StudentModule.objects.bulk_create(new_submissions)
  1. Go to the course as admin > Instructor > Data download > create a report of problem responses
  2. Use Django debug toolbar and docker logs to see performance difference

Deadline

None

During the process of generatinng report for problem responses, there are two places where N + 1 query problem exist. In both cases, `StudentModule` objects are fetched and looped over where `student.username` field for each object is accessed. This result in a seperate database call to get the username for each student response.
This problem is fixed by creating a join to fetch the related table in the original query using `select_related`. In a test conducted on report having 5000 `StudentModule` objects, the number of queries for the request reduced from 8363 to 29. The total time taken for the task reduced from 23764 ms to 7394 ms.
@openedx-webhooks openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label Dec 16, 2023
@openedx-webhooks
Copy link

openedx-webhooks commented Dec 16, 2023

Thanks for the pull request, @ahmed-zubair-1998! Please note that it may take us up to several weeks or months to complete a review and merge your PR.

Feel free to add as much of the following information to the ticket as you can:

  • supporting documentation
  • Open edX discussion forum threads
  • timeline information ("this must be merged by XX date", and why that is)
  • partner information ("this is a course on edx.org")
  • any other information that can help Product understand the context for the PR

All technical communication about the code itself will be done via the GitHub pull request interface. As a reminder, our process documentation is here.

Please let us know once your PR is ready for our review and all tests are green.

@mphilbrick211
Copy link

Hi @ahmed-zubair-1998! Thanks so much for this contribution. Please let me know if you have any questions regarding completing a CLA form.

@mphilbrick211 mphilbrick211 added the needs test run Author's first PR to this repository, awaiting test authorization from Axim label Dec 18, 2023
@ahmed-zubair-1998
Copy link
Contributor Author

Hi @ahmed-zubair-1998! Thanks so much for this contribution. Please let me know if you have any questions regarding completing a CLA form.

Hi @mphilbrick211 , I have signed the form today and just got couple of confirmation emails that the signing is complete

@ahmed-zubair-1998
Copy link
Contributor Author

Adding a comment to re-run the CLA check as per instructions. It has been more than 1 business day since I signed the contract.

@e0d e0d removed the needs test run Author's first PR to this repository, awaiting test authorization from Axim label Dec 20, 2023
@mphilbrick211
Copy link

Adding a comment to re-run the CLA check as per instructions. It has been more than 1 business day since I signed the contract.

Hi @ahmed-zubair-1998! Your tests and CLA are all set. Please let me know if this is ready for review.

@ahmed-zubair-1998
Copy link
Contributor Author

Hi @ahmed-zubair-1998! Your tests and CLA are all set. Please let me know if this is ready for review.

Hi @mphilbrick211 , this is ready for review

@rgraber rgraber requested a review from a team January 11, 2024 15:59
@ormsbee ormsbee merged commit 73a446d into openedx:master Jan 16, 2024
64 checks passed
@openedx-webhooks
Copy link

@ahmed-zubair-1998 🎉 Your pull request was merged! Please take a moment to answer a two question survey so we can improve your experience in the future.

@edx-pipeline-bot
Copy link
Contributor

2U Release Notice: This PR has been deployed to the edX staging environment in preparation for a release to production.

@edx-pipeline-bot
Copy link
Contributor

2U Release Notice: This PR has been deployed to the edX production environment.

@ahmed-zubair-1998 ahmed-zubair-1998 deleted the problem-responses-report branch January 17, 2024 07:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
open-source-contribution PR author is not from Axim or 2U
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

6 participants