-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: Reduce database calls when generating problem responses report #33940
perf: Reduce database calls when generating problem responses report #33940
Conversation
During the process of generatinng report for problem responses, there are two places where N + 1 query problem exist. In both cases, `StudentModule` objects are fetched and looped over where `student.username` field for each object is accessed. This result in a seperate database call to get the username for each student response. This problem is fixed by creating a join to fetch the related table in the original query using `select_related`. In a test conducted on report having 5000 `StudentModule` objects, the number of queries for the request reduced from 8363 to 29. The total time taken for the task reduced from 23764 ms to 7394 ms.
Thanks for the pull request, @ahmed-zubair-1998! Please note that it may take us up to several weeks or months to complete a review and merge your PR. Feel free to add as much of the following information to the ticket as you can:
All technical communication about the code itself will be done via the GitHub pull request interface. As a reminder, our process documentation is here. Please let us know once your PR is ready for our review and all tests are green. |
Hi @ahmed-zubair-1998! Thanks so much for this contribution. Please let me know if you have any questions regarding completing a CLA form. |
Hi @mphilbrick211 , I have signed the form today and just got couple of confirmation emails that the signing is complete |
Adding a comment to re-run the CLA check as per instructions. It has been more than 1 business day since I signed the contract. |
Hi @ahmed-zubair-1998! Your tests and CLA are all set. Please let me know if this is ready for review. |
Hi @mphilbrick211 , this is ready for review |
@ahmed-zubair-1998 🎉 Your pull request was merged! Please take a moment to answer a two question survey so we can improve your experience in the future. |
2U Release Notice: This PR has been deployed to the edX staging environment in preparation for a release to production. |
2U Release Notice: This PR has been deployed to the edX production environment. |
Description
During the process of generating report for problem responses, there are two places where N + 1 query problem exist. In both cases,
StudentModule
objects are fetched and looped over wherestudent.username
field for each object is accessed. This result in a separate database call to get the username for each student response.This problem is fixed by creating a join to fetch the related table in the original query using
select_related
. In a test conducted on report having 5000StudentModule
objects, the number of queries for the request reduced from 8363 to 29. The total time taken for the task reduced from 23764 ms to 7394 ms.This PR makes no change in functionality. Only performance improvement is made by reducing the number of database queries.
To generate the problem responses report
/courses/{course_id}/instructor/api/get_problem_responses
api is called. During this process, all student responses for a particular block are fetched in a DB call. Then the responses are looped over and one field from a related table is fetched in each iteration. This process happens in two places in the same function (1 and 2). This results in large number of database queries (N + 1 query problem) which could be reduced by making a join in original query. As the number of student responses increase, the performance will be impacted more due to large number of database queries.To test the performance improvements, I carried out some tests on my laptop with tutor based dev setup. I created a test course having a poll and a survey xblock in a course. I then enrolled a user to go through the course, answering the poll and survey. I then enrolled 2499 new users and programatically copied the response from the first user for all these new user. In total there were 2500 users in the course and 5000
StudentModule
rows. Then I generated a report for the course using original code and then with suggested changes. The results are added below.Original Code
SQL Queries:
Time Taken from Task Progress Log:
Suggested Code
SQL Queries:
Time Taken from Task Progress Log:
Results
The number of queries for the request reduced from 8363 to 29 (99.7%). The total time taken for the task reduced from 23764 ms to 7394 ms (68.8%). Below screenshot of Grafana showing container stats for CPU and memory usage also highlight the performance improvement.
Testing instructions
Survey
andPoll
xblock unitscreate_random_users
management command can be used for this purposeStudentModule
entries from the original user to the newly created test users for this course. Use the following code for this:Deadline
None