Caching and UPDATED-sorting for the Github Data Sync Algorithm #94

GODrums · 2024-09-18T00:27:36Z

Motivation

The current data fetching algorithm didn't collect all the required data and had massive performance issues. This PR aims to fix both of these issues. With this PR, the table should be on-par with the data from the original Python script.

Description

This PR includes multiple improvements:

Github PR data is now sorted after UPDATED instead of CREATED, allowing us to get all PR that were worked on
Instead of always querying for fetched entities in the DB, we implement caching for improved performance (see performance test below)
GET /leaderboard now filters the timeframe for the data before calculating the response

Performance Test

#PRs	`develop`-branch	PR (no caching)	PR (caching)
8	90s	-	-
53	impossible	5 min	3 min

In my test runs the new fetching process now takes about 3 minutes (53 PRs), in comparison to the original 1 minute (8 PRs) and 5 minutes without caching (53 PRs).

Screenshots (if applicable)

Checklist

General

PR title is clear and descriptive
PR description explains the purpose and changes
Code follows project coding standards
Self-review of the code has been done
Changes have been tested locally

Server (if applicable)

Code is performant and follows best practices
No security vulnerabilities introduced
Proper error handling has been implemented
Added tests for new functionality
Changes have been tested in different environments (if applicable)

FelixTJDietrich

Code looks good, except idk what happens with dismissed reviews. Only thing that is missing is the fix in the story so the check runs through

FelixTJDietrich · 2024-09-18T12:18:36Z

...application-server/src/main/java/de/tum/in/www1/hephaestus/leaderboard/LeaderboardEntry.java

-        private int comments;
+        private PullRequestReviewDTO[] changesRequested;
+        private PullRequestReviewDTO[] approvals;
+        private PullRequestReviewDTO[] comments;


What happens with the dismissed reviews?

Interesting point! They were counted as comments for now. This should be fixed in: #97.

In the future we might want to do some sort of state replacement similar to the original Python script.

FelixTJDietrich · 2024-09-18T12:25:22Z

...plication-server/src/main/java/de/tum/in/www1/hephaestus/leaderboard/LeaderboardService.java

-                    changesApproved.get(), comments);
+                    changesRequestedSet.toArray(new PullRequestReviewDTO[changesRequestedSet.size()]),
+                    approvedSet.toArray(new PullRequestReviewDTO[approvedSet.size()]),
+                    commentSet.toArray(new PullRequestReviewDTO[commentSet.size()]));


I think everything is fine for now, in the future we might want store the score for each activity also in the database and then we can just calculate rank and aggregated score more easily.

For example, a pull request review comes in via webhooks, we estimate the effort it took to do this review and store it in the database. I think as it is now the score is not stable since the PR might get more changes and reviews get dismissed.

Yes, this is very true.
We for sure have to improve a few elements for a correct and performant long-running execution.

Update Github storing algorithm

c5b1e00

GODrums added enhancement New feature or request application-server priority:critical Urgent tasks needing immediate resolution. labels Sep 18, 2024

GODrums added this to the Gamification Leaderboard MVP milestone Sep 18, 2024

GODrums requested review from FelixTJDietrich and iam-flo September 18, 2024 00:27

GODrums self-assigned this Sep 18, 2024

github-actions bot added bug Something isn't working size:L This PR changes 100-499 lines, ignoring generated files. labels Sep 18, 2024

GODrums added 3 commits September 18, 2024 02:47

Improve loop performance

6da0d83

Add PR Number

2707cc7

Respond with reviews instead of counter

f8d3792

github-actions bot added the client label Sep 18, 2024

FelixTJDietrich previously approved these changes Sep 18, 2024

View reviewed changes

fix leaderboard story

ee4d935

FelixTJDietrich dismissed their stale review via ee4d935 September 18, 2024 12:32

FelixTJDietrich merged commit ca9e577 into develop Sep 18, 2024
5 checks passed

FelixTJDietrich deleted the fix/github-data-fetching-updated-sorting branch September 18, 2024 12:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Caching and UPDATED-sorting for the Github Data Sync Algorithm #94

Caching and UPDATED-sorting for the Github Data Sync Algorithm #94

GODrums commented Sep 18, 2024 •

edited

Loading

FelixTJDietrich left a comment

FelixTJDietrich Sep 18, 2024

GODrums Sep 18, 2024

FelixTJDietrich Sep 18, 2024

GODrums Sep 18, 2024

Caching and UPDATED-sorting for the Github Data Sync Algorithm #94

Caching and UPDATED-sorting for the Github Data Sync Algorithm #94

Conversation

GODrums commented Sep 18, 2024 • edited Loading

Motivation

Description

Performance Test

Screenshots (if applicable)

Checklist

General

Server (if applicable)

FelixTJDietrich left a comment

Choose a reason for hiding this comment

FelixTJDietrich Sep 18, 2024

Choose a reason for hiding this comment

GODrums Sep 18, 2024

Choose a reason for hiding this comment

FelixTJDietrich Sep 18, 2024

Choose a reason for hiding this comment

GODrums Sep 18, 2024

Choose a reason for hiding this comment

GODrums commented Sep 18, 2024 •

edited

Loading