Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching and UPDATED-sorting for the Github Data Sync Algorithm #94

Merged
merged 5 commits into from
Sep 18, 2024

Conversation

GODrums
Copy link
Collaborator

@GODrums GODrums commented Sep 18, 2024

Motivation

The current data fetching algorithm didn't collect all the required data and had massive performance issues. This PR aims to fix both of these issues. With this PR, the table should be on-par with the data from the original Python script.

Description

This PR includes multiple improvements:

  • Github PR data is now sorted after UPDATED instead of CREATED, allowing us to get all PR that were worked on
  • Instead of always querying for fetched entities in the DB, we implement caching for improved performance (see performance test below)
  • GET /leaderboard now filters the timeframe for the data before calculating the response

Performance Test

#PRs develop-branch PR (no caching) PR (caching)
8 90s - -
53 impossible 5 min 3 min

In my test runs the new fetching process now takes about 3 minutes (53 PRs), in comparison to the original 1 minute (8 PRs) and 5 minutes without caching (53 PRs).

Screenshots (if applicable)

Checklist

General

  • PR title is clear and descriptive
  • PR description explains the purpose and changes
  • Code follows project coding standards
  • Self-review of the code has been done
  • Changes have been tested locally

Server (if applicable)

  • Code is performant and follows best practices
  • No security vulnerabilities introduced
  • Proper error handling has been implemented
  • Added tests for new functionality
  • Changes have been tested in different environments (if applicable)

@GODrums GODrums added enhancement New feature or request application-server priority:critical Urgent tasks needing immediate resolution. labels Sep 18, 2024
@GODrums GODrums added this to the Gamification Leaderboard MVP milestone Sep 18, 2024
@GODrums GODrums self-assigned this Sep 18, 2024
@github-actions github-actions bot added bug Something isn't working size:L This PR changes 100-499 lines, ignoring generated files. labels Sep 18, 2024
Copy link
Collaborator

@FelixTJDietrich FelixTJDietrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good, except idk what happens with dismissed reviews. Only thing that is missing is the fix in the story so the check runs through

private int comments;
private PullRequestReviewDTO[] changesRequested;
private PullRequestReviewDTO[] approvals;
private PullRequestReviewDTO[] comments;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens with the dismissed reviews?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting point! They were counted as comments for now. This should be fixed in: #97.

In the future we might want to do some sort of state replacement similar to the original Python script.

changesApproved.get(), comments);
changesRequestedSet.toArray(new PullRequestReviewDTO[changesRequestedSet.size()]),
approvedSet.toArray(new PullRequestReviewDTO[approvedSet.size()]),
commentSet.toArray(new PullRequestReviewDTO[commentSet.size()]));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think everything is fine for now, in the future we might want store the score for each activity also in the database and then we can just calculate rank and aggregated score more easily.

For example, a pull request review comes in via webhooks, we estimate the effort it took to do this review and store it in the database. I think as it is now the score is not stable since the PR might get more changes and reviews get dismissed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is very true.
We for sure have to improve a few elements for a correct and performant long-running execution.

@FelixTJDietrich FelixTJDietrich merged commit ca9e577 into develop Sep 18, 2024
5 checks passed
@FelixTJDietrich FelixTJDietrich deleted the fix/github-data-fetching-updated-sorting branch September 18, 2024 12:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
application-server bug Something isn't working client enhancement New feature or request priority:critical Urgent tasks needing immediate resolution. size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants