Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MODSOURMAN-1195 Save job execution progress in batches #908

Merged
merged 11 commits into from
Jun 12, 2024

Conversation

okolawole-ebsco
Copy link
Contributor

@okolawole-ebsco okolawole-ebsco commented May 28, 2024

Purpose

Save job execution progress in batches to improve performance and scalability.

Approach

Instead of each record attempt to update job execution progress, a message is sent to an address over the Vert.x event bus. Upon successful placement of the message in the event bus, processing of the record continues.
On the consumer side, messages are group in batches by tenant ID and job execution ID. The group of messages are then reduced into a single job execution progress which is then saved into the database and overall job execution status is updated.
In local testing, update of job execution progress happens 10,000 times for a data import job of 10,000 records. Total execution time for all 10,000 database calls was 53 seconds(mean execution time for each call was 5.3 milliseconds). After this optimization, there were only 352 database calls with a total execution time of 17 milliseconds.

Advantages

  • Less database overhead due to less commits and less contention for the same database row.
  • Less database connection pool thrashing since a connection is not required to update job execution progress for each record.
  • Faster processing of records since processing is no longer waiting for update of job execution progress to continue.

Disadvantages

  • If for any reason the update of the reduced job execution fails, the records responsible for the update would have already been discarded. This is due to the asynchronous nature of this optimization.
  • Very small data import jobs with minimal number of records can see total duration be extended by < 1000 milliseconds.

@@ -103,67 +90,5 @@ private Future<JobExecution> updateJobStatusToError(String jobExecutionId, Okapi
.withErrorStatus(StatusDto.ErrorStatus.FILE_PROCESSING_ERROR), params);
}

private Future<Boolean> updateJobExecutionIfAllRecordsProcessed(String jobExecutionId, JobExecutionProgress progress, OkapiConnectionParams params) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines were moved to the JobExecutionProgressVerticle file.

@okolawole-ebsco okolawole-ebsco marked this pull request as ready for review May 29, 2024 23:21
@okolawole-ebsco okolawole-ebsco requested a review from a team May 29, 2024 23:21
Copy link

@okolawole-ebsco okolawole-ebsco merged commit 92ee1c2 into master Jun 12, 2024
6 checks passed
@okolawole-ebsco okolawole-ebsco deleted the MODSOURMAN-1195 branch June 12, 2024 14:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants