-
Notifications
You must be signed in to change notification settings - Fork 354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: job scheduler - run jobs continuously [DHIS2-16004] #15466
Conversation
Kudos, SonarCloud Quality Gate passed! |
Codecov Report
@@ Coverage Diff @@
## master #15466 +/- ##
============================================
+ Coverage 66.10% 66.12% +0.01%
- Complexity 31115 31139 +24
============================================
Files 3484 3486 +2
Lines 129448 129547 +99
Branches 15095 15107 +12
============================================
+ Hits 85577 85659 +82
- Misses 36792 36807 +15
- Partials 7079 7081 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 21 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
Nice. A test to cover this new behaviour would be good to have in the e2e suite. Checking that the allowed (continuous) job type finishes within 20 seconds?? Otherwise we'll never know (from a test) if this functionality breaks after it passes manual testing. |
Yes, if multiple small jobs of the I will have a look if I can create an e2e test :) |
I have not forgotton about adding a test but currently there are many other topics I work on that block further development which I need to prioritize so I merged this now as the added feature also is a precondition for that work. |
Summary
Adds a feature to the scheduler to allow jobs to execute faster than one per type per loop cycle.
For fast jobs like small imports that only take below 1-2 seconds it is important to run more than one per 20 seconds loop.
Otherwise jobs might be added faster or in higher quantity that the scheduling can execute as it is limited to one per 20 seconds otherwise. This would be 3 per minute, 180 per hour, 4320 per day. For imports this might not be enough and not be fast enough.
Usage
For future job types to use it their enum constant need to be added to the new https://github.com/dhis2/dhis2-core/pull/15466/files#diff-3ad347fffaf9c8a6f46a781e97b1a5d9770a5fecba98121543414e69428baa48R232 method
Implementation Notes
To allow importing faster a job type can be hard coded as being continuous. For job types of that nature the loop pushes all jobs (IDs) that are potentially ready to a per type queue. If no queue exists a new one is created for the type and a worker spawned that works the queue until all jobs in the queue are processed. The main loop keeps potentially adding to the queue in every loop cycle in the background but the worker will poll and execute the jobs as fast as possible while still ensuring that only one job at a time per type is running.
The queue will only remember the job's ID and reload the job configuration when it is time to execute them because the job configuration may have changed in the database since it was added to the queue.
A worker working the queue will always assume the queue was created by the worker so when all entries in the queue are processed the worker clears the queue before it ends so that another worker may be spawned in the future when a new job of the job type is created.
In the unlikely event that the loop cycle checks if the queue is empty to find out if a worker should be spawned and it is empty because the current worker is just processing the last entry while also a new entry was created since then a second worker is spawned for the same queue. Now two workers will poll from the same queue to empty it and both will try to remove it when done. This is not an issue as they will poll different jobs and the DB still limits execution to 1 job at a time. This means a job might try to run while the other worker's job just got started. It then fails to start and is re-added to the queue in the next loop cycle as a job that is ready to run.
The goal with this implementation is that while there is in-memory state the design is robust against getting stuck or out-of sync.
In a worst case a job might be delayed and run 1 loop cycle (20 seconds) later than it could have been running but it will eventually be picked up and run if it is ready.
Manual Testing
lastExecuted
time for the created jobs (jobType=METADATA_IMPORT
), they should be all started quickly after another and not be about 20 seconds apart, e.g./api/jobConfigurations/gist?filter=jobType:eq:METADATA_IMPORT