[24.2] Fix various job concurrency limit issues #19824

mvdbeek · 2025-03-17T16:54:07Z

I've added an additional check in job_wrapper.enqueue that only updates jobs below the limit. This should be multi-process / multi-thread safe.
The queries are essentially the same queries that are done in JobHandler.__check_user_jobs, JobHandler.__check_destination_jobs etc, but now it's all in in a single update statement.

I suppose performance might be a concern, however we still run through the (cached) checks before we decide to queue the job, so I think the cost is likely minimal. By integrating the limit check in the query i think it should become very unlikely that jobs can bypass limits in a multi handler scenario.

c088f9c fixes a bug where a resubmitted job would cause the cached user_job_count_per_destination / user_job_count values to start at 0.

How to test the changes?

(Select all options that apply)

I've included appropriate automated tests.
This is a refactoring of components with existing test coverage.
Instructions for manual testing are as follows:
1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

These are essentially the same queries that are done in `JobHandler.__check_user_jobs`, `JobHandler.__check_destination_jobs` etc, but now it's all in in a single update statement. I suppose performance might be a concern, however we still run through the (cached) checks before we decide to queue the job, so I think the cost is likely minimal. By integrating the limit check in the query i think it should become very unlikely that jobs can bypass limits in a multi handler scenario.

I assume these are limited on the job level ...

and before incrementing in-memory structures.

mvdbeek · 2025-03-18T14:54:05Z

whoa, all tests ran and are green! that's been a while

mvdbeek · 2025-03-24T15:30:24Z

This is on main now, job loop times seem unaffected, which is good. let's merge this ?

mvdbeek requested a review from natefoo March 17, 2025 16:54

github-actions bot added the area/jobs label Mar 17, 2025

mvdbeek added 5 commits March 17, 2025 21:28

Skip limit check for tasks

3c798e7

I assume these are limited on the job level ...

Add unit tests for queue_with_limit

3e767c9

Drop redundant queued state in pulsar

0b302d2

Don't bypass limits in slurm resubmission

1628fff

Rebuild caches after clearing cache

c088f9c

and before incrementing in-memory structures.

mvdbeek force-pushed the fix_limit_bypass branch from 30d6868 to c088f9c Compare March 18, 2025 13:10

mvdbeek changed the title ~~[24.2] Guard state update with limit queries~~ [24.2] Fix various job concurrency limit issues Mar 18, 2025

mvdbeek marked this pull request as ready for review March 18, 2025 13:25

github-actions bot added this to the 25.0 milestone Mar 18, 2025

mvdbeek requested a review from a team March 18, 2025 14:54

natefoo merged commit ecc4b47 into galaxyproject:release_24.2 Mar 24, 2025
57 checks passed

nsoranzo deleted the fix_limit_bypass branch March 24, 2025 23:23

nsoranzo added the kind/bug label Mar 24, 2025

galaxyproject deleted a comment from github-actions bot Mar 25, 2025

mvdbeek mentioned this pull request Mar 25, 2025

[24.2] Fix tabular metadata setting on pulsar with remote metadata #19891

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[24.2] Fix various job concurrency limit issues #19824

[24.2] Fix various job concurrency limit issues #19824

mvdbeek commented Mar 17, 2025 •

edited

Loading

mvdbeek commented Mar 18, 2025

mvdbeek commented Mar 24, 2025

[24.2] Fix various job concurrency limit issues #19824

[24.2] Fix various job concurrency limit issues #19824

Conversation

mvdbeek commented Mar 17, 2025 • edited Loading

How to test the changes?

License

mvdbeek commented Mar 18, 2025

mvdbeek commented Mar 24, 2025

mvdbeek commented Mar 17, 2025 •

edited

Loading