Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[24.2] Fix various job concurrency limit issues #19824

Merged
merged 6 commits into from
Mar 24, 2025

Conversation

mvdbeek
Copy link
Member

@mvdbeek mvdbeek commented Mar 17, 2025

I've added an additional check in job_wrapper.enqueue that only updates jobs below the limit. This should be multi-process / multi-thread safe.
The queries are essentially the same queries that are done in JobHandler.__check_user_jobs, JobHandler.__check_destination_jobs etc, but now it's all in in a single update statement.

I suppose performance might be a concern, however we still run through the (cached) checks before we decide to queue the job, so I think the cost is likely minimal. By integrating the limit check in the query i think it should become very unlikely that jobs can bypass limits in a multi handler scenario.

c088f9c fixes a bug where a resubmitted job would cause the cached user_job_count_per_destination / user_job_count values to start at 0.

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
    1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

These are essentially the same queries that are done in
`JobHandler.__check_user_jobs`, `JobHandler.__check_destination_jobs`
etc, but now it's all in in a single update statement.

I suppose performance might be a concern, however we still run through
the (cached) checks before we decide to queue the job, so I think
the cost is likely minimal. By integrating the limit check in the query
i think it should become very unlikely that jobs can bypass limits in a
multi handler scenario.
@mvdbeek mvdbeek requested a review from natefoo March 17, 2025 16:54
@mvdbeek mvdbeek changed the title [24.2] Guard state update with limit queries [24.2] Fix various job concurrency limit issues Mar 18, 2025
@mvdbeek mvdbeek marked this pull request as ready for review March 18, 2025 13:25
@github-actions github-actions bot added this to the 25.0 milestone Mar 18, 2025
@mvdbeek
Copy link
Member Author

mvdbeek commented Mar 18, 2025

whoa, all tests ran and are green! that's been a while

@mvdbeek mvdbeek requested a review from a team March 18, 2025 14:54
@mvdbeek
Copy link
Member Author

mvdbeek commented Mar 24, 2025

This is on main now, job loop times seem unaffected, which is good. let's merge this ?

@natefoo natefoo merged commit ecc4b47 into galaxyproject:release_24.2 Mar 24, 2025
57 checks passed
@nsoranzo nsoranzo deleted the fix_limit_bypass branch March 24, 2025 23:23
@galaxyproject galaxyproject deleted a comment from github-actions bot Mar 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants