-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ci] fix linux runners running out of disk space (fixed #6635) #6636
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
jameslamb
changed the title
WIP: [ci] fix linux runners running out of disk space
[ci] fix linux runners running out of disk space (fixed #6635)
Sep 2, 2024
jameslamb
requested review from
guolinke,
shiyu1994,
jmoralez,
borchero and
StrikerRUS
as code owners
September 2, 2024 05:30
borchero
reviewed
Sep 2, 2024
borchero
approved these changes
Sep 2, 2024
Thanks very much for the review @borchero ! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
fixes #6635
It seems that CI jobs on the self-hosted linux runner pool in Azure DevOps are failing because those runners don't have enough free disk space.
I found that around 55% of the disk space on those runners is occupied by no-longer-user container images (#6635 (comment)). This proposes introducing one new CI job that deletes images that are more than 30 days old and not currently in usage by any containers.
Notes for Reviewers
Why not run this cleanup on every CI job?
It'd add around 30-45 seconds to every run of every Azure DevOps linux CI job.
I don't think that's necessary... running this cleanup once per CI run (one one randomly-assigned runner in the self-hosted pool) should hopefully be enough to prevent the disk space from filling up way again.
I'm not sure how many total runners there are in the pool introduced in #6407 ... @shiyu1994 could you tell us? That would help with understanding how many runs might be required to clean up every node at least once.
How to test this
On any Azure DevOps run, check the
warnings
tab(example build link)
Over the next few days, we should see the number of such warnings decrease... and eventually see 0 warnings related to disk usage on the linux runners.
Also check the output of the
Maintenance
job.(example build link)
The end of those runs should show only 2 container images on the the host... the one used on the
Linux
jobs and the ones used on theLinux_latest
jobs.Over the next few days, we should see log messages like "Total reclaimed space: 14.12GB". Those should eventually stop showing up, as all the runners are cleaned up.
So is this job temporary?
No, I'm proposing this as a permanent addition to LightGBM's CI.
That way, we'll automatically be protected against disk-space issues in situations like the following:
ubuntu-latest
(via new Ubuntu version or just security patches pushed to the official Ubuntu images)