Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci] fix linkchecker job #6757

Merged
merged 5 commits into from
Dec 16, 2024
Merged

[ci] fix linkchecker job #6757

merged 5 commits into from
Dec 16, 2024

Conversation

jameslamb
Copy link
Collaborator

@jameslamb jameslamb commented Dec 15, 2024

Fixes #6756

The relative path we pass to its config file needs to be updated (I think I broke this in #6497).

Notes for Reviewers

How I tested this

Manually triggered the link check workflow from this branch:

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works now, thanks a lot for quick fix!

As for the

Warning    [http-rate-limited] Rate limited (Retry-After: None)
Result     Valid: 429 Too Many Requests

errors, could you please try decreasing the maxrequestspersecond for one host to 0.1?

maxrequestspersecond=1

Refer to

Values less than 1 and at least 0.001 can be used.

It's now possible to use values less than 1.

@jameslamb
Copy link
Collaborator Author

Just pushed 5265786 with that change. I also put a floor on linkchecker pointing at the latest version... to be sure we always get a version that respects values less than 1 and to make installation a little faster.

Thanks for the references and the suggestion! That saved me some effort figuring out how to get around those errors 😊

new test build: https://github.com/microsoft/LightGBM/actions/runs/12335867947

@StrikerRUS
Copy link
Collaborator

StrikerRUS commented Dec 15, 2024

new test build

In case of successful run, we can try to increase threads up to default 10, I guess.

I hope that linkchecker ensures that requests to one host are done from a single thread.
linkchecker/linkchecker#193

@jameslamb
Copy link
Collaborator Author

Oh interesting, yeah hopefully that tool divides the hosts across threads instead of just dividing links. Pushed 4061211 setting threads=10, let's see how it goes: https://github.com/microsoft/LightGBM/actions/runs/12336243590

@jameslamb
Copy link
Collaborator Author

It's still taking a very long time with nthreads=10... maybe even slower than with nthreads=1: https://github.com/microsoft/LightGBM/actions/runs/12336243590/job/34428554660

I guess because the GitHub Actions Linux runners only have 4 vCPUS: https://docs.github.com/en/actions/using-github-hosted-runners/using-github-hosted-runners/about-github-hosted-runners#standard-github-hosted-runners-for-public-repositories

nthreads=10 might be oversubscribing them (just like we see with LightGBM and recommend against:

#' @param num_threads Number of parallel threads to use. For best speed, this should be set to the number of
#' physical cores in the CPU - in a typical x86-64 machine, this corresponds to half the
#' number of maximum threads.

Trying a build limiting to 4 threads: https://github.com/microsoft/LightGBM/actions/runs/12336463177

@StrikerRUS
Copy link
Collaborator

Heh, still 48 minutes with 4 threads...

Seems that maxrequestspersecond has higher priority over threads and they don't play well with each other 😞

Anyway, we have successful cron job with less than 1 hour execution time that is run once a day. I believe it's totally fine and we shouldn't spend more time for the speedup!

@jameslamb jameslamb changed the title WIP: [ci] fix linkchecker job [ci] fix linkchecker job Dec 16, 2024
@jameslamb jameslamb marked this pull request as ready for review December 16, 2024 01:19
@jameslamb
Copy link
Collaborator Author

we have successful cron job with less than 1 hour execution time that is run once a day. I believe it's totally fine and we shouldn't spend more time for the speedup!

Yep I agree! I'll merge this.

@jameslamb jameslamb merged commit 8eb3c3c into master Dec 16, 2024
49 checks passed
@jameslamb jameslamb deleted the ci/fix-linkchecker branch December 16, 2024 01:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ci] Link check job is broken
2 participants