Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve link checking #742

Merged
merged 2 commits into from
Oct 30, 2023
Merged

Improve link checking #742

merged 2 commits into from
Oct 30, 2023

Conversation

zacharykeeping
Copy link
Member

@zacharykeeping zacharykeeping commented Oct 26, 2023

This implements a few changes to the link scanning code to improve accuracy and cut down on the amount of false positives:

  1. Switches to GET by default as we can confirm that we're not downloading much additional data as we're not handling the response bodies of requests. This also includes code to cancel and close any connections after we've recorded the response status data to be safe.
  2. Updates headers on requests to better match that of a web browser to ensure results are more accurate.
  3. Updates the timeouts in the http client to prevent connections ending too quickly.
  4. Cuts down on the amount of "0 - No Response" errors as a lot of these seem to be simply connections closing and not actual errors. This now specifically checks for DNS errors (ie if a host no longer exists) and records these as Host Errors while filtering out other errors.
  5. Ignores 429 Too Many Request errors since we're not able to get accurate results so we cannot definitively count them as broken. We'll have to investigate ways around this going forward (time retries, etc.).
  6. Updates Go to the latest version.

#674 #732

@zacharykeeping zacharykeeping marked this pull request as ready for review October 26, 2023 22:28
Copy link
Contributor

@tombui99 tombui99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tombui99 tombui99 merged commit 6aba695 into staging Oct 30, 2023
1 check passed
@tombui99 tombui99 deleted the link-scan-improvements branch November 9, 2023 04:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants