You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As per our conversation with @BrookJeynes earlier, our current handling of using either HEAD or GET requests to check for broken links is becoming complicated and harder to manage. Since we recently updated the code to include all non-successful status code responses as broken links, the total counts for broken links on our sites have blown out massively, with many reported inaccurately due to HEAD requests being used. Currently we get around this by adding these links to our Unscannable Links list, but with so many more URLs to deal with now this solution does not seem to scale well.
I discussed a possible solution being that we continue to use HEAD requests by default, but remove the Unscannable Links list from the equation and instead only check GET as a fallback if the HEAD request fails. This seems like a decent middle ground between having to run GET on every request (thus having to retrieve much more data per request) and having to maintain a list of problematic domains. Brook also believes this might be a good idea for tackling this issue, so we should discuss this option and investigate that this resolves this issue without having too much of a negative impact on our scans.
Tasks
Investigate this option
Implement solution if it seems to solve our issue
Thanks!
The text was updated successfully, but these errors were encountered:
After testing it was noted that since our GET requests do not handle the response body, we're not actually fetching any more additional data than you would with a HEAD request. With that change, we're now getting more accurate results and cutting down on the amount of false positives.
Figure: The amount of broken links has dropped dramatically with more accurate results
Cc: @tombui99 @william-liebenberg @BrookJeynes
Hi team
Pain
As per our conversation with @BrookJeynes earlier, our current handling of using either HEAD or GET requests to check for broken links is becoming complicated and harder to manage. Since we recently updated the code to include all non-successful status code responses as broken links, the total counts for broken links on our sites have blown out massively, with many reported inaccurately due to HEAD requests being used. Currently we get around this by adding these links to our Unscannable Links list, but with so many more URLs to deal with now this solution does not seem to scale well.
I discussed a possible solution being that we continue to use HEAD requests by default, but remove the Unscannable Links list from the equation and instead only check GET as a fallback if the HEAD request fails. This seems like a decent middle ground between having to run GET on every request (thus having to retrieve much more data per request) and having to maintain a list of problematic domains. Brook also believes this might be a good idea for tackling this issue, so we should discuss this option and investigate that this resolves this issue without having too much of a negative impact on our scans.
Tasks
Thanks!
The text was updated successfully, but these errors were encountered: