-
-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
muffet generates 403 on pixabay.com #189
Comments
Seems like it is by default.
Same results with curl -v --header 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0' https://pixabay.com/ I'm using: |
has the same problem. Both sites - cyberciti and pixabay - secure their sites with cloudflare. If you check the output, Cloudflare tries to check “browser”. <div class="cf-section cf-wrapper">
<div class="cf-columns two">
<div class="cf-column">
<h2 data-translate="why_captcha_headline">Why do I have to complete a CAPTCHA?</h2>
<p data-translate="why_captcha_detail">Completing the CAPTCHA proves you are a human and gives you temporary access to the web property.</p>
</div> This post about |
This pull request adds a new optional argument `--status-codes` to make the accepted HTTP response status codes configurable and solves #189 and #291. I use muffet to check all links on https://tinylog.org/. However, some websites (e.g. https://stackoverflow.com, https://www.baeldung.com, and https://mkyong.com/) respond with status code 403 instead of 200 to muffet. Therefore, I would like to accept 403 as valid HTTP response status code.
my site links to https://pixabay.com/ and if i check it with muffet it leads to a 403. tried to set a custom header but i still get a 403
i don't want to scrape pixabay but i would like to check all external links on my site. currently my workaround is to exclude the site.
same problem for
pexels.com
.looks like it is not really a problem of muffet as
wget
also produces a 403 but maybe muffet can do something about itThe text was updated successfully, but these errors were encountered: