Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for warn versus error or ignore #292

Open
spkane opened this issue Mar 9, 2023 · 8 comments
Open

Add support for warn versus error or ignore #292

spkane opened this issue Mar 9, 2023 · 8 comments
Labels
enhancement New feature or request

Comments

@spkane
Copy link

spkane commented Mar 9, 2023

It would be nice to categorize HTTP error codes, and URL patterns that should be reported, but not trigger an error.

That way you can report on some less critical errors, that you still might want to fix, or at least be aware of.

At the moment, because things can only be ignored (or cause a failure), you may be forced to ignore a pattern, which will also make you blind to any actual failures that crop up later with that URL, etc.

@spkane
Copy link
Author

spkane commented Mar 9, 2023

Ideally, this would build on the feature in #291, but it could be done initially with just the pattern arguments.

@spkane
Copy link
Author

spkane commented Mar 9, 2023

To build on this a bit it would be really flexible, if I could set this globally or per pattern so that I could do something for www.unix.com which would allow me to ignore, or only warn on the 403 it always responds with, but still error on a 404, for that particular site. LinkedIn reports 999 on public profiles, for whatever reason, so that is another useful example.

--exclude {name=www.unix.com/man-page/linux, ignore=403, warn=308}
--exclude {name=linkedin.com/in/, ignore=999 }

*** INFO: [2023-03-09 17:48:22] Start checking: "https://example.com"
https://example.com/journal/unix-programming/
	403 (following redirect https://www.unix.com/man-page/linux/5/init/)	http://www.unix.com/man-page/linux/5/init/
*** ERROR: [2023-03-09 17:48:47] Something went wrong - see the errors above...

@raviqqe
Copy link
Owner

raviqqe commented Mar 16, 2023

What kinds of status codes do you want to mark as warnings? For example, is reducing 308 for SEO?

@raviqqe raviqqe moved this to Todo in Muffet Mar 16, 2023
@raviqqe raviqqe added this to Muffet Mar 16, 2023
@spkane
Copy link
Author

spkane commented Mar 17, 2023

As an example, one might want to know about a redirect so that it can eventually be fixed, without it actually throwing an error and therefore breaking a deployment of a website change.

@raviqqe
Copy link
Owner

raviqqe commented Mar 19, 2023

What is the size of your website? For example, how many pages and links does it have roughly?

@spkane
Copy link
Author

spkane commented Mar 31, 2023

It is not huge, but we do have a lot of long technical blog articles, that tend to link out to other sites, whose links and general behavior are more likely to change or become invalid over time.

@spkane
Copy link
Author

spkane commented Mar 31, 2023

I could see value in being able to pass this information in via a config file when there are a lot of rules, in addition to simply supplying a few options on the command line when the rules are very simple.

@Sieboldianus
Copy link

Sieboldianus commented Jun 2, 2023

I want to bump this issue/idea. I have a Hugo site with about 4500 links that I check via Gitlab CI. Basically everytime I add a new blog post the CI tests break and I need to update my exclude-list. Currently, the script looks like below with the ... meaning many more --exclude lines.

#!/bin/bash

LOCAL_HOST="http://localhost:1313/links/"
MAX_WAIT_TIME=60 # 30 sec
OPTIONS="--exclude 'reddit.com' \
         --exclude 'anaconda.org' \
         --exclude 'arxiv.org' \
         --exclude 'docker.com' \
         --exclude 'stackoverflow.com' \
         --exclude 'linuxize.com' \
         --exclude 'cyberciti.biz' \
         --exclude 'gitlab.yourgitlab.com' \
         --exclude 'openai.com' \
         --exclude '^*.webm$' \
         ...
         --ignore-fragments \
         --max-response-body-size 100000000 \
         --junit > rspec.xml"

for i in $(seq 0 ${MAX_WAIT_TIME}); do # 5 min
    sleep 0.5
    IS_SERVER_RUNNING=$(curl -LI ${LOCAL_HOST} -o /dev/null -w '%{http_code}' -s)
    if [[ "${IS_SERVER_RUNNING}" == "200" ]]; then
        eval muffet "${OPTIONS}" ${LOCAL_HOST} && exit 0 || exit 1
    fi
done

echo "error: time out $((${MAX_WAIT_TIME}/2)) sec" && exit 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Todo
Development

No branches or pull requests

3 participants