Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Last response on a --max-redirect limit is not written to WARC #487

Open
JustAnotherArchivist opened this issue Aug 21, 2024 · 0 comments
Open
Labels

Comments

@JustAnotherArchivist
Copy link
Contributor

When using --max-redirect, wpull permits at most that number of redirects before erroring out. However, the redirect that reaches the limit and therefore triggers the error is never written to the WARC output. For common redirect loops, that is not a major issue since there will usually be multiple identical redirect responses. But that's not always the case. It also means that it's impossible to only capture a redirect with wpull without ever following it: --max-redirect 0 will correctly raise an error on the first 3xx response, but the WARC will only contain the request. (This, for example, means it's not possible to work around #425 by first fetching the redirects and then running the redirect targets in a separate process.) See also #390 for a similar bug where a syntactically fine but semantically problematic redirect response isn't written to WARC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant