-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: ability to only save pages that haven't been archived yet #30
Comments
Just chiming in that I think this would be really slow. |
If I wanted to do this on the text files I have myself, I would currently need to do this:
It takes around half an hour (or more) to hand-process each of those files. If I have multiple files, this would get tedious pretty quickly; merging and un-merging them would add two additional steps to the already large method (I wouldn't even know how to separate them after they get processed). If the script could do this, it would not only make this method outdated, but it would also be quicker since it doesn't need to do every URL at once. |
This should be already possible. |
"The capture will start in ~ seconds because we are doing a lot of captures of ~ ~ right now" When this message appears, it seems that the archive will be duplicated. |
There should be an option that allows you to check in the Wayback Machine if it has already been archived. For example, if you have a bunch of text files and only want to send requests for URLs with no archived page (i.e. first archive of a page), this setting can help.
Other things to consider when adding this is how long ago should it check. Maybe the option can work by adding the option, and then the timestamp?
The text was updated successfully, but these errors were encountered: