You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is it safe to stop and then resume the fetching process with ctrl+c?
Can that lead to data corruption or inconsistency?
Does fetch_links.py properly save data before exiting?
The text was updated successfully, but these errors were encountered:
Yes, it is safe to quit and re-start, although it is not well tested. I would personally wait until the script outputs "got %s links, wrote %s and %s comments", which happens every 10 links/posts. Waiting for this will avoid quitting the script while write_links() is running.
Since data is written every 10 links/posts, by quitting the most data you will lose is 10 links.
Looking at this in more detail, it is safe to re-run a data fetch, but it's not smart at all. It will re-download all of the data and just refuse to write it to disk since it's already there. (edit: it will skip fetching comments if the link is already written to file)
So for now, for sanity/efficiency, you have to check to see the last date that you downloaded data for, and start your next run on that same date.
I think a 'resume' flag could be added that will skip ahead to by date based on what's already on disk. Maybe we can improve on ctrl+c as well. We can leave this one open.
Is it safe to stop and then resume the fetching process with ctrl+c?
Can that lead to data corruption or inconsistency?
Does
fetch_links.py
properly save data before exiting?The text was updated successfully, but these errors were encountered: