-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Index of local repository may not reflect its data holdings #104
Comments
ROVER's notion of data holdings is in the index database. If a user is manually removing data files from a repository and not modifying the index, I do not think that is an issue for ROVER to solve. There is a command to (re)index data files (i.e. |
I forgot about the index command. Interestingly, if the data holdings are indexed after data is manually removed, running We could put an indexing step into This patch brings us a step close to creating the delete function outlined in issue #21 and it also prevents Rover from presenting false information to the user. It seems like adding this audit step keeps ROVER's index and data holdings aligned no matter what happens to the data. It is a way for the program to automatically check for errors that could be human or computer derived. |
Yes, it appears that the reindexing smartly handles the manually removed data, but see more below. The reason this is not done automatically is because it can be a huge operation checking terabytes of data. If a user manually modifies the data files "under" ROVER, they should not expect ROVER to automagically figure it out, it is perfectly reasonable for the user to issue the With that said, I did find an apparent bug in the (re)indexing. Repeatable test case below. Download 15 days of data and manually remove the first 10 days:
Now list the index, which shows all 15 days:
Now (re)index and list the index again:
Empty!?!! Oops, something is wrong. Do the exact same steps, (re)index and list the index again:
Now it is back!? This is what I would have expected after the first |
After downloading the request
IU ANMO * LHZ 2012-01-01T00:00:00 2012-02-01T00:00:00
and then manually deleting multiple data directories
rm -r ./data/IU/2012/01*
rover list-summary
returns:and
rover list-index
returns (only included location code 00 for readability):which does not reflect the local repo's data holdings. Furthermore, there is no rover command that allows the user to reindex the data holdings.
When
rover retrieve request.txt
is run on this use case, all of the requested data is collected. It seems like we should expect either none of the data to be collected (if the local repo's index is being compared to the availability service) or only the missing data to be collected (if the data repo is re-indexed and is compared to the availability service). The latter is the correct use case.The text was updated successfully, but these errors were encountered: