-
-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
download_and_extract failure #34
Comments
I am not in favor of the idea of automatically creating the directory if it does not exist as typo can be quite common. For example, a typo like For the second case, the current implementation always download, extract and overwrite the data. We could add a In some case, we might interrupt the downloading or the extraction process, so we might need to handle this as well. For example, if the existing achieve size is incorrect, then we should remove it and redownload it. Also, if part of the files are already extracted, we could skip them, which can be done by modifying |
But |
And the issue I was having with my dataset is that the archive contained some read-only files. So creating the dataset a second time resulted in a |
If the extraction process is interrupted, |
Yup. Sounds good. |
That's right. Totally forgot that. So what's not done yet is to have a boolean argument |
- Add argument `overwrite` to several functions and methods - Add argument `verbose` to several functions and methods - Support sha256 hash check in `datasets.utils.download_url` - Support xz files in `datasets.utils.extract_archives`
|
Checking the existence of |
As for Basically, there should be a way (preferably the default one) to create a dataset that:
And I think that checking that an archive is correctly extracted can be an example of an unnecessary long-running task. Especially if the dataset is already converted to the MusPy format and you end up using the converted version, which you also do not check. |
RemoteDataset(download_and_extract=True)
fails if:In the first case, we can simply create the directory (it's only going to be used for this one dataset anyway). In the second case, we can skip extraction if the
.muspy.success
file exists. Or (to make sure the data is not corrupt) we could just check the files that exist before trying to overwrite them.The text was updated successfully, but these errors were encountered: