You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some datasets have a top-level directory in the archive that has just a single node in it.
This is unnecessary and creates much longer filepaths than needed.
Description of a solution
Instead it would be nice to remove top-level directories by default if they only have a single node.
Example:
after extracting the archive archive.zip into the directory archive we get the following filestructure:
If you have the capacity, it would be nice to check the extracted file structure for the existing cases too.
It should be sufficient to just check if the respective directory has the expected set of files.
Description of the problem
Some datasets have a top-level directory in the archive that has just a single node in it.
This is unnecessary and creates much longer filepaths than needed.
Description of a solution
Instead it would be nice to remove top-level directories by default if they only have a single node.
Example:
after extracting the archive
archive.zip
into the directoryarchive
we get the following filestructure:This feature would then instead create the following structure:
But in case of
The structure would have to be left untouched.
Minimum acceptance criteria
Dataset.extract()
andutils.archives.extract()
remove_top_level: bool = True
)tests/utils/archives_test.py
The text was updated successfully, but these errors were encountered: