The library collects utilities to download data used for research and teaching. In the future we might add some very basic tools for preprocessing.
Make sure you have at least python3.10
installed.
Then you can install the package via
$ pip install git+ssh://[email protected]:es-ude/data.git
We recommend to install the uv package manager. Afterwards you can use
$ uv init --python 3.12 my-project
$ cd my-project
$ uv python install python3.12
$ uv add git+ssh://[email protected]:es-ude/data.git
$ uv sync
$ source .venv/bin/activate
You can use
from iesude.data import MitBihAtrialFibrillationDataSet as AFDataSet
d = AFDataSet.download("my_data_dir")
This will download the data from our public sciebo share into a tmp directory
and extract the contents into a folder called my_data_dir
.
If you want to download your data again you need to delete that directory.
You need write access to our sciebo share.
Upload your dataset, preferrably as a zip file, since support for compressed tar archives is not implemented yet.
Assuming you stored your data under "myproject/dataset01.zip"
.
You define your new dataset like so
from iesude.data import DataSet, Zip as ZipArchive
MyNewDataSet = DataSet(file_path = "myproject/dataset01.zip", file_type=ZipArchive)
- automatic download from UDE IES sciebo share
- automatic archive extraction into a given folder
- supported file types are
.zip
archive- uncompressed
.tar
archive - plain files (download a file directly to your folder without extraction)
-
tar.gz
-
tar.xz
- override share endpoint in user config
- upload data sets
- automatically put descriptions/readmes for uploaded datasets in github repo
- autogenerate classes when uploading a data set
- upload checksums for datasets
- use checksums instead of directory presence to decide whether or not to download datasets
- download progress bar
- logging
To contribute, clone the repository and install uv (link in the install section). Additionally also install pre-commit, e.g., like so
$ uv tool install pre-commit
Alternatively you can use devenv environment for reproducible, declarative and easy to use setup. It will take care of
- installing
uv
and calling it to install python and the dev dependencies - most importantly it will install and setup pre-commit
Follow the 2 steps from the devenv getting started guide.
Additionally, we recommend you use direnv to activate devenv automatically upon entering the project in a shell (also supported by several IDE/Editor plugins).