UKB_API

This library has been designed to provide convenient access to UKB dataset to users of PSC and Openmind7 clusters by leveraging Datalad package. Datalad allows its users to store large datasets in a distributed manner and faciliate easier collaboration. Currently the repo has modules dedicated to three types of data stored within the UKB dataset namely Scalar, Genetic and Bulk data types. The library also serves as a directory for all the unique categories and field ids which are part of UKB dataset.

Datalad is designed as a tool to version control large datasets in a manner quite similar to Git but limited to either a single network, HPC cluster. This library has been designed to help the end user make use the principles of datalad to access data stored on a different cluster and even with limited knowledge of how datalad works.

Additionally this repository also houses the notebook files related to this project showcasing exploratory data analysis done on multiple categories.The modules are currently designed to give access to the necessary data through the use of maximum of three-four lines of code.

Documentation

[Documentation for this repo can be found here]

Installation

pip install ukb-api==0.370

Import relevant modules and initilize objects

from UKBRepo.UKBRepo import module_scalar_data_handler as scalar_module

scalar_handler_object=scalar_module.scalar_data_handler()

Pick the main category to which your datatype belongs (T1_Images/Freesurfer/Diet/Smoking)

scalar_handler_object.display_all_ukb_categories()

Fetch relevant field for that Category

bulk_handler_object.get_field_ids_for_category(Category_Name)

Retrive the list of subjects who have those particular field ids

bulk_handler_object.get_subject_list_field_ids(Field_Id_List)

Retrive relevant data for the subjects

bulk_handler_object.get_data_bulk(Field_Id,subject_id)

After executing the funtion of data retrival,the output user receives depends upon the type of data bieng requested. In case of scalar data the user will recive the actual data in the output whereas in case of bulk data it will be the path where the fetched bulk data files have been stored.

Requirements

Datalad
Git-annex
Pandas
Numpy

Refrences

http://handbook.datalad.org/en/latest/

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
docs		docs
notebooks		notebooks
pytest		pytest
scripts		scripts
static_resources		static_resources
ukb_api		ukb_api
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UKB_API

Documentation

Installation

Requirements

Refrences

About

Releases

Packages

Languages

License

batmanlab/ukb_api

Folders and files

Latest commit

History

Repository files navigation

UKB_API

Documentation

Installation

Requirements

Refrences

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages