Repository for the project Opioid Use Disorder Risk Modelling through Mobility and Genetic Feature Integration
Overview of our integrative approach for estimating disease risk.
The mobility trace and genetic data are preprocessed, then augmented to balance the genetic and mobility trace sample sizes. The augmented data is merged using a disease co-occurrence parameter (
- The preprocessed mobility and genetic data can be found under
./data/preprocessed_raw/
- The extracted genetic variants data can be found under
./data/data_variants/
To set up the project locally, generate the hybrid datasets, perform feature and model selection, train against the datasests and evaluate the model performance, please follow the instructions below.
The following python version is required to proceed to the steps below:
- Python version: python (>=3.8)
Setup steps for running the simulation and modelling mechanisms.
- Clone the OUD-Risk-Prediction repository
git clone https://github.com/bayesomicslab/OUD-Risk-Prediction.git
- Install dependencies
pip install -r requirements.txt
- Run the commands described in the "Usage" section.
- Create synthetic hybrid datasets (mobility trace + genetic)
python create_synthetic_datasets.py \ --comorbidity=CO_OCCURRENCE_LEVEL \ --rr_geno=GENOTYPE_RISK_RATIO\ --rr_mt=MOBILITY_TRACE_RISK_RATIO\ --n_sets=NUM_SETS_PER_CO_OCCURRENCE_RR_CONFIG\ --out
- Feature selection on genotype data
python feature_selection.py \ --filename=FILENAME_FOR_MERGED_DATA \ --out
- Model selection
python model_selection.py \ --merged_file=FILENAME_FOR_MERGED_DATA \ --var_features=FILENAME_FOR_SELECTED_VARIANTS\ --out
- Training and evaluation of models
python select_train_test_f1.py \ --dataset=CO_OCCURRENCE_LEVEL \ --rr_geno=GENOTYPE_RISK_RATIO\ --rr_mt=MOBILITY_TRACE_RISK_RATIO\ --hyperparams=HYPERPARAMS_FILEPATH\ --out
- Derek Aguiar, Ph.D. (PI)
- Sybille M. Légitime
- Bing Wang, Ph.D.
- Dipak Dey, Ph.D.
- Kaustubh Prabhu
- Devin J. McConnell
D.A. and S.L. were supported in part by the University of Connecticut’s Institute for Collaboration on Health, Intervention, and Policy (awarded to D.A.); B.W. was supported in part by the National Science Foundation grant IIS-1407205 (awarded to B.W.).