Skip to content

bayesomicslab/OUD-Risk-Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OUD-Risk-Prediction

Repository for the project Opioid Use Disorder Risk Modelling through Mobility and Genetic Feature Integration

About The Project

Approach diagram

Overview of our integrative approach for estimating disease risk.
The mobility trace and genetic data are preprocessed, then augmented to balance the genetic and mobility trace sample sizes. The augmented data is merged using a disease co-occurrence parameter ($C$), genetic relative risk ($G$), and mobility relative risk ($M$). In the modelling step, features and models are selected, and classifiers are trained to estimate OUD risk.

Data Sources

  • The preprocessed mobility and genetic data can be found under ./data/preprocessed_raw/
  • The extracted genetic variants data can be found under ./data/data_variants/

Getting Started

To set up the project locally, generate the hybrid datasets, perform feature and model selection, train against the datasests and evaluate the model performance, please follow the instructions below.

Prerequisites

The following python version is required to proceed to the steps below:

  • Python version: python (>=3.8)

Installation

Setup steps for running the simulation and modelling mechanisms.

  1. Clone the OUD-Risk-Prediction repository
    git clone https://github.com/bayesomicslab/OUD-Risk-Prediction.git
  2. Install dependencies
    pip install -r requirements.txt
  3. Run the commands described in the "Usage" section.

Usage

  1. Create synthetic hybrid datasets (mobility trace + genetic)
    python create_synthetic_datasets.py \
    --comorbidity=CO_OCCURRENCE_LEVEL \
    --rr_geno=GENOTYPE_RISK_RATIO\
    --rr_mt=MOBILITY_TRACE_RISK_RATIO\
    --n_sets=NUM_SETS_PER_CO_OCCURRENCE_RR_CONFIG\
    --out
  2. Feature selection on genotype data
    python feature_selection.py \
    --filename=FILENAME_FOR_MERGED_DATA \
    --out
  3. Model selection
    python model_selection.py \
    --merged_file=FILENAME_FOR_MERGED_DATA \
    --var_features=FILENAME_FOR_SELECTED_VARIANTS\
    --out
  4. Training and evaluation of models
    python select_train_test_f1.py \
    --dataset=CO_OCCURRENCE_LEVEL \
    --rr_geno=GENOTYPE_RISK_RATIO\
    --rr_mt=MOBILITY_TRACE_RISK_RATIO\
    --hyperparams=HYPERPARAMS_FILEPATH\
    --out

(back to top)

Authors

  • Derek Aguiar, Ph.D. (PI)
  • Sybille M. Légitime
  • Bing Wang, Ph.D.
  • Dipak Dey, Ph.D.
  • Kaustubh Prabhu
  • Devin J. McConnell

Acknowledgments

D.A. and S.L. were supported in part by the University of Connecticut’s Institute for Collaboration on Health, Intervention, and Policy (awarded to D.A.); B.W. was supported in part by the National Science Foundation grant IIS-1407205 (awarded to B.W.).

About

OUD-Risk-Prediction

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published