Running Machine Learning experiments

Machine learning experiments to understand the impact of ranking and filtering laboratory result features according to discriminative score, on the interpretability and performance of logistic regression and gradient boosting machine learning models. Experiments are conducted with a severe asthma mortality prediction case study.

Requirments

Python 3.6
Pandas
Sklearn library

Getting Paitent Mortality Labels and Discriminative Scores

Three preprocessed data files MedsConnections.csv, ProcedureConnection.csv and died.csv can be downloaded from Here.

Steps to run the experiments:

Copy the MIMIC III .csv files described Here into the directory data/input
Carry out steps 2-5 of the Mimic 3 benchmark scripts to generate episode data using the version of the scripts in this repository. (Note: MIMIC III benchmark scripts come from here with edits: https://github.com/YerevaNN/mimic3-benchmarks)

       cd mimic3benchmarks
       export PYTHONPATH=$PYTHONPATH:[PATH TO DIRECTORY]
       python scripts/extract_subjects.py ../data/input ../data/all_patients
       python scripts/validate_events.py ../data/all_patients
       python scripts/extract_episodes_from_subjects.py ../data/all_patients

Run the following script to exctract asthma patients

       cd code
       python extract_asthma.py

Run the following scripts in the codes directory to summarize hourly data for 48 hours in the ICU

       # extracts just lab values
       python extract_hourly.py
       
       # extracts labs+data demographics
       python extract_hourly_intervention.py --data-set=1
       
       # excracts labs+demographics+interventions
       python extract_hourly_intervention.py --data-set=2
       
       # extracts labs+demographics+interventions+triples
       python extract_hourly_intervention.py --data-set=3

Create (Train/Test) split in pickle file format by running the following script with these parameters where the following correspond:

0: labs
1: labs + demographics
2: labs + demographics+interventions
3: labs+demographics+interventions+triples

       python create_train_test_new_weight_pickle.py
       python create_train_test_pickle.py
       python create_train_test_new_weight_pickle.py --data-set=1
       python create_train_test_pickle.py --data-set=1
       python create_train_test_new_weight_pickle.py --data-set=2
       python create_train_test_pickle.py --data-set=2
       python create_train_test_new_weight_pickle.py --data-set=3
       python create_train_test_pickle.py --data-set=3

Run the followin python scripts to run the machine learning modelling experiments (each take 12-20 hours to run):

Logistic Regression:

# logistic regression, 11 clinically relevant labs:
python logistic_regression.py --top-fc=0 > ../data/output/logistic_regression0.out

#logistic regression, 42 lab features:      
python logistic_regression.py > ../data/output/logistic_regression1.out

# logistic regression, top 32 lab features:
python logistic_regression.py --top-fc=32 > ../data/output/logistic_regression2.out

#logistic regression, top 16 lab features:
python logistic_regression.py --top-fc=16 > ../data/output/logistic_regression3.out

# logistic regression, top 8 lab features:
python logistic_regression.py --top-fc=8 > ../data/output/logistic_regression4.out
   
# logistic regression, top 4 lab features:
python logistic_regression.py --top-fc=4 > ../data/output/logistic_regression5.out
   
# logistic regression, top 2 lab features:
python logistic_regression.py --top-fc=2 > ../data/output/logistic_regression6.out
   
# logistic regression, top 1 lab feature:
python logistic_regression.py --top-fc=1 > ../data/output/logistic_regression7.out
   
# logistic regression, 11 clinically relevant labs, demographics:
python logistic_regression.py --top-fc=0 --data-set=1 > ../data/output/logistic_regression0_d.out

# logistic regression, 42 labs, demographics:
python logistic_regression.py --data-set=1 > ../data/output/logistic_regression1_d.out

# logistic regression, top 32 labs, demographics:
python logistic_regression.py --top-fc=32 --data-set=1 > ../data/output/logistic_regression2_d.out

# logistic regression, top 16 labs, demographics:
python logistic_regression.py --top-fc=16 --data-set=1 > ../data/output/logistic_regression3_d.out

# logistic regression, 11 clinically relevant labs, demographics+interventions:
python logistic_regression.py --top-fc=0 --data-set=2 > ../data/output/logistic_regression0_i.out
   
# logistic regression, 42 labs, demographics+interventions:
python logistic_regression.py --data-set=2 > ../data/output/logistic_regression1_i.out

# logistic regression, top 32 labs, demographics+interventions:
python logistic_regression.py --top-fc=32 --data-set=2 > ../data/output/logistic_regression2_i.out
   
# logistic regression, top 16 labs, demographics+interventions:
python logistic_regression.py --top-fc=16 --data-set=2 > ../data/output/logistic_regression3_i.out

# logistic regression, 11 clinically relevant labs, demographics+interventions+triples:
python logistic_regression.py --top-fc=0 --data-set=3 > ../data/output/logistic_regression0_t.out

# logistic regression, 42 labs, demographics+interventions+triples:
python logistic_regression.py --data-set=3 > ../data/output/logistic_regression1_t.out
   
# logistic regression, top 32 labs, demographics+interventions+triples:
python logistic_regression.py --top-fc=32 --data-set=3 > ../data/output/logistic_regression2_t.out

# logistic regression, top 16 labs, demographics+interventions+triples:
python logistic_regression.py --top-fc=16 --data-set=3 > ../data/output/logistic_regression3_t.out

Gradient Boosting:

# gradient boosting, 11 clinically relevant labs:
python gradient_boosting.py --top-fc=0 > ../data/output/gradient_boosting0.out

# gradient boosting, 42 labs:
python gradient_boosting.py > ../data/output/gradient_boosting1.out
   
# gradient boosting, top 32 labs:
python gradient_boosting.py --top-fc=32 > ../data/output/gradient_boosting2.out
   
# gradient boosting, top 16 labs:
python gradient_boosting.py --top-fc=16 > ../data/output/gradient_boosting3.out

# gradient boosting, top 8 labs:
python gradient_boosting.py --top-fc=8 > ../data/output/gradient_boosting4.out
   
# gradient boosting, top 4 labs:
python gradient_boosting.py --top-fc=4 > ../data/output/gradient_boosting5.out

# gradient boosting, top 2 labs:
python gradient_boosting.py --top-fc=2 > ../data/output/gradient_boosting6.out

# gradient boosting, top 1 lab:
python gradient_boosting.py --top-fc=1 > ../data/output/gradient_boosting7.out
   
# gradient boosting, 11 clinically relevant labs+demographics:
python gradient_boosting.py --top-fc=0 --data-set=1 > ../data/output/gradient_boosting0_d.out
   
# gradient boosting, 42 labs+demographics:
python gradient_boosting.py --data-set=1 > ../data/output/gradient_boosting1_d.out
   
# gradient boosting, 32 labs+demographics:
python gradient_boosting.py --top-fc=32 --data-set=1 > ../data/output/gradient_boosting2_d.out
   
# gradient boosting, 16 labs+demographics:
python gradient_boosting.py --top-fc=16 --data-set=1 > ../data/output/gradient_boosting3_d.out
   
# gradient boosting, 11 clinically relevant labs+demographics+interventions:
python gradient_boosting.py --top-fc=0 --data-set=2 > ../data/output/gradient_boosting0_i.out
   
# gradient boosting, 42 labs+demographics+interventions:
python gradient_boosting.py --data-set=2 > ../data/output/gradient_boosting1_i.out

# gradient boosting, top 32 labs+demographics+interventions:
python gradient_boosting.py --top-fc=32 --data-set=2 > ../data/output/gradient_boosting2_i.out
   
# gradient boosting, top 16 labs+demographics+interventions:
python gradient_boosting.py --top-fc=16 --data-set=2 > ../data/output/gradient_boosting3_i.out

# gradient boosting, 11 clinically relevant labs+demographics+interventions+triples:
python gradient_boosting.py --top-fc=0 --data-set=3 > ../data/output/gradient_boosting0_t.out
 
# gradient boosting, 42 labs+demographics+interventions+triples:
python gradient_boosting.py --data-set=3 > ../data/output/gradient_boosting1_t.out

# gradient boosting, 32 labs+demographics+interventions+triples:
python gradient_boosting.py --top-fc=32 --data-set=3 > ../data/output/gradient_boosting2_t.out

# gradient boosting, 16 labs+demographics+interventions+triples:
python gradient_boosting.py --top-fc=16 --data-set=3 > ../data/output/gradient_boosting3_t.out

License and Citation

By committing your code to the MIMICIII-ML-experiments code repository you agree to release the code under the MIT License attached to the repository.

Roe KD, Jawa V, Zhang X, Chute CG, Epstein JA, Matelsky J, Shpitser I, Taylor CO. Feature engineering with clinical expert knowledge: A case study assessment of machine learning model complexity and performance. PloS one. 2020 Apr 23;15(4):e0231300.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
code		code
data		data
mimic3benchmarks		mimic3benchmarks
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Running Machine Learning experiments

Requirments

Getting Paitent Mortality Labels and Discriminative Scores

Steps to run the experiments:

Logistic Regression:

Gradient Boosting:

License and Citation

About

Releases

Packages

Languages

License

tirilab/MIMICIII-ML-experiments

Folders and files

Latest commit

History

Repository files navigation

Running Machine Learning experiments

Requirments

Getting Paitent Mortality Labels and Discriminative Scores

Steps to run the experiments:

Logistic Regression:

Gradient Boosting:

License and Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages