IMPACT

This repository provides the dataset, models and code for paper "Interpretable machine learning prediction of all-cause mortality". Please read our preprint at the following link: https://www.medrxiv.org/content/10.1101/2021.01.20.21250135v2

Data

NHANES

Please find the NHANES data in ./data/NHANES/NHANES.csv. Here are the mortality labels:

mortstat: mortality status (0=Assumed alive, 1=Assumed deceased
permth_int: person months from the date of interview to the date of death or the end of the mortality period
x_year_label (x=1,2,3,4,5): the label for x-year mortality prediction More description of the features can be found in ./data/NHANES/NHANES_feature_list.csv

UK Biobank

The overlapping features' information between NHANES and UK Biobank can be found in ./data/UKB/overlapped_NHANES_UKB.csv

Data cannot be shared publicly by the authors because of information governance restrictions around health data. The data can however be downloaded following a project approval process by the UK Biobank. Researchers wishing to access the data can apply directly to the UK Biobank https://www.ukbiobank.ac.uk/enable-your-research/apply-for-access and the process involves registering on the access management system, submitting a research study protocol and paying a fee directly to the UK Biobank. UK Biobank is an open access research resource for researchers and accepts applications with no restriction.

Models

All of the mortality prediction models are available in model/NHANES_xxx/. Please find the input feature list in ./model/model_features.csv. We also share the explicands and SHAP values:

model.pickle.dat: the trained xgboost model
fore_data.csv: the explicands when calculating the SHAP values
shap_values.npy: the SHAP values

The model for IMPACT-20 mortality risk scores are available in ./model/IMPACT-20/. Please find the input feature list in ./model/IMPACT-20/IMPACT-20_features.csv

The input feature list can be also obtained by import pickle model = pickle.load(open(model_path, 'rb') print(model.get_booster().feature_names)

Code

Model training

shap_NHANES_classification.py: code for mortality prediction model training and SHAP values calculation
mortality_risk_scores_feature_elimination.ipynb: code for mortality risk scores training and feature elimination
supervised_distance_feature_elimination.ipynb: code for supervised distance calculation and supervised distance-based feature elimination approach

Visualization

visualization.ipynb process results from the IMPACT framework and generate figures presented in the paper:

SHAP summary plot
SHAP values plot
SHAP main effect plot: please calculate the SHAP interaction values using shap_NHANES_classification.py before generating the SHAP main effect plot
SHAP interaction plot: please calculate the SHAP interaction values using shap_NHANES_classification.py before generating the SHAP main effect plot
SHAP individualized plot
Partial dependence plot for reference interval

Dependencies

This software was originally designed with Python 3.6.13. Standard python software packages used: numpy (1.20.3), pandas (1.3.2), scikit-learn (0.22.2 or 0.21.3), shap (0.39.0), matplotlib (3.4.2).

The ./model/IMPACT-20/IMPACT_5_year_top20.pickle.dat and ./model/IMPACT-20/IMPACT_5_year_Demo_Lab_top20.pickle.dat must be loaded using scikit-learn (0.21.3).

References

If you find IMPACT useful for your work, please consider citing our preprent:

@article{qiu2022interpretable,
  title={Interpretable machine learning prediction of all-cause mortality},
  author={Qiu, Wei and Chen, Hugh and Dincer, Ayse Berceste and Lundberg, Scott and Kaeberlein, Matt and Lee, Su-In},
  journal={medRxiv},
  pages={2021--01},
  year={2022},
  publisher={Cold Spring Harbor Laboratory Press}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IMPACT

Data

NHANES

UK Biobank

Models

Code

Model training

Visualization

Dependencies

References

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
data		data
model		model
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
mortality_risk_scores_feature_elimination.ipynb		mortality_risk_scores_feature_elimination.ipynb
shap_NHANES_classification.py		shap_NHANES_classification.py
supervised_distance_feature_elimination.ipynb		supervised_distance_feature_elimination.ipynb
visualization.ipynb		visualization.ipynb

License

suinleelab/IMPACT

Folders and files

Latest commit

History

Repository files navigation

IMPACT

Data

NHANES

UK Biobank

Models

Code

Model training

Visualization

Dependencies

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages