I. 💉 ARDS Survival Prediction

Survival prediction on ECMO treatment on patients diagnosed with ARDS

🧛🏼‍♂️Background

🌱 Motivation

Thesis project for MSc Biostatistics at University of Glasgow 2019

📓 Notebooks

I will include a write up similar to the final thesis in an html RMarkdown.

📁 Datasets

ARDSdata.csv - I don't have information on who, where, how this data was optained.

II. 📦 Packages

Caret 🥕

Books 📚

Applied Predictive Modeling

Vignettes 🎻

Parallel Computing 💾

A Quick Intro to Parallel Computing in R

Vignette 🎻

Getting Started with doParallel and foreach

Imputation 🐁

The mice package implements a method to deal with missing data. The package creates multiple imputations (replacement values) for multivariate missing data. The method is based on Fully Conditional Specification, where each incomplete variable is imputed by a separate model. The MICE algorithm can impute mixes of continuous, binary, unordered categorical and ordered categorical data. In addition, MICE can impute continuous two-level data, and maintain consistency between imputations by means of passive imputation. Many diagnostic plots are implemented to inspect the quality of the imputations.

-Package 'mice'

Books 📚

Van Buuren, S. (2018). Flexible Imputation of Missing Data. Second Edition.. Chapman & Hall/CRC. Boca Raton, FL.

Vignettes 🎻

More Examples

III. 🛠️ Methods

Logistic Regression + LASSO Regularization

References

[Regression Shrinkage and Selection via the Lasso (Tibshirani, 1996)(Tibshirani, Robert (1996). "Regression Shrinkage and Selection via the lasso". Journal of the Royal Statistical Society. )

Linear Discriminant Analysis

References

Vignettes 🎻

Foundation of LDA and QDA for prediction, dimensionality reduction or forecasting

Quadratic Discriminant Analysis

References

Vignettes 🎻

Foundation of LDA and QDA for prediction, dimensionality reduction or forecasting

K-Nearest Neighbors

References

Hechenbichler K. and Schliep K.P. (2004)Weighted k-Nearest-Neighbor Techniques and OrdinalClassification, Discussion Paper 399, SFB 386, Ludwig-Maximilians University Munich (http://www.stat.uni-muenchen.de/sfb386/papers/dsp/paper399.ps)
Hechenbichler K. (2005)Ensemble-Techniken und ordinale Klassifikation, PhD-thesis
Samworth, R.J. (2012)Optimal weighted nearest neighbour classifiers.Annals of Statistics, 40,2733-2763. (avaialble from http://www.statslab.cam.ac.uk/~rjs57/Research.html)

Random Forests

References

Support Vector Machines

References

Vignettes 🎻

SVM Classifier Implementation in R with caret package

IV. Misc Topics

Imbalanced Data Sets

Vignettes 🎻

Rank Deficiency

Ran into some rank deficiency problems when training QDA on imputed datasets

What is Rank Deficiency?

Tuning Parameters 📻

Tune ML Algorithms in R
Accuracy Metrics in caret
- Accuracy & Cohen's Kappa - Kappa good for imbalanced datasets
- ROC & AUC - Good for binary outcome
- RMSE & R^2 - Good for continuous outcome
- Logarithmic Loss - Good for multilevel outcome. A linear translation of the log likelihood for a binary outcome model (Bernoulli trials).

V. To-Do

Save trained models
Try out caret "recipes" (http://topepo.github.io/caret/using-recipes-with-train.html)
ROC plots
Performance Tables
Switch to "caret" package
Add Parallel Processing

Bibliography

Multiple Imputation and Ensemble Learning for Classification with Incomplete Data
- Method: Build multiple datasets using Multilple Imputation. Use Ensemble method to combine classification results.
- Results: .."using the diversity of imputed datasets in an ensemble method leads to a more effective classifier."
- Notes: Flow charts showing imputation / training/ ensembling
Methods to Combine Multiple Imputations
- Method: Ensemble method / stacked dataset with dummy variables for imputed values
- Results: No Sources
Inference and Missing Data
Notes: Original paper on missing data - MCAR, MAR, MNAR
Classification Uncertainty of Multiple Imputed Data
- Method: Random Forest imputation with trees = 10
- Notes: Discusses methods for making classifications on imputed data.
- Keywords: White paper, Uncertainty measures, Discussion of Rubin's Rules
Handling missing values in kernel methods with application to microbiology data
- Method:
  1. Concatenate the multiple imputed data sets and optimize an SVM classifier in the resulting set; this not only accounts for the variability of the parameter estimates but also for the variability of the training observations in relation to the imputed values. (IMI Algorithm) In the first algorithm the training data set was imputed m times, merged into a single large data set, and then used to train a classifier, SVM in this case. Test data set was then concatenated with the stacked training data set, imputed m times, and extracted from the training samples for prediction. Each of the m now complete test data sets were used for prediction using the classifier that was trained in the previous step. Therefore, for each sample in the test data set, m predictions were produced and a majority vote was used to form the final prediction for each test sample.
  2. A more standard procedure, involves fitting separate SVMs to each imputed data set and get the pooled (i.e., averaged) performance of the different SVMs. In the second algorithm, training data was again imputed m times and for each of the complete data sets, a classifier was trained. Test data set was then concatenated with each of the m imputed training data sets, imputed once (i.e. m=1) and then used for prediction. Again, m predictions were produced and a majority vote was used to form the final prediction.
- The IMI algorithm was determined to work generally better.
- Notes: Two general principles should be kept in mind for performing MI at test time:
  1. Imputation of test data must be done in test time, that is, it is not possible to do the imputation of all data altogether (training and test).
  2. When imputing the missing values in test data, it is not possible to use the class (target) variable for the imputation (only the predictors can be used).
- Keywords: Kendall coefficient, Proportion of Useable Cases
Multiple Imputation for Nonresponse in Surveys
- Notes: Rubin's Rules (pooling estimates)
Khan, S. Shehroz and Ahmad, Amir and Mihailidis, Alex (2018) Bootstrapping and Multiple Imputation Ensemble Approaches for Missing Data.
- Notes: Good literature review on Multiple Imputation and ensembling results.
- Keywords: Ensemble, Multiple Imputation, Bagging, Flow Chart, Classifier Fusion Techniques, Expectation Maximization
Impact of imputation of missing values on classification error for discrete data
- Notes: Comparison of imputation methods. Good write up on missing data and imputation methods.
Barnard, J. and Rubin, D.B. (1999). Small sample degrees of freedom with multiple imputation. Biometrika, 86, 948-955.
Rubin, D.B. (1987). Multiple Imputation for Nonresponse in Surveys. New York: John Wiley and Sons.
van Buuren S and Groothuis-Oudshoorn K (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3), 1-67. https://www.jstatsoft.org/v45/i03/
- Notes: MICE package

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
info		info
papers		papers
project		project
thesis		thesis
viva		viva
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

License

rtedwards/ARDS-classification

Folders and files

Latest commit

History

Repository files navigation

I. 💉 ARDS Survival Prediction

🧛🏼‍♂️Background

🌱 Motivation

📓 Notebooks

📁 Datasets

II. 📦 Packages

Caret 🥕

Books 📚

Vignettes 🎻

Parallel Computing 💾

Vignette 🎻

Imputation 🐁

Books 📚

Vignettes 🎻

More Examples

III. 🛠️ Methods

Logistic Regression + LASSO Regularization

References

Linear Discriminant Analysis

References

Vignettes 🎻

Quadratic Discriminant Analysis

References

Vignettes 🎻

K-Nearest Neighbors

References

Random Forests

References

Support Vector Machines

References

Vignettes 🎻

IV. Misc Topics

Imbalanced Data Sets

Vignettes 🎻

Rank Deficiency

Tuning Parameters 📻

V. To-Do

Bibliography

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages