Diabetes Dataset - Detailed Analysis

This repository contains a detailed analysis of the Pima Indians Diabetes Database found on kaggle. Both predictive and descriptive analyses were performed, using various algorithms and information about Diabetes found in papers online. The document will be updated frequently, in order to implement new algorithms or ideas; thus, it can be viewed as a proof of principle of sorts.

Content

diabetes.csv files contains
- 8 medical predictor factors: pregnancies, glucose, blood pressure, skin thickness, insulin, BMI, diabetes pedigree function and age
- One target variable: outcome
- Data from 768 female patients
*.ipynb files are Jupyter notebooks that document the research
utils.py contains all functions used for analysis
environment.yml used to create a conda environment

Jupyter notebooks

Report: main analysis and discussion

To see the notebooks, run jupyter notebook from the root directory of the project.

Acknowledgements

Special thanks to the Takeda Data Challenge, which took place in June 2018; it inspired me to work on this dataset extensively, and helped me greatly in finding my strenghts and weaknesses.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Diabetes Dataset - Detailed Analysis

Content

Jupyter notebooks

Acknowledgements

Files

README.md

Latest commit

History

README.md

File metadata and controls

Diabetes Dataset - Detailed Analysis

Content

Jupyter notebooks

Acknowledgements