Skip to content

Latest commit

 

History

History
23 lines (16 loc) · 1.18 KB

README.md

File metadata and controls

23 lines (16 loc) · 1.18 KB

Diabetes Dataset - Detailed Analysis

This repository contains a detailed analysis of the Pima Indians Diabetes Database found on kaggle. Both predictive and descriptive analyses were performed, using various algorithms and information about Diabetes found in papers online. The document will be updated frequently, in order to implement new algorithms or ideas; thus, it can be viewed as a proof of principle of sorts.

Content

  • diabetes.csv files contains
    • 8 medical predictor factors: pregnancies, glucose, blood pressure, skin thickness, insulin, BMI, diabetes pedigree function and age
    • One target variable: outcome
    • Data from 768 female patients
  • *.ipynb files are Jupyter notebooks that document the research
  • utils.py contains all functions used for analysis
  • environment.yml used to create a conda environment

Jupyter notebooks

  • Report: main analysis and discussion

To see the notebooks, run jupyter notebook from the root directory of the project.

Acknowledgements

Special thanks to the Takeda Data Challenge, which took place in June 2018; it inspired me to work on this dataset extensively, and helped me greatly in finding my strenghts and weaknesses.