The dataset consists of 70 000
records of patients data: 11
features and a target variable.
Input features could be considered as belonging to the following 3
types:
- Objective: factual information;
- Examination: results of medical examination;
- Subjective: information given by the patient.
Features:
Feature Description | Feature Type | Feature Name | Data Type |
---|---|---|---|
Age |
Objective Feature | age |
int (days) |
Height |
Objective Feature | height |
int (cm) |
Weight |
Objective Feature | weight |
float (kg) |
Gender |
Objective Feature | gender |
Categorical code |
Systolic blood pressure |
Examination Feature | ap_hi |
int |
Diastolic blood pressure |
Examination Feature | ap_lo |
int |
Cholesterol |
Examination Feature | cholesterol |
1 : normal , 2 : above normal , 3 : well above normal |
Glucose |
Examination Feature | gluc |
1 : normal , 2 : above normal , 3 : well above normal |
Smoking |
Subjective Feature | smoke |
binary |
Alcohol intake |
Subjective Feature | alco |
binary |
Physical activity |
Subjective Feature | active |
binary |
Presence or absence of cardiovascular disease |
Target Variable | cardio |
binary |
All of the dataset values were collected at the moment of medical examination.
The goal of the project is to analyze Cardiovascular Disease dataset to find which factors are related to the heart diseases.
numpy
, pandas
, matplotlib
, seaborn
, sklearn
, lightgbm
, catboost
, pytorch