Medical Insurance Cost Prediction

PROJECT OVERVIEW

Health insurance costs have risen dramatically over the past decade in response to the rising cost of health care services and are determined by a multitude of factors. Let's look at the cost of healthcare for a sample of the population given age, sex, bmi, number of children, smoking habits, and region.

The purpose of this project is to determine the contributing factors and predict health insurance cost by performing exploratory data analysis and predictive modeling on the Health Insurance dataset. This project makes use of Numpy, Pandas, Sci-kit learn, and Data Visualization libraries.

Overview:
• Seek insight from the dataset with Exploratory Data Analysis
• Performed Data Processing, Data Engineering and Feature Transformation to prepare data before modeling
• Built a model to predict Insurance Cost based on the features
• Evaluated the model using various Performance Metrics like RMSE, R2, Testing Accuracy, Training Accuracy and MAE

DATA DESCRIPTION

age: age of primary beneficiary
sex: insurance contractor gender, female, male
bmi: Body mass index, providing an understanding of body, weights that are relatively high or low relative to height, objective index of body weight (kg / m ^ 2) using the ratio of height to weight, ideally 18.5 to 24.9
children: Number of children covered by health insurance / Number of dependents
smoker: Smoking
region: the beneficiary's residential area in the US, northeast, southeast, southwest, northwest
charges: Individual medical costs billed by health insurance

Data source : https://www.kaggle.com/mirichoi0218/insurance

EXPLORATORY DATA ANALYSIS (EDA)

• Feature sex, region has an almost balanced amount, meanwhile most people are non smoker & obese

• A person who smoke and have BMI above 30 tends to have a higher medical cost

• Older people who smoke have more expensive charges

• People who smoke and obese have the highest average charges compared to others

INSIGHTS

The insights drawn by performing Exploratory Data Analysis (EDA) are:

Most people are a non smokers & obese.
Feature sex, region has an almost balanced amount.
People who smoke & have a higher BMI, has higher medical charges.
Older people who smoke have more expensive charges.
An obese person who smokes have higher charges.

DATA PROCESSING

Check missing value - there are none
Check duplicate value - there are 1 duplicate, will be remove
Feature engineering - make a new column weight_status based on BMI score
Feature transformation:
A) Encoding sex, region, & weight_status attributes
B) Ordinal encoding smoker attribute
Modeling:
A) Separating target & features
B) Splitting train & test data
C) Modeling using Linear Regression, Random Forest, Decision Tree, Ridge, & Lasso algorithm
D) Find the best algorithm
E) Tuning Hyperparameter

MODEL EVALUATION

Score	LinearRegression	DecisionTree	RandomForest	Ridge
R2	0.77	0.78	0.78	0.86
Train Accuracy	0.74	1.0	0.97	0.74
MAE	4305.20	2798.83	2608.55	4311.10
Test Accuracy	0.77	0.78	0.86	0.77
RMSE	6209.88	6067.50	4841.88	6238.13

CONCLUSION

Based on the predictive modeling, Linear Regression algorithm has the best score compared to the others, with MAE Score 4305.20, RMSE Score 6209.88, & R2 Score 0.77.

Therefore, Linear Regression algorithm is the best fitted model based on the training and testing accuracy.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Images		Images
Reports		Reports
Tableau Dashboard		Tableau Dashboard
data		data
.DS_Store		.DS_Store
.gitattributes		.gitattributes
1_Clustering_and_Regression_Analysis.ipynb		1_Clustering_and_Regression_Analysis.ipynb
2_EDA_Profiling.ipynb		2_EDA_Profiling.ipynb
3_Outlier Detection Techniques.ipynb		3_Outlier Detection Techniques.ipynb
4_Exploratory_Data_Analysis.ipynb		4_Exploratory_Data_Analysis.ipynb
5_Simple Linear Regression.ipynb		5_Simple Linear Regression.ipynb
6_Health Insurance Cost Analysis and Prediction.ipynb		6_Health Insurance Cost Analysis and Prediction.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Medical Insurance Cost Prediction

PROJECT OVERVIEW

DATA DESCRIPTION

EXPLORATORY DATA ANALYSIS (EDA)

INSIGHTS

DATA PROCESSING

MODEL EVALUATION

CONCLUSION

About

Releases

Packages

Contributors 2

Languages

adiag321/Medical-Insurance-Cost-Prediction

Folders and files

Latest commit

History

Repository files navigation

Medical Insurance Cost Prediction

PROJECT OVERVIEW

DATA DESCRIPTION

EXPLORATORY DATA ANALYSIS (EDA)

INSIGHTS

DATA PROCESSING

MODEL EVALUATION

CONCLUSION

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages