Skip to content

Latest commit

 

History

History
24 lines (17 loc) · 1.11 KB

README.md

File metadata and controls

24 lines (17 loc) · 1.11 KB

PolicyRenewals (work in progress)

Data preparation, EDA and modeling on an imbalanced dataset from: https://www.kaggle.com/arashnic/imbalanced-data-practice?select=aug_train.csv

In this exercise: Explore Deep Learning and Random Forests with the h2o ML package, Apply explainable AI (XAI) methods with DALEX package, Work with SMOTE and ROSE methods for upsampling and PCA, Further improve my data wrangling skills with dplyr, Perform automatic feature engineering, Fit, evaluate and compare caret models, Create more advanced ggplot2 graphics for EDA, Play around with tidyquant (excel-like functions, e.g. pivot tables),

Despite the repository name, we build a model to predict whether the policyholders (Health Insurance) from past year will also be interested in Vehicle Insurance provided by the company. [Kaggle]

Comments: Overall, this exercise would be more interesting, if there were more information on the variables, e.g. the regions, or the sales channels. In that case, some well informed and justified feature engineering could be performed. A similar point could be made when it comes to visual presentation.