Imported data into R. Set the census benchmark for 50k and performed binary classification. Used Lasso Regression from the boruta package and random forests to predict almost 8% of the missing values. Used SMOTE to reduce the class imbalance. Build random forests, naive bayes and decision tree models for the prediction. the best accuracy was around 81% with AUC around 84% Used R Shiny to better visualize the analysis.