Based on the Coursera Loan Prediction Challenge, this project covers the development of multiple classification models that predict whether a borrower will ultimately default on their loan. Utilizing Python libraries including pandas, matplotlib, seaborn, and sklearn, a random forest model with accuracy and F1 scores over 0.93 is proposed at the optimal model. Complete with a discussion of the surrounding ethical issues, the report offers an robust account of the benefits and challenges of basing loan decisions on ultimately impersonal data.
- Version 1: Released on September 30th, 2023, it covered the development of multiple models, ultimately suggesting the use of either an XGBoost or Random Forest model.
- Version 2: Released on November 14th, 2023, it includes the addition of a heatmap exploring feature correlation, a discussion of the importance of the features for the XGBoost and random forest models, and an ethical concerns section. Other improvements and small bug fixes will also be added (exact details pending development).
The final report--Loan_Default_Full_Report.ipynb--is currently marked as complete and no further work is scheduled. That said, I will fix any bugs and errors should I find them.
I make no claim to the data used here. The data originates from Coursera and was sourced from Kaggle.
Please feel free to reference or make use of any of the code I have available in this repository. If you are using any of non-code parts, however, I request that you provide credit (ideally a link to this page).