This project analyzes loan application data to predict potential loan defaulters. By leveraging machine learning models and statistical analysis, we provide insights to help banks and financial institutions make informed loan decisions.
Loan defaulter prediction is a binary classification problem. Borrowers are classified as defaulters or non-defaulters based on their financial history, loan details, and personal information.
✅ Identify probable defaulters.
✅ Provide suggestions for informed decision-making.
✅ Compare features to identify loan applicants most in need.
- Banks & Financial Institutions – To improve loan approval strategies.
- Students – Seeking education loans.
- Home & Vehicle Loan Borrowers – To understand loan approval probabilities.
- Non-Profit Organizations – To target financial aid effectively.
Used two datasets for this analysis:
- current_app.csv – Contains details of current loan applications, including repayment status.
- previous_app.csv – Holds information on past loan applications (approved, rejected, canceled). Since these files exceeded GitHub's upload limits, they have been compressed.
EDA was performed to:
✅ Discover patterns.
✅ Spot anomalies.
✅ Test hypotheses.
✅ Visualize key features.
📌 Loan Type: More people opt for cash loans than revolving loans.
📌 Marital Status: Married individuals are the most frequent loan applicants.
📌 Property Ownership: Homeowners are more likely to take loans (collateral availability).
📌 Gender-Based Defaulting: Single men default more often than single women.
📌 Education Impact: People with academic degrees take higher loan amounts.
Implemented and compared multiple models for predicting loan defaults:
- Decision Tree: Models decision paths to classify defaulters.
- Naive Bayes: Uses probability-based classification.
- K-Nearest Neighbors (KNN): Classifies based on nearest data points.
- Feature transformations were applied before model fitting to enhance accuracy.
Before using the data, extract the files inside the data folder:
cd data
unzip current_app.csv.zip
unzip previous_app.csv.zip
- Clone the Repository
git clone https://github.com/khushijain03/credit_analysis.git
cd credit_analysis
- Install Dependencies (R Libraries)
install.packages(c("dplyr", "ggplot2", "tidyverse", "data.table"))
- Run the Analysis
source("src/analysis_script.R")
I welcome contributions! If you find any improvements, feel free to open an issue or a pull request.