GitHub - khushijain03/credit_analysis: Files associated with Credit_Analysis Project

Credit_analysis

Overview

This project analyzes loan application data to predict potential loan defaulters. By leveraging machine learning models and statistical analysis, we provide insights to help banks and financial institutions make informed loan decisions.

Problem Definition

Loan defaulter prediction is a binary classification problem. Borrowers are classified as defaulters or non-defaulters based on their financial history, loan details, and personal information.

The goal is to:

✅ Identify probable defaulters.
✅ Provide suggestions for informed decision-making.
✅ Compare features to identify loan applicants most in need.

Target Audience

Banks & Financial Institutions – To improve loan approval strategies.
Students – Seeking education loans.
Home & Vehicle Loan Borrowers – To understand loan approval probabilities.
Non-Profit Organizations – To target financial aid effectively.

Dataset

Used two datasets for this analysis:

current_app.csv – Contains details of current loan applications, including repayment status.
previous_app.csv – Holds information on past loan applications (approved, rejected, canceled). Since these files exceeded GitHub's upload limits, they have been compressed.

Exploratory Data Analysis (EDA)

EDA was performed to:
✅ Discover patterns.
✅ Spot anomalies.
✅ Test hypotheses.
✅ Visualize key features.

Key Insights:

📌 Loan Type: More people opt for cash loans than revolving loans.
📌 Marital Status: Married individuals are the most frequent loan applicants.
📌 Property Ownership: Homeowners are more likely to take loans (collateral availability).
📌 Gender-Based Defaulting: Single men default more often than single women.
📌 Education Impact: People with academic degrees take higher loan amounts.

Machine Learning Models Used

Implemented and compared multiple models for predicting loan defaults:

Decision Tree: Models decision paths to classify defaulters.
Naive Bayes: Uses probability-based classification.
K-Nearest Neighbors (KNN): Classifies based on nearest data points.
Feature transformations were applied before model fitting to enhance accuracy.

How to Run the Project

Extract the Data

Before using the data, extract the files inside the data folder:

cd data
unzip current_app.csv.zip
unzip previous_app.csv.zip

Clone the Repository

git clone https://github.com/khushijain03/credit_analysis.git
cd credit_analysis

Install Dependencies (R Libraries)

install.packages(c("dplyr", "ggplot2", "tidyverse", "data.table"))

Run the Analysis

source("src/analysis_script.R")

Contributing

I welcome contributions! If you find any improvements, feel free to open an issue or a pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.DS_Store		.DS_Store
.gitignore		.gitignore
DA.R		DA.R
Data.Rproj		Data.Rproj
LICENSE		LICENSE
README.md		README.md
current_app.csv.zip		current_app.csv.zip
previous_app.csv.zip		previous_app.csv.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Credit_analysis

Overview

Problem Definition

The goal is to:

Target Audience

Dataset

Exploratory Data Analysis (EDA)

Key Insights:

Machine Learning Models Used

How to Run the Project

Extract the Data

Contributing

About

Releases

Packages

Languages

License

khushijain03/credit_analysis

Folders and files

Latest commit

History

Repository files navigation

Credit_analysis

Overview

Problem Definition

The goal is to:

Target Audience

Dataset

Exploratory Data Analysis (EDA)

Key Insights:

Machine Learning Models Used

How to Run the Project

Extract the Data

Contributing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages