Credit Risk Analysis Mini Project

Overview

This project aims to analyze credit risk using a dataset containing various customer attributes. The goal is to predict whether a person has good or bad credit risk based on their characteristics, employing logistic regression for classification.

Results and Conclusion

This model serves as a valuable tool for banks, enabling them to efficiently assess potential credit risks based on readily available customer information. By identifying high-risk customers early in the process, banks can save time and resources before conducting more detailed evaluations. The model achieved an accuracy of approximately 78% on the test set, with a confusion matrix revealing the performance across different classes of credit risk. This project provides a foundational approach to credit risk analysis using logistic regression. Future improvements could include experimenting with more advanced models and feature engineering techniques.

Data Description

Two datasets are utilized in this project:

credit_s.csv: Contains customer attributes but lacks a target variable for credit risk.
- Features include:
  - Age (numeric)
  - Sex (text): male, female
  - Job (numeric): 0 - unskilled and non-resident, 1 - unskilled and resident, 2 - skilled, 3 - highly skilled
  - Housing (text): own, rent, or free
  - Saving (text): little, moderate, quite rich, rich
  - Checking (text): little, moderate, rich
  - Credit_amount (numeric, in DM)
  - Duration (numeric, in months)
  - Purpose (text): various categories
credit_g.csv: Contains customer attributes along with the target variable for credit risk.
- Key columns after renaming:
  - status: Credit risk status (1 for good, 2 for bad)
  - Duration: Duration of the credit
  - Credit_amount: Amount of credit
  - Credit_risk: Target variable

Data Visualization

The following visualizations are created to understand the dataset better:

Pie chart for Credit_risk
Histogram for Age
Boxplot comparing Age distributions by Credit_risk
Bar plot comparing Credit_risk across different Sex groups
Scatterplot showing the relationship between Age and Credit_amount colored by Credit_risk

Data Management

Summary statistics for key features are calculated.
Missing values are handled by encoding them as a new category: "Unknown."
Categorical variables are converted to dummy variables.

Model Training

A logistic regression model is trained using the processed dataset:

The data is split into training and testing sets, reserving 20% for testing.

X_train, X_test, y_train, y_test = train_test_split(features_standardized, target, random_state=99,test_size=0.2 )

Cross-validation is used to find the optimal model parameters.

logistic_regression = LogisticRegressionCV(cv=5,solver='lbfgs', 
                                    multi_class='multinomial', penalty='l2', 
                                           Cs=20, random_state=100, n_jobs=-1)

Installation

To run this project, you will need Python and the following libraries:

pandas
numpy
matplotlib
seaborn
plotly
scikit-learn

Usage

Clone this repository to your local machine.
Update the file paths in the code to point to your datasets (credit_s.csv and credit_g.csv).
Run the Python script to execute the analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
Credit-Risk-Analysis-Mini-Project.ipynb		Credit-Risk-Analysis-Mini-Project.ipynb
Graph1.jpg		Graph1.jpg
Graph2.jpg		Graph2.jpg
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Credit Risk Analysis Mini Project

Overview

Results and Conclusion

Table of Contents

Data Description

Data Visualization

Data Management

Model Training

Installation

Usage

About

Releases

Packages

Languages

RoryQo/Credit-Risk-Assesment-Mini-Project

Folders and files

Latest commit

History

Repository files navigation

Credit Risk Analysis Mini Project

Overview

Results and Conclusion

Table of Contents

Data Description

Data Visualization

Data Management

Model Training

Installation

Usage

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages