Skip to content

A predictive analysis project uses three supervised and one unsupervised learning model on a credit score dataset to assess loan risk. With details like ID, gender,income,etc, I added a custom "status" column as the target variable for learning purposes. Accuracy is not great and just for learning purpose

Notifications You must be signed in to change notification settings

amarrr11/Predicting_accuracy-CreditCardDataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Predictive Analysis on Credit Score Dataset

This project applies various machine learning techniques to a credit score dataset to predict loan risk levels for customers. By exploring both supervised and unsupervised learning models, the project demonstrates and compares the effectiveness of multiple algorithms, providing a hands-on learning experience.


Project Overview

The primary goal of this project was to build, train, and test predictive models on a Kaggle credit score dataset. The dataset includes customer attributes like ID, gender, income, and education as predictor variables. Since a target variable was not provided, I created a new column STATUS with random binary values (0 and 1), representing "low risk" and "high risk" categories. While random assignment does not yield precise real-world predictions, it enables practical model testing and comparisons.

Steps in the Project

  1. Data Preprocessing and Cleaning

    • Loaded the dataset and removed columns with minimal impact on prediction, like ID, CODE_GENDER, and FLAG_EMAIL.
    • Converted DAYS_BIRTH and DAYS_EMPLOYED to years and took absolute values to ensure positivity.
    • Transformed categorical columns (e.g., FLAG_OWN_CAR, NAME_INCOME_TYPE) into factors for compatibility.
  2. Data Normalization and Encoding

    • Applied normalization to numeric columns for consistent scaling.
    • Encoded categorical columns numerically, converting factor levels to integers to work with all models.
  3. Train-Test Split

    • Split data into 70% training and 30% testing sets. This partition provides an unbiased accuracy estimate by evaluating models on unseen data.
  4. Modeling

    • K-Nearest Neighbors (KNN): Used KNN with k=10 as a baseline model to predict loan risk, assessing accuracy by comparing predictions against actual labels.
    • Naive Bayes: Implemented Naive Bayes (using the e1071 library), a probabilistic approach that assumes independence among predictors.
    • Decision Tree: Trained a decision tree classifier to predict loan risk, visualized with rpart.plot to observe decision paths.
    • K-Means Clustering (Unsupervised): Applied K-means clustering with two clusters to explore patterns in the data, visualizing cluster centers and groupings.
  5. Accuracy Comparisons

    • Calculated and compared accuracies for KNN, Naive Bayes, and Decision Tree models.
    • Created a bar chart with ggplot2 to compare model accuracies visually.

Visualizations

  • Decision Tree: Visualized to display classification rules.
  • K-Means Clustering Plot: Displays data points grouped by clusters, showing natural groupings.
  • Accuracy Comparison Chart: Highlights accuracy for each model to identify the best-performing algorithm.

Libraries and Tools Used

  • Data Manipulation: dplyr, lapply, as.factor
  • Modeling: class (KNN), e1071 (Naive Bayes), rpart (Decision Tree), kmeans
  • Visualization: ggplot2 (bar chart), rpart.plot (decision tree visualization)

Conclusion

This project demonstrates a comprehensive approach to predictive analysis using multiple machine learning models. Although the dataset lacks a genuine target variable, the random target column enables an educational exploration of various classification techniques.

About

A predictive analysis project uses three supervised and one unsupervised learning model on a credit score dataset to assess loan risk. With details like ID, gender,income,etc, I added a custom "status" column as the target variable for learning purposes. Accuracy is not great and just for learning purpose

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages