Identify these risky loan applicants, or the drivers that lead to Risky loan applicatins so that such loans can be reduced thereby cutting down the amount of credit loss. Identification of such applicants using EDA is the aim of this case study. By doing this excercise we wil understand the concepts of Univariate and Bivariate analysis on basics of data model
- [General Info]
- [Technologies Used]
- [Conclusions]
- [Acknowledgements]
- Provide general information about your project here.
- A finance compay provides loan to various users but many are un able to pay resulting in financial loss?
- Objective of this study is to identify the Variables that constitute and add to those causes which result in charge off or defaulters of loan?
- For doing that analysis , Initially we have included a filtered column data for identifying the trend within the Charge Off.
- Post that we have included the entire data for all loan status (Charge Off, Current , and Fully Paid)
- Removal of rows for current loan status was not adding or changing any result set
• Candidates with high loan amount are more likely to charge off
• Grades E,F,G have higher tendency to charge off
• Grade F , G , E have high interest rates as compared to others
• Grade E,F,G have higher Debt to income ratio
• Candidates with high income are less likely to default
• Candidates with high interest rates are likely to default
• Debt to Income ratio (DTI) Is better for Fully paid candidates
• Candidates having less Employee Length(exp) are more likely to not pay the loan
• Charge Off are higher for high installments
• Customers who are on RENT or who are non verified are more likely to default and hence High risk customers
• Candidates are wanting loan more Purpose of home, small business and they take it for less term
• Installments for credit card , debt consolidation , small_business and house are more with regards to others
• Candidates who have taken loan for purpose of small business are more likely to default along with Debt consolidation
• Most percentage of defaulters is in range of less than 10k with rent or mortgages
• Maximum charge off are from California that means better checks in CA state and FL or NY
• DTI is negatively correlated with annual inc or loan amount and recoveries is also very weakly correlated with most vars not giving much insight
• Interest rate vs loan amount gives that higher median of loan amount for higher interest rates
-- Hence loand grade,Home ownership,Interest rate,Purpose,Address state, Employee income are some variables to identify High risk customers
- Anaconda , Python
- numpy - version 1.24.3
- pandas - version 2.1.1
- seaborn - version 0.12.2
- matplotlib - version 3.7.1
Give credit here.
- This project was together done by Syed Z abbas ans Naz Akbar..
- For tackling few problems and understand seaborn,matlibplt below links were followed [https://seaborn.pydata.org/api.html , https://matplotlib.org/stable/tutorials/index.html, upgrad tutorials]
Created by [@zuhair30] - feel free to contact me! on email as well [email protected]
Supported by Shagufta Naaz Shaikh - rechable at [email protected]
https://github.com/zuhair30/Lending_Club_Case_Study_SYEDandNAAZ
In case above link is not working or ppt or python file is not rendered due to git hub slowness or size issue please use below url that will dowload the above repo in zip format in your local
https://github.com/zuhair30/Lending_Club_Case_Study_SYEDandNAAZ/zipball/master