- Pratteek Shaurya and Sai siva subramanian
This project involves analyzing and exploring a loan dataset to extract meaningful insights and prepare the data for further modeling or visualization. The data contains a variety of borrower-related, loan-related, and credit history information.
The dataset used in this project is loan.csv
. Some of the key fields include:
loan_amnt
: The amount of loan applied for.int_rate
: Interest rate for the loan.emp_length
: Length of employment in years.grade
: Loan grade assigned.loan_status
: The current status of the loan.
The data dictionary provides detailed information about each field.
- Clean the data by handling missing values, duplicates, and inconsistent formats.
- Perform exploratory data analysis (EDA) to identify trends and correlations.
- Prepare a refined dataset for further use in predictive modeling or business decision-making.
Key observations from the analysis:
- The majority of loans are graded as B and C.
- Most borrowers have employment lengths between 1 to 10 years.
- A significant portion of loans have a status of "Fully Paid."
- Python: Core programming language.
- Pandas: For data manipulation and cleaning.
- Matplotlib & Seaborn: For data visualization.
- Jupyter Notebook: For documenting and running the analysis interactively.