This project is an end-to-end machine learning system for predicting bank customer churn based on a set of key features such as age, tenure, active membership, and credit score. It uses a pipeline for data ingestion, validation, transformation, model training, and evaluation. The model is deployed via a Flask web application that allows for real-time predictions.
- Project Overview
- Features
- Technologies Used
- Installation
- Running the Application
- Usage
- Model Evaluation
This system predicts whether a customer will churn based on various features like:
- Age
- Tenure
- IsActiveMember status
- Gender
- Geography
- CreditScore
- NumOfProducts
The dataset is highly imbalanced, so SMOTE (Synthetic Minority Over-sampling Technique) was applied to handle the imbalance. Multiple machine learning algorithms were tested, and Gradient Boosting Classifier was found to be the best-performing model based on accuracy, precision, recall, and F1-score.
The project is structured as a modular pipeline and includes data ingestion, validation, feature engineering, model training, and evaluation.
- Data Ingestion: Handles reading and loading data for training and testing.
- Data Validation: Ensures data quality and consistency.
- Data Transformation: Preprocessing and feature engineering using techniques like scaling, encoding, etc.
- Model Training: A range of models are tested, and Gradient Boosting Classifier is used for final predictions.
- Model Evaluation: Metrics like accuracy, precision, recall, and F1-score are calculated.
- Flask Web App: Provides two URLs: one for the default route and another for making predictions.
- Python (for building the pipeline and machine learning models)
- Flask (for deploying the model as a web app)
- MLflow (for tracking model metrics and artifacts)
- Pandas (for data manipulation)
- Scikit-learn (for model building and evaluation)
- SMOTE (for handling imbalanced data)
- Conda (for environment management)
-
Clone the repository:
git clone https://github.com/Debopam-Pritam2014/customer-churn.git cd customer-churn
-
Create and activate the Conda environment:
conda create --name churn_prediction python=3.10 conda activate churn_prediction
-
Install the required dependencies:
pip install -r requirements.txt
-
Run the Flask app:
python app.py
-
The app will start on
http://127.0.0.1:5000/
.
- Default route (
/
): For testing the app. - Prediction route (
/predict
): Accepts input features for making predictions.
Once the application is running, you can access the following:
- Default route:
http://127.0.0.1:5000/
- Prediction route:
http://127.0.0.1:5000/predict
You can use the /predict
route to input features like age
, tenure
, creditscore
, etc., and get back whether the customer is predicted to churn or not.
The model was evaluated using various metrics:
- Accuracy : 86%
- Precision : 66%
- Recall : 62%
- F1-score : 60%
The Gradient Boosting Classifier was found to be the best-performing model for predicting churn.