- Project Overview
- Objectives
- Dataset Overview
- Methodology
- Insights and Recommendations
- Results
- Conclusions
- Acknowledgments
The Ola Driver Churn Analysis project examines driver retention on the Ola platform, a leading ride-hailing service in India. Frequent driver churn disrupts operational consistency and drives up costs related to recruiting, training, and onboarding. This project focuses on identifying drivers likely to churn and delivering actionable insights to improve Ola’s retention strategy.
The project’s main objectives include:
- Predicting which drivers are at risk of leaving.
- Profiling drivers to identify demographic, performance, or behavior patterns linked to churn.
- Enabling Data-Driven Retention by identifying priority drivers for Ola’s retention initiatives, helping to stabilize the driver base and reduce operational costs.
The dataset contains demographic, tenure, and performance-related details for Ola’s driver-partners, forming the basis for understanding patterns and training models to identify drivers at risk.
- Driver_ID: Unique identifier for each driver.
- Age, Gender, and City: Demographic details, essential for driver segmentation.
- Income: Monthly income level of drivers.
- Joining Date: Date of joining Ola.
- Last Working Date: Date of last engagement with Ola (null values indicate active drivers).
- Quarterly Rating: Driver performance rating (scale of 1-5).
- Total Business Value: Monthly revenue contribution, adjusted for cancellations or refunds.
The analysis approach follows a structured methodology to handle data, extract insights, and develop predictive models effectively.
Objective: Clean and prepare data to optimize model performance and accuracy.
- Missing Value Handling: Imputed missing values in
Last Working Date
andIncome
. - Encoding Categorical Variables: Transformed
Gender
andCity
fields to numerical representations. - Feature Engineering: Created derived features such as
Service Duration
andIncome-to-Business Ratio
. - Normalization: Scaled continuous variables like
Income
andTotal Business Value
to ensure equal contribution to the model.
Objective: Reveal trends, patterns, and key relationships in the data.
-
Income vs. Churn:
Scatter plot illustrating income levels among churned and retained drivers to highlight income impact on churn. -
City-wise Churn Distribution:
Churn rates visualized across cities to examine region-specific trends. -
Quarterly Ratings Distribution:
Distribution of driver ratings to analyze performance impact on churn. -
Correlation Heatmap:
Feature correlation matrix to assess relationships and potential multicollinearity.
- Logistic Regression: Baseline classifier for interpretability.
- Random Forest Classifier: Model using feature importance to capture non-linear patterns.
- XGBoost: Effective for imbalanced data and complex relationships.
- Accuracy: General prediction correctness.
- Precision & Recall: Identifying true churn cases (recall) while minimizing false positives (precision).
- ROC-AUC Score: Evaluates model's discrimination ability.
Feature importance analysis via Random Forest and XGBoost highlighted factors most relevant to churn prediction.
- Income: Lower-income drivers are more likely to churn.
- Service Duration: Shorter tenure correlates with churn.
- Quarterly Rating: Lower ratings increase churn likelihood.
- City: Regional differences affect churn rates.
- Income and Churn Correlation: Lower-income drivers are more prone to churn.
- City-Specific Patterns: Certain cities show higher churn rates, possibly due to competition or operational challenges.
- Performance-Linked Churn: Lower-rated drivers are likelier to churn, potentially due to reduced ride assignments or customer preference.
- Feature Correlations:
Service Duration
andQuarterly Rating
are correlated, suggesting longer-serving drivers receive better ratings.
- Incentivize Low-Income Drivers: Provide incentives to improve earnings and reduce churn.
- City-Specific Retention Strategies: Implement regional retention strategies in high-churn cities.
- Performance Improvement Programs: Offer training for lower-rated drivers to boost performance and satisfaction.
- Tenure-Based Rewards: Introduce rewards based on service duration to promote long-term retention.
- Driver Feedback: Regular feedback can reveal underlying churn factors for proactive improvements.
Model | Precision | Recall | F1 Score | Train ROC-AUC | Test ROC-AUC |
---|---|---|---|---|---|
Logistic Regression | 85.53% | 81.35% | 83.39% | 86.76% | 86.59% |
Random Forest | 88.39% | 83.79% | 86.03% | 91.41% | 88.48% |
XGBoost | 84.06% | 88.69% | 86.31% | 91.66% | 89.38% |
- Logistic Regression: Solid baseline model.
- Random Forest: Higher precision and F1 score, good generalization.
- XGBoost: Highest test ROC-AUC (89.38%) and recall (88.69%), optimal for churn identification.
- Confusion Matrix: Reflects true positives (290) and false positives (55).
- ROC Curve: AUC score of 0.89 highlights separation capability.
- Precision-Recall Curve: Shows high precision and recall for effective churn prediction.
- Reliable Churn Prediction: Model effectively identifies at-risk drivers.
- Insight-Driven Retention: Churn profiles help Ola focus on at-risk demographics and regions.
- Operational Improvements: Proactive retention reduces onboarding costs and supports consistent service.
Thanks to Ola for data access and project support. Special gratitude to Pandas, Matplotlib, Scikit-learn, and XGBoost contributors for their invaluable tools.