This project focuses on analyzing ride sharing app data to predict locations of high value.
Findings of this EDA could be found in eda_notebook. And as nbviewer notebook.
If you want to play with the data or the report you may install and set up the environment.
- Clone the repository.
- In the terminal run:
poetry install
- Done
I recommend using pyenv to set up correct Python version.
- Python >=3.8.1,<3.10.0
- Poetry
- Other requirements listed in pyproject.toml file
The data used in this project is in the form of a .csv file and contains the following columns:
- start_time: the time when the order was made
- start_lat: latitude of the location where the order was made
- start_lng: longitude of the location where the order was made
- end_lat: latitude of the destination point
- end_lng: longitude of the destination point
- ride_value: how much monetary value is in this particular ride
The availability of supply for ride sharing services depends on the duration of time it takes for the drivers to reach the customers. We want to attract drivers towards areas of the highest ride value. The purpose of this EDA is to determine if it is possible to predict areas of high ride value using only the data available.
The data is aggregated into clusters to allow prediction of demand based on location. ETNA library is used to conduct the forecasting tasks in this work.
Two different approaches for clustering and forecasting is used. The most promissing one is using:
- uber h3 for clusterization.
- catboost model for forecasting.
- Rearrange the regions manually
- Collect more data on small regions
- Try an ensemble of models
- Use
end_lat
,end_lng
columns