The city of Chicago is home to nearly 3 million people, and it is currently the third most populous city in the US. Furthermore, its Cook County is the second most populous county in the country. Owing to this massive population, there are a range of transport options in the city. One of these is the city's Divvy Bike-sharing system, complete with hundreds of stations and thousands of bikes & scooters. It is currently operated by the ride-sharing company Lyft, and has been in existence for 9 years. With this many trips taking place every day for this long, this makes Divvy's historical trip data an attractive source of time-series data (at least for me :D), especially because the data is updated monthly.
How can we predict the number of trips that will start and end at various stations in the city each hour?
- Being able to anticipate spikes in activity will enable Divvy to allocate bikes and scooters more efficiently over time.
- This capabability could help the management to plan any possible changes in the scale of their services in a given area.
- Having models that predict customer activity in this way can provide a sense of confidence in managements understanding customer behaviour.
Build a complete end-to-end machine learning system that culminates in a simple frontend which provides the desired predictions in an interactive manner.
- ingests the available recent monthly usage data
- runs preprocessing procedures to produce time series data
- transforms the time series data into training data
- trains models (with selected architectures) to predict hourly arrivals and departures
- implements optional hyperparameter tuning during training
- logs the best model to CometML's model registry
- Provides code that allows for interaction with the Hopsorks Feature Store API.
- Backfills the Hopsworks feature store with time series data and predictions
- Delivers these predictions through a simple Streamlit frontend.
- Github actions are used to backfill the feature store with new predictions every hour.
A containerised version of the app is available here.
-
Clone the repository:
$ git clone https://github.com/maadabrandon/Hourly-Divvy-Trip-Predictor
-
Install Poetry
$ curl -sSL https://install.python-poetry.org | python3 -
-
Enter the project directory and run:
$ poetry install
-
Register free accounts on Hopsworks and CometML. Then copy your project names(for both platforms), API keys(again for both platforms), Comet workspace name, and email address into a .env file.
-
Backfill the Hopsworks feature groups with historical data:
$ make backfill-features
-
Run the training pipeline:
$ make train-all
-
Backfill the Hopsworks feature groups with predictions:
$ make backfill-predictions
-
View the frontend:
$ make frontend