This repository contains code for the Walmart Sales Forecasting project. The project aims to forecast weekly sales for 45 Walmart stores located in different regions. The data includes historical sales data, holiday events, and store information. Project contains framework to run and test multiple models in a spark environment. It also contains code to build spark docker image and run the spark container locally if needed.
-
Clone the repository:
git clone https://github.com/selewaut/forecast_forge.git cd forecast_forge
-
Create and activate a virtual environment:
python3 -m venv .venv source .venv/bin/activate
-
Install the required dependencies:
pip install -r requirements.txt
-
Install forecast_forge package
pip install -e .
-
Install openjdk on local machine
sudo apt-get update sudo apt-get install openjdk-8-jdk
The data is stored in the data/
directory. The data is stored in the following files:
train.csv
: historical sales data for 45 Walmart storestest.csv
: test data for forecastingfeatures.csv
: additional data related to the stores and regional activitystores.csv
: store information
Data is originally downloaded using the following kaggle competition: https://www.kaggle.com/c/walmart-recruiting-store-sales-forecasting/data.
For downloading data from kaggle, you need to have a kaggle account and kaggle API key. You can download the data using the following command:
- Install kaggle package
pip install kaggle
- Generate API key from kaggle account and save it in
~/.kaggle/kaggle.json
-
Move to directory containgin spark dockerfile.
cd spark-setup
-
Start container.
make run
This will start a spark container with the code mounted in the container. The container will be running in the background.
-
Run
univariate_weekly.py
script to train and test univariate weekly sales forecasting models.spark-submit --master local[*] src/forecast_forge/univariate_weekly.py
-
Results are saved in parquet format in evaluation_output path.