S-Mart Sales Predictor

John Clos, Uchenna Nwagbara, Sharon Colson, Jack Cohen

https://s-martpredictor.herokuapp.com/

December 2, 2021

Background

For S-Mart and other retailers, setting the right price for a given product is critical for maximizing sales and profits. If the price is too low, profit per unit will be too low to maintain. If the price is too high, the units sold will be too low to maintain and product will be wasted sitting on the shelves. To maximize profits, the business needs to find the optimum price.

Using S-Mart's sales data, our team has trained a machine learning model that will predict the units sold and calculate the estimated revenue for a given scenario. S-mart stores can use this sales prediction tool to ensure that they are properly stocked and staffed for the coming week's sales.

Important note: This page is for learning purposes only and is not meant to be reflective of any actual business. The data is not actual data. However, if you are interested in our work please see our about page for contact information.

Webpage: https://s-martpredictor.herokuapp.com/

Exploratory Data Analysis

File: eda.ipynb

The first step is to acquire a general understanding of the trends and patterns by exploring the dataset. Plotting the data in various ways using the tools outlined led to the following observations.

Tools

Jupyter Notebook
Python Pandas
Python Matplotlib
Seaborn

Observations

Empirical Cummulative Distribution Function (eCDF)
- Although in the best week, a store sold more than 2500 units, about 80% of the time, weekly units sold did not exceed 500.
- Although the highest weekly sales exceeded 25K dollars, over 90% of the data had weekly sales less than 5K dollars.
Price Is Not Dependent On Holiday
Units Sold IS Dependent On Holiday
- Product 2 is the cheapest product among all the three products, and it sells the most.
- Product 3 is the most expensive product among all the three products.
- Additionally, product price did not change during holidays (either it was on promotion or it was not, promotion is independent of Holiday status.)
Units Sold: Holiday vs Non-Holiday per Store
Product Sold: Holiday vs Non-Holiday
- It does not seem that holidays have a positive impact for the business. For most of the stores, weekly units sold during the holiday is the same as the normal days, while store 10 actually had a decrease during the holidays.
- Weekly units sold for product 1 had a slight increase during the holidays, while product 2 and product 3 had a decrease during the holidays.
Product Units Sold Based on Price and Holiday
- Every product has more than one price, both at holidays and normal days. The assumption is that one is regular price, another is promotional price.
- The price gap for product 3 is huge, it was slashed to almost 50% off during promotions.
- Product 3 made the most sales during non-holidays.
Product In Each Store vs Units Sold and Price
- All of these 9 stores carry these 3 products. They all seem to have similar kinds of discount promotions. However, product 3 sells the most units during promotions at store 10.
Yearly Seasonality per Store
- Every store has somewhat seasonality, store 10 has the most obvious seasonal pattern.
Seasonality per Product
- Every product has somewhat seasonality, product 2 has two peak seasons per year and product 3 has one.
Seasonality per Product per Store in Units Sold
- In general, product 2 sells more units per week than the other products in every store.
- Once a while, product 3 would exceed product 2 at store 10.
Holiday and Price Effect On Sales
- The cheaper the price, the more weekly units were sold.
- Is holiday or not has nothing to do with the unit sold.
Units Sold vs Promotion per Store
- Every store sells more during the promotions, there is no exception.
Units Sold per Product While On or Off Promotion
- Every product sells more during the promotions, in particular, product 2 and product 3.
Distribution of Price and Promotion
- All the stores have the similar price promotion pattern, for some reason, store 10 sells the most during the promotions.
Price Change While On and Off Promotion and the Change in Sales
- Every product has the regular price and promotional price. Product 3 has the highest discount and sells the most during the promotions.
Observation Summary:
- Store 10 has the highest average weekly sales among all 9 stores, also Store 10 has the most total weekly units sold.
- Store 5 has the lowest average weekly sales.
- The data is 429 weeks beginning 2/5/2010 and ending 10/26/2012. This is 143 weeks of data for 9 stores and 3 products.
- The data is evenly distributed. No gaps or staggered start and stop dates.
- The most selling and crowded Store is Store 10, and the least crowded store is Store 5.
- In terms of number of units sold, the most selling product is product 2 throughout the year.
- Stores do not necessarily run product promotions during holidays.
- Product 2 seems to be the cheapest product, and Product 3 is the most expensive product.
- Most stores have some kind of seasonality and they have two peak seasons per year.
- Product 1 sells a little more in February than the other months, Product 2 sells the most around April and July, and Product 3 sells the most around July to September.
- Each product has its regular price and promotional price. There isn’t significant gap between regular price and promotional price on Product 1 and Product 2, however, Product 3’s promotional price can be slashed to 50% of its original price. Although every store makes this kind of price cut for product 3, store 10 is the one made the highest sales during the price cut.
- It is not unusual to sell more during promotion than the normal days. Store 10 has made Product 3 the best selling product around July to September.

Model Exploration

File: model.ipynb

Tools

Scikit-learn
Cross Validation

After completing our EDA we chose to run regression to predict the numerical value Weekly Units Sold. We first encoded the data using One-Hot Encoding, then split the data for training and testing, and scaled it. We tested the data on 14 different models before finalizing our model as the Gradient Boost Regressor which gave the highest accuracy by using decision stumps to boost weak features.

Model Hyper Tuning

File: hypertuning.ipynb

Tools

Scikit-learn
GradientBoostingRegressor
GridSearchCV

After exploring potential models, we then began hypertuning the parameters incrementaly to increase accuracy score.

The following tools and methods were used for this process.

Method

Create Gradient Boosting Regression
- Train, Test, Split the data
- Scale the data
- Calculating Cross Validation Score across multiple testing sets
- Classifications use Accuracy and F1 Score
- Regressions use R2 Score and Mean Absolute Error (MAE)
- Create a model using Gradient Boosting Regression
Feature Importance on the Model
- Create a plot of the features
- Generate a cross validation score
Hypertuning the Model
- Set the parameters
- Tune the model using GridSearchCV
- Generate predictions
- Generate r-squared and validate
Final Hypertuned Model
- Create a model using the optimized values
- Train, Test, Split the data
- Scale the data
- Calculating Cross Validation Score across multiple testing sets
- Classifications use Accuracy and F1 Score
- Regressions use R2 Score and Mean Absolute Error (MAE)
- X_test_scaled['Weekly_Units_Sold'] = pred
Feature Importance on the Hypertuned Model
- Plot the features for the optimized model
Export Model
- Save the model and scaler for deployment

Model Deployment

File: app.py

Webpage: https://s-martpredictor.herokuapp.com/

The prediction tool was deployed on a Heroku webpage through a multi-route Flask application. The following tools and methods were used.

Tools

Heroku
Flask
Pickle Joblib
Pandas
SQLite
HTML/CSS/JavaScript
Bootstrap
Font-Awesome
D3
Jinja

Method

Import the model and scaler
Create the webpage's application routes
- Home Page
- Predictor
- Data Page
- EDA Page
- Model Page
- About Page
- Error Handler Page
Create POST method route
- Fetches information from webpage inputs
- Formats inputs into dataframe
- Feeds dataframe into scaler and model
- Returns output back to the webpage
Use JavaScript D3 for event handling

Contacts:

Sharon Colson:

Jack Cohen:

Uchenna Nwagbara:

John Clos:

Name		Name	Last commit message	Last commit date
Latest commit History 294 Commits
.vscode		.vscode
images		images
node_modules		node_modules
static		static
templates		templates
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
Procfile		Procfile
README.md		README.md
S-Mart Sales Predictor.pdf		S-Mart Sales Predictor.pdf
app.py		app.py
package-lock.json		package-lock.json
package.json		package.json
requirements.txt		requirements.txt

License

jackatopolis/s-martpredictor

Folders and files

Latest commit

History

Repository files navigation

S-Mart Sales Predictor

Background

Exploratory Data Analysis

Tools

Observations

Empirical Cummulative Distribution Function (eCDF)

Price Is Not Dependent On Holiday

Units Sold IS Dependent On Holiday

Units Sold: Holiday vs Non-Holiday per Store

Product Sold: Holiday vs Non-Holiday

Product Units Sold Based on Price and Holiday

Product In Each Store vs Units Sold and Price

Yearly Seasonality per Store

Seasonality per Product

Seasonality per Product per Store in Units Sold

Holiday and Price Effect On Sales

Units Sold vs Promotion per Store

Units Sold per Product While On or Off Promotion

Distribution of Price and Promotion

Price Change While On and Off Promotion and the Change in Sales

Observation Summary:

Model Exploration

Tools

Model Hyper Tuning

Tools

Method

Create Gradient Boosting Regression

Feature Importance on the Model

Hypertuning the Model

Final Hypertuned Model

Feature Importance on the Hypertuned Model

Export Model

Model Deployment

Tools

Method

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages