John Clos, Uchenna Nwagbara, Sharon Colson, Jack Cohen
https://s-martpredictor.herokuapp.com/
December 2, 2021
For S-Mart and other retailers, setting the right price for a given product is critical for maximizing sales and profits. If the price is too low, profit per unit will be too low to maintain. If the price is too high, the units sold will be too low to maintain and product will be wasted sitting on the shelves. To maximize profits, the business needs to find the optimum price.
Using S-Mart's sales data, our team has trained a machine learning model that will predict the units sold and calculate the estimated revenue for a given scenario. S-mart stores can use this sales prediction tool to ensure that they are properly stocked and staffed for the coming week's sales.
Important note: This page is for learning purposes only and is not meant to be reflective of any actual business. The data is not actual data. However, if you are interested in our work please see our about page for contact information.
Webpage: https://s-martpredictor.herokuapp.com/
File: eda.ipynb
The first step is to acquire a general understanding of the trends and patterns by exploring the dataset. Plotting the data in various ways using the tools outlined led to the following observations.
- Jupyter Notebook
- Python Pandas
- Python Matplotlib
- Seaborn
-
- Although in the best week, a store sold more than 2500 units, about 80% of the time, weekly units sold did not exceed 500.
- Although the highest weekly sales exceeded 25K dollars, over 90% of the data had weekly sales less than 5K dollars.
-
-
- Product 2 is the cheapest product among all the three products, and it sells the most.
- Product 3 is the most expensive product among all the three products.
- Additionally, product price did not change during holidays (either it was on promotion or it was not, promotion is independent of Holiday status.)
-
-
- It does not seem that holidays have a positive impact for the business. For most of the stores, weekly units sold during the holiday is the same as the normal days, while store 10 actually had a decrease during the holidays.
- Weekly units sold for product 1 had a slight increase during the holidays, while product 2 and product 3 had a decrease during the holidays.
-
- Every product has more than one price, both at holidays and normal days. The assumption is that one is regular price, another is promotional price.
- The price gap for product 3 is huge, it was slashed to almost 50% off during promotions.
- Product 3 made the most sales during non-holidays.
-
- All of these 9 stores carry these 3 products. They all seem to have similar kinds of discount promotions. However, product 3 sells the most units during promotions at store 10.
-
- Every store has somewhat seasonality, store 10 has the most obvious seasonal pattern.
-
- Every product has somewhat seasonality, product 2 has two peak seasons per year and product 3 has one.
-
- In general, product 2 sells more units per week than the other products in every store.
- Once a while, product 3 would exceed product 2 at store 10.
-
- The cheaper the price, the more weekly units were sold.
- Is holiday or not has nothing to do with the unit sold.
-
- Every store sells more during the promotions, there is no exception.
-
- Every product sells more during the promotions, in particular, product 2 and product 3.
-
- All the stores have the similar price promotion pattern, for some reason, store 10 sells the most during the promotions.
-
- Every product has the regular price and promotional price. Product 3 has the highest discount and sells the most during the promotions.
-
- Store 10 has the highest average weekly sales among all 9 stores, also Store 10 has the most total weekly units sold.
- Store 5 has the lowest average weekly sales.
- The data is 429 weeks beginning 2/5/2010 and ending 10/26/2012. This is 143 weeks of data for 9 stores and 3 products.
- The data is evenly distributed. No gaps or staggered start and stop dates.
- The most selling and crowded Store is Store 10, and the least crowded store is Store 5.
- In terms of number of units sold, the most selling product is product 2 throughout the year.
- Stores do not necessarily run product promotions during holidays.
- Product 2 seems to be the cheapest product, and Product 3 is the most expensive product.
- Most stores have some kind of seasonality and they have two peak seasons per year.
- Product 1 sells a little more in February than the other months, Product 2 sells the most around April and July, and Product 3 sells the most around July to September.
- Each product has its regular price and promotional price. There isn’t significant gap between regular price and promotional price on Product 1 and Product 2, however, Product 3’s promotional price can be slashed to 50% of its original price. Although every store makes this kind of price cut for product 3, store 10 is the one made the highest sales during the price cut.
- It is not unusual to sell more during promotion than the normal days. Store 10 has made Product 3 the best selling product around July to September.
File: model.ipynb
- Scikit-learn
- Cross Validation
After completing our EDA we chose to run regression to predict the numerical value Weekly Units Sold. We first encoded the data using One-Hot Encoding, then split the data for training and testing, and scaled it. We tested the data on 14 different models before finalizing our model as the Gradient Boost Regressor which gave the highest accuracy by using decision stumps to boost weak features.
File: hypertuning.ipynb
- Scikit-learn
- GradientBoostingRegressor
- GridSearchCV
After exploring potential models, we then began hypertuning the parameters incrementaly to increase accuracy score.
The following tools and methods were used for this process.
-
- Train, Test, Split the data
- Scale the data
- Calculating Cross Validation Score across multiple testing sets
- Classifications use Accuracy and F1 Score
- Regressions use R2 Score and Mean Absolute Error (MAE)
- Create a model using Gradient Boosting Regression
-
- Create a plot of the features
- Generate a cross validation score
-
- Set the parameters
- Tune the model using GridSearchCV
- Generate predictions
- Generate r-squared and validate
-
- Create a model using the optimized values
- Train, Test, Split the data
- Scale the data
- Calculating Cross Validation Score across multiple testing sets
- Classifications use Accuracy and F1 Score
- Regressions use R2 Score and Mean Absolute Error (MAE)
- X_test_scaled['Weekly_Units_Sold'] = pred
-
- Plot the features for the optimized model
-
- Save the model and scaler for deployment
File: app.py
Webpage: https://s-martpredictor.herokuapp.com/
The prediction tool was deployed on a Heroku webpage through a multi-route Flask application. The following tools and methods were used.
- Heroku
- Flask
- Pickle Joblib
- Pandas
- SQLite
- HTML/CSS/JavaScript
- Bootstrap
- Font-Awesome
- D3
- Jinja
- Import the model and scaler
- Create the webpage's application routes
- Home Page
- Predictor
- Data Page
- EDA Page
- Model Page
- About Page
- Error Handler Page
- Create POST method route
- Fetches information from webpage inputs
- Formats inputs into dataframe
- Feeds dataframe into scaler and model
- Returns output back to the webpage
- Use JavaScript D3 for event handling
Contacts:
Sharon Colson:
Jack Cohen:
Uchenna Nwagbara:
John Clos: