The objective of this project is to delve into Walmart Sales data, aiming to discern the top-performing branches and products, analyze the sales trends across various product categories, and understand customer behavior. The primary goal is to investigate opportunities for enhancing and optimizing sales strategies. The dataset utilized for this analysis was sourced from the Kaggle Walmart Sales Forecasting Competition.
The primary objective of this project is to acquire insights into Walmart's sales data, aiming to comprehend the various factors influencing the sales performance across its branches.
The dataset was obtained from the Kaggle Walmart Sales Forecasting Competition. This dataset contains sales transactions from a three different branches of Walmart, respectively located in Mandalay, Yangon and Naypyitaw. The data contains 17 columns and 1000 rows:
Column | Description | Data Type |
---|---|---|
invoice_id | Invoice of the sales made | VARCHAR(30) |
branch | Branch at which sales were made | VARCHAR(5) |
city | The location of the branch | VARCHAR(30) |
customer_type | The type of the customer | VARCHAR(30) |
gender | Gender of the customer making purchase | VARCHAR(10) |
product_line | Product line of the product solf | VARCHAR(100) |
unit_price | The price of each product | DECIMAL(10, 2) |
quantity | The amount of the product sold | INT |
VAT | The amount of tax on the purchase | FLOAT(6, 4) |
total | The total cost of the purchase | DECIMAL(10, 2) |
date | The date on which the purchase was made | DATE |
time | The time at which the purchase was made | TIMESTAMP |
payment_method | The total amount paid | DECIMAL(10, 2) |
cogs | Cost Of Goods sold | DECIMAL(10, 2) |
gross_margin_percentage | Gross margin percentage | FLOAT(11, 9) |
gross_income | Gross Income | DECIMAL(10, 2) |
rating | Rating | FLOAT(2, 1) |
- Product Analysis
Perform an in-depth analysis of the data to gain insights into various product lines, identifying top-performing ones and areas for enhancement.
- Sales Analysis
The objective of this analysis is to discern sales trends for products. The findings will enable us to evaluate the efficacy of each sales strategy employed by the business and determine necessary adjustments to enhance sales.
- Customer Analysis
This analysis seeks to unveil distinct customer segments, purchasing patterns, and the profitability associated with each segment.
- Data Wrangling: The initial stage involves examining the data to identify and handle NULL and missing values using appropriate replacement methods.
- Build a database
- Create table and insert the data.
- Select columns with null values in them. There are no null values in our database as in creating the tables, we set NOT NULL for each field, hence null values are filtered out.
- Feature Engineering: This step entails creating new columns derived from existing ones to provide additional insights.
A. Introduce a column named time_of_day to categorize sales into Morning, Afternoon, and Evening periods, facilitating analysis of peak sales times.
B. Incorporate a column named day_name to record the day of the week for each transaction (Mon, Tue, Wed, Thur, Fri), aiding in identifying the busiest days for each branch.
C. Integrate a column named month_name to indicate the month of each transaction (Jan, Feb, Mar), assisting in determining the most profitable months of the year.
- Exploratory Data Analysis (EDA): This phase involves investigating the data to address the project's objectives and questions.
- How many distinct cities are represented in the dataset?
- Which city corresponds to each branch?
- How many distinct product lines are included in the dataset?
- What is the predominant payment method?
- Which product line sells the most units?
- What is the total revenue generated each month?
- Which month recorded the highest Cost of Goods Sold (COGS)?
- Which product line generated the highest revenue?
- Which city generated the highest revenue?
- Which product line incurred the highest VAT (Value Added Tax)?
- Retrieve each product line and append a column indicating "Good" or "Bad" based on whether its sales exceed the average.
- Which branch exceeded the average number of products sold?
- What is the most prevalent product line based on gender?
- What is the average rating for each product line?
- Count of sales transactions per time of day, categorized by weekday.
- Which customer type contributes the most to revenue?
- Which city has the highest percentage of tax or VAT (Value Added Tax)?
- Which customer type contributes the most to VAT payments?
- How many distinct customer types are represented in the dataset?
- How many distinct payment methods are used by customers?
- What is the most common customer type?
- Which customer type makes the highest number of purchases?
- What is the predominant gender among customers?
- How is gender distributed across branches?
- During which time of day do customers provide the highest ratings?
- During which time of day, per branch, do customers provide the highest ratings?
- Which day of the week receives the highest average ratings?
- Which day of the week, per branch, receives the highest average ratings?
COGS = unitPrice * quantity
𝑉𝐴𝑇 = 5% * 𝐶𝑂𝐺𝑆
VAT is added to the COGS and this is what is billed to the customer.
Total (Gross Sales) = COGS + VAT
Gross Profit = Total(Gross Sales) − COGS
Gross Margin is gross profit expressed in percentage of the total(gross profit/revenue)
Gross Margin= Total Revenue / Gross Income
Example with the first row in our DB:
Data given:
- Unite Price = $45.79
- Quantity = $7
COGS = Unit Price * Quantity = 45.79 * 7 = $320.53
VAT = 5% * COGS = 5% * 320.53
Total = VAT + COGS = 16.03 + 320.53 = 336.56
Gross Margin Percentage = Total Revenue / Gross Income = 336.5565 / 16.0265 = 0.047619 ≈ 4.76 % (Approx)