Skip to content

Latest commit

 

History

History
400 lines (280 loc) · 17.6 KB

README.md

File metadata and controls

400 lines (280 loc) · 17.6 KB


Logo

Market Basket Analysis, Apriori Algorithm and Asssociation

A Market Basket Analysis project


Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. AssociationRules
  5. RFM Analysis
  6. Scenario Analysis - Bundle recommendations
  7. Roadmap
  8. Contributing
  9. License
  10. Contact
  11. Acknowledgments

Craft Tea Fox - Craft Matcha made better

Product Screen Shot

This analysis is a practical implementation of the Apriori Algorithm via Python.

Primer on Apriori Algorithm & Association Rules

Apriori algorithms is a data mining algorithm used for mining frequent itemsets and relevant association rules. It is devised to operate on a database that contain transactions -like, items bought by a customer in a store.

An itemset can be considered frequent if it meets a user-specified support threshold. For example, if the support threshold is set to 0.5(50%), a frequent itemset is a set of items that are bought/purchased together in atleast 50% of all transactions.

Association rules are a set of rules derived from a database, that can help determining relationship among variables in a large transactional database.

For example, let I ={i(1),i(2)...,i(m)} be a set of m attributes called items, and T={t(1),t(2),...,t(n)} be the set of transactions. Every transaction t(i) in T has a unique transaction ID, and it contains a subset of itemsets in I.

Association rules are usually written as i(j) -> i(k). This means that there is a strong relationship between the purchase of item i(j) and item i(k). Both these items were purchased together in the same transaction.

In the above example, i(j) is the antecedent and i(k) is the consequent.

Please note that both antecedents and consequents can have multiple items. For example, {Diaper,Gum} -> {Beer, Chips} is also valid.

Since multiplie rules are possible even from a very small database, i-order to select the most relevant ones, we use constraints on various measures of interest. The most important measures are discussed below. They are:

    1. Support : * The support of an itemset X, supp(X) is the proportion of transaction in the database in which the item X appears. It signifies the popularity of an itemset.

    supp(X) = (Number of transactions in which X appears)/(Total number of transactions)

We can identify itemsets that have support values beyond this threshold as significant itemsets.

    1. Confidence :* Confidence of a rule signifies the likelihood of item Y being purchased when item X is purchased.

Thus, conf(X -> Y) = supp(X U Y) / supp( X )

If conf (X -> Y) is 75%, it implies that, for 75% of transactions containing X & Y, this rule is correct. It is more like a conditional probability, P(Y|X), that the probability of finding itemset Y in transactions fiven that the transaction already contains itemset X.

    1. Lift :* Lift explains the the likelihood of the itemset Y being purchased when itemset X is already purchased, while taking into account the popularity of Y.r>

Thus, lift (X -> Y) = supp (X U Y)/( supp(X) supp (Y) )

If the value of lift is greater than 1, it means that the itemset Y is likely to be bought with itemset X, while a value less than 1 implies that the itemset Y is unlikely to be bought if the itemset X is bought.

(back to top)

Built With

Major frameworks/libraries used to bootstrap project.

  • Python
  • Githubl
  • Bootstrap

(back to top)

Getting Started

Instructions on setting up your project locally. To get a local copy up and running follow these simple example steps.

Prerequisites

  • pip
    pip install -r requirements

Installation

Installing and setting up your app.

  1. Run Jupyter notebook on Sagemaker at https://bcg-rise-bda.awsapps.com/start#/
  2. Clone the repo
    git clone https://github.com/JohnTan38/Best-README.git
  3. Install packages
    pip install mlxtend
  4. Import libraries
    from mlxtend.frequent_patterns import apriori, association_rules
    from mlxtend.preprocessing import TransactionEncoder

(back to top)

Association Rules & RFM Analysis (Recency, Frequency, Monetary)

Data Preprocessing and transformation - TransactionEncoder class from the MLXtend library

  1. To find unique items - flatten the dataframe and convert into a set. The transformation removes any duplicate items
  2. Fit the object of the class on the list and convert to dataframe.
  3. for every item in a transaction, append 1 if purchased and 0 otherwise.
  # fitting the list and converting the transactions to true and false
  encoder = TransactionEncoder()
  transactions = encoder.fit(matcha_list).transform(matcha_list)
  
  # converting the transactions array to a datafrmae
  df = pd.DataFrame(transactions, columns=encoder.columns_)

Market Basket Analysis

Market Basket Analysis is a data mining tool used by retailers to increase sales by better understanding customer purchasing patterns. Purchase history and items bought together are analyzed to reveal product groupings, as well as products that are likely to e purchased together.

Association Rules

Association Analysis looks for relationships in large datasets. These relationships can take 2 forms: frequent item sets or association rules. Frequent item sets are a collection of items that frequently occur together. Association rules suggest that a strong relationship exists between two items

Frequently bought together

Matcha and Hojicha


> Matcha latte and Hojicha latte pair with high level of support and lift. Lift > 1 indicates that higher sales of antecedents lead to higher sales of consequents

Association Rule - Awakening Matcha Whisk set & Matcha Starter kit

Awakening Matcha and Starter


> Awakening Matcha Whisk set and Matcha Starter kit bundle with high level of support and lift.

Association Rule - Min Support 3% and Lift > 2

Association rule Support and Lift

Closely associated products with minimum support of 3% and lift greater than 2. Customers who add item to cart could have closely associated items suggested to them before checkout. Different permutations and threholds of Support and Lift return differennt association rules.

(back to top)

RFM Analysis

Customers recency, frequency & monetary (transaction values) are analyzed and K Means clustering used to group customers into distinct segments
.

Customer segmentation

Customer segmentation fine-tuned with detailed analysis and RFM segments identified. For example, top customers who buy frequently and with high ticket values in RFM segment '144' could be offered bundle of 'Awakening Matcha Whisk set' with 'Ceremonial Uji Matcha Powder'.

RFM segment

Association Rule + RFM - Opportunities for targeted cross-selling

Association and RFM

Customers' RFM segments and closely associated products provide opportuniites for targeted cross selling . Customers of RFM segment '444' who bought 'Awakening Matcha Whisk Set' could have 'Matcha Starter Kit' recommended.

Sales Trends -

Matcha and Hojicha sales Consistent all year sales except for last quarter of 2021.

Matcha Starter kit sales Matcha Starter Kit enjoys high support and lift. Sales campaign to smooth out sales trend during 2nd and 3rd quarters. Gross profit would be increased with a successful campaign.

(back to top)

Scenario Analysis - Bundle recommendations

Cross Sell High Value_2
Potential uplift of 35% gross sales of Awakening Matcha Whisk Set.

Cross Sell High Value_1
Potential uplift of 52% gross sales of Ceremonial Uji Matcha Powder.

Cross Sell Medium Value_1
Potential uplift of 18% gross sales of Barista Uji Matcha Powder.

Pros and Cons of Apriori Algorithm

Easy to understand
Suitable for large itemsets

Computationally expensie if there are many association rules
Calculating Support is expensive as algorithm goes through entire dataset

_For more examples, please refer to the Documentation

(back to top)

Roadmap

  • Data collection - customers' demographic profile
  • Sesarch Engine Optimization (SEO) & click through rates (CTR)
  • Google Analytics 360 - data driven attribution
  • Fine tune threshold values for Support and Lift
  • Multi-language Support
    • Chinese
    • Bahasa Indeonesia

(back to top)

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

Contributors ✨

Contributing

This project follows the all-contributors specification. Contributions of any kind welcome!

Support:

buymeacoffeeJohnTan




Buy Me A Coffee

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Contact

Project Link: https://github.com/JohnTan38/Best-README

(back to top)

John's GitHub stats

Acknowledgments

(back to top)