Merge pull request #1 from lichili233/analysis

Add Supplementary Analysis
mercari · Jul 15, 2024 · 361f264 · 361f264
2 parents 59ef4bb + e311b49
commit 361f264
Show file tree

Hide file tree

Showing 8 changed files with 245 additions and 0 deletions.
diff --git a/analysis/dataset_comparison.md b/analysis/dataset_comparison.md
@@ -0,0 +1,107 @@
+## Comparing MerRec to Other Large Scale E-Commerce Datasets
+
+|         Dataset         |  Market Type  |  Users  |  Items  | Interactions |                              Interaction Type                              | Categories / Leveled? | Brand | Price | Color | Size  | Timestamp | Item Tokens | SKU/UPC | Covered Year | Sessions |
+| :---------------------: | :-----------: | :-----: | :-----: | :----------: | :------------------------------------------------------------------------: | :-------------------: | :---: | :---: | :---: | :---: | :-------: | :---------: | :-----: | :----------: | :------: |
+|      MerRec (2024)      | C2C (General) |  5.56M  | 83.07M  |    1.27B     | Click, Like, Add-to-cart, Make Offer, Initiate Purchase, Complete Purchase |      3399 / Yes       |  Yes  |  Yes  |  Yes  |  Yes  |    Yes    |   18.86B    |   No    | Half of 2023 | 227.16M  |
+|      Amazon (2023)      | B2C (General) | 54.51M  | 48.19M  |   571.54M    |                               Rating, Review                               |        33 / No        |  Yes  |  Yes  |  Yes  |  Yes  |    Yes    |   30.78B    |   Yes   |  1996-2023   |    No    |
+|      Tmall (2016)       | B2C (General) | 645.37K |  2.35M  |    44.52M    |                              Click, Purchase                               |        72 / No        |  No   |  No   |  No   |  No   |    Yes    |     No      |   Yes   | Half of 2015 | 200.28K  |
+|    Amazon-M2 (2023)     | B2C (General) |   No    |  1.41M  |    16.79M    |                                     No                                     |        No / No        |  Yes  |  No   |  Yes  |  Yes  |    No     |     Yes     |   Yes   |      ?       |  3.96M   |
+|    DIGINETICA (2016)    |       ?       | 232.93K | 184.04K |     3.3M     |                       View, Click, Purchase, Search                        |       1217 / No       |  No   |  Yes  |  No   |  No   |    Yes    |   941.64K   |    ?    |      ?       | 573.93K  |
+|    YOOCHOOSE (2015)     |       ?       |   No    | 52.73K  |    34.15M    |                                  Purchase                                  |       348 / No        |  No   |  Yes  |  No   |  No   |    Yes    |     No      |    ?    |      ?       |  9.24M   |
+|   Retailrocket (2022)   |       ?       |  1.40M  | 417.05K |    2.75M     |                        View, Add-to-cart, Purchase                         |      1669 / Yes       |  No   |  No   |  No   |  No   |    Yes    |   51.29M    |    ?    |      ?       |    No    |
+|     Ali-CCP (2018)      | C2C (General) |  400K   |  4.3M   |    87.41M    |                           View, Click, Purchase                            |         ? / ?         |  Yes  |  No   |  No   |  No   |    No     |     No      |    ?    |      ?       |    No    |
+| Alibaba-iFashion (2019) | C2C (Fashion) |  3.56M  |  4.46M  |   191.39M    |                                   Click                                    |        75 / No        |  No   |  No   |  No   |  No   |    No     |    7.7M     |    ?    |      ?       |    No    |
+
+## Notes
+
+- To our best awareness, only MerRec and Retailrocket [6] have item snapshots.
+- Some columns (e.g. condition, shipment paying party) are omitted here to prevent the table from being too large.
+- In practice, token estimates for DIGINETICA [4] and Retailrocket [6] should be higher than numbers shown here, as the stated numbers here represent counting over unique snapshots rather than all (possibly redundant) snapshots. MerRec estimate contains all snapshot instances.
+- The Amazon (2023) [1] review dataset is not a conventional action event based recommendation dataset and cannot be used interchangeably with MerRec. These two do not generally conform to the same machine learning tasks.
+- Retailrocket [6] hashed most of its item metadata field names, so it is unclear if it contains some of the metadata compared here.
+- For Ali-CCP [7], we interpreted "impression" as "view", and "conversion" as "purchase".
+- For Alibaba-iFashion [8], the tokens provided were mostly Chinese characters. If we define tokens as the number of Chinese characters (as well as other occasional English characters and numerics) instances instead of splitting by empty space, then the token count would be 164.18M.
+
+## Reference
+
+Note: All datasets below are cited as their most-recent version available. References are provided as Bibtex if available to be format agnostic.
+
+[1] Amazon (2023): https://amazon-reviews-2023.github.io/
+```tex
+@article{hou2024bridging,
+  title={Bridging Language and Items for Retrieval and Recommendation},
+  author={Hou, Yupeng and Li, Jiacheng and He, Zhankui and Yan, An and Chen, Xiusi and McAuley, Julian},
+  journal={arXiv preprint arXiv:2403.03952},
+  year={2024}
+}
+```
+
+[2] Tmall (2016): https://tianchi.aliyun.com/dataset/53
+```tex
+@misc{tmall2016,
+    title={IJCAI-16 Brick-and-Mortar Store Recommendation Dataset},
+    url={https://tianchi.aliyun.com/dataset/dataDetail?dataId=53},
+    author={Tianchi},
+    year={2018}
+}
+```
+
+[3] Amazon-M2 (2023): https://kddcup23.github.io/
+```tex
+@inproceedings{jin2023amazonm,
+  title={Amazon-M2: A Multilingual Multi-locale Shopping Session Dataset for Recommendation and Text Generation},
+  author={Wei Jin and Haitao Mao and Zheng Li and Haoming Jiang and Chen Luo and Hongzhi Wen and Haoyu Han and Hanqing Lu and Zhengyang Wang and Ruirui Li and Zhen Li and Monica Xiao Cheng and Rahul Goutam and Haiyang Zhang and Karthik Subbian and Suhang Wang and Yizhou Sun and Jiliang Tang and Bing Yin and Xianfeng Tang},
+  booktitle={Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
+  year={2023},
+  url={https://openreview.net/forum?id=uXBO47JcJT}
+}
+```
+
+[4] DIGINETICA (2016):
+- https://competitions.codalab.org/competitions/11161#learn_the_details-data2
+
+[5] YOOCHOOSE (2015):
+- https://recsys.acm.org/recsys15/challenge/
+
+[6] Retail Rocket (2022): https://www.kaggle.com/datasets/retailrocket/ecommerce-dataset
+```tex
+@misc{zykov2022retailrocket,
+	title={Retailrocket recommender system dataset},
+	url={https://www.kaggle.com/dsv/4471234},
+	DOI={10.34740/KAGGLE/DSV/4471234},
+	publisher={Kaggle},
+	author={Roman Zykov and Noskov Artem and Anokhin Alexander},
+	year={2022}
+}
+```
+
+[7] Ali-CCP (2018): https://tianchi.aliyun.com/dataset/408
+```tex
+@inproceedings{ma2018esmm,
+  title={Entire space multi-task model: An effective approach for estimating post-click conversion rate},
+  author={Ma, Xiao and Zhao, Liqin and Huang, Guan and Wang, Zhi and Hu, Zelin and Zhu, Xiaoqiang and Gai, Kun},
+  booktitle={The 41st International ACM SIGIR Conference on Research \& Development in Information Retrieval},
+  pages={1137--1140},
+  year={2018}
+}
+```
+
+[8] Alibaba-iFashion (2019): https://github.com/wenyuer/POG?tab=readme-ov-file
+```tex
+@inproceedings{chen2019pog,
+  author = {Chen, Wen and Huang, Pipei and Xu, Jiaming and Guo, Xin and Guo, Cheng and Sun, Fei and Li, Chao and Pfadler, Andreas and Zhao, Huan and Zhao, Binqiang},
+  title = {POG: Personalized Outfit Generation for Fashion Recommendation at Alibaba iFashion},
+  year = {2019},
+  isbn = {9781450362016},
+  publisher = {Association for Computing Machinery},
+  address = {New York, NY, USA},
+  url = {https://doi.org/10.1145/3292500.3330652},
+  doi = {10.1145/3292500.3330652},
+  booktitle = {Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining},
+  pages = {2662–2670},
+  numpages = {9},
+  keywords = {transformer, self-attention, fashion outfit recommendation, fashion outfit generation, deep learning},
+  location = {Anchorage, AK, USA},
+  series = {KDD '19}
+}
+```
diff --git a/analysis/dataset_structure.pdf b/analysis/dataset_structure.pdf
diff --git a/analysis/dataset_structure.png b/analysis/dataset_structure.png
diff --git a/analysis/iphone_prices.pdf b/analysis/iphone_prices.pdf
diff --git a/analysis/iphone_prices.png b/analysis/iphone_prices.png
diff --git a/analysis/iphone_prices_compact.pdf b/analysis/iphone_prices_compact.pdf
diff --git a/analysis/mtl_metrics.md b/analysis/mtl_metrics.md
@@ -0,0 +1,96 @@
+# Multi-task learning (MTL) for Recommendation
+
+Below we added 4 additional stronger baselines.
+
+|      Model       | View AUC | Like AUC | Log Loss (View + Like) | Train+Val Time (Hrs) | VRAM (GB) |
+| :--------------: | :------: | :------: | :--------------------: | :------------------: | :-------: |
+| MMOE (Only view) |  0.709   |   N/A    |         0.395          |         16.3         |   3.93    |
+| MMOE (Only like) |   N/A    |  0.709   |         0.354          |         16.6         |   3.93    |
+|       MMOE       |  0.712   |  0.713   |         0.744          |        22.61         |   3.94    |
+|       ESMM       |  0.713   |  0.715   |         0.744          |         23.7         |   3.94    |
+|       AITM       |  0.773   |  0.736   |         0.684          |         2.85         |   3.26    |
+|       PLE        |  0.773   |  0.736   |         0.686          |         4.38         |   3.39    |
+|       OMoE       |  0.773   |  0.737   |         0.684          |         3.3          |   4.17    |
+|       STEM       |  0.772   |  0.736   |         0.683          |         9.68         |   11.27   |
+
+## Notes
+
+- For AITM [2], PLE [3], OMoE [4] and STEM [1], the AUC numbers were more different in the subsequent but truncated digits, though the margin is small and we do not observe any significant difference.
+- Just like originally indicated in the paper, no model in this MTL experiment utilizes the text features, as this was the commonly-adopted standard way of evaluation in the literature.
+- These 4 models were selected based on recent studies showing that they are at SoTA level.
+- Implementation will be publicly released in the coming weeks. Some cleaning and organization is needed to accommodate these 4 models.
+- The ME-MMOE and ME-PLE competitive baselines introduced by the STEM [1] paper were omitted here as some of the design details and implementation were not published by them.
+
+## Reference
+
+References here are presented as Bibtex entries to be format agnostic.
+
+[1] STEM (AAAI 2024): https://ojs.aaai.org/index.php/AAAI/article/view/28749/
+```tex
+@article{su2024stem,
+  author = {Liangcai Su and Junwei Pan and Ximei Wang and Xi Xiao and Shijie Quan and Xihua Chen and Jie Jiang},
+  title  = {STEM: Unleashing the Power of Embeddings for Multi-task Recommendation},
+  journal = {Proceedings of the 38-th AAAI Conference on Artificial Intelligence (AAAI 2024)},
+  year    = {2024},
+}
+```
+
+[2] AITM (KDD 2021): https://dl.acm.org/doi/10.1145/3447548.3467071
+```tex
+@inproceedings{xi2021aitm,
+    author = {Xi, Dongbo and Chen, Zhen and Yan, Peng and Zhang, Yinger and Zhu, Yongchun and Zhuang, Fuzhen and Chen, Yu},
+    title = {Modeling the Sequential Dependence among Audience Multi-step Conversions with Multi-task Learning in Targeted Display Advertising},
+    year = {2021},
+    isbn = {9781450383325},
+    publisher = {Association for Computing Machinery},
+    address = {New York, NY, USA},
+    url = {https://doi.org/10.1145/3447548.3467071},
+    doi = {10.1145/3447548.3467071},
+    booktitle = {Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery \& Data Mining},
+    pages = {3745–3755},
+    numpages = {11},
+    keywords = {targeted display advertising, sequential dependence, multi-task learning, multi-step conversions},
+    location = {Virtual Event, Singapore},
+    series = {KDD '21}
+}
+```
+
+[3] PLE (RecSys 2020): https://dl.acm.org/doi/10.1145/3383313.3412236
+```tex
+@inproceedings{tang2020ple,
+    author = {Tang, Hongyan and Liu, Junning and Zhao, Ming and Gong, Xudong},
+    title = {Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations},
+    year = {2020},
+    isbn = {9781450375832},
+    publisher = {Association for Computing Machinery},
+    address = {New York, NY, USA},
+    url = {https://doi.org/10.1145/3383313.3412236},
+    doi = {10.1145/3383313.3412236},
+    booktitle = {Proceedings of the 14th ACM Conference on Recommender Systems},
+    pages = {269–278},
+    numpages = {10},
+    keywords = {Multi-task Learning, Recommender System, Seesaw Phenomenon},
+    location = {Virtual Event, Brazil},
+    series = {RecSys '20}
+}
+```
+
+[4] OMoE (KDD 2018): https://dl.acm.org/doi/10.1145/3219819.3220007
+```tex
+@inproceedings{10.1145/3219819.3220007,
+author = {Ma, Jiaqi and Zhao, Zhe and Yi, Xinyang and Chen, Jilin and Hong, Lichan and Chi, Ed H.},
+title = {Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts},
+year = {2018},
+isbn = {9781450355520},
+publisher = {Association for Computing Machinery},
+address = {New York, NY, USA},
+url = {https://doi.org/10.1145/3219819.3220007},
+doi = {10.1145/3219819.3220007},
+booktitle = {Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining},
+pages = {1930–1939},
+numpages = {10},
+keywords = {mixture of experts, multi-task learning, neural network, recommendation system},
+location = {London, United Kingdom},
+series = {KDD '18}
+}
+```
diff --git a/analysis/sbr_mercatran.md b/analysis/sbr_mercatran.md
@@ -0,0 +1,42 @@
+# Session-based Recommendation (SBR): MercaTran
+
+| Input Features | Pred. Type | Step  | nDCG @5 | nDCG @20 | Recall @5 | Recall @20 | Tr./Val Time (Hrs) | VRAM (GB) |
+| -------------- | :--------: | :---: | :-----: | :------: | :-------: | :--------: | :----------------: | :-------: |
+|                |    Item    |   1   | 0.0407  |  0.0615  |  0.0789   |   0.1750   |        23.2        |    8.4    |
+|                |            |   2   | 0.0395  |  0.0598  |  0.0768   |   0.1702   |                    |           |
+|                |            |   3   | 0.0380  |  0.0579  |  0.0743   |   0.1664   |                    |           |
+| Item Title     |            |   4   | 0.0371  |  0.0567  |  0.0719   |   0.1627   |                    |           |
+| +              |  Category  |   1   |   N/A   |   N/A    |  0.4411   |   0.5746   |                    |           |
+| Brand Text     |            |   2   |   N/A   |   N/A    |  0.4427   |   0.5742   |                    |           |
+| +              |            |   3   |   N/A   |   N/A    |  0.4409   |   0.5740   |                    |           |
+| Category Text  |            |   4   |   N/A   |   N/A    |  0.4398   |   0.5730   |                    |           |
+|                |   Brand    |   1   |   N/A   |   N/A    |  0.4952   |   0.6085   |                    |           |
+|                |            |   2   |   N/A   |   N/A    |  0.4945   |   0.6072   |                    |           |
+|                |            |   3   |   N/A   |   N/A    |  0.4935   |   0.6061   |                    |           |
+|                |            |   4   |   N/A   |   N/A    |  0.4922   |   0.6052   |                    |           |
+| Input Features | Pred Type  | Step  | nDCG@5  | nDCG@20  | Recall@5  | Recall@20  | Tr./Val Time (Hrs) | VRAM (GB) |
+|                |    Item    |   1   | 0.0410  |  0.0611  |  0.0802   |   0.1735   |        23.2        |    8.4    |
+|                |            |   2   | 0.0402  |  0.0603  |  0.0780   |   0.1710   |                    |           |
+|                |            |   3   | 0.0402  |  0.0598  |  0.0786   |   0.1697   |                    |           |
+|                |            |   4   | 0.0399  |  0.0594  |  0.0771   |   0.1679   |                    |           |
+| Item Title     |  Category  |   1   |   N/A   |   N/A    |  0.4524   |   0.5988   |                    |           |
+|                |            |   2   |   N/A   |   N/A    |  0.4511   |   0.5981   |                    |           |
+|                |            |   3   |   N/A   |   N/A    |  0.4503   |   0.5974   |                    |           |
+|                |            |   4   |   N/A   |   N/A    |  0.4493   |   0.5975   |                    |           |
+|                |   Brand    |   1   |   N/A   |   N/A    |  0.5024   |   0.6256   |                    |           |
+|                |            |   2   |   N/A   |   N/A    |  0.5031   |   0.6268   |                    |           |
+|                |            |   3   |   N/A   |   N/A    |  0.5038   |   0.6271   |                    |           |
+|                |            |   4   |   N/A   |   N/A    |  0.5037   |   0.6264   |                    |           |
+| Input Features | Pred Type  | Step  | nDCG@5  | nDCG@20  | Recall@5  | Recall@20  | Tr./Val Time (Hrs) | VRAM (GB) |
+|                |    Item    |   1   | 0.0111  |  0.0196  |  0.0217   |   0.0609   |        23.2        |    8.4    |
+|                |            |   2   | 0.0111  |  0.0194  |  0.0215   |   0.0593   |                    |           |
+|                |            |   3   | 0.0113  |  0.0193  |  0.0217   |   0.0583   |                    |           |
+| Brand Text     |            |   4   | 0.0110  |  0.0189  |  0.0211   |   0.0575   |                    |           |
+| +              |  Category  |   1   |   N/A   |   N/A    |  0.2769   |   0.3604   |                    |           |
+| Category Text  |            |   2   |   N/A   |   N/A    |  0.2810   |   0.3626   |                    |           |
+|                |            |   3   |   N/A   |   N/A    |  0.2818   |   0.3636   |                    |           |
+|                |            |   4   |   N/A   |   N/A    |  0.2806   |   0.3638   |                    |           |
+|                |   Brand    |   1   |   N/A   |   N/A    |  0.3020   |   0.3485   |                    |           |
+|                |            |   2   |   N/A   |   N/A    |  0.3099   |   0.3587   |                    |           |
+|                |            |   3   |   N/A   |   N/A    |  0.3125   |   0.3643   |                    |           |
+|                |            |   4   |   N/A   |   N/A    |  0.3146   |   0.3680   |                    |           |