[RMP] Provide to our customers best practices guidance for training Retrieval, Ranking and Multi-Stage RecSys models #553

gabrielspmoreira · 2022-08-19T15:44:13Z

Problem:

Merlin platform provides tools for building multi-stage recommender systems (which we are going to present at RecSys’22 conference on our tutorial and demo).
In particular, the retrieval and ranking stages are implemented in Merlin Models library and tied together during inference with Merlin Systems library.
Although we provide to our customers an implementation of retrieval and ranking models, and a few example notebooks with toy datasets to get them started, it might be hard for our customers to obtain a reasonable accuracy and performance without too much experimentation on their side with their datasets. In addition, it is expected that customers find some issues in their experiments related to our API flexibility and to models accuracy/performance with real datasets.
This scenario might reduce customers’ interest and engagement if they think Merlin has not been well experimented and refined with real datasets and is not mature enough for their purposes.

Goal:

The goal of this work is to improve customers experience when starting to experiment Merlin with their own datasets.
We want to leverage NVIDIA internal research and computational resources to perform comprehensive experimentation of our retrieval and ranking models with a diversity of public datasets, so that we can:

Learn and share with our customers best practices on how to train Two-Stage RecSys.
Test our implementation, scalability, performance, and models accuracy with real datasets, finding and fixing potential issues before our customers
Refine our API to provide the flexibility a Data Scientist / ML Engineer would reasonably expect from an ML framework

Constraints:

The deliverables of our best practices for our customers (and for our internal solution architects) will include in our documentation and examples:
- Merlin Model Zoo Leaderboard - Comparison of the accuracy and performance of different models with different public datasets for the retrieval and ranking tasks
- Hypertuning guidance - Identifying the most important hyperparameters to tune in such models and providing guidance in terms of the search space as a function of the dataset size (# of samples and features)
- Advanced examples - Experimentation scripts showcasing advanced usage of our API to set the important hyperparameters.
CI integration tests on perf and acccuracy regressions - Usage of the experimentation scripts in our CI, so that we can detect training performance or accuracy regressions.
API improvement – It is common in the experimentation to process to find bugs or flexibility limitation in our API for some use cases or datasets. This work involves identifying the issues and fixing the critical issues.

Starting Point:

Our research team has already done or is doing experimentation work for some RecSys use cases:

Session-based recommendation – For the research where Transformers4Rec library was designed and experimented, we provided in our RecSys paper in its online appendix:
- The model zoo leaderboard, comparing models accuracy and performance
- Some hypertuning guidance (search space but not the ranking of most important of the hparams)
- Advanced example of the API
Retrieval models – This experimentation process from our research on retrieval models allowed to identify a bunch of API, performance and accuracy issues, all of them either fixed or in progress. The preliminary experiment results are here (internal to NVIDIA), comparing retrieval models like MF (both Implicit and Merlin Models), TwoTower and YouTubeDNN for different datasets. We are going to refine the experiments after the on-going refactory being performed in some related Merlin Models building blocks. Based on the retrieval experiments script, now we have integration tests of retrieval models to monitor accuracy.
The effort in the retrieval research is tightly related to the scope of this RMP.
Ranking models – We have currently a research project focused in Single-Task Learning (STL), Multi-Task Learning (MTL) in ranking models (e.g. predicting the likelihood of a user clicking, liking, sharing and/or purchasing an item), a typical use case of large companies using Two-Stage recommendation with rich user feedback (like in Tenrec dataset provided by Tencent and Twitter dataset provided for RecSys challenge).
We have been refining/fixing and experimenting with a number of ranking models (like DCN-v2, DLRM, DeepFM, Wide&Deep) and MTL ranking models (e.g. MMoE, PLE) for AliCCP and Tenrec public dataset. That research work is directly related to this RMP.
Two-stage RecSys models training – We have started for the Hack week a research project to investigate the best practices to train two-stage recsys pipelines (proposal here - internal only), composed of Retrieval and Ranking models. In particular, we are interested to understand how to train the ranking model, so that it is able to improve the ranking of items provided by a retrieval models (e.g. by usage of additional features, real x sampled negatives, transfer learning).

Tasks

Ranking Models

Update the research STL/MTL ranking training script to use the latest API (PredictionBlock instead of PredictionTask) #917

AliCCP dataset experiments

Tenrec dataset experiments

Testing

Create integration tests for ranking models on CI to track accuracy or performance regressions over time #667

Documentation

Retrieval Models

Finish the refactory of the YouTubeDNN retrieval model
- [x] NVIDIA-Merlin/models#540 - Provide flexibility to define the targets of YouTube DNN sequential input (e.g. next-item prediction x last item prediction)
- [x] NVIDIA-Merlin/models#622 - Make YouTubeDNN and retrieval model, that exports a Top-K recommender model for evaluation and inference

Update the existing examples to use the refactored retrieval models (V2) #684

Datasets preprocessing

Hypertuning

Investigate/research how to better train two-stage recsys pipelines

The text was updated successfully, but these errors were encountered:

EvenOldridge · 2022-10-12T16:56:22Z

@gabrielspmoreira @viswa-nvidia to update checkboxes and convert to issues across the releases.

gabrielspmoreira · 2022-10-24T16:19:10Z

fyi, I have updated the check boxes and created the tasks a few days ago

gabrielspmoreira · 2022-11-09T18:38:27Z

Created some slides with an overview of the proposal to discuss in the next grooming meeting

gabrielspmoreira · 2022-11-15T00:13:15Z

I have written a new RMP #732 as a rewrite of this one in a more customer-centric and pragmatic approach: a quick start pipeline for customers. I keep this RMP unchanged as it was prioritized, but propose closing it and taking the #732 as the replacement. The new one also reduces the scope in terms of dataset experiments, as we are going to stick initially to a single dataset (TenRec) as an example on how to use the quick-start template pipeline.

gabrielspmoreira · 2022-11-16T21:06:26Z

@viswa-nvidia @EvenOldridge I am closing this ticket, as I rewrote it in a customer-centric approach in #732 , which is now assigned to 23.01

gabrielspmoreira added the roadmap label Aug 19, 2022

gabrielspmoreira assigned EvenOldridge Aug 19, 2022

EvenOldridge added this to the Merlin 23.01 milestone Oct 12, 2022

EvenOldridge assigned gabrielspmoreira, radekosmulski, sararb and bschifferer Oct 12, 2022

EvenOldridge mentioned this issue Oct 12, 2022

[RMP] Evaluate training performance for two-tower and DLRM models #258

Closed

8 tasks

gabrielspmoreira changed the title ~~[RMP] Provide to our customersbest practices guidance for training Retrieval, Ranking and Multi-Stage RecSys models~~ [RMP] Provide to our customers best practices guidance for training Retrieval, Ranking and Multi-Stage RecSys models Nov 14, 2022

gabrielspmoreira mentioned this issue Nov 16, 2022

[RMP] Quick-start RecSys pipeline and best practices guidance for training and evaluating retrieval and ranking models #732

Open

4 tasks

gabrielspmoreira removed this from the Merlin 23.01 milestone Nov 16, 2022

gabrielspmoreira closed this as completed Nov 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RMP] Provide to our customers best practices guidance for training Retrieval, Ranking and Multi-Stage RecSys models #553

[RMP] Provide to our customers best practices guidance for training Retrieval, Ranking and Multi-Stage RecSys models #553

gabrielspmoreira commented Aug 19, 2022 •

edited

Loading

EvenOldridge commented Oct 12, 2022

gabrielspmoreira commented Oct 24, 2022

gabrielspmoreira commented Nov 9, 2022

gabrielspmoreira commented Nov 15, 2022 •

edited

Loading

gabrielspmoreira commented Nov 16, 2022

[RMP] Provide to our customers best practices guidance for training Retrieval, Ranking and Multi-Stage RecSys models #553

[RMP] Provide to our customers best practices guidance for training Retrieval, Ranking and Multi-Stage RecSys models #553

Comments

gabrielspmoreira commented Aug 19, 2022 • edited Loading

Problem:

Goal:

Constraints:

Starting Point:

Tasks

Ranking Models

AliCCP dataset experiments

Tenrec dataset experiments

Testing

Documentation

Retrieval Models

Datasets preprocessing

Hypertuning

Documentation

Testing

Two-stage recommendation

EvenOldridge commented Oct 12, 2022

gabrielspmoreira commented Oct 24, 2022

gabrielspmoreira commented Nov 9, 2022

gabrielspmoreira commented Nov 15, 2022 • edited Loading

gabrielspmoreira commented Nov 16, 2022

gabrielspmoreira commented Aug 19, 2022 •

edited

Loading

gabrielspmoreira commented Nov 15, 2022 •

edited

Loading