Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RMP] Provide to our customers best practices guidance for training Retrieval, Ranking and Multi-Stage RecSys models #553

Closed
25 tasks
gabrielspmoreira opened this issue Aug 19, 2022 · 5 comments
Assignees
Labels

Comments

@gabrielspmoreira
Copy link
Member

gabrielspmoreira commented Aug 19, 2022

Problem:

Merlin platform provides tools for building multi-stage recommender systems (which we are going to present at RecSys’22 conference on our tutorial and demo).
In particular, the retrieval and ranking stages are implemented in Merlin Models library and tied together during inference with Merlin Systems library.
Although we provide to our customers an implementation of retrieval and ranking models, and a few example notebooks with toy datasets to get them started, it might be hard for our customers to obtain a reasonable accuracy and performance without too much experimentation on their side with their datasets. In addition, it is expected that customers find some issues in their experiments related to our API flexibility and to models accuracy/performance with real datasets.
This scenario might reduce customers’ interest and engagement if they think Merlin has not been well experimented and refined with real datasets and is not mature enough for their purposes.

Goal:

The goal of this work is to improve customers experience when starting to experiment Merlin with their own datasets.
We want to leverage NVIDIA internal research and computational resources to perform comprehensive experimentation of our retrieval and ranking models with a diversity of public datasets, so that we can:

  • Learn and share with our customers best practices on how to train Two-Stage RecSys.
  • Test our implementation, scalability, performance, and models accuracy with real datasets, finding and fixing potential issues before our customers
  • Refine our API to provide the flexibility a Data Scientist / ML Engineer would reasonably expect from an ML framework

Constraints:

  • The deliverables of our best practices for our customers (and for our internal solution architects) will include in our documentation and examples:
    • Merlin Model Zoo Leaderboard - Comparison of the accuracy and performance of different models with different public datasets for the retrieval and ranking tasks
    • Hypertuning guidance - Identifying the most important hyperparameters to tune in such models and providing guidance in terms of the search space as a function of the dataset size (# of samples and features)
    • Advanced examples - Experimentation scripts showcasing advanced usage of our API to set the important hyperparameters.
  • CI integration tests on perf and acccuracy regressions - Usage of the experimentation scripts in our CI, so that we can detect training performance or accuracy regressions.
  • API improvement – It is common in the experimentation to process to find bugs or flexibility limitation in our API for some use cases or datasets. This work involves identifying the issues and fixing the critical issues.

Starting Point:

Our research team has already done or is doing experimentation work for some RecSys use cases:

  • Session-based recommendation – For the research where Transformers4Rec library was designed and experimented, we provided in our RecSys paper in its online appendix:
    • The model zoo leaderboard, comparing models accuracy and performance
    • Some hypertuning guidance (search space but not the ranking of most important of the hparams)
    • Advanced example of the API
  • Retrieval models – This experimentation process from our research on retrieval models allowed to identify a bunch of API, performance and accuracy issues, all of them either fixed or in progress. The preliminary experiment results are here (internal to NVIDIA), comparing retrieval models like MF (both Implicit and Merlin Models), TwoTower and YouTubeDNN for different datasets. We are going to refine the experiments after the on-going refactory being performed in some related Merlin Models building blocks. Based on the retrieval experiments script, now we have integration tests of retrieval models to monitor accuracy.
    The effort in the retrieval research is tightly related to the scope of this RMP.
  • Ranking models – We have currently a research project focused in Single-Task Learning (STL), Multi-Task Learning (MTL) in ranking models (e.g. predicting the likelihood of a user clicking, liking, sharing and/or purchasing an item), a typical use case of large companies using Two-Stage recommendation with rich user feedback (like in Tenrec dataset provided by Tencent and Twitter dataset provided for RecSys challenge).
    We have been refining/fixing and experimenting with a number of ranking models (like DCN-v2, DLRM, DeepFM, Wide&Deep) and MTL ranking models (e.g. MMoE, PLE) for AliCCP and Tenrec public dataset. That research work is directly related to this RMP.
  • Two-stage RecSys models training – We have started for the Hack week a research project to investigate the best practices to train two-stage recsys pipelines (proposal here - internal only), composed of Retrieval and Ranking models. In particular, we are interested to understand how to train the ranking model, so that it is able to improve the ranking of items provided by a retrieval models (e.g. by usage of additional features, real x sampled negatives, transfer learning).

Tasks

Ranking Models

AliCCP dataset experiments

Tenrec dataset experiments

Testing

Documentation

Retrieval Models

Finish the refactory of the YouTubeDNN retrieval model
- [x] NVIDIA-Merlin/models#540 - Provide flexibility to define the targets of YouTube DNN sequential input (e.g. next-item prediction x last item prediction)
- [x] NVIDIA-Merlin/models#622 - Make YouTubeDNN and retrieval model, that exports a Top-K recommender model for evaluation and inference

Datasets preprocessing

Hypertuning

Documentation

Testing

Two-stage recommendation

Investigate/research how to better train two-stage recsys pipelines

@EvenOldridge
Copy link
Member

@gabrielspmoreira @viswa-nvidia to update checkboxes and convert to issues across the releases.

@gabrielspmoreira
Copy link
Member Author

fyi, I have updated the check boxes and created the tasks a few days ago

@gabrielspmoreira
Copy link
Member Author

Created some slides with an overview of the proposal to discuss in the next grooming meeting

@gabrielspmoreira gabrielspmoreira changed the title [RMP] Provide to our customers best practices guidance for training Retrieval, Ranking and Multi-Stage RecSys models [RMP] Provide to our customers quick-start code and best practices guidance for training Retrieval, Ranking and Multi-Stage RecSys models Nov 14, 2022
@gabrielspmoreira gabrielspmoreira changed the title [RMP] Provide to our customers quick-start code and best practices guidance for training Retrieval, Ranking and Multi-Stage RecSys models [RMP] Provide to our customersbest practices guidance for training Retrieval, Ranking and Multi-Stage RecSys models Nov 14, 2022
@gabrielspmoreira gabrielspmoreira changed the title [RMP] Provide to our customersbest practices guidance for training Retrieval, Ranking and Multi-Stage RecSys models [RMP] Provide to our customers best practices guidance for training Retrieval, Ranking and Multi-Stage RecSys models Nov 14, 2022
@gabrielspmoreira
Copy link
Member Author

gabrielspmoreira commented Nov 15, 2022

I have written a new RMP #732 as a rewrite of this one in a more customer-centric and pragmatic approach: a quick start pipeline for customers. I keep this RMP unchanged as it was prioritized, but propose closing it and taking the #732 as the replacement. The new one also reduces the scope in terms of dataset experiments, as we are going to stick initially to a single dataset (TenRec) as an example on how to use the quick-start template pipeline.

@gabrielspmoreira
Copy link
Member Author

@viswa-nvidia @EvenOldridge I am closing this ticket, as I rewrote it in a customer-centric approach in #732 , which is now assigned to 23.01

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants