[RMP] Add PyTorch backend in Merlin Models #893

marcromeyn · 2023-04-03T11:47:54Z

Problem:

We are currently in a situation where some customers are using merlin-models & some T4Rec to train models. The APIs of these 2 tools have diverged quite dramatically and some features (like extracting embeddings out of models) are only supported in Merlin Models. Both tools require some work in order to have easy to use APIs.

On the Merlin models side, we are in a in-between state where (because of time pressure) there are a bunch of V1 & V2 classes. We would like to migrate all our users to the V2 classes (while removing V2 from the name) & deprecate the old classes.

On the T4Rec side, we would like to keep using this project for session-based models in PyTorch because of the traction we've got. The idea would be to break out the core model-building parts (block-API) in favor of the pytorch-backend of Merlin Models. This roadmap-level ticket focusses on this new pytorch-backend, integration into T4Rec is left out for later. The first major deliverable of this backend is the creation of retrieval models, this because we typically frame session-based models as retrieval-models

Goal:

Reach feature parity & rough API parity between TF & PyTorch backends in Merlin models. This roadmap ticket will be around PyTorch, a future roadmap ticket will focus on TF.

New Functionality

Models
- PyTorch: New backend, build from the ground up based on the TF implementation. Port the all retrieval examples.

Constraints:

We focus on just retrieval-models. Ranking-models will be tackled in a future roadmap ticket.
Migrating T4Rec to the new Block-API is future work and will be captured in another roadmap-level ticket.

Starting Point:

In order to properly plan out the work, a dev-branch is created to answer various design-questions around being able to create retrieval-models in PyTorch. This has lead to a rough MVP that contains all the major pieces. This has also given us a better idea how to break things down to turn the MVP into a fully fleshed product.

We are planning to have people work in parallel on 4 different major parts: inputs, outputs, models & masking.

Implement base-classes of block-API in PyTorch

People: @marcromeyn

Currently the block-API is T4Rec is using a similar design to Keras to allow for modules that lazily initialize their variables. We would like to deprecate this in favor of a native way to achieve the same thing that could launched recently.

Masking

People: @sararb, @gabrielspmoreira & @marcromeyn

This work is dependent on answering the design-question how to handle ragged-tensors.

Tasks: TODO

Input-blocks

People: @marcromeyn

PyTorch

Starting point: MVP

Implement Continuous & Embeddings
Implement TabularInputBlock
Implement Encoder
Add support for sequential-features in input-blocks
Do performance testing of holding multiple features in a single embedding-table

Output-blocks

People: @edknv & @marcromeyn

(Add BinaryOutput models#1099)
(Add RegressionOutput models#1115)
Port CategoricalOutput
Port ContrastiveOutput + negative samplers
Port TopKOutput
Port OutputBlock (for multi-task learning)

Models

People: @edknv & @marcromeyn

Starting point: MVP

One of the leading questions in the initial experimentation phase was to figure out if we can leverage PyTorch lightning for a high-level training-API (similar to how we use Keras on the TF-side). We are confident that PyTorch Lightning is the right path forward.

Implement Model class (using PyTorch lightning)
Create custom Trainer that can handle multi-GPU with data-loader
Implement RetrievalModel class
Port MatrixFactorizationModel, TwoTowerModel & YoutubeDNNRetrievalModel

Documentation

Create a migration guide from Transformers4Rec to Merlin Models session-based PyTorch API

The text was updated successfully, but these errors were encountered:

viswa-nvidia · 2023-05-16T17:04:34Z

@marcromeyn , please create the tasks for PyT and create the tickets so that we can assign them

EvenOldridge · 2023-06-27T16:57:57Z

@marcromeyn @gabrielspmoreira can you work to split this up into: Ranking, Retrieval and Session based

marcromeyn · 2023-07-03T16:38:36Z

Ranking ticket is here: #1044

marcromeyn added the roadmap label Apr 3, 2023

marcromeyn assigned EvenOldridge Apr 3, 2023

EvenOldridge added this to the Merlin 23.05 milestone Apr 26, 2023

EvenOldridge assigned gabrielspmoreira, marcromeyn, sararb, edknv and oliverholworthy Apr 27, 2023

EvenOldridge mentioned this issue Apr 27, 2023

[RMP] Merlin Models PyTorch API #534

Closed

4 tasks

EvenOldridge modified the milestones: Merlin 23.05, Merlin 23.06 Apr 27, 2023

marcromeyn changed the title ~~[RMP] Unify and clean up block-API in TensorFlow & PyTorch~~ [RMP] Add PyTorch backend in Merlin Models May 30, 2023

viswa-nvidia modified the milestones: Merlin 23.06, Merlin 23.07 Jun 6, 2023

oliverholworthy modified the milestones: Merlin 23.07, Merlin 23.09 Jul 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RMP] Add PyTorch backend in Merlin Models #893

[RMP] Add PyTorch backend in Merlin Models #893

marcromeyn commented Apr 3, 2023 •

edited by oliverholworthy

Loading

viswa-nvidia commented May 16, 2023

EvenOldridge commented Jun 27, 2023

marcromeyn commented Jul 3, 2023

[RMP] Add PyTorch backend in Merlin Models #893

[RMP] Add PyTorch backend in Merlin Models #893

Comments

marcromeyn commented Apr 3, 2023 • edited by oliverholworthy Loading

Problem:

Goal:

New Functionality

Constraints:

Starting Point:

Implement base-classes of block-API in PyTorch

Masking

Input-blocks

PyTorch

Output-blocks

Models

Documentation

viswa-nvidia commented May 16, 2023

EvenOldridge commented Jun 27, 2023

marcromeyn commented Jul 3, 2023

marcromeyn commented Apr 3, 2023 •

edited by oliverholworthy

Loading