Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RMP] Improve the speed of training retrieval models with Merlin Models #259

Closed
3 tasks done
karlhigley opened this issue May 3, 2022 · 3 comments
Closed
3 tasks done
Assignees
Milestone

Comments

@karlhigley
Copy link
Contributor

karlhigley commented May 3, 2022

Problem:

Merlin models needs to differentiate itself relative to other RecSys library solutions. One of those areas of differentiation needs to be performance on the GPU. If our libraries don't follow best practices and achieve fast performance that we can measure on GPU then our potential customers have no reason to use the library.

Goal:

  • Provide performant retrieval models in production
  • Follow best practices by our colleagues for GPU optimization

Constraints

  • Merlin models is built on top of Tensorflow

Possible Optimizations

Retrieval models

@karlhigley karlhigley added the epic label May 3, 2022
@karlhigley karlhigley added this to the Merlin 22.05 milestone May 3, 2022
@viswa-nvidia viswa-nvidia changed the title [RMP] Performance optimization of model training and serving [RMP] Performance optimization of model training May 25, 2022
@karlhigley karlhigley changed the title [RMP] Performance optimization of model training [RMP] Improve the speed of training models with Merlin Models May 25, 2022
@rnyak rnyak removed this from the Merlin 22.05 milestone Jun 22, 2022
@EvenOldridge
Copy link
Member

@gabrielspmoreira @marcromeyn Where are we at with the perf regressions we were seeing? Can we close this?

@viswa-nvidia
Copy link

Check this bug is done NVIDIA-Merlin/models#339 and this issue should be closed. This is an ongoing effort. The profiling portion will be spun off as a separate RMP ticket

@gabrielspmoreira gabrielspmoreira changed the title [RMP] Improve the speed of training models with Merlin Models [RMP] Improve the speed of training retrieval models with Merlin Models Oct 26, 2022
@gabrielspmoreira gabrielspmoreira added this to the Merlin 22.11 milestone Oct 26, 2022
@gabrielspmoreira
Copy link
Member

gabrielspmoreira commented Oct 26, 2022

@EvenOldridge @viswa-nvidia As we discussed in the Grooming meeting today, I have tested the pending runtime issue of retrieval models training (NVIDIA-Merlin/models#339) with the current implementation (for both V1 and V2) and it doesn't occur anymore. So that bug was closed.
I also extracted the profiling task of retrieval model pipelines from this RMP to a new RMP #709 , which also addresses ranking model pipelines, as profiling tasks will require an external support (Valerie).
So I am closing this RMP, as it already delivers value with the finished perf improvements on retrieval models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants