Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨[Feature] Weight specific engine caching #3146

Open
narendasan opened this issue Sep 4, 2024 · 0 comments · May be fixed by #3167
Open

✨[Feature] Weight specific engine caching #3146

narendasan opened this issue Sep 4, 2024 · 0 comments · May be fixed by #3167
Assignees
Labels
feature request New feature or request

Comments

@narendasan
Copy link
Collaborator

narendasan commented Sep 4, 2024

Is your feature request related to a problem? Please describe.

Caching right now is weight agnostic, but at the cost of creating lower performance engines.

Describe the solution you'd like

If we know that weights would be identical, then we can cache engines that are higher performance. The caching system would need to be able to distinguish these two caches and based on user settings select the right one

TensorRT has a flag called kREFIT_IDENTICAL for this workflow

Describe alternatives you've considered

Additional context

@narendasan narendasan added the feature request New feature or request label Sep 4, 2024
@zewenli98 zewenli98 linked a pull request Sep 19, 2024 that will close this issue
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants