-
Notifications
You must be signed in to change notification settings - Fork 351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Runtime output buffer optimization #3276
base: main
Are you sure you want to change the base?
Conversation
d0ef3cd
to
377248e
Compare
0a98180
to
4a5f0d1
Compare
I think this PR doesn't have to do with fake tensor as output shape was inferred from trt function( |
@@ -263,19 +284,15 @@ std::vector<at::Tensor> execute_engine(std::vector<at::Tensor> inputs, c10::intr | |||
output_profiler_guard = | |||
std::make_unique<torch::autograd::profiler::RecordProfile>(compiled_engine->output_profile_path); | |||
} | |||
if ((false == compiled_engine->use_pre_allocated_outputs) || shape_changed) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!compiled_engine->use_pre_allocated_outputs ?
return false; | ||
} | ||
|
||
std::vector<at::Tensor> create_output_tensors(c10::intrusive_ptr<TRTEngine> compiled_engine) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we functionalize inputs allocation/creation in the execute engine similar to this ? ( I posted a similar comment in your wrapper module PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create a context manager to enable this across subgraphs
Description
Latency hiding by creating the output tensor for next output buffer
Fixes #3275
Type of change
Please delete options that are not relevant and/or add your own.
Checklist: