✨[Feature] Performance optimization of PyTorch + TRT subgraphs #3277

keehyuna · 2024-11-04T13:26:45Z

Is your feature request related to a problem? Please describe.

If there is graph breaks due to unsupported ops, overhead between TRT and torch module is observed.

Describe the solution you'd like

Entire subgraphs are capture/replayed by cuda graphs in wrapper runtime module

Describe alternatives you've considered

cuda graph can be applied to torch and trt module individually but it's not ideal to reduce cpu overhead.

Additional context

keehyuna added the feature request New feature or request label Nov 4, 2024

keehyuna assigned narendasan Nov 4, 2024

This was referenced Nov 4, 2024

📖 [Story] Optimize the launch overhead of TRT engine and pytorch kernels #3274

Open

Wrapper module around TRT + pytorch subgraphs #3270

Open

keehyuna assigned keehyuna and unassigned narendasan Nov 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✨[Feature] Performance optimization of PyTorch + TRT subgraphs #3277

✨[Feature] Performance optimization of PyTorch + TRT subgraphs #3277

keehyuna commented Nov 4, 2024

✨[Feature] Performance optimization of PyTorch + TRT subgraphs #3277

✨[Feature] Performance optimization of PyTorch + TRT subgraphs #3277

Comments

keehyuna commented Nov 4, 2024