Torch-TensorRT v1.3.0 #1505
narendasan
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
PyTorch 1.13, CUDA 11.7, TensorRT 8.5, Support for Dynamic Batch for Partially Compiled Modules, Engine Profiling, Experimental Unified Runtime for FX and TorchScript Frontends
Torch-TensorRT 1.3.0 targets PyTorch 1.13, CUDA 11.7, cuDNN 8.5 and TensorRT 8.5. This release focuses on adding support for Dynamic Batch Sizes for partially compiled modules using the TorchScript frontend (this is also supported with the FX frontend). It also introduces a new execution profiling utility to understand the execution of specific engine sub blocks that can be used in conjunction with PyTorch profiling tools to understand the performance of your model post compilation. Finally this release introduces a new experimental unified runtime shared by both the TorchScript and FX frontends. This allows you to start using the FX frontend to generate
torch.jit.trace
able compiled modules.Dynamic Batch Sizes for Partially Compiled Modules via the TorchScript Frontend
A long-standing limitation of the partitioning system in the TorchScript function is lack of support for dynamic shapes. In this release we address a major subset of these use cases with support for dynamic batch sizes for modules that will be partially compiled. Usage is the same as the fully compiled workflow where using the
torch_tensorrt.Input
class, you may define the range of shapes that an input may take during runtime. This is represented as a set of 3 shape sizes:min
,max
andopt
.min
andmax
define the dynamic range of the input Tensor.opt
informs TensorRT what size to optimize for provided there are multiple valid kernels available. TensorRT will select kernels that are valid for the full range of input shapes but most efficient at theopt
size. In this release, partially compiled module inputs can vary in shape for the highest order dimension.For example:
Is a valid shape range, however:
is still not supported.
Engine Profiling [Experimental]
This release introduces a number of profiling tools to measure the performance of TensorRT sub blocks in compiled modules. This can be used in conjunction with PyTorch profiling tools to get a picture of the performance of your model. Profiling for any particular sub block can be enabled by the
enabled_profiling()
method of any__torch__.classes.tensorrt.Engine
attribute, or of anytorch_tensorrt.TRTModuleNext
. The profiler will dump trace files by default in/tmp
, though this path can be customized by either setting theprofile_path_prefix
of__torch__.classes.tensorrt.Engine
or as an argument totorch_tensorrt.TRTModuleNext.enable_precision(profiling_results_dir="")
. Traces can be visualized using the Perfetto tool (https://perfetto.dev)Engine Layer information can also be accessed using
get_layer_info
which returns a JSON string with the layers / fusions that the engine contains.Unified Runtime for FX and TorchScript Frontends [Experimental]
In previous versions of Torch-TensorRT, the FX and TorchScript frontends were mostly separate and each had their distinct benefits and limitations. Torch-TensorRT 1.3.0 introduces a new unified runtime to support both FX and TorchScript meaning that you can choose the compilation workflow that makes the most sense for your particular use case, be it pure Python conversion via FX or C++ Torchscript compilation. Both frontends use the same primitives to construct their compiled graphs be it fully compiled or just partially.
Basic Usage
The TorchScript frontend uses the new runtime by default. No additional workflow changes are necessary.
For the FX frontend, the new runtime can be chosen but setting
use_experimental_fx_rt=True
as part of your compile settings to eithertorch_tensorrt.compile(my_mod, ir="fx", use_experimental_fx_rt=True, explicit_batch_dimension=True)
ortorch_tensorrt.fx.compile(my_mod, use_experimental_fx_rt=True, explicit_batch_dimension=True)
TRTModuleNext
The FX frontend will return a
torch.nn.Module
containingtorch_tensorrt.TRTModuleNext
submodules instead oftorch_tensorrt.fx.TRTModule
s. The features of these modules are nearly identical but with a few key improvements.TRTModuleNext
profiling dumps a trace visualizable with Perfetto (see above for more details).TRTModuleNext
modules aretorch.jit.trace
-able, meaning you can save FX compiled modules as TorchScript for python-less / C++ deployment scenarios. Traced compiled modules have the same deployment instructions as compiled modules produced by the TorchScript frontend.TRTModule
supports as well (state_dict / extra_state, torch.save/torch.load)Examples
Using TRTModuleNext as an arbirary TensorRT engine holder
Using TorchScript you have long been able to embed an arbritrary TensorRT engine from any source in a TorchScript module using
torch_tensorrt.ts.embed_engine_in_new_module
. Now you can do this at thetorch.nn.Module
level by directly usingTRTModuleNext
and access all the benefits enumerated above.The intention is in a future release to have
torch_tensorrt.TRTModuleNext
replacetorch_tensorrt.fx.TRTModule
as the default TensorRT Module implementation. Feedback on this class or how it is used, the runtime in general or associated features (profiler, engine inspector) is welcomed.What's Changed
aten::index.Tensor
by @ruoqianguo in Fix bug: correct the output shape ofaten::index.Tensor
#1314torch.std
andtorch.var
support multi-dimensional reductions by @gs-olive in fix:torch.std
andtorch.var
support multi-dimensional reductions #1395aten::split
behavior with negative indexing by @gs-olive in fix:aten::split
behavior with negative indexing #1403aten::masked_fill
by @gs-olive in fix: Ensure proper type inheritance inaten::masked_fill
#1430noxfile.py
by @gs-olive in chore: Lintnoxfile.py
#1443aten
operators by @gs-olive in fix: Device casting issues with certainaten
operators #1416aten::div
when using truncation with Int32 tensor inputs by @gs-olive in fix: Error withaten::div
when using truncation with Int32 tensor inputs #1442New Contributors
Full Changelog: v1.1.0...v1.3.0
This discussion was created from the release Torch-TensorRT v1.3.0.
Beta Was this translation helpful? Give feedback.
All reactions