Torch-TensorRT v1.3.0 #1505

narendasan · 2022-12-01T02:36:50Z

narendasan
Dec 1, 2022
Collaborator

PyTorch 1.13, CUDA 11.7, TensorRT 8.5, Support for Dynamic Batch for Partially Compiled Modules, Engine Profiling, Experimental Unified Runtime for FX and TorchScript Frontends

Torch-TensorRT 1.3.0 targets PyTorch 1.13, CUDA 11.7, cuDNN 8.5 and TensorRT 8.5. This release focuses on adding support for Dynamic Batch Sizes for partially compiled modules using the TorchScript frontend (this is also supported with the FX frontend). It also introduces a new execution profiling utility to understand the execution of specific engine sub blocks that can be used in conjunction with PyTorch profiling tools to understand the performance of your model post compilation. Finally this release introduces a new experimental unified runtime shared by both the TorchScript and FX frontends. This allows you to start using the FX frontend to generate torch.jit.traceable compiled modules.

Dynamic Batch Sizes for Partially Compiled Modules via the TorchScript Frontend

A long-standing limitation of the partitioning system in the TorchScript function is lack of support for dynamic shapes. In this release we address a major subset of these use cases with support for dynamic batch sizes for modules that will be partially compiled. Usage is the same as the fully compiled workflow where using the torch_tensorrt.Input class, you may define the range of shapes that an input may take during runtime. This is represented as a set of 3 shape sizes: min, max and opt. min and max define the dynamic range of the input Tensor. opt informs TensorRT what size to optimize for provided there are multiple valid kernels available. TensorRT will select kernels that are valid for the full range of input shapes but most efficient at the opt size. In this release, partially compiled module inputs can vary in shape for the highest order dimension.

For example:

min_shape: (1, 3, 128, 128)
opt_shape: (8, 3, 128, 128)
max_shape: (32, 3, 128, 128)

Is a valid shape range, however:

min_shape: (1, 3, 128, 128)
opt_shape: (1, 3, 256, 256)
max_shape: (1, 3, 512, 512)

is still not supported.

Engine Profiling [Experimental]

This release introduces a number of profiling tools to measure the performance of TensorRT sub blocks in compiled modules. This can be used in conjunction with PyTorch profiling tools to get a picture of the performance of your model. Profiling for any particular sub block can be enabled by the enabled_profiling() method of any __torch__.classes.tensorrt.Engine attribute, or of any torch_tensorrt.TRTModuleNext. The profiler will dump trace files by default in /tmp, though this path can be customized by either setting the profile_path_prefix of __torch__.classes.tensorrt.Engine or as an argument to torch_tensorrt.TRTModuleNext.enable_precision(profiling_results_dir=""). Traces can be visualized using the Perfetto tool (https://perfetto.dev)

Engine Layer information can also be accessed using get_layer_info which returns a JSON string with the layers / fusions that the engine contains.

Unified Runtime for FX and TorchScript Frontends [Experimental]

In previous versions of Torch-TensorRT, the FX and TorchScript frontends were mostly separate and each had their distinct benefits and limitations. Torch-TensorRT 1.3.0 introduces a new unified runtime to support both FX and TorchScript meaning that you can choose the compilation workflow that makes the most sense for your particular use case, be it pure Python conversion via FX or C++ Torchscript compilation. Both frontends use the same primitives to construct their compiled graphs be it fully compiled or just partially.

Basic Usage

The TorchScript frontend uses the new runtime by default. No additional workflow changes are necessary.

Note: The runtime ABI version was increased to support this feature, as such models compiled with previous versions of Torch-TensorRT will need to be recompiled

For the FX frontend, the new runtime can be chosen but setting use_experimental_fx_rt=True as part of your compile settings to either torch_tensorrt.compile(my_mod, ir="fx", use_experimental_fx_rt=True, explicit_batch_dimension=True) or torch_tensorrt.fx.compile(my_mod, use_experimental_fx_rt=True, explicit_batch_dimension=True)

Note: The new runtime only supports explicit batch dimension

TRTModuleNext

The FX frontend will return a torch.nn.Module containing torch_tensorrt.TRTModuleNext submodules instead of torch_tensorrt.fx.TRTModules. The features of these modules are nearly identical but with a few key improvements.

TRTModuleNext profiling dumps a trace visualizable with Perfetto (see above for more details).
TRTModuleNext modules are torch.jit.trace-able, meaning you can save FX compiled modules as TorchScript for python-less / C++ deployment scenarios. Traced compiled modules have the same deployment instructions as compiled modules produced by the TorchScript frontend.
TRTModuleNext maintains the same serialization workflows TRTModule supports as well (state_dict / extra_state, torch.save/torch.load)

Examples

model_fx = model_fx.cuda()
inputs_fx = [i.cuda() for i in inputs_fx]
trt_fx_module_f16 = torch_tensorrt.compile(
    model_fx,
    ir="fx",
    inputs=inputs_fx,
    enabled_precisions={torch.float16},
    use_experimental_fx_rt=True,
    explicit_batch_dimension=True
)

# Save model using torch.save 

torch.save(trt_fx_module_f16, "trt.pt")
reload_trt_mod = torch.load("trt.pt")

# Trace and save the FX module in TorchScript
scripted_fx_module = torch.jit.trace(trt_fx_module_f16, example_inputs=inputs_fx)
scripted_fx_module.save("/tmp/scripted_fx_module.ts")
scripted_fx_module = torch.jit.load("/tmp/scripted_fx_module.ts")

... #Get a handle for a TRTModuleNext submodule

# Extract state dictionary
st = trt_mod.state_dict()

# Load the state dict into a new module
new_trt_mod = TRTModuleNext()
new_trt_mod.load_state_dict(st)

Using TRTModuleNext as an arbirary TensorRT engine holder

Using TorchScript you have long been able to embed an arbritrary TensorRT engine from any source in a TorchScript module using torch_tensorrt.ts.embed_engine_in_new_module. Now you can do this at the torch.nn.Module level by directly using TRTModuleNext and access all the benefits enumerated above.

trt_mod = TRTModuleNext(
            serialized_engine,
            name="TestModule",
            input_binding_names=input_names,
            output_binding_names=output_names,
 )

The intention is in a future release to have torch_tensorrt.TRTModuleNext replace torch_tensorrt.fx.TRTModule as the default TensorRT Module implementation. Feedback on this class or how it is used, the runtime in general or associated features (profiler, engine inspector) is welcomed.

What's Changed

chore: Bump version to 1.2.0a0 by @narendasan in chore: Bump version to 1.2.0a0 #1044
feat: Extending nox for cxx11 ABI version by @andi4191 in feat: Extending nox for cxx11 ABI version #1013
docs: Update the documentation theme to PyTorch by @narendasan in docs: Update the documentation theme to PyTorch #1063
Adding Code of Conduct file by @facebook-github-bot in Adding Code of Conduct file #1061
Update CONTRIBUTING.md by @frank-wei in Update CONTRIBUTING.md #1064
feat: Optimize hub.py download by @andi4191 in feat: Optimize hub.py download #1022
Adding an action to automatically assign reviewers and assignees by @narendasan in Adding an action to automatically assign reviewers and assignees #1078
Add PR assigner support by @narendasan in Add PR assigner support #1080
(//core): Align with prim::Enter in module fallback by @andi4191 in (//core): Align with prim::Enter in module fallback #991
(//core): Added a variant for aten::split by @andi4191 in (//core): Added a variant for aten::split #992
feat(nox): Replacing session with environment variable by @andi4191 in feat(nox): Replacing session with environment variable #1057
Refactor the internal codebase from fx2trt_oss to torch_tensorrt by @frank-wei in Refactor the internal codebase from fx2trt_oss to torch_tensorrt #1104
format by buildifier by @frank-wei in format by buildifier #1106
[fx2trt] Modify lower setting class by @frank-wei in [fx2trt] Modify lower setting class #1107
Modified the notebooks directory's README file by @svenchilton in Modified the notebooks directory's README file #1102
[FX] Sync to OSS by @frank-wei in [FX] Sync to OSS #1118
[fx_acc] Add acc_tracer support for torch.mm by @khabinov in [fx_acc] Add acc_tracer support for torch.mm #1120
Added Triton deployment instructions to documentation by @tanayvarshney in Added Triton deployment instructions to documentation #1116
amending triton deployment docs by @tanayvarshney in amending triton deployment docs #1126
fix: Update broken repo hyperlink by @lamhoangtung in fix: Update broken repo hyperlink #1131
fix: Fix keep_dims functionality for aten::max by @peri044 in fix: Fix keep_dims functionality for aten::max #1099
fix(tests/core/partitioning): Fix tests of refactoring segmentation in partitioning by @peri044 in fix(tests/core/partitioning): Fix tests of refactoring segmentation in partitioning #1140
feat(//tests): Update rtol and atol based tolerance for test cases by @andi4191 in feat(//tests): Update rtol and atol based tolerance for test cases #1055
doc: add the explanation for partition phases on docs by @bowang007 in doc: add the explanation for partition phases on docs #1090
feat (//cpp): Using atol and rtol based tolerance threshold for torchtrtc by @andi4191 in feat (//cpp): Using atol and rtol based tolerance threshold for torchtrtc #1052
CI/CD setup by @frank-wei in CI/CD setup #1137
Update README.md by @frank-wei in Update README.md #1142
[fx2trt] Engineholder feature improvement, test fixes by @frank-wei in [fx2trt] Engineholder feature improvement, test fixes #1143
feat (//core/conversion) : Add converter for torch.bitwise_not by @blchu in feat (//core/conversion) : Add converter for torch.bitwise_not #1029
fixed typos by @tanayvarshney in fixed typos #1098
[FX] --fx-only does not need to check bazel by @frank-wei in [FX] --fx-only does not need to check bazel #1147
[FX] refactor the fx path in compile function by @frank-wei in [FX] refactor the fx path in compile function #1141
[FX] Create getting_started_with_fx_path.rst by @frank-wei in [FX] Create getting_started_with_fx_path.rst #1145
[FX] move example folder by @frank-wei in [FX] move example folder #1149
[FX] Sync enhancement done internally at Meta by @yinghai in [FX] Sync enhancement done internally at Meta #1161
Update config.yml by @frank-wei in Update config.yml #1163
Use py3 next() syntax by @ptrblck in Use py3 next() syntax #1159
Add missing comma for proper torch versioning in setup.py by @dabauxi in Add missing comma for proper torch versioning in setup.py #1164
[docs] Update link to relative path by @zhiqwang in [docs] Update link to relative path #1171
[FX] Changes done internally at Facebook by @frank-wei in [FX] Changes done internally at Facebook #1172
fix: fix the model name typo error by @bowang007 in fix: fix the model name typo error #1176
[FX] Changes done internally at Facebook by @frank-wei in [FX] Changes done internally at Facebook #1178
[feat]: support slice with dynamic shape by @inocsin in [feat]: support slice with dynamic shape #1110
[FX] Update getting_started_with_fx_path.rst by @frank-wei in [FX] Update getting_started_with_fx_path.rst #1184
[FX] Update README.md by @frank-wei in [FX] Update README.md #1183
fix: Fix PTQ calibration when there are multiple inputs by @peri044 in fix: Fix PTQ calibration when there are multiple inputs #1191
[FX] Changes done internally at Facebook by @frank-wei in [FX] Changes done internally at Facebook #1194
[fix]: fix bug in aten::to, when network only have aten::to layer wil… by @inocsin in [fix]: fix bug in aten::to, when network only have aten::to layer wil… #1108
Add .circleci/config.yml by @narendasan in Add .circleci/config.yml #1153
feat: Upgrade TRT to 8.4 by @peri044 in feat: Upgrade TRT to 8.4 #1152
feat: Update Pytorch version to 1.12 by @peri044 in feat: Update Pytorch version to 1.12 #1177
fix: converter renaming already named tensors by @bowang007 in fix: converter renaming already named tensors #1167
feat(//py): Use TensorRT to fill in .so libraries automatically if possible by @narendasan in feat(//py): Use TensorRT to fill in .so libraries automatically if possible #1085
[FX] Changes done internally at Facebook by @frank-wei in [FX] Changes done internally at Facebook #1204
fix: fix the parsing related model loading bug by @bowang007 in fix: fix the parsing related model loading bug #1148
feat: support min_block_size != 1 caused fallback nodes re-segmentation by @bowang007 in feat: support min_block_size != 1 caused fallback nodes re-segmentation #1195
[FX] Changes done internally at Facebook by @frank-wei in [FX] Changes done internally at Facebook #1208
fix: fix the fallback related issue after merging collection by @bowang007 in fix: fix the fallback related issue after merging collection #1206
Add CMake support to build the libraries by @gcuendet in Add CMake support to build the libraries #1058
Fix typo in EfficientNet-example by @davinnovation in Fix typo in EfficientNet-example #1217
fix: fix bug that ListConstruct in TRT subgraph when it's entire graph's output by @bowang007 in fix: fix bug that ListConstruct in TRT subgraph when it's entire graph's output #1220
fix: fix the error that collection input segmented into trt subgraph by @bowang007 in fix: fix the error that collection input segmented into trt subgraph #1225
feat(//circleci): Adding release automation by @narendasan in feat(//circleci): Adding release automation #1215
fix: support int tensor * int scaler in aten::mul by @mfeliz-cruise in fix: support int tensor * int scaler in aten::mul #1095
[FX] Changes done internally at Facebook by @frank-wei in [FX] Changes done internally at Facebook #1221
Fix errors in unbind and list slice by @mfeliz-cruise in Fix errors in unbind and list slice #1088
Adding a Resnet C++ example by @vinhngx in Adding a Resnet C++ example #1175
[FX] disable 2 of conv3d and type_as tests by @frank-wei in [FX] disable 2 of conv3d and type_as tests #1224
[feat] Add support for integers in aten::abs converter (Support TensorRT Post Training Quantization #35) by @mfeliz-cruise in [feat] Add support for integers in aten::abs converter (#35) #1232
Update PTQ example to fix new compile_spec requirements by @ncomly-nvidia in Update PTQ example to fix new compile_spec requirements #1242
feat: support for grouped inputs by @narendasan in feat: support for grouped inputs #1201
feat: Added support for custom torch operators and converters in torchtrtc by @andi4191 in feat: Added support for custom torch operators and converters in torchtrtc #1219
Add outputPadding in deconv by @ruoqianguo in Add outputPadding in deconv #1234
chore: Apply linting and ignore new bazel dirs by @narendasan in chore: Apply linting and ignore new bazel dirs #1223
added qat-ptq workflow notebook by @tanayvarshney in added qat-ptq workflow notebook #1239
fix: Update cmake for the new collection files by @narendasan in fix: Update cmake for the new collection files #1246
chore: ignore dist dir for pre-commit by @narendasan in chore: ignore dist dir for pre-commit #1249
chore: Aligning bazel version for consistency across different docker… by @andi4191 in chore: Aligning bazel version for consistency across different docker… #1250
refactor: Changed the hardcoded values to macros for DLA memory sizes by @andi4191 in refactor: Changed the hardcoded values to macros for DLA memory sizes #1247
chore: update jetson pytorch baase by @narendasan in chore: update jetson pytorch baase #1251
[feat] Add automatic type promotion to element-wise ops by @mfeliz-cruise in [feat] Add automatic type promotion to element-wise ops #1240
Assorted small fixes by @narendasan in Assorted small fixes #1259
[FX] remove op_lowering_disallow_list and format revert by @frank-wei in [FX] remove op_lowering_disallow_list and format revert #1261
fix: fix the "schema not found for node" error by @bowang007 in fix: fix the "schema not found for node" error #1236
chore: Fix contributing doc by @peri044 in chore: Fix contributing doc #1268
feat: support scatter.value and scatter.src by @inocsin in feat: support scatter.value and scatter.src #1252
Internal workspace workflow by @narendasan in Internal workspace workflow #1269
Fix typo in README by @davinnovation in Fix typo in README #1273
Support swin/bert with dynamic batch by @Njuapp in Support swin/bert with dynamic batch #1270
correct sha256sum of cudnn by @Njuapp in correct sha256sum of cudnn #1278
Jetson workspace by @narendasan in Jetson workspace #1280
chore(deps): bump @actions/core from 1.8.2 to 1.9.1 in /.github/actions/assigner by @dependabot in chore(deps): bump @actions/core from 1.8.2 to 1.9.1 in /.github/actions/assigner #1287
[FX] Changes done internally at Facebook by @frank-wei in [FX] Changes done internally at Facebook #1288
chore: Fix dataloader in finetune_qat script by @andi4191 in chore: Fix dataloader in finetune_qat script #1292
chore: Truncate long and double for ptq CPP path by @andi4191 in chore: Truncate long and double for ptq CPP path #1291
feat: Add support for aten::square by @mfeliz-cruise in feat: Add support for aten::square #1286
fix: fix misleading skipping partitioning msg by @bowang007 in fix: fix misleading skipping partitioning msg #1289
fix: Add int support to constant_pad_nd by @mfeliz-cruise in fix: Add int support to constant_pad_nd #1283
fix: Resolve non-determinism in registerSegmentsOutputs by @mfeliz-cruise in fix: Resolve non-determinism in registerSegmentsOutputs #1284
docs: Update docgen task by @narendasan in docs: Update docgen task #1294
update fx notebook by @frank-wei in update fx notebook #1297
[FX] Changes done internally at Facebook by @frank-wei in [FX] Changes done internally at Facebook #1299
fix(tools): Fix linter to not depend on docker by @narendasan in fix(tools): Fix linter to not depend on docker #1301
Support multiple indices for aten::index.Tensor by @ruoqianguo in Support multiple indices for aten::index.Tensor #1309
chore: Adding CMake to the CI by @narendasan in chore: Adding CMake to the CI #1310
feat: Upgrade Pytorch to 1.12.1 and TensorRT to 8.4.3.1 by @peri044 in feat: Upgrade Pytorch to 1.12.1 and TensorRT to 8.4.3.1 #1315
Fix bug: correct the output shape of aten::index.Tensor by @ruoqianguo in Fix bug: correct the output shape of aten::index.Tensor #1314
feat (//core/conversion) : Add converter for torch.repeat_interleave ( by @blchu in feat (//core/conversion) : Add converter for torch.repeat_interleave ( #1313
chore: Adding NGC build path by @narendasan in chore: Adding NGC build path #1311
Update lower.py by @frank-wei in Update lower.py #1324
fix!: Fixed Windows compilation failures by @andi4191 in fix!: Fixed Windows compilation failures #1330
[feat] Add support for argmax and argmin by @mfeliz-cruise in [feat] Add support for argmax and argmin #1312
chore: Adding a guideline to build on Windows platform by @andi4191 in chore: Adding a guideline to build on Windows platform #1337
chore: Fix data loader issues and nox file paths by @peri044 in chore: Fix data loader issues and nox file paths #1281
feat(//tools/perf): Refactor perf_run.py, add fx2trt backend support, usage via CLI arguments by @peri044 in feat(//tools/perf): Refactor perf_run.py, add fx2trt backend support, usage via CLI arguments #1254
refactor(//tests) : Refactor the test suite by @peri044 in refactor(//tests) : Refactor the test suite #1329
[feat] add support for aten::reciprocal(int) by @mfeliz-cruise in [feat] add support for aten::reciprocal(int) #1308
[FX] Update getting_started_with_fx_path.rst by @frank-wei in [FX] Update getting_started_with_fx_path.rst #1342
Update getting_started_with_fx_path.rst by @frank-wei in Update getting_started_with_fx_path.rst #1343
enable direct call to fx.compile() by @frank-wei in enable direct call to fx.compile() #1344
fix: add remove_exception pass from torch to fix uninitialized tensor… by @bowang007 in fix: add remove_exception pass from torch to fix uninitialized tensor… #1345
chore: apply linting to docs by @narendasan in chore: apply linting to docs #1347
docs: Adding v1.2.0 and v1.1.1 docs by @narendasan in docs: Adding v1.2.0 and v1.1.1 docs #1349
Docs for release by @narendasan in Docs for release #1350
fix: Fixing pybind error on nightly by @andi4191 in fix: Fixing pybind error on nightly #1285
Centralizing Partitioning State by @narendasan in Centralizing Partitioning State #1263
chore: Fix centralized partititoning by @peri044 in chore: Fix centralized partititoning #1367
chore: Move master to test nightly only by @narendasan in chore: Move master to test nightly only #1370
[fix] Avoid layer name conflicts in aten::index by @mfeliz-cruise in [fix] Avoid layer name conflicts in aten::index #1377
[fix] Fix output dimensions of aten::unbind converter by @mfeliz-cruise in [fix] Fix output dimensions of aten::unbind converter #1373
Einsum converter by @gs-olive in Einsum converter #1385
Atan2 converter by @gs-olive in Atan2 converter #1381
[FX] aten2trt and some pass fixes by @frank-wei in [FX] aten2trt and some pass fixes #1390
feat: Add converter for aten::sign unary op by @gs-olive in feat: Add converter for aten::sign unary op #1391
Add support for aten::squeeze without a dim by @mfeliz-cruise in Add support for aten::squeeze without a dim #1393
[fix] incorrect casting behavior in floor_divide by @mfeliz-cruise in [fix] incorrect casting behavior in floor_divide #1392
chore: minor fixes by @peri044 in chore: minor fixes #1397
fix: torch.std and torch.var support multi-dimensional reductions by @gs-olive in fix: torch.std and torch.var support multi-dimensional reductions #1395
fix: fix missing float type in shape analysis by @bowang007 in fix: fix missing float type in shape analysis #1399
feat: Rsqrt lowering pass by @gs-olive in feat: Rsqrt lowering pass #1394
Add correct pip install instructions by @msaroufim in Add correct pip install instructions #1400
fix: aten::split behavior with negative indexing by @gs-olive in fix: aten::split behavior with negative indexing #1403
fix: fix compilation stuck bug caused by elimination exception by @bowang007 in fix: fix compilation stuck bug caused by elimination exception #1409
[FX] Fix clamping float32 boundary values, aten2trt init check-in, fix slice issues by @frank-wei in [FX] Fix clamping float32 boundary values, aten2trt init check-in, fix slice issues #1415
[feat]Add converter for aten::where by @mfeliz-cruise in [feat]Add converter for aten::where #1421
[feat]Add converter support for aten::frobenius_norm by @mfeliz-cruise in [feat]Add converter support for aten::frobenius_norm #1422
chore: Update torch installation paths for NGC by @peri044 in chore: Update torch installation paths for NGC #1435
[feat] Add dependency awareness to torch-trt partitioning by @mfeliz-cruise in [feat] Add dependency awareness to torch-trt partitioning #1304
docs: minor changes in Resnet50 example by @przemb in docs: minor changes in Resnet50 example #1427
fix: Ensure proper type inheritance in aten::masked_fill by @gs-olive in fix: Ensure proper type inheritance in aten::masked_fill #1430
chore: Nox file update from NGC 22.11 release by @peri044 in chore: Nox file update from NGC 22.11 release #1438
fix: Add check to ensure einsum converter has no more than 2 tensor inputs by @gs-olive in fix: Add check to ensure einsum converter has no more than 2 tensor inputs #1439
[feat] Add partial converter support for aten::linalg_norm by @mfeliz-cruise in [feat] Add partial converter support for aten::linalg_norm #1426
chore: Lint noxfile.py by @gs-olive in chore: Lint noxfile.py #1443
fix: CUDA error 710 bugfix by @gs-olive in fix: CUDA error 710 bugfix #1424
scalar_to_tensor avoid scalar.to() by @Njuapp in scalar_to_tensor avoid scalar.to<float>() #1448
feat: rewriting param to a Constant if it's a introduced input by @bowang007 in feat: rewriting param to a Constant if it's a introduced input #1298
feat: support int64 <=> int32 auto conversion by @bowang007 in feat: support int64 <=> int32 auto conversion #1407
fix: Device casting issues with certain aten operators by @gs-olive in fix: Device casting issues with certain aten operators #1416
feat(//core/partitioning) : Dynamic shapes + fallback by @peri044 in feat(//core/partitioning) : Dynamic shapes + fallback #1414
[fix] unmangle_cls_name for variable length mangled tags by @mfeliz-cruise in [fix] unmangle_cls_name for variable length mangled tags #1454
fix: Error with aten::div when using truncation with Int32 tensor inputs by @gs-olive in fix: Error with aten::div when using truncation with Int32 tensor inputs #1442
fix: fix failed test cases caused by partition API changes by @bowang007 in fix: fix failed test cases caused by partition API changes #1460
fix: Update floor division schema replacement in lowering by @gs-olive in fix: Update floor division schema replacement in lowering #1464
feat: Add functionality to performance tooling by @gs-olive in feat: Add functionality to performance tooling #1451
Unifying the FX and TS Frontends by @narendasan in Unifying the FX and TS Frontends #1404

New Contributors

@facebook-github-bot made their first contribution in Adding Code of Conduct file #1061
@frank-wei made their first contribution in Update CONTRIBUTING.md #1064
@khabinov made their first contribution in [fx_acc] Add acc_tracer support for torch.mm #1120
@blchu made their first contribution in feat (//core/conversion) : Add converter for torch.bitwise_not #1029
@yinghai made their first contribution in [FX] Sync enhancement done internally at Meta #1161
@ptrblck made their first contribution in Use py3 next() syntax #1159
@dabauxi made their first contribution in Add missing comma for proper torch versioning in setup.py #1164
@zhiqwang made their first contribution in [docs] Update link to relative path #1171
@gcuendet made their first contribution in Add CMake support to build the libraries #1058
@davinnovation made their first contribution in Fix typo in EfficientNet-example #1217
@dependabot made their first contribution in chore(deps): bump @actions/core from 1.8.2 to 1.9.1 in /.github/actions/assigner #1287
@msaroufim made their first contribution in Add correct pip install instructions #1400
@przemb made their first contribution in docs: minor changes in Resnet50 example #1427

Full Changelog: v1.1.0...v1.3.0

This discussion was created from the release Torch-TensorRT v1.3.0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Torch-TensorRT v1.3.0 #1505

{{title}}

Replies: 0 comments

Select a reply

Torch-TensorRT v1.3.0 #1505

narendasan Dec 1, 2022 Collaborator

PyTorch 1.13, CUDA 11.7, TensorRT 8.5, Support for Dynamic Batch for Partially Compiled Modules, Engine Profiling, Experimental Unified Runtime for FX and TorchScript Frontends

Dynamic Batch Sizes for Partially Compiled Modules via the TorchScript Frontend

Engine Profiling [Experimental]

Unified Runtime for FX and TorchScript Frontends [Experimental]

Basic Usage

TRTModuleNext

Examples

Using TRTModuleNext as an arbirary TensorRT engine holder

What's Changed

New Contributors

Replies: 0 comments

narendasan
Dec 1, 2022
Collaborator