Torch-TensorRT

In-framework compilation of PyTorch inference code for NVIDIA GPUs

Torch-TensorRT is a inference compiler for PyTorch, targeting NVIDIA GPUs via NVIDIA's TensorRT Deep Learning Optimizer and Runtime. It supports both just-in-time (JIT) compilation workflows via the torch.compile interface as well as ahead-of-time (AOT) workflows. Torch-TensorRT integrates seamlessly into the PyTorch ecosystem supporting hybrid execution of optimized TensorRT code with standard PyTorch code.

More Information / System Architecture:

Torch-TensorRT 2.0

Getting Started

:ref:`installation`

.. toctree::
   :caption: Getting Started
   :maxdepth: 1
   :hidden:

   getting_started/installation
   getting_started/jetpack
   getting_started/quick_start

User Guide

:ref:`torch_tensorrt_explained`
:ref:`dynamic_shapes`
:ref:`ptq`
:ref:`saving_models`
:ref:`runtime`
:ref:`using_dla`
:ref:`mixed_precision`

.. toctree::
   :caption: User Guide
   :maxdepth: 1
   :hidden:

   user_guide/torch_tensorrt_explained
   user_guide/dynamic_shapes
   user_guide/saving_models
   user_guide/runtime
   user_guide/using_dla
   user_guide/mixed_precision

Tutorials

:ref:`torch_compile_advanced_usage`
:ref:`vgg16_ptq`
:ref:`engine_caching_example`
:ref:`engine_caching_bert_example`
:ref:`refit_engine_example`
:ref:`serving_torch_tensorrt_with_triton`
:ref:`torch_export_cudagraphs`
:ref:`converter_overloading`
:ref:`custom_kernel_plugins`
:ref:`mutable_torchtrt_module_example`
:ref:`weight_streaming_example`
:ref:`pre_allocated_output_example`
:ref:`tensor_parallel_llama3`

.. toctree::
   :caption: Tutorials
   :maxdepth: 1
   :hidden:

   tutorials/_rendered_examples/dynamo/torch_compile_advanced_usage
   tutorials/_rendered_examples/dynamo/vgg16_ptq
   tutorials/_rendered_examples/dynamo/engine_caching_example
   tutorials/_rendered_examples/dynamo/engine_caching_bert_example
   tutorials/_rendered_examples/dynamo/refit_engine_example
   tutorials/serving_torch_tensorrt_with_triton
   tutorials/_rendered_examples/dynamo/torch_export_cudagraphs
   tutorials/_rendered_examples/dynamo/converter_overloading
   tutorials/_rendered_examples/dynamo/custom_kernel_plugins
   tutorials/_rendered_examples/dynamo/auto_generate_converters
   tutorials/_rendered_examples/dynamo/mutable_torchtrt_module_example
   tutorials/_rendered_examples/dynamo/weight_streaming_example
   tutorials/_rendered_examples/dynamo/pre_allocated_output_example
   tutorials/_rendered_examples/distributed_inference/tensor_parallel_llama3

Dynamo Frontend

:ref:`torch_compile`
:ref:`dynamo_export`

.. toctree::
   :caption: Dynamo Frontend
   :maxdepth: 1
   :hidden:

   dynamo/torch_compile
   dynamo/dynamo_export

TorchScript Frontend

:ref:`creating_a_ts_mod`
:ref:`getting_started_with_python_api`
:ref:`getting_started_cpp`
:ref:`use_from_pytorch`

.. toctree::
   :caption: TorchScript Frontend
   :maxdepth: 1
   :hidden:

   ts/creating_torchscript_module_in_python
   ts/getting_started_with_python_api
   ts/getting_started_with_cpp_api
   ts/use_from_pytorch
   ts/ptq

FX Frontend

:ref:`getting_started_with_fx`

.. toctree::
   :caption: FX Frontend
   :maxdepth: 1
   :hidden:

   fx/getting_started_with_fx_path

Model Zoo

:ref:`torch_compile_resnet`
:ref:`torch_compile_transformer`
:ref:`torch_compile_stable_diffusion`
:ref:`torch_compile_gpt2`
:ref:`torch_export_gpt2`
:ref:`torch_export_llama2`
:ref:`torch_export_sam2`
:ref:`notebooks`

.. toctree::
   :caption: Model Zoo
   :maxdepth: 3
   :hidden:

   tutorials/_rendered_examples/dynamo/torch_compile_resnet_example
   tutorials/_rendered_examples/dynamo/torch_compile_transformers_example
   tutorials/_rendered_examples/dynamo/torch_compile_stable_diffusion
   tutorials/_rendered_examples/distributed_inference/data_parallel_gpt2
   tutorials/_rendered_examples/distributed_inference/data_parallel_stable_diffusion
   tutorials/_rendered_examples/dynamo/torch_compile_gpt2
   tutorials/_rendered_examples/dynamo/torch_export_gpt2
   tutorials/_rendered_examples/dynamo/torch_export_llama2
   tutorials/_rendered_examples/dynamo/torch_export_sam2
   tutorials/notebooks

Python API Documentation

:ref:`torch_tensorrt_py`
:ref:`torch_tensorrt_dynamo_py`
:ref:`torch_tensorrt_logging_py`
:ref:`torch_tensorrt_fx_py`
:ref:`torch_tensorrt_ts_py`
:ref:`torch_tensorrt_ptq_py`

.. toctree::
   :caption: Python API Documentation
   :maxdepth: 0
   :hidden:

   py_api/torch_tensorrt
   py_api/dynamo
   py_api/logging
   py_api/fx
   py_api/ts
   py_api/ptq

C++ API Documentation

:ref:`namespace_torch_tensorrt`
:ref:`namespace_torch_tensorrt__logging`
:ref:`namespace_torch_tensorrt__ptq`
:ref:`namespace_torch_tensorrt__torchscript`

.. toctree::
   :caption: C++ API Documentation
   :maxdepth: 1
   :hidden:

   _cpp_api/torch_tensort_cpp
   _cpp_api/namespace_torch_tensorrt
   _cpp_api/namespace_torch_tensorrt__logging
   _cpp_api/namespace_torch_tensorrt__torchscript
   _cpp_api/namespace_torch_tensorrt__ptq

CLI Documentation

:ref:`torchtrtc`

.. toctree::
   :caption: CLI Documentation
   :maxdepth: 0
   :hidden:

   cli/torchtrtc

Contributor Documentation

:ref:`system_overview`
:ref:`dynamo_converters`
:ref:`writing_dynamo_aten_lowering_passes`
:ref:`ts_converters`
:ref:`useful_links`

.. toctree::
   :caption: Contributor Documentation
   :maxdepth: 1
   :hidden:

   contributors/system_overview
   contributors/dynamo_converters
   contributors/writing_dynamo_aten_lowering_passes
   contributors/ts_converters
   contributors/useful_links

Indices

:ref:`supported_ops`
:ref:`genindex`
:ref:`search`

.. toctree::
   :caption: Indices
   :maxdepth: 1
   :hidden:

   indices/supported_ops

Legacy Further Information (TorchScript)

Introductory Blog Post
GTC 2020 Talk
GTC 2020 Fall Talk
GTC 2021 Talk
GTC 2021 Fall Talk
PyTorch Ecosystem Day 2021
PyTorch Developer Conference 2021
PyTorch Developer Conference 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index.rst

index.rst

Torch-TensorRT

In-framework compilation of PyTorch inference code for NVIDIA GPUs

Getting Started

User Guide

Tutorials

Dynamo Frontend

TorchScript Frontend

FX Frontend

Model Zoo

Python API Documentation

C++ API Documentation

CLI Documentation

Contributor Documentation

Indices

Legacy Further Information (TorchScript)

Files

index.rst

Latest commit

History

index.rst

File metadata and controls

Torch-TensorRT

In-framework compilation of PyTorch inference code for NVIDIA GPUs

Getting Started

User Guide

Tutorials

Dynamo Frontend

TorchScript Frontend

FX Frontend

Model Zoo

Python API Documentation

C++ API Documentation

CLI Documentation

Contributor Documentation

Indices

Legacy Further Information (TorchScript)