Torch-TensorRT is a inference compiler for PyTorch, targeting NVIDIA GPUs via NVIDIA's TensorRT Deep Learning Optimizer and Runtime.
It supports both just-in-time (JIT) compilation workflows via the torch.compile
interface as well as ahead-of-time (AOT) workflows.
Torch-TensorRT integrates seamlessly into the PyTorch ecosystem supporting hybrid execution of optimized TensorRT code with standard PyTorch code.
More Information / System Architecture:
.. toctree:: :caption: Getting Started :maxdepth: 1 :hidden: getting_started/installation getting_started/jetpack getting_started/quick_start
- :ref:`torch_tensorrt_explained`
- :ref:`dynamic_shapes`
- :ref:`ptq`
- :ref:`saving_models`
- :ref:`runtime`
- :ref:`using_dla`
- :ref:`mixed_precision`
.. toctree:: :caption: User Guide :maxdepth: 1 :hidden: user_guide/torch_tensorrt_explained user_guide/dynamic_shapes user_guide/saving_models user_guide/runtime user_guide/using_dla user_guide/mixed_precision
- :ref:`torch_compile_advanced_usage`
- :ref:`vgg16_ptq`
- :ref:`engine_caching_example`
- :ref:`engine_caching_bert_example`
- :ref:`refit_engine_example`
- :ref:`serving_torch_tensorrt_with_triton`
- :ref:`torch_export_cudagraphs`
- :ref:`converter_overloading`
- :ref:`custom_kernel_plugins`
- :ref:`mutable_torchtrt_module_example`
- :ref:`weight_streaming_example`
- :ref:`pre_allocated_output_example`
- :ref:`tensor_parallel_llama3`
.. toctree:: :caption: Tutorials :maxdepth: 1 :hidden: tutorials/_rendered_examples/dynamo/torch_compile_advanced_usage tutorials/_rendered_examples/dynamo/vgg16_ptq tutorials/_rendered_examples/dynamo/engine_caching_example tutorials/_rendered_examples/dynamo/engine_caching_bert_example tutorials/_rendered_examples/dynamo/refit_engine_example tutorials/serving_torch_tensorrt_with_triton tutorials/_rendered_examples/dynamo/torch_export_cudagraphs tutorials/_rendered_examples/dynamo/converter_overloading tutorials/_rendered_examples/dynamo/custom_kernel_plugins tutorials/_rendered_examples/dynamo/auto_generate_converters tutorials/_rendered_examples/dynamo/mutable_torchtrt_module_example tutorials/_rendered_examples/dynamo/weight_streaming_example tutorials/_rendered_examples/dynamo/pre_allocated_output_example tutorials/_rendered_examples/distributed_inference/tensor_parallel_llama3
.. toctree:: :caption: Dynamo Frontend :maxdepth: 1 :hidden: dynamo/torch_compile dynamo/dynamo_export
- :ref:`creating_a_ts_mod`
- :ref:`getting_started_with_python_api`
- :ref:`getting_started_cpp`
- :ref:`use_from_pytorch`
.. toctree:: :caption: TorchScript Frontend :maxdepth: 1 :hidden: ts/creating_torchscript_module_in_python ts/getting_started_with_python_api ts/getting_started_with_cpp_api ts/use_from_pytorch ts/ptq
.. toctree:: :caption: FX Frontend :maxdepth: 1 :hidden: fx/getting_started_with_fx_path
- :ref:`torch_compile_resnet`
- :ref:`torch_compile_transformer`
- :ref:`torch_compile_stable_diffusion`
- :ref:`torch_compile_gpt2`
- :ref:`torch_export_gpt2`
- :ref:`torch_export_llama2`
- :ref:`torch_export_sam2`
- :ref:`notebooks`
.. toctree:: :caption: Model Zoo :maxdepth: 3 :hidden: tutorials/_rendered_examples/dynamo/torch_compile_resnet_example tutorials/_rendered_examples/dynamo/torch_compile_transformers_example tutorials/_rendered_examples/dynamo/torch_compile_stable_diffusion tutorials/_rendered_examples/distributed_inference/data_parallel_gpt2 tutorials/_rendered_examples/distributed_inference/data_parallel_stable_diffusion tutorials/_rendered_examples/dynamo/torch_compile_gpt2 tutorials/_rendered_examples/dynamo/torch_export_gpt2 tutorials/_rendered_examples/dynamo/torch_export_llama2 tutorials/_rendered_examples/dynamo/torch_export_sam2 tutorials/notebooks
- :ref:`torch_tensorrt_py`
- :ref:`torch_tensorrt_dynamo_py`
- :ref:`torch_tensorrt_logging_py`
- :ref:`torch_tensorrt_fx_py`
- :ref:`torch_tensorrt_ts_py`
- :ref:`torch_tensorrt_ptq_py`
.. toctree:: :caption: Python API Documentation :maxdepth: 0 :hidden: py_api/torch_tensorrt py_api/dynamo py_api/logging py_api/fx py_api/ts py_api/ptq
- :ref:`namespace_torch_tensorrt`
- :ref:`namespace_torch_tensorrt__logging`
- :ref:`namespace_torch_tensorrt__ptq`
- :ref:`namespace_torch_tensorrt__torchscript`
.. toctree:: :caption: C++ API Documentation :maxdepth: 1 :hidden: _cpp_api/torch_tensort_cpp _cpp_api/namespace_torch_tensorrt _cpp_api/namespace_torch_tensorrt__logging _cpp_api/namespace_torch_tensorrt__torchscript _cpp_api/namespace_torch_tensorrt__ptq
.. toctree:: :caption: CLI Documentation :maxdepth: 0 :hidden: cli/torchtrtc
- :ref:`system_overview`
- :ref:`dynamo_converters`
- :ref:`writing_dynamo_aten_lowering_passes`
- :ref:`ts_converters`
- :ref:`useful_links`
.. toctree:: :caption: Contributor Documentation :maxdepth: 1 :hidden: contributors/system_overview contributors/dynamo_converters contributors/writing_dynamo_aten_lowering_passes contributors/ts_converters contributors/useful_links
.. toctree:: :caption: Indices :maxdepth: 1 :hidden: indices/supported_ops