Performance Benchmarking, Testing, and Evaluation #1556

gs-olive · 2022-12-16T05:09:04Z

gs-olive
Dec 16, 2022
Collaborator

A Framework for Performance Benchmarking

Updated Version of RFC #1169

TL;DR

An updated view on performance benchmarking, model functionality testing, and overall evaluation of performance of Torch-TRT across models, compilation models, and inputs.

Goal(s)

The primary goal of this document and discussion is to outline the various upgrades and advances that have been made since RFC #1169 and provide suggestions and ideas on next steps regarding improving model coverage, inference performance, and FX/TS functionality.

Updates Since #1169

CLI was added to the standalone performance benchmarking tool (7779b50)
Functionality was added to time compilation of models in addition to inference (e7bb8c2)
FX benchmarking support has been added and the CLI has been improved to handle PyTorch-saved models in addition to TorchScript-saved models (2b130ec, 360f6c4)
Torch-TRT 1.2.0 has been benchmarked across the TorchBench suite of models (though not integrated into the framework fully yet)
Tooling + functionality has been added to the standalone benchmarking tool for better error detection, user prompts, and documentation (f8285ba, f1bf283)
Baseline support for benchmarking models with Torch Dynamo (dc570e4)

TorchBench

Integration with the existing PyTorch benchmarking framework has many positives, as well as a few drawbacks. The main positives are ease-of-maintenance, a wide net of models with streamlined installation procedures, and a polished CLI. A few drawbacks include the need to test custom configurations of our compilation, include dynamic batch and varying input batch size, segment-length (for language models), among other customizations.

Still, integration with this tool could potentially come in many forms. For example, one option is to author a PR to add functionality to test Torch-TRT with the TorchBench tool for CLI functionality and models, but also have a small-form single-model performance script in the Torch-TRT repository for testing more granular performance configurations. It is worth noting that there is already some existing functionality built in to the TorchBench suite for Torch-TRT.

Usecases

Assessing performance and functionality across many models, for example, those provided in Torch's benchmarking suite: https://github.com/pytorch/benchmark/tree/main/torchbenchmark/models

Defining evaluation stages of models in Torch-TRT:

1. Unsuccessful Compilation
2. Successful Compilation, Unsuccessful Inference
3. Successful Compilation, Successful Inference
4. Inference faster than PyTorch
5. Inference faster than PyTorch --> ONNX --> TensorRT

Proposed APIs / UX

Bash scripts for evaluating Torch-TRT across all models in the Torch benchmarking suite, or some user-specified subset, with a data-aggregation mechanism to collect and score models automatically during the run. Furthermore, the bash scripts should handle versioning issues, and verify that the installations of each dependency are compatible to avoid crashes.

Functionality added to TorchBench to include benchmarking of Torch-TRT models across both TorchScript and FX.

Limitations

The benchmarking additions, as scoped, will not include functionality for benchmarking a custom model of the user's choosing, but will instead focus on a set of key popular models to determine overall performance and coverage over a large class of model types. The TorchBench repository includes documentation on how to incorporate new models for benchmarking.

Internal Implementation

Design

New Python scripts needed which interface with TorchTRT, Torch, TensorRT, and the Torch benchmark models. For each model, batch size, and input shape, the script will compile the model using a set of desired compilation methods:

fx2trt [+ Dynamo]
TensorRT
PyTorch
Torch-TRT

Functionality for the above methods of compilation already exist, except for Dynamo, and would just need to be refactored for full functionality:

TensorRT/tools/perf/perf_run.py

Lines 309 to 328 in 2ef6c3a

    
           if backend == "all": 
        
               run_torch(model, input_tensors, params, precision, batch_size) 
        
               run_torch_tensorrt( 
        
                   model, 
        
                   input_tensors, 
        
                   params, 
        
                   precision, 
        
                   truncate_long_and_double, 
        
                   batch_size, 
        
               ) 
        
               run_tensorrt( 
        
                   model, 
        
                   input_tensors, 
        
                   params, 
        
                   precision, 
        
                   truncate_long_and_double, 
        
                   is_trt_engine, 
        
                   batch_size, 
        
               ) 
        
               run_fx2trt(model_torch, input_tensors, params, precision, batch_size)

Then, the script should aggregate statistics about the model run, including which of the evaluation scores is achieved by Torch-TRT, and coalesce these in an easy-to-use data structure such as a Pandas DataFrame.

Implementation Phases

Prototype - S

Custom script in Python, including benchmarking for Dynamo + Dynamic Batch
Bash script to benchmark models and coalesce results, scoring Torch-TRT performance by the proposed scale

MVP `(1.5.0)` - M

PR for TorchBench to add functionality for benchmarking Torch-TRT
Dependency compatibility detection in Torch-TRT local bash scripts

Extension Phase 1 - S

Scripts to make it easy for users to benchmark custom models

peri044 · 2023-01-13T23:08:48Z

peri044
Jan 13, 2023
Collaborator

Dynamo support (limited) is now merged into master. https://github.com/pytorch/TensorRT/blob/main/tools/perf/perf_run.py#L174
Bash script to benchmark models and coalesce results, scoring Torch-TRT performance by the proposed scale

Is it similar to our benchmark.sh ? Or is this a script that will be contributed to Torchbench repo ?

PR for TorchBench to add functionality for benchmarking Torch-TRT
Beyond the existing functionality of Torch-TRT in torchbench, what additions would this include ?

1 reply

gs-olive Jan 14, 2023
Collaborator Author

Thank you for the review - to address the points:

I have added a note of this to the Updates section in the RFC above
I was thinking this could be an extension of benchmark.sh, not a script contributed to Torchbench. Specifically, my intent was for the script to allow for benchmarking of more granular, Torch-TRT-specific aspects of modeling, including Dynamic vs Static batch, enabling/disabling features like require_full_compilation or truncate_long_and_double, etc. Additionally, I wanted to incorporate the evaluation scale proposed in the Usecases section above, and ultimately return to users a Pandas DataFrame or similar data matrix which contains all of the specifications of a particular model/compilation instance, along with Torch-TRT's performance on that instance.
Beyond the existing TorchBench capabilities, I believe some modifications to the existing Torch-TRT Backend code, found here, would be beneficial. Specifically, in the TorchScript path, we may consider enabling truncate_long_and_double=True and running torch.jit.trace(...) on the input model, prior to compilation. Also, we can now enable dynamic batch in the TorchScript path code. Additionally, it might be helpful to look into the delineation of fx2trt and fx2trt via Dynamo, which should be made clear as certain models can compile with the latter but not the former.

I don't think that many substantial changes are needed to TorchBench, but I think it would be helpful to ensure that the compilation methods are being compared reasonably. Please let me know your thoughts on this!

ncomly-nvidia · 2023-01-26T00:10:08Z

ncomly-nvidia
Jan 26, 2023

Where are we with full fx + dynamo testing in CI? Should pass 1:1 w/ TS

1 reply

gs-olive Jan 31, 2023
Collaborator Author

Currently, there are three tasks to evaluate and improve TS/FX testing parity: (#1632, #1633, #1634). There are known gaps in Python API testing, Model Compatibility testing, and Quantization testing between the TS and FX paths, which are being addressed and tracked via these issues.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Benchmarking, Testing, and Evaluation #1556

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Performance Benchmarking, Testing, and Evaluation #1556

gs-olive Dec 16, 2022 Collaborator

A Framework for Performance Benchmarking

TL;DR

Goal(s)

Updates Since #1169

TorchBench

Usecases

Proposed APIs / UX

Limitations

Internal Implementation

Design

Implementation Phases

Prototype - S

MVP (1.5.0) - M

Extension Phase 1 - S

Replies: 2 comments · 2 replies

peri044 Jan 13, 2023 Collaborator

gs-olive Jan 14, 2023 Collaborator Author

ncomly-nvidia Jan 26, 2023

gs-olive Jan 31, 2023 Collaborator Author

gs-olive
Dec 16, 2022
Collaborator

MVP `(1.5.0)` - M

Replies: 2 comments 2 replies

peri044
Jan 13, 2023
Collaborator

gs-olive Jan 14, 2023
Collaborator Author

ncomly-nvidia
Jan 26, 2023

gs-olive Jan 31, 2023
Collaborator Author