Questions about the two execution-policy of TensorRT backend, DEVICE_BLOCKING mode and BLOCKING mode #7831
Labels
module: backends
Issues related to the backends
performance
A possible performance tune-up
question
Further information is requested
TensorRT
I notice that TensorRT Backend has two execution-policy “DEVICE_BLOCKING mode and BLOCKING mode”.
Here are two questions that need your help.
Q1:
I tested the performance by using the perf_analyzer tool and the test result was as follows:
DEVICE_BLOCKING
throughput: 935 inf/s, server latency: 24 ms
BLOCKING
throughput: 430 inf/s, server latency: 42 ms
DEVICE_BLOCKING mode performance is better than BLOCKING mode. I wonder in what situations it is appropriate to use BLOCKING mode.
Q2:
Can two different models be merged into one TensorRT thread when executing two different models simultaneously in DEVICE_BLOCKING mode?
For Example:
model-repository
resnext101_sparse_int8
resnext101_sparse_2_int8
TritionServer
./tritonserver --model-repository=../docs/examples/model_repository/ --grpc-port=8011 --http-port=8010 --metrics-port=8012 --pinned-memory-pool-byte-size=6442450944 --cuda-memory-pool-byte-size 0:2147483648 --disable-auto-complete-config --backend-config=tensorrt,execution-policy=DEVICE_BLOCKING
Client(concurrency execute)
./perf_analyzer -m resnext101_sparse_2_int8 --service-kind=triton -i http -u localhost:8010 -b 1 --concurrency-range 1
./perf_analyzer -m resnext101_sparse_int8 --service-kind=triton -i http -u localhost:8010 -b 1 --concurrency-range
Observe execution results, there are two tensorRT thread
![image](https://private-user-images.githubusercontent.com/179904892/389402069-cd19b9a6-d4c6-4998-9b3d-5a4fe2aa6ced.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk4NTAwNDAsIm5iZiI6MTczOTg0OTc0MCwicGF0aCI6Ii8xNzk5MDQ4OTIvMzg5NDAyMDY5LWNkMTliOWE2LWQ0YzYtNDk5OC05YjNkLTVhNGZlMmFhNmNlZC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjE4JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIxOFQwMzM1NDBaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1jODM5MWU5ODM1NWZkOTcyYzlhZDk3MGIyNjEzZWMxODRlMzhkNDkxYjQzNDAxMDhjOGZkZWEyMmJiM2UxOTJmJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.n8CfInlxcuut6Jwb8yAISaEP3CrPcCjrg4IvHidyj8A)
Information
Hardware NVIDIA Jetson AGX Orin Jetpack 6.1
Triton server version:2.51.0
The text was updated successfully, but these errors were encountered: