Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about the two execution-policy of TensorRT backend, DEVICE_BLOCKING mode and BLOCKING mode #7831

Open
Will-Chou-5722 opened this issue Nov 25, 2024 · 1 comment
Labels
module: backends Issues related to the backends performance A possible performance tune-up question Further information is requested TensorRT

Comments

@Will-Chou-5722
Copy link

Will-Chou-5722 commented Nov 25, 2024

I notice that TensorRT Backend has two execution-policy “DEVICE_BLOCKING mode and BLOCKING mode”.
Here are two questions that need your help.

Q1:
I tested the performance by using the perf_analyzer tool and the test result was as follows:

  • DEVICE_BLOCKING

    throughput: 935 inf/s, server latency: 24 ms

  • BLOCKING

    throughput: 430 inf/s, server latency: 42 ms

DEVICE_BLOCKING mode performance is better than BLOCKING mode. I wonder in what situations it is appropriate to use BLOCKING mode.

Q2:
Can two different models be merged into one TensorRT thread when executing two different models simultaneously in DEVICE_BLOCKING mode?

For Example:

  • model-repository
    resnext101_sparse_int8
    resnext101_sparse_2_int8

  • TritionServer
    ./tritonserver --model-repository=../docs/examples/model_repository/ --grpc-port=8011 --http-port=8010 --metrics-port=8012 --pinned-memory-pool-byte-size=6442450944 --cuda-memory-pool-byte-size 0:2147483648 --disable-auto-complete-config --backend-config=tensorrt,execution-policy=DEVICE_BLOCKING

  • Client(concurrency execute)
    ./perf_analyzer -m resnext101_sparse_2_int8 --service-kind=triton -i http -u localhost:8010 -b 1 --concurrency-range 1
    ./perf_analyzer -m resnext101_sparse_int8 --service-kind=triton -i http -u localhost:8010 -b 1 --concurrency-range

  • Observe execution results, there are two tensorRT thread
    image

Information
Hardware NVIDIA Jetson AGX Orin Jetpack 6.1
Triton server version:2.51.0

@rmccorm4 rmccorm4 added performance A possible performance tune-up module: backends Issues related to the backends TensorRT labels Jan 22, 2025
@rmccorm4
Copy link
Contributor

CC @tanmayv25

@rmccorm4 rmccorm4 added the question Further information is requested label Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: backends Issues related to the backends performance A possible performance tune-up question Further information is requested TensorRT
Development

No branches or pull requests

2 participants