-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Triton server receives Signal (11) when tracing is enabled with no sampling (or a small sampling rate) #7795
Comments
Hi @nicomeg-pr, thanks for raising this.
|
Here is the complete warning message, sorry it was truncated :
All the warnings are the same. |
Hi @nicomeg-pr , could you please try increasing More docs for Batch exporter can be found here: https://github.com/triton-inference-server/server/blob/main/docs/user_guide/trace.md#opentelemetry-trace-apis-settings |
Hello, To answer you questions : For GRPC I use triton grpc client :
For HTTP I use simple http request.
|
@oandreeva-nv I tested with |
Description
When starting Triton Server with tracing and with a generic model (e.g.,
identity_model_fp32
from the Python backend example), the server crashes with signal 11 after handling a few thousand requests at a relatively high QPS (> 100).The issue appears to be primarily influenced by the QPS rather than the total number of requests sent to the server—the higher the QPS, the sooner the signal 11 crash occurs.
I get the following error message :
I receive a lot of warning before signal (11):
[Warning] File: /tmp/tritonbuild/tritonserver/build/_deps/repo-third-party-build/opentelemetry-cpp/src/opentelemetry-cpp/sdk/src/trace/b
I tested with several backends and models :
torchscript
,python
,onnx
and observed the same behavior across all of them (onT4
andA100
gpus).The issue appears to be related to the
--trace-config
sampling rate parameter. When the rate is set to 100 or higher, everything works fine. However, when it's set between 1 and 100, the server receives Signal (11) and restarts.Triton Information
I use triton version :
24.09
I used standard container :
nvcr.io/nvidia/tritonserver:24.09-py3
To Reproduce
Use a sample model from the repo e.g: identity_fp32
Deploy to with the following helm chart deployment :
Expected behavior
After few thousand requests at a high QPS, server should receive a signal (11) and restart.
The text was updated successfully, but these errors were encountered: