Triton server receives Signal (11) when tracing is enabled with no sampling (or a small sampling rate) #7795

nicomeg-pr · 2024-11-14T11:45:19Z

Description

When starting Triton Server with tracing and with a generic model (e.g., identity_model_fp32 from the Python backend example), the server crashes with signal 11 after handling a few thousand requests at a relatively high QPS (> 100).

The issue appears to be primarily influenced by the QPS rather than the total number of requests sent to the server—the higher the QPS, the sooner the signal 11 crash occurs.

I get the following error message :

Signal (11) received.
14# 0x00007F78C0DED850 in /lib/x86_64-linux-gnu/libc.so.6
13# 0x00007F78C0D5BAC3 in /lib/x86_64-linux-gnu/libc.so.6
12# 0x00007F78C0FCC253 in /lib/x86_64-linux-gnu/libstdc++.so.6
11# 0x00005A51911D67F2 in tritonserver
10# 0x00005A5191336143 in tritonserver
9# 0x00005A51911E9411 in tritonserver
8# 0x00005A51911E7B7D in tritonserver
7# 0x00005A5191855163 in tritonserver
6# 0x00005A519121F25C in tritonserver
5# 0x00005A5191856623 in tritonserver
4# 0x00005A519121E5B0 in tritonserver
3# 0x00005A5191858D2A in tritonserver
2# 0x00005A5191858B84 in tritonserver
1# 0x00007F78C0D09520 in /lib/x86_64-linux-gnu/libc.so.6
0# 0x00005A519117D52D in tritonserver

I receive a lot of warning before signal (11):
[Warning] File: /tmp/tritonbuild/tritonserver/build/_deps/repo-third-party-build/opentelemetry-cpp/src/opentelemetry-cpp/sdk/src/trace/b

I tested with several backends and models : torchscript, python, onnx and observed the same behavior across all of them (onT4and A100gpus).

The issue appears to be related to the --trace-config sampling rate parameter. When the rate is set to 100 or higher, everything works fine. However, when it's set between 1 and 100, the server receives Signal (11) and restarts.

Triton Information

I use triton version : 24.09

I used standard container : nvcr.io/nvidia/tritonserver:24.09-py3

To Reproduce

Use a sample model from the repo e.g: identity_fp32

Deploy to with the following helm chart deployment :

apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ .Release.Name }}
  namespace: {{ .Release.Namespace }}
spec:
  replicas: 1
  minReadySeconds: 30
  selector:
    matchLabels:
      app: {{ .Release.Name }}
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 1
  template:
    metadata:
      labels:
        app: {{ .Release.Name }}
      annotations:
        ad.datadoghq.com/{{ .Release.Name }}.checks: |
          {
            "openmetrics": {
              "init_config": {},
              "instances": [
                {
                  "openmetrics_endpoint": "http://%%host%%:8002/metrics",
                  "namespace": "dev.backend.tritonserver",
                  "metrics": ["nv_.*"],
                  "tags":["env:dev"]
                }
              ]
            }
          }
    spec:
      serviceAccountName: {{ .Values.serviceAccountName }}
      restartPolicy: Always
      terminationGracePeriodSeconds: 30
      containers:
        - name: {{ .Release.Name }}
          image: nvcr.io/nvidia/tritonserver:24.09-py3
          imagePullPolicy: Always
          command:
            - tritonserver
            - --model-repository=gs://path/to/repo
            - --trace-config
            - mode=opentelemetry
            - --trace-config
            - opentelemetry,url=http://datadog-agent-agent.datadog.svc:4318/v1/traces
            - --trace-config
            - opentelemetry,bsp_max_export_batch_size=1
            - --trace-config
            - opentelemetry,resource=service.name=backend.tritonserver
            - --trace-config
            - opentelemetry,resource=deployment.environment=dev
            - --trace-config
            - rate=1
            - --trace-config
            - level=TIMESTAMPS
            - --log-warning=1
            - --log-error=1
          ports:
            - name: http
              containerPort: 8000
            - name: grpc
              containerPort: 8001
            - name: metrics
              containerPort: 8002
          livenessProbe:
            initialDelaySeconds: 60
            failureThreshold: 3
            periodSeconds: 10
            httpGet:
              path: /v2/health/live
              port: http
          readinessProbe:
            initialDelaySeconds: 60
            periodSeconds: 5
            failureThreshold: 3
            httpGet:
              path: /v2/health/ready
              port: http
          startupProbe:
            periodSeconds: 10
            failureThreshold: 30
            httpGet:
              path: /v2/health/ready
              port: http
          resources:
            limits:
              nvidia.com/gpu: 1

Expected behavior

After few thousand requests at a high QPS, server should receive a signal (11) and restart.

The text was updated successfully, but these errors were encountered:

rmccorm4 · 2024-11-15T01:40:03Z

Hi @nicomeg-pr, thanks for raising this.

I receive a lot of warning before signal (11):
[Warning] File: /tmp/tritonbuild/tritonserver/build/_deps/repo-third-party-build/opentelemetry-cpp/src/opentelemetry-cpp/sdk/src/trace/b

Was this warning cut off? Is there more to it?
Can you reproduce this using the triton tracing mode instead of opentelemetry mode?
Can you reproduce with both HTTP and GRPC clients, or only one?
Can you share the client script to send the request load?

CC @indrajit96 @oandreeva-nv

nicomeg-pr · 2024-11-15T10:13:51Z

Here is the complete warning message, sorry it was truncated :

[Warning] File: /tmp/tritonbuild/tritonserver/build/_deps/repo-third-party-build/opentelemetry-cpp/src/opentelemetry-cpp/sdk/src/trace/batch_span_processor.cc:55 BatchSpanProcessor queue is full - dropping span.

All the warnings are the same.

oandreeva-nv · 2024-11-18T17:00:39Z

Hi @nicomeg-pr , could you please try increasing bsp_max_queue_size, by default it's 2048. I would also recommend adjusting bsp_max_export_batch_size to a number greater than 1. This will potentially speed up your system as well.

More docs for Batch exporter can be found here: https://github.com/triton-inference-server/server/blob/main/docs/user_guide/trace.md#opentelemetry-trace-apis-settings

nicomeg-pr · 2024-11-20T08:46:14Z

Hello,

To answer you questions :
2. I didn't test with triton traces as I cannot use them with datadog
3. I can reproduce the error with both GRPC and HTTP (with same warnings)
4. Here is the 2 code of the 2 client I used for GRPC and HTTP :

For GRPC I use triton grpc client :

from tritonclient.grpc.auth import BasicAuth
import tritonclient.grpc as grpcclient
basic_auth_config = BasicAuth("login", "password")
with grpcclient.InferenceServerClient(server_url) as triton_client:
   triton_client.register_plugin(basic_auth_config)
   input_data = np.array([random.random() for i in range(50)], dtype=np.float32).reshape(1, -1)
   model_input = grpcclient.InferInput(
       name="INPUT0",
       datatype="FP32",
       shape=input_data.shape
   )
   model_input.set_data_from_numpy(input_data)

   requests = triton_client.infer("identity_fp32", [model_input])
   result = requests.as_numpy("OUTPUT0")[0]

For HTTP I use simple http request.


input_data = np.array([random.random() for i in range(50)], dtype=np.float32).reshape(1, -1)
payload = {
            "inputs": [
                {
                    "name": "identity_fp32",
                    "shape": list(input_data.shape),
                    "datatype": "FP32,
                    "data": input_data.tolist(),
                }
            ]
        }
with httpx.Client() as client:
    response = client.post(
        f"{server_url}/v2/models/identity_fp32/infer",  json=payload, headers=BASIC_AUTH_HEADER
    )
    if response.status_code == 200:
        result = response.json()
    else:
        result = None
except httpx.HTTPError as e:
     ...

nicomeg-pr · 2024-11-20T08:47:11Z

@oandreeva-nv I tested with bsp_max_queue_size and rate 1 and I still got signal (11), I will try with 2048

nicomeg-pr changed the title ~~Triton server receives Signal (11) when tracing is enable with no sampling (or a small sampling rate)~~ Triton server receives Signal (11) when tracing is enabled with no sampling (or a small sampling rate) Nov 14, 2024

rmccorm4 self-assigned this Nov 15, 2024

rmccorm4 added the crash Related to server crashes, segfaults, etc. label Nov 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Triton server receives Signal (11) when tracing is enabled with no sampling (or a small sampling rate) #7795

Triton server receives Signal (11) when tracing is enabled with no sampling (or a small sampling rate) #7795

nicomeg-pr commented Nov 14, 2024

rmccorm4 commented Nov 15, 2024 •

edited

Loading

nicomeg-pr commented Nov 15, 2024

oandreeva-nv commented Nov 18, 2024 •

edited

Loading

nicomeg-pr commented Nov 20, 2024 •

edited

Loading

nicomeg-pr commented Nov 20, 2024

Triton server receives Signal (11) when tracing is enabled with no sampling (or a small sampling rate) #7795

Triton server receives Signal (11) when tracing is enabled with no sampling (or a small sampling rate) #7795

Comments

nicomeg-pr commented Nov 14, 2024

rmccorm4 commented Nov 15, 2024 • edited Loading

nicomeg-pr commented Nov 15, 2024

oandreeva-nv commented Nov 18, 2024 • edited Loading

nicomeg-pr commented Nov 20, 2024 • edited Loading

nicomeg-pr commented Nov 20, 2024

rmccorm4 commented Nov 15, 2024 •

edited

Loading

oandreeva-nv commented Nov 18, 2024 •

edited

Loading

nicomeg-pr commented Nov 20, 2024 •

edited

Loading