Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Triton server receives Signal (11) when tracing is enabled with no sampling (or a small sampling rate) #7795

Open
nicomeg-pr opened this issue Nov 14, 2024 · 5 comments
Assignees
Labels
crash Related to server crashes, segfaults, etc.

Comments

@nicomeg-pr
Copy link

Description

When starting Triton Server with tracing and with a generic model (e.g., identity_model_fp32 from the Python backend example), the server crashes with signal 11 after handling a few thousand requests at a relatively high QPS (> 100).

The issue appears to be primarily influenced by the QPS rather than the total number of requests sent to the server—the higher the QPS, the sooner the signal 11 crash occurs.

I get the following error message :

Signal (11) received.
14# 0x00007F78C0DED850 in /lib/x86_64-linux-gnu/libc.so.6
13# 0x00007F78C0D5BAC3 in /lib/x86_64-linux-gnu/libc.so.6
12# 0x00007F78C0FCC253 in /lib/x86_64-linux-gnu/libstdc++.so.6
11# 0x00005A51911D67F2 in tritonserver
10# 0x00005A5191336143 in tritonserver
9# 0x00005A51911E9411 in tritonserver
8# 0x00005A51911E7B7D in tritonserver
7# 0x00005A5191855163 in tritonserver
6# 0x00005A519121F25C in tritonserver
5# 0x00005A5191856623 in tritonserver
4# 0x00005A519121E5B0 in tritonserver
3# 0x00005A5191858D2A in tritonserver
2# 0x00005A5191858B84 in tritonserver
1# 0x00007F78C0D09520 in /lib/x86_64-linux-gnu/libc.so.6
0# 0x00005A519117D52D in tritonserver

I receive a lot of warning before signal (11):
[Warning] File: /tmp/tritonbuild/tritonserver/build/_deps/repo-third-party-build/opentelemetry-cpp/src/opentelemetry-cpp/sdk/src/trace/b

I tested with several backends and models : torchscript, python, onnx and observed the same behavior across all of them (onT4and A100gpus).

The issue appears to be related to the --trace-config sampling rate parameter. When the rate is set to 100 or higher, everything works fine. However, when it's set between 1 and 100, the server receives Signal (11) and restarts.

Triton Information

I use triton version : 24.09

I used standard container : nvcr.io/nvidia/tritonserver:24.09-py3

To Reproduce

Use a sample model from the repo e.g: identity_fp32

Deploy to with the following helm chart deployment :

apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ .Release.Name }}
  namespace: {{ .Release.Namespace }}
spec:
  replicas: 1
  minReadySeconds: 30
  selector:
    matchLabels:
      app: {{ .Release.Name }}
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 1
  template:
    metadata:
      labels:
        app: {{ .Release.Name }}
      annotations:
        ad.datadoghq.com/{{ .Release.Name }}.checks: |
          {
            "openmetrics": {
              "init_config": {},
              "instances": [
                {
                  "openmetrics_endpoint": "http://%%host%%:8002/metrics",
                  "namespace": "dev.backend.tritonserver",
                  "metrics": ["nv_.*"],
                  "tags":["env:dev"]
                }
              ]
            }
          }
    spec:
      serviceAccountName: {{ .Values.serviceAccountName }}
      restartPolicy: Always
      terminationGracePeriodSeconds: 30
      containers:
        - name: {{ .Release.Name }}
          image: nvcr.io/nvidia/tritonserver:24.09-py3
          imagePullPolicy: Always
          command:
            - tritonserver
            - --model-repository=gs://path/to/repo
            - --trace-config
            - mode=opentelemetry
            - --trace-config
            - opentelemetry,url=http://datadog-agent-agent.datadog.svc:4318/v1/traces
            - --trace-config
            - opentelemetry,bsp_max_export_batch_size=1
            - --trace-config
            - opentelemetry,resource=service.name=backend.tritonserver
            - --trace-config
            - opentelemetry,resource=deployment.environment=dev
            - --trace-config
            - rate=1
            - --trace-config
            - level=TIMESTAMPS
            - --log-warning=1
            - --log-error=1
          ports:
            - name: http
              containerPort: 8000
            - name: grpc
              containerPort: 8001
            - name: metrics
              containerPort: 8002
          livenessProbe:
            initialDelaySeconds: 60
            failureThreshold: 3
            periodSeconds: 10
            httpGet:
              path: /v2/health/live
              port: http
          readinessProbe:
            initialDelaySeconds: 60
            periodSeconds: 5
            failureThreshold: 3
            httpGet:
              path: /v2/health/ready
              port: http
          startupProbe:
            periodSeconds: 10
            failureThreshold: 30
            httpGet:
              path: /v2/health/ready
              port: http
          resources:
            limits:
              nvidia.com/gpu: 1

Expected behavior

After few thousand requests at a high QPS, server should receive a signal (11) and restart.

@nicomeg-pr nicomeg-pr changed the title Triton server receives Signal (11) when tracing is enable with no sampling (or a small sampling rate) Triton server receives Signal (11) when tracing is enabled with no sampling (or a small sampling rate) Nov 14, 2024
@rmccorm4
Copy link
Contributor

rmccorm4 commented Nov 15, 2024

Hi @nicomeg-pr, thanks for raising this.

I receive a lot of warning before signal (11):
[Warning] File: /tmp/tritonbuild/tritonserver/build/_deps/repo-third-party-build/opentelemetry-cpp/src/opentelemetry-cpp/sdk/src/trace/b

  1. Was this warning cut off? Is there more to it?
  2. Can you reproduce this using the triton tracing mode instead of opentelemetry mode?
  3. Can you reproduce with both HTTP and GRPC clients, or only one?
  4. Can you share the client script to send the request load?

CC @indrajit96 @oandreeva-nv

@rmccorm4 rmccorm4 self-assigned this Nov 15, 2024
@rmccorm4 rmccorm4 added the crash Related to server crashes, segfaults, etc. label Nov 15, 2024
@nicomeg-pr
Copy link
Author

Here is the complete warning message, sorry it was truncated :

[Warning] File: /tmp/tritonbuild/tritonserver/build/_deps/repo-third-party-build/opentelemetry-cpp/src/opentelemetry-cpp/sdk/src/trace/batch_span_processor.cc:55 BatchSpanProcessor queue is full - dropping span.

All the warnings are the same.

@oandreeva-nv
Copy link
Contributor

oandreeva-nv commented Nov 18, 2024

Hi @nicomeg-pr , could you please try increasing bsp_max_queue_size, by default it's 2048. I would also recommend adjusting bsp_max_export_batch_size to a number greater than 1. This will potentially speed up your system as well.

More docs for Batch exporter can be found here: https://github.com/triton-inference-server/server/blob/main/docs/user_guide/trace.md#opentelemetry-trace-apis-settings

@nicomeg-pr
Copy link
Author

nicomeg-pr commented Nov 20, 2024

Hello,

To answer you questions :
2. I didn't test with triton traces as I cannot use them with datadog
3. I can reproduce the error with both GRPC and HTTP (with same warnings)
4. Here is the 2 code of the 2 client I used for GRPC and HTTP :

For GRPC I use triton grpc client :

from tritonclient.grpc.auth import BasicAuth
import tritonclient.grpc as grpcclient
basic_auth_config = BasicAuth("login", "password")
with grpcclient.InferenceServerClient(server_url) as triton_client:
   triton_client.register_plugin(basic_auth_config)
   input_data = np.array([random.random() for i in range(50)], dtype=np.float32).reshape(1, -1)
   model_input = grpcclient.InferInput(
       name="INPUT0",
       datatype="FP32",
       shape=input_data.shape
   )
   model_input.set_data_from_numpy(input_data)

   requests = triton_client.infer("identity_fp32", [model_input])
   result = requests.as_numpy("OUTPUT0")[0]

For HTTP I use simple http request.


input_data = np.array([random.random() for i in range(50)], dtype=np.float32).reshape(1, -1)
payload = {
            "inputs": [
                {
                    "name": "identity_fp32",
                    "shape": list(input_data.shape),
                    "datatype": "FP32,
                    "data": input_data.tolist(),
                }
            ]
        }
with httpx.Client() as client:
    response = client.post(
        f"{server_url}/v2/models/identity_fp32/infer",  json=payload, headers=BASIC_AUTH_HEADER
    )
    if response.status_code == 200:
        result = response.json()
    else:
        result = None
except httpx.HTTPError as e:
     ...            

@nicomeg-pr
Copy link
Author

@oandreeva-nv I tested with bsp_max_queue_size and rate 1 and I still got signal (11), I will try with 2048

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
crash Related to server crashes, segfaults, etc.
Development

No branches or pull requests

3 participants