triton-inference-server / server Public

Notifications You must be signed in to change notification settings
Fork 1.5k
Star 8.4k

Code
Issues 582
Pull requests 63
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: triton-inference-server/server

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

582 Open 3,212 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Questions about the two execution-policy of TensorRT backend, DEVICE_BLOCKING mode and BLOCKING mode

#7831 opened Nov 25, 2024 by Will-Chou-5722

Source code shows that buffer-manager-thread-count option is broken

#7830 opened Nov 25, 2024 by sunkenQ

How to free multiple gpu memory

#7825 opened Nov 22, 2024 by 1120475708

Suggestion on optimizing inference when model output size is large.

#7824 opened Nov 22, 2024 by zmy1116

Unknown TensorRT-LLM model endpoint when using --model-namespacing=true

#7823 opened Nov 21, 2024 by MatteoPagliani

Triton Server Utilizes Only One GPU Despite Two GPUs Available on Node

#7818 opened Nov 20, 2024 by jmarchel7bulls

Can triton server support trace_id generator config?

#7817 opened Nov 20, 2024 by stknight43

Model Analyzer Fails to Connect to Triton Server ([StatusCode.UNAVAILABLE] failed to connect to all addresses)

#7813 opened Nov 19, 2024 by goudemaoningsir

InferenceResponse error code is lost in Python BLS

#7804 opened Nov 15, 2024 by ShuaiShao93

Error about driver version compatibility module: platforms

Issues related to platforms, hardware, and support matrix

question

Further information is requested

#7798 opened Nov 15, 2024 by GLW1215

Problems with the response of the OpenAI-Compatible Frontend for Triton Inference Server module: frontends

Issues related to the triton frontends

#7796 opened Nov 14, 2024 by DimadonDL

Triton server receives Signal (11) when tracing is enabled with no sampling (or a small sampling rate) crash

Related to server crashes, segfaults, etc.

#7795 opened Nov 14, 2024 by nicomeg-pr

ensemble multi-GPU module: server

Issues related to the server core

question

Further information is requested

#7794 opened Nov 14, 2024 by xiazi-yu

有人遇到过yolov8n.pt模型转torchscripts和onnx，在triton server或Deepytorch Inference上推理，精度下降的问题吗？

#7792 opened Nov 14, 2024 by JackonLiu

Triton x vLLM backend GPU selection issue module: backends

Issues related to the backends

#7786 opened Nov 13, 2024 by Tedyang2003

Example of using Ragged Batching with FasterTransformer / TRT-LLM for zero-padding BERT inference ("continuous batching")

#7777 opened Nov 8, 2024 by vadimkantorov

Unpredictability in Sequence batching performance

A possible performance tune-up

#7776 opened Nov 8, 2024 by arun-oai

Do I need to warm up the model again after reloading it? question

Further information is requested

#7762 opened Nov 4, 2024 by soulseen

How to deploy ensemble models of different versions more elegantly?

#7761 opened Nov 4, 2024 by lzcchl

StatusCode.UNAVAILABLE] Received http2 header with status: 502

#7760 opened Nov 3, 2024 by furkanc

Unable to simultaneously load TensorRT model.plan on different GPUs in Triton Inference Server in the same instance

#7755 opened Oct 30, 2024 by AntnvSergey

Build AMD64 Triton from ARM64 machine generate ARM64 architecture executable file build

Issues pertaining to builds

module: platforms

Issues related to platforms, hardware, and support matrix

#7745 opened Oct 26, 2024 by ti1uan

Handle raw binary request in python

#7741 opened Oct 24, 2024 by remiruzn

SeamlessM4T on triton

#7740 opened Oct 24, 2024 by Interwebart

Expensive & Volatile Triton Server latency performance

A possible performance tune-up

#7739 opened Oct 24, 2024 by jadhosn

Previous 1 2 3 4 5 … 23 24 Next

Previous Next

ProTip! What’s not been updated in a month: updated:<2024-10-24.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly