[Build] Issues with Multithreading in the New Versions of onnxruntime-directml #22867

lianshiye0 · 2024-11-18T03:43:07Z

Describe the issue

Issue Description:

In versions 1.17.0 and earlier of onnxruntime-directml, when using an AMD GPU and the onnxruntime.InferenceSession() method to load an ONNX model onto the GPU, a model session is created. If the program utilizes multithreading, multiple threads may compete for the model session, leading to deadlocks and crashes. Implementing a queue mechanism to avoid resource contention resolves the issue in these versions.

However, from version 1.18.0 onwards, despite using various mechanisms such as queueing, locks, and thread semaphores to limit resource contention in a multithreaded environment, these solutions have no effect. The problem persists, resulting in deadlocks and crashes.

Steps to Reproduce:

Use an AMD GPU.

Load an ONNX model using onnxruntime.InferenceSession() in a multithreaded program.

Observe deadlocks and crashes due to multiple threads competing for the model session.

Implement queueing, locks, and thread semaphores to manage resource contention.

Observe that these mechanisms do not resolve the issue in versions 1.18.0 and later.

Expected Behavior: Multithreading mechanisms should effectively manage resource contention, preventing deadlocks and crashes.

Actual Behavior: Resource contention management mechanisms are ineffective in versions 1.18.0 and later, resulting in persistent deadlocks and crashes.

Environment:

ONNX Runtime DirectML Versions: 1.17.0 and earlier (issue resolved with queueing), 1.18.0 and later (issue persists)

Hardware: AMD GPU

Operating System: (windows10 or windows11)

Request for Assistance: Given my observations, there seems to be a resource contention issue, but I am not entirely certain of the underlying cause. Could you provide guidance or solutions for resolving this issue in the newer versions of onnxruntime-directml?

Urgency

No response

Target platform

windows10 or windows11

Build script

session= onnxruntime.InferenceSession(onnx_model_path, providers= ['DmlExecutionProvider', 'CPUExecutionProvider'])

Error / output

The program deadlocks and crashes without generating any error messages or logs.

Visual Studio Version

No response

GCC / Compiler Version

No response

lianshiye0 · 2024-11-18T05:10:12Z

session= onnxruntime 的InferenceSession（onnx_model_path， providers= ['DmlExecutionProvider'， 'CPUExecutionProvider']
"After loading the model onto the GPU, the issue of crashing occurs when calling session.run()."

lianshiye0 added the build build issues; typically submitted using template label Nov 18, 2024

github-actions bot added the ep:DML issues related to the DirectML execution provider label Nov 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Build] Issues with Multithreading in the New Versions of onnxruntime-directml #22867

[Build] Issues with Multithreading in the New Versions of onnxruntime-directml #22867

lianshiye0 commented Nov 18, 2024

lianshiye0 commented Nov 18, 2024

[Build] Issues with Multithreading in the New Versions of onnxruntime-directml #22867

[Build] Issues with Multithreading in the New Versions of onnxruntime-directml #22867

Comments

lianshiye0 commented Nov 18, 2024

Describe the issue

Urgency

Target platform

Build script

Error / output

Visual Studio Version

GCC / Compiler Version

lianshiye0 commented Nov 18, 2024