Ensure to use correct GPU device in RunSince when it's invoked by new thread #24192

chilo-ms · 2025-03-26T19:25:31Z

Running cuda kernel on incorrect GPU device will end up getting CUDA error: invalid resource handle.

CUDA EP and TRT EP both have this issue when ExecutionMode::ORT_PARALLEL is enabled.

Repro code:

provider = [
        [
            ('TensorrtExecutionProvider', {
            'device_id': 0,
            }),
        ],
        [
            ('TensorrtExecutionProvider', {
            'device_id': 1,
            }),
        ]
       ]

class ThreadObj():
    def __init__(self, model_path: str, iterations: int, idx: int):
       ...
        sess_opt = ort.SessionOptions()
        sess_opt.execution_mode = ort.ExecutionMode.ORT_PARALLEL
        self.inference_session = ort.InferenceSession(model_path, sess_opt, provider[idx % 2])
     
    def warmup(self):
        self.inference_session.run(None, self.input)

    def run(self, thread_times, threads_complete):
        for iter in range(self.iterations):
            self.inference_session.run(None, self.input)

def thread_target(obj, thread_times, threads_complete):
    obj.run(thread_times, threads_complete)

...

iterations = 500
num_threads = 13
t_obj_list = []
thread_list = []

for tidx in range(num_threads):
    obj = ThreadObj(model_path, iterations, tidx)
    t_obj_list.append(obj)
    obj.warmup()
    
for t_obj in t_obj_list:
    thread = threading.Thread(target=thread_target, daemon=True, args=(t_obj,thread_times,threads_complete,))
    thread.start()
    thread_list.append(thread)

...

The reason is when the inference session is initialized, it can be bound to device > 0, whereas when running the inference, i.e. RunSince can be invoked by a new thread and new threads default to using device 0, then we will hit the error of using the incorrect GPU device.
This PR provides a general fix for both CUDA EP and TRT EP to call cudaSetDeivce in RunSince.

jywu-msft · 2025-03-28T19:37:00Z

Can a test be added for CUDA EP/TRT EP to stress this? (or existing test enhanced)

chilo-ms · 2025-03-28T21:00:30Z

Can a test be added for CUDA EP/TRT EP to stress this? (or existing test enhanced)

i thought about adding the test but it needs to have multiple GPUs to test.
Checking whether our CI has multiple GPUs or not.

chilo-ms · 2025-03-31T17:03:39Z

Can a test be added for CUDA EP/TRT EP to stress this? (or existing test enhanced)

The Linux and Windows GPU CI have only one GPU which is not suitable for testing the device id > 0 scenario.
But the newly added concurrent test can still test CUDA and TRT EP multiple concurrent runs using multithreading.

chilo-ms · 2025-04-02T15:20:56Z

/azp run Big Models, Linux CPU Minimal Build E2E CI Pipeline, Linux QNN CI Pipeline, ONNX Runtime Web CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline, Windows x64 QNN CI Pipeline

azure-pipelines · 2025-04-02T15:21:25Z

Azure Pipelines successfully started running 7 pipeline(s).

… thread (microsoft#24192) Running cuda kernel on incorrect GPU device will end up getting CUDA error: `invalid resource handle.` CUDA EP and TRT EP both have this issue when ExecutionMode::ORT_PARALLEL is enabled. Repro code: ````python provider = [ [ ('TensorrtExecutionProvider', { 'device_id': 0, }), ], [ ('TensorrtExecutionProvider', { 'device_id': 1, }), ] ] class ThreadObj(): def __init__(self, model_path: str, iterations: int, idx: int): ... sess_opt = ort.SessionOptions() sess_opt.execution_mode = ort.ExecutionMode.ORT_PARALLEL self.inference_session = ort.InferenceSession(model_path, sess_opt, provider[idx % 2]) def warmup(self): self.inference_session.run(None, self.input) def run(self, thread_times, threads_complete): for iter in range(self.iterations): self.inference_session.run(None, self.input) def thread_target(obj, thread_times, threads_complete): obj.run(thread_times, threads_complete) ... iterations = 500 num_threads = 13 t_obj_list = [] thread_list = [] for tidx in range(num_threads): obj = ThreadObj(model_path, iterations, tidx) t_obj_list.append(obj) obj.warmup() for t_obj in t_obj_list: thread = threading.Thread(target=thread_target, daemon=True, args=(t_obj,thread_times,threads_complete,)) thread.start() thread_list.append(thread) ... ```` The reason is when the inference session is initialized, it can be bound to device > 0, whereas when running the inference, i.e. RunSince can be invoked by a new thread and new threads default to using device 0, then we will hit the error of using the incorrect GPU device. This PR provides a general fix for both CUDA EP and TRT EP to call cudaSetDeivce in RunSince.

chilo-ms added 4 commits March 26, 2025 10:45

update

4093ed8

update

e63db6e

update

91d3419

update

9647041

chilo-ms requested review from tianleiwu and jywu-msft March 26, 2025 19:26

chilo-ms added 7 commits March 26, 2025 13:34

fix compile warning

4ae911a

update

376689b

fix typo

ddf67ac

fix compile warning

7bd1afd

call SetDevice function only when ORT_ENABLE_STREAM is on

b62ecd9

fix bug

8f0e1d3

fix bug

0687b25

chilo-ms added 2 commits March 28, 2025 15:54

add test

f41ca4f

modify test to correctly get num GPU device

172dea6

tianleiwu approved these changes Mar 31, 2025

View reviewed changes

chilo-ms closed this Apr 1, 2025

chilo-ms reopened this Apr 1, 2025

chilo-ms added 2 commits April 1, 2025 09:05

Merge branch 'main' into chi/set_device_id

4d197fd

Merge branch 'main' into chi/set_device_id

234052c

chilo-ms merged commit b379390 into main Apr 2, 2025
76 checks passed

chilo-ms deleted the chi/set_device_id branch April 2, 2025 17:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure to use correct GPU device in RunSince when it's invoked by new thread #24192

Ensure to use correct GPU device in RunSince when it's invoked by new thread #24192

chilo-ms commented Mar 26, 2025 •

edited

Loading

jywu-msft commented Mar 28, 2025

chilo-ms commented Mar 28, 2025

chilo-ms commented Mar 31, 2025

chilo-ms commented Apr 2, 2025

azure-pipelines bot commented Apr 2, 2025

Ensure to use correct GPU device in RunSince when it's invoked by new thread #24192

Ensure to use correct GPU device in RunSince when it's invoked by new thread #24192

Conversation

chilo-ms commented Mar 26, 2025 • edited Loading

jywu-msft commented Mar 28, 2025

chilo-ms commented Mar 28, 2025

chilo-ms commented Mar 31, 2025

chilo-ms commented Apr 2, 2025

azure-pipelines bot commented Apr 2, 2025

chilo-ms commented Mar 26, 2025 •

edited

Loading