Skip to content

Ensure to use correct GPU device in RunSince when it's invoked by new thread #24192

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Apr 2, 2025

Conversation

chilo-ms
Copy link
Contributor

@chilo-ms chilo-ms commented Mar 26, 2025

Running cuda kernel on incorrect GPU device will end up getting CUDA error: invalid resource handle.

CUDA EP and TRT EP both have this issue when ExecutionMode::ORT_PARALLEL is enabled.

Repro code:

provider = [
        [
            ('TensorrtExecutionProvider', {
            'device_id': 0,
            }),
        ],
        [
            ('TensorrtExecutionProvider', {
            'device_id': 1,
            }),
        ]
       ]

class ThreadObj():
    def __init__(self, model_path: str, iterations: int, idx: int):
       ...
        sess_opt = ort.SessionOptions()
        sess_opt.execution_mode = ort.ExecutionMode.ORT_PARALLEL
        self.inference_session = ort.InferenceSession(model_path, sess_opt, provider[idx % 2])
     
    def warmup(self):
        self.inference_session.run(None, self.input)

    def run(self, thread_times, threads_complete):
        for iter in range(self.iterations):
            self.inference_session.run(None, self.input)

def thread_target(obj, thread_times, threads_complete):
    obj.run(thread_times, threads_complete)

...

iterations = 500
num_threads = 13
t_obj_list = []
thread_list = []

for tidx in range(num_threads):
    obj = ThreadObj(model_path, iterations, tidx)
    t_obj_list.append(obj)
    obj.warmup()
    
for t_obj in t_obj_list:
    thread = threading.Thread(target=thread_target, daemon=True, args=(t_obj,thread_times,threads_complete,))
    thread.start()
    thread_list.append(thread)

...

The reason is when the inference session is initialized, it can be bound to device > 0, whereas when running the inference, i.e. RunSince can be invoked by a new thread and new threads default to using device 0, then we will hit the error of using the incorrect GPU device.
This PR provides a general fix for both CUDA EP and TRT EP to call cudaSetDeivce in RunSince.

@chilo-ms chilo-ms requested review from tianleiwu and jywu-msft March 26, 2025 19:26
@jywu-msft
Copy link
Member

Can a test be added for CUDA EP/TRT EP to stress this? (or existing test enhanced)

@chilo-ms
Copy link
Contributor Author

Can a test be added for CUDA EP/TRT EP to stress this? (or existing test enhanced)

i thought about adding the test but it needs to have multiple GPUs to test.
Checking whether our CI has multiple GPUs or not.

@chilo-ms
Copy link
Contributor Author

Can a test be added for CUDA EP/TRT EP to stress this? (or existing test enhanced)

The Linux and Windows GPU CI have only one GPU which is not suitable for testing the device id > 0 scenario.
But the newly added concurrent test can still test CUDA and TRT EP multiple concurrent runs using multithreading.

@chilo-ms chilo-ms closed this Apr 1, 2025
@chilo-ms chilo-ms reopened this Apr 1, 2025
@chilo-ms
Copy link
Contributor Author

chilo-ms commented Apr 2, 2025

/azp run Big Models, Linux CPU Minimal Build E2E CI Pipeline, Linux QNN CI Pipeline, ONNX Runtime Web CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline, Windows x64 QNN CI Pipeline

Copy link

Azure Pipelines successfully started running 7 pipeline(s).

@chilo-ms chilo-ms merged commit b379390 into main Apr 2, 2025
76 checks passed
@chilo-ms chilo-ms deleted the chi/set_device_id branch April 2, 2025 17:23
quic-zhaoxul pushed a commit to CodeLinaro/onnxruntime that referenced this pull request Apr 17, 2025
… thread (microsoft#24192)

Running cuda kernel on incorrect GPU device will end up getting CUDA
error: `invalid resource handle.`

CUDA EP and TRT EP both have this issue when ExecutionMode::ORT_PARALLEL
is enabled.

Repro code:
````python

provider = [
        [
            ('TensorrtExecutionProvider', {
            'device_id': 0,
            }),
        ],
        [
            ('TensorrtExecutionProvider', {
            'device_id': 1,
            }),
        ]
       ]

class ThreadObj():
    def __init__(self, model_path: str, iterations: int, idx: int):
       ...
        sess_opt = ort.SessionOptions()
        sess_opt.execution_mode = ort.ExecutionMode.ORT_PARALLEL
        self.inference_session = ort.InferenceSession(model_path, sess_opt, provider[idx % 2])
     
    def warmup(self):
        self.inference_session.run(None, self.input)

    def run(self, thread_times, threads_complete):
        for iter in range(self.iterations):
            self.inference_session.run(None, self.input)

def thread_target(obj, thread_times, threads_complete):
    obj.run(thread_times, threads_complete)

...

iterations = 500
num_threads = 13
t_obj_list = []
thread_list = []

for tidx in range(num_threads):
    obj = ThreadObj(model_path, iterations, tidx)
    t_obj_list.append(obj)
    obj.warmup()
    
for t_obj in t_obj_list:
    thread = threading.Thread(target=thread_target, daemon=True, args=(t_obj,thread_times,threads_complete,))
    thread.start()
    thread_list.append(thread)

...
````
The reason is when the inference session is initialized, it can be bound
to device > 0, whereas when running the inference, i.e. RunSince can be
invoked by a new thread and new threads default to using device 0, then
we will hit the error of using the incorrect GPU device.
This PR provides a general fix for both CUDA EP and TRT EP to call
cudaSetDeivce in RunSince.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants