-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Ensure to use correct GPU device in RunSince when it's invoked by new thread #24192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Can a test be added for CUDA EP/TRT EP to stress this? (or existing test enhanced) |
i thought about adding the test but it needs to have multiple GPUs to test. |
The Linux and Windows GPU CI have only one GPU which is not suitable for testing the device id > 0 scenario. |
/azp run Big Models, Linux CPU Minimal Build E2E CI Pipeline, Linux QNN CI Pipeline, ONNX Runtime Web CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline, Windows x64 QNN CI Pipeline |
Azure Pipelines successfully started running 7 pipeline(s). |
… thread (microsoft#24192) Running cuda kernel on incorrect GPU device will end up getting CUDA error: `invalid resource handle.` CUDA EP and TRT EP both have this issue when ExecutionMode::ORT_PARALLEL is enabled. Repro code: ````python provider = [ [ ('TensorrtExecutionProvider', { 'device_id': 0, }), ], [ ('TensorrtExecutionProvider', { 'device_id': 1, }), ] ] class ThreadObj(): def __init__(self, model_path: str, iterations: int, idx: int): ... sess_opt = ort.SessionOptions() sess_opt.execution_mode = ort.ExecutionMode.ORT_PARALLEL self.inference_session = ort.InferenceSession(model_path, sess_opt, provider[idx % 2]) def warmup(self): self.inference_session.run(None, self.input) def run(self, thread_times, threads_complete): for iter in range(self.iterations): self.inference_session.run(None, self.input) def thread_target(obj, thread_times, threads_complete): obj.run(thread_times, threads_complete) ... iterations = 500 num_threads = 13 t_obj_list = [] thread_list = [] for tidx in range(num_threads): obj = ThreadObj(model_path, iterations, tidx) t_obj_list.append(obj) obj.warmup() for t_obj in t_obj_list: thread = threading.Thread(target=thread_target, daemon=True, args=(t_obj,thread_times,threads_complete,)) thread.start() thread_list.append(thread) ... ```` The reason is when the inference session is initialized, it can be bound to device > 0, whereas when running the inference, i.e. RunSince can be invoked by a new thread and new threads default to using device 0, then we will hit the error of using the incorrect GPU device. This PR provides a general fix for both CUDA EP and TRT EP to call cudaSetDeivce in RunSince.
Running cuda kernel on incorrect GPU device will end up getting CUDA error:
invalid resource handle.
CUDA EP and TRT EP both have this issue when ExecutionMode::ORT_PARALLEL is enabled.
Repro code:
The reason is when the inference session is initialized, it can be bound to device > 0, whereas when running the inference, i.e. RunSince can be invoked by a new thread and new threads default to using device 0, then we will hit the error of using the incorrect GPU device.
This PR provides a general fix for both CUDA EP and TRT EP to call cudaSetDeivce in RunSince.