Skip to content

Add infrastructure for auto EP selection #24430

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 36 commits into from
Apr 20, 2025

Conversation

skottmckay
Copy link
Contributor

Description

Add infrastructure to enable auto EP selection.

Device discovery for CPU/GPU/NPU on Windows.
Supports internal (CPU/DML/WebGPU) and provider bridge (CUDA) EPs currently.
Infrastructure will be used with plugin EPs next.

Selection policy implementation will be added next, so in the interim there's a temporary function with manually specified selection so unit tests can cover the end-to-end.

Motivation and Context

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@jywu-msft
Copy link
Member

bunch of build failures. missing some dxcore header ?

Disable EP registration/auto selection testing on non-Windows platforms. There's no device discovery so it can't be used.
@jywu-msft jywu-msft requested a review from RyanUnderhill April 17, 2025 22:41
skottmckay and others added 4 commits April 18, 2025 12:07

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
skottmckay and others added 3 commits April 19, 2025 07:17

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Add ORT_API_CALL fixes

Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com>
…cies a lot simpler, and in the future we need to provide that to an EP author so it is reasonable for it to be implemented in onnxruntime_session.

Address PR comments.

Add onnxruntime_session to the CUDA EP dependencies.
skottmckay and others added 11 commits April 19, 2025 09:59

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com>
- move provider bridge support to the EP side
- change SessionOptionsAppendExecutionProvider_V2 to take OrtEpDevices as input
- better device discovery (pending final implementation)
Avoid double lock for Provider Get when it calls Load.
skottmckay and others added 4 commits April 20, 2025 07:28

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com>
Disable setting device_id for DML. Need to investigate why we get an invalid bus number in device discovery. The device discovery code is temporary and most likely has an issue.
@skottmckay skottmckay merged commit 6df6206 into main Apr 20, 2025
87 of 89 checks passed
@skottmckay skottmckay deleted the skottmckay/AutoSelectEpInfrastructure_PR branch April 20, 2025 06:14
@skottmckay
Copy link
Contributor Author

Will follow up any comments in separate PR

ashrit-ms pushed a commit that referenced this pull request Apr 24, 2025
### Description
<!-- Describe your changes. -->
Add infrastructure to enable auto EP selection.

Device discovery for CPU/GPU/NPU on Windows.
Supports internal (CPU/DML/WebGPU) and provider bridge (CUDA) EPs
currently.
Infrastructure will be used with plugin EPs next.

Selection policy implementation will be added next, so in the interim
there's a temporary function with manually specified selection so unit
tests can cover the end-to-end.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com>
adrianlizarraga added a commit that referenced this pull request Apr 25, 2025
### Description
Fixes a segfault that occurs when an EP library is re-loaded in the same
process.


### Motivation and Context
A recent [PR ](#24430)
updated the Environment to unload all EP libraries on destruction of
`OrtEnv`. We forgot to properly update the state to mark the EP library
as unloaded. Therefore, this caused a segfault when the EP library was
re-loaded.
intbf pushed a commit to intbf/onnxruntime that referenced this pull request Apr 25, 2025
### Description
<!-- Describe your changes. -->
Add infrastructure to enable auto EP selection.

Device discovery for CPU/GPU/NPU on Windows.
Supports internal (CPU/DML/WebGPU) and provider bridge (CUDA) EPs
currently.
Infrastructure will be used with plugin EPs next.

Selection policy implementation will be added next, so in the interim
there's a temporary function with manually specified selection so unit
tests can cover the end-to-end.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com>
Signed-off-by: bfilipek <bartlomiej.filipek@intel.com>
vraspar pushed a commit that referenced this pull request Apr 28, 2025
### Description
Fixes a segfault that occurs when an EP library is re-loaded in the same
process.


### Motivation and Context
A recent [PR ](#24430)
updated the Environment to unload all EP libraries on destruction of
`OrtEnv`. We forgot to properly update the state to mark the EP library
as unloaded. Therefore, this caused a segfault when the EP library was
re-loaded.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants