Skip to content

Commit d6de955

Browse files
authoredApr 19, 2025··
[Docs] EPcontext error handling (#24471)
### Description Add section into EPContext design doc: Error Handling During EP Context Binary Loading, to provider user a standard to follow.
1 parent c2d9787 commit d6de955

File tree

2 files changed

+23
-13
lines changed

2 files changed

+23
-13
lines changed
 

‎docs/execution-providers/EP-Context-Design.md

+21-12
Original file line numberDiff line numberDiff line change
@@ -199,7 +199,7 @@ ONNX Runtime EPs that support loading models with `EPContext` nodes should follo
199199
- The EP should only partition the `EPContext` nodes where the `source` attribute matches the key required by the EP.
200200
- The EP loads the cached context from the matched `EPContext` nodes.
201201

202-
- **Handling External Context Binaries (embed_mode = 0)**
202+
- **Handling External Context Binaries (embed_mode = 0)**
203203
When the `EPContext` cache model is generated with `embed_mode = 0`, the context binary is stored as a separate file alongside the ONNX model in the same folder.
204204
- ONNX Runtime retrieves the relative path of the context binary file from the `ep_cache_context` attribute of the `EPContext` node.
205205
- **For models loaded from a file path:**
@@ -208,10 +208,19 @@ ONNX Runtime EPs that support loading models with `EPContext` nodes should follo
208208
- Since the EP cannot derive the model's folder path, the user must specify the session option `ep.context_file_path`.
209209
- The EP uses `ep.context_file_path` to determine the folder path and combines it with the relative path to construct the full path to the context binary file.
210210

211-
- **Support for Multiple Primary `EPContext` Nodes (`main_context = 1`)**
211+
- **Support for Multiple Primary `EPContext` Nodes (`main_context = 1`)**
212212
- The EP should support multiple primary `EPContext` nodes without any limitations.
213213
- The EP must be capable of loading all EP context binary buffers/files specified in the `ep_cache_context` attributes of the `EPContext` nodes, deserializing them, managing the `ep_graphs`, and selecting the appropriate one for execution.
214214

215+
- **Error Handling During EP Context Binary Loading**
216+
217+
The EP or its backend SDK should be capable of detecting common failure scenarios (including but not limited to the following). In such cases, the EP should return a status with the `INVALID_GRAPH` status code:
218+
219+
- Detect mismatches between the driver version and the version required by the EP context binary; return an error if they are incompatible.
220+
- Detect mismatches between the runtime SDK version and the version used to generated the EP context binary; return an error if they are incompatible.
221+
- Return an error if loading the EP context binary fails for any reason.
222+
223+
215224
<p align="center"><img width="60%" src="../../images/EP_context_nodes_with_different_eps.png" alt="EP Context nodes with different EPs"/></p>
216225

217226
### Usage Scenario Code Examples
@@ -253,13 +262,13 @@ Creating a session from a memory buffer of the model causes the session to lose
253262
session1.run(...);
254263
```
255264

256-
# EPContext with Weight Sharing
265+
## EPContext with Weight Sharing
257266

258-
## Weight Sharing in Onnx Domain
267+
### Weight Sharing in Onnx Domain
259268
In ONNX, weight sharing refers to multiple ONNX models with external weights pointing to the same external weight file. These models use the same tensor names, allowing them to reference the same tensor data.
260269
<p align="center"><img width="50%" src="../../images/Onnx_weight_sharing.png" alt="Weight sharing across Onnx models"/></p>
261270

262-
## Weight Sharing in EP Domain with EPContext
271+
### Weight Sharing in EP Domain with EPContext
263272
EP weight sharing is enabled using a pre-generated EP context binary/blob.
264273
To do this, users must **generate the context binary offline** (Ahead Of Time).
265274
- Some EPs require specific platforms, such as **Linux x86_64** and/or **Windows x86_64**. Please refer to the specific EP page for details.
@@ -272,15 +281,15 @@ The EP or backend SDK should be capable of converting and compiling the graph as
272281
- When new graphs are compiled into the EP context, they should reuse existing weights if they are recognized as identical.
273282
For example, in `[model_name]_[ep].bin`, `tensor1_1` from `ep_graph1` and `tensor2_1` from `ep_graph2` are identical and both point to the same data offset, `tensor_data1`.
274283

275-
## EPContext Model Generation with Weight Sharing Workflow
284+
### EPContext Model Generation with Weight Sharing Workflow
276285
<p align="center"><img width="90%" src="../../images/EP_weight_sharing_workflow.png" alt="Weight sharing workflow"/></p>
277286

278287
Each ONNX Runtime session is associated with an ONNX model. Models that share weights are grouped into a model group, while ONNX Runtime sessions with common properties are organized into a session group. ONNX Runtime introduces two session options: `ep.share_ep_contexts` and `ep.stop_share_ep_contexts` to facilitate session grouping.
279288
- All ONNX Runtime sessions within the session group should have `ep.share_ep_contexts` enabled.
280289
- The final ONNX Runtime session uses `ep.stop_share_ep_contexts` to indicate that it is the last session in the group.
281290
Note: A single ONNX model may contain multiple `EPContext` nodes, depending on the graph partitioning result. However, for simplicity, each model is shown with only one `EPcontext` node here.
282291

283-
## Implementation Guidelines for EPContext Model Generation with Weight Sharing
292+
### Implementation Guidelines for EPContext Model Generation with Weight Sharing
284293
- Shared Workspace Creation:
285294
<br/> The first session creates a shared workspace (e.g., EP Singleton) to share resources with other sessions.
286295
- EP Context Binary File Naming:
@@ -298,7 +307,7 @@ Note: A single ONNX model may contain multiple `EPContext` nodes, depending on t
298307
<br/> For N source models that share weights, a total of N+1 files should be generated.
299308
<br/> The generated files are `model1_ctx.onnx`, `...`, `modeln_ctx.onnx`, `[model1_name]_[ep].bin`.
300309

301-
### User Code Example
310+
#### User Code Example
302311
```
303312
Ort::SessionOptions so;
304313
@@ -321,21 +330,21 @@ Note: A single ONNX model may contain multiple `EPContext` nodes, depending on t
321330
Ort::Session session2(env, "model2.onnx", so);
322331
```
323332

324-
### General Tool for EPContext Model Generation with Weight Sharing
333+
#### General Tool for EPContext Model Generation with Weight Sharing
325334
OnnxRuntime provides the [ep_weight_sharing_ctx_gen](https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/test/ep_weight_sharing_ctx_gen) tool to automate the weight-sharing workflow. This tool handles the entire process. This tool is specifically designed for **weight sharing** scenarios, streamlining the `EPContext` model generation process.
326335
Example command line:
327336
```
328337
./ep_weight_sharing_ctx_gen -e qnn -i "soc_model|60 htp_graph_finalization_optimization_mode|3" ./model1.onnx,./model2.onnx
329338
```
330339
It creates two Onnx models (`model1_ctx.onnx`, `model2_ctx.onnx`) and one QNN context binary file (`[model1_name]_[ep].bin`).
331340

332-
## Inference Sessions from EPContext Models with Weight Sharing
341+
### Inference Sessions from EPContext Models with Weight Sharing
333342
To use the dumped EPContext models with weight sharing enabled, ONNX Runtime inference sessions must have **resource sharing** activated. This is done by setting the session option:
334343
```
335344
ep.share_ep_contexts = 1
336345
```
337346

338-
### Implementation Guidelines for Inferencing from EPContext Models with Weight Sharing
347+
#### Implementation Guidelines for Inferencing from EPContext Models with Weight Sharing
339348
- Create the first OnnxRuntime inference session
340349
- Set session option: `ep.share_ep_contexts=1`.
341350
- Load the `model1_ctx.onnx` model.
@@ -355,7 +364,7 @@ To use the dumped EPContext models with weight sharing enabled, ONNX Runtime inf
355364
- To avoid issues during concurrent execution, it is recommended to **destroy the sessions in reverse order** (i.e., destroy the second session before the first session).
356365
- This ensures proper resource management and prevents potential conflicts with shared resources.
357366

358-
### User Code Example
367+
#### User Code Example
359368
```
360369
Ort::SessionOptions so;
361370
// enable ep.share_ep_contexts

‎docs/execution-providers/QNN-ExecutionProvider.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -149,9 +149,10 @@ Alternatively to setting profiling_level at compile time, profiling can be enabl
149149
|'1'|Enable the QNN HTP shared memory allocator. Requires libcdsprpc.so/dll to be available. [Code example](https://github.com/microsoft/onnxruntime/blob/544bdd60730270f49f6a5baafdff54065f626776/onnxruntime/test/shared_lib/test_inference.cc#L2262-L2354)|
150150

151151
### Run Options
152+
152153
|`"qnn.lora_config"`|Description|
153154
|---|---|
154-
|Config path|LoRAv2 config file path. The format of the config will be mentioned in the <b>LoraV2 support</b>.|
155+
|Config path|LoRAv2 config file path. The format of the config will be mentioned in the **LoraV2 support**.|
155156

156157
## Supported ONNX operators
157158

0 commit comments

Comments
 (0)
Please sign in to comment.