You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the `EPContext` cache model is generated with `embed_mode = 0`, the context binary is stored as a separate file alongside the ONNX model in the same folder.
204
204
- ONNX Runtime retrieves the relative path of the context binary file from the `ep_cache_context` attribute of the `EPContext` node.
205
205
-**For models loaded from a file path:**
@@ -208,10 +208,19 @@ ONNX Runtime EPs that support loading models with `EPContext` nodes should follo
208
208
- Since the EP cannot derive the model's folder path, the user must specify the session option `ep.context_file_path`.
209
209
- The EP uses `ep.context_file_path` to determine the folder path and combines it with the relative path to construct the full path to the context binary file.
210
210
211
-
-**Support for Multiple Primary `EPContext` Nodes (`main_context = 1`)**
211
+
-**Support for Multiple Primary `EPContext` Nodes (`main_context = 1`)**
212
212
- The EP should support multiple primary `EPContext` nodes without any limitations.
213
213
- The EP must be capable of loading all EP context binary buffers/files specified in the `ep_cache_context` attributes of the `EPContext` nodes, deserializing them, managing the `ep_graphs`, and selecting the appropriate one for execution.
214
214
215
+
-**Error Handling During EP Context Binary Loading**
216
+
217
+
The EP or its backend SDK should be capable of detecting common failure scenarios (including but not limited to the following). In such cases, the EP should return a status with the `INVALID_GRAPH` status code:
218
+
219
+
- Detect mismatches between the driver version and the version required by the EP context binary; return an error if they are incompatible.
220
+
- Detect mismatches between the runtime SDK version and the version used to generated the EP context binary; return an error if they are incompatible.
221
+
- Return an error if loading the EP context binary fails for any reason.
222
+
223
+
215
224
<palign="center"><imgwidth="60%"src="../../images/EP_context_nodes_with_different_eps.png"alt="EP Context nodes with different EPs"/></p>
216
225
217
226
### Usage Scenario Code Examples
@@ -253,13 +262,13 @@ Creating a session from a memory buffer of the model causes the session to lose
253
262
session1.run(...);
254
263
```
255
264
256
-
# EPContext with Weight Sharing
265
+
##EPContext with Weight Sharing
257
266
258
-
## Weight Sharing in Onnx Domain
267
+
###Weight Sharing in Onnx Domain
259
268
In ONNX, weight sharing refers to multiple ONNX models with external weights pointing to the same external weight file. These models use the same tensor names, allowing them to reference the same tensor data.
260
269
<palign="center"><imgwidth="50%"src="../../images/Onnx_weight_sharing.png"alt="Weight sharing across Onnx models"/></p>
261
270
262
-
## Weight Sharing in EP Domain with EPContext
271
+
###Weight Sharing in EP Domain with EPContext
263
272
EP weight sharing is enabled using a pre-generated EP context binary/blob.
264
273
To do this, users must **generate the context binary offline** (Ahead Of Time).
265
274
- Some EPs require specific platforms, such as **Linux x86_64** and/or **Windows x86_64**. Please refer to the specific EP page for details.
@@ -272,15 +281,15 @@ The EP or backend SDK should be capable of converting and compiling the graph as
272
281
- When new graphs are compiled into the EP context, they should reuse existing weights if they are recognized as identical.
273
282
For example, in `[model_name]_[ep].bin`, `tensor1_1` from `ep_graph1` and `tensor2_1` from `ep_graph2` are identical and both point to the same data offset, `tensor_data1`.
274
283
275
-
## EPContext Model Generation with Weight Sharing Workflow
284
+
###EPContext Model Generation with Weight Sharing Workflow
Each ONNX Runtime session is associated with an ONNX model. Models that share weights are grouped into a model group, while ONNX Runtime sessions with common properties are organized into a session group. ONNX Runtime introduces two session options: `ep.share_ep_contexts` and `ep.stop_share_ep_contexts` to facilitate session grouping.
279
288
- All ONNX Runtime sessions within the session group should have `ep.share_ep_contexts` enabled.
280
289
- The final ONNX Runtime session uses `ep.stop_share_ep_contexts` to indicate that it is the last session in the group.
281
290
Note: A single ONNX model may contain multiple `EPContext` nodes, depending on the graph partitioning result. However, for simplicity, each model is shown with only one `EPcontext` node here.
282
291
283
-
## Implementation Guidelines for EPContext Model Generation with Weight Sharing
292
+
###Implementation Guidelines for EPContext Model Generation with Weight Sharing
284
293
- Shared Workspace Creation:
285
294
<br/> The first session creates a shared workspace (e.g., EP Singleton) to share resources with other sessions.
286
295
- EP Context Binary File Naming:
@@ -298,7 +307,7 @@ Note: A single ONNX model may contain multiple `EPContext` nodes, depending on t
298
307
<br/> For N source models that share weights, a total of N+1 files should be generated.
299
308
<br/> The generated files are `model1_ctx.onnx`, `...`, `modeln_ctx.onnx`, `[model1_name]_[ep].bin`.
300
309
301
-
### User Code Example
310
+
####User Code Example
302
311
```
303
312
Ort::SessionOptions so;
304
313
@@ -321,21 +330,21 @@ Note: A single ONNX model may contain multiple `EPContext` nodes, depending on t
321
330
Ort::Session session2(env, "model2.onnx", so);
322
331
```
323
332
324
-
### General Tool for EPContext Model Generation with Weight Sharing
333
+
####General Tool for EPContext Model Generation with Weight Sharing
325
334
OnnxRuntime provides the [ep_weight_sharing_ctx_gen](https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/test/ep_weight_sharing_ctx_gen) tool to automate the weight-sharing workflow. This tool handles the entire process. This tool is specifically designed for **weight sharing** scenarios, streamlining the `EPContext` model generation process.
It creates two Onnx models (`model1_ctx.onnx`, `model2_ctx.onnx`) and one QNN context binary file (`[model1_name]_[ep].bin`).
331
340
332
-
## Inference Sessions from EPContext Models with Weight Sharing
341
+
###Inference Sessions from EPContext Models with Weight Sharing
333
342
To use the dumped EPContext models with weight sharing enabled, ONNX Runtime inference sessions must have **resource sharing** activated. This is done by setting the session option:
334
343
```
335
344
ep.share_ep_contexts = 1
336
345
```
337
346
338
-
### Implementation Guidelines for Inferencing from EPContext Models with Weight Sharing
347
+
####Implementation Guidelines for Inferencing from EPContext Models with Weight Sharing
339
348
- Create the first OnnxRuntime inference session
340
349
- Set session option: `ep.share_ep_contexts=1`.
341
350
- Load the `model1_ctx.onnx` model.
@@ -355,7 +364,7 @@ To use the dumped EPContext models with weight sharing enabled, ONNX Runtime inf
355
364
- To avoid issues during concurrent execution, it is recommended to **destroy the sessions in reverse order** (i.e., destroy the second session before the first session).
356
365
- This ensures proper resource management and prevents potential conflicts with shared resources.
Copy file name to clipboardExpand all lines: docs/execution-providers/QNN-ExecutionProvider.md
+2-1
Original file line number
Diff line number
Diff line change
@@ -149,9 +149,10 @@ Alternatively to setting profiling_level at compile time, profiling can be enabl
149
149
|'1'|Enable the QNN HTP shared memory allocator. Requires libcdsprpc.so/dll to be available. [Code example](https://github.com/microsoft/onnxruntime/blob/544bdd60730270f49f6a5baafdff54065f626776/onnxruntime/test/shared_lib/test_inference.cc#L2262-L2354)|
150
150
151
151
### Run Options
152
+
152
153
|`"qnn.lora_config"`|Description|
153
154
|---|---|
154
-
|Config path|LoRAv2 config file path. The format of the config will be mentioned in the <b>LoraV2 support</b>.|
155
+
|Config path|LoRAv2 config file path. The format of the config will be mentioned in the **LoraV2 support**.|
0 commit comments