[WebNN EP] Automatically move input CPU tensors to ml-tensor #23073

egalli · 2024-12-11T01:31:30Z

Description

If it would improve performance, this patch moves the CPU to ml-tensor before sending the to the ONNXRuntime WebNN EP.

Motivation and Context

We are currently performing 2 extra copies on input tensors located in the CPU when using the WebNN EP (JS -(copy)-> wasm heap -(copy)-> JS -> WebNN API). This patch removes these extra copies.

### Description If it would improve performance, this patch moves the CPU to ml-tensor before sending the to the ONNXRuntime WebNN EP. ### Motivation and Context We are currently performing 2 extra copies on input tensors located in the CPU when using the WebNN EP (JS -(copy)-> wasm heap -(copy)-> JS -> WebNN API). This patch removes these extra copies.

fdwr

Yulong or Guenther can review this one effectively (as it impacts TypeScript interfaces). I'll start the CI's though.

fdwr · 2024-12-11T03:26:59Z

/azp run ONNX Runtime Web CI Pipeline,Windows GPU CI Pipeline,Linux Android Emulator QNN CI Pipeline

fdwr · 2024-12-11T03:27:02Z

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline

fdwr · 2024-12-11T03:27:04Z

/azp run Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline

fdwr · 2024-12-11T03:27:07Z

/azp run Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,Windows x64 QNN CI Pipeline,Big Models

azure-pipelines · 2024-12-11T03:27:13Z

Azure Pipelines successfully started running 2 pipeline(s).

azure-pipelines · 2024-12-11T03:27:19Z

Azure Pipelines successfully started running 3 pipeline(s).

azure-pipelines · 2024-12-11T03:27:25Z

Azure Pipelines successfully started running 4 pipeline(s).

azure-pipelines · 2024-12-11T03:27:36Z

Azure Pipelines successfully started running 9 pipeline(s).

Honry

👍

js/web/lib/wasm/jsep/backend-webnn.ts

Honry

LGTM % a nit.

js/web/lib/wasm/jsep/backend-webnn.ts

Honry · 2024-12-12T00:46:49Z

@fs-eire, @guschmue, pls. take another look, thanks!

js/web/lib/wasm/wasm-core-impl.ts

…sion

guschmue · 2024-12-17T18:00:44Z

/azp run ONNX Runtime Web CI Pipeline,Windows GPU CI Pipeline,Linux Android Emulator QNN CI Pipeline

guschmue · 2024-12-17T18:00:52Z

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline

guschmue · 2024-12-17T18:01:00Z

/azp run Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,Windows x64 QNN CI Pipeline,Big Models

azure-pipelines · 2024-12-17T18:01:01Z

Azure Pipelines successfully started running 2 pipeline(s).

guschmue · 2024-12-17T18:01:07Z

/azp run Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline

azure-pipelines · 2024-12-17T18:01:22Z

Azure Pipelines successfully started running 3 pipeline(s).

azure-pipelines · 2024-12-17T18:01:23Z

Azure Pipelines successfully started running 4 pipeline(s).

azure-pipelines · 2024-12-17T18:01:34Z

Azure Pipelines successfully started running 9 pipeline(s).

fdwr · 2025-01-09T00:19:43Z

/azp run Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,Windows x64 QNN CI Pipeline,Big Models

azure-pipelines · 2025-01-09T00:19:51Z

Azure Pipelines successfully started running 2 pipeline(s).

azure-pipelines · 2025-01-09T00:19:56Z

Azure Pipelines successfully started running 3 pipeline(s).

azure-pipelines · 2025-01-09T00:20:01Z

Azure Pipelines successfully started running 4 pipeline(s).

azure-pipelines · 2025-01-09T00:20:15Z

Azure Pipelines successfully started running 9 pipeline(s).

egalli · 2025-01-21T22:58:32Z

@fdwr on updates on this PR?

fdwr · 2025-01-22T01:54:02Z

@fdwr on updates on this PR?

@fs-eire or @guschmue, can you review the Typescript interface JSEP changes?

guschmue · 2025-02-06T23:43:03Z

/azp run Win_TRT_Minimal_CUDA_Test_CI

azure-pipelines · 2025-02-06T23:43:15Z

Azure Pipelines successfully started running 1 pipeline(s).

guschmue · 2025-02-06T23:45:50Z

/azp run ONNX Runtime Web CI Pipeline,Windows GPU CI Pipeline,Linux Android Emulator QNN CI Pipeline

guschmue · 2025-02-06T23:46:01Z

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline

azure-pipelines · 2025-02-06T23:46:05Z

Azure Pipelines successfully started running 2 pipeline(s).

guschmue · 2025-02-06T23:46:12Z

/azp run Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,Windows x64 QNN CI Pipeline,Big Models

guschmue · 2025-02-06T23:46:21Z

/azp run Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI

azure-pipelines · 2025-02-06T23:46:35Z

Azure Pipelines successfully started running 4 pipeline(s).

azure-pipelines · 2025-02-06T23:46:43Z

Azure Pipelines successfully started running 9 pipeline(s).

azure-pipelines · 2025-02-06T23:46:47Z

Azure Pipelines successfully started running 4 pipeline(s).

### Description If it would improve performance, this patch moves the CPU to ml-tensor before sending the to the ONNXRuntime WebNN EP. ### Motivation and Context We are currently performing 2 extra copies on input tensors located in the CPU when using the WebNN EP (JS -(copy)-> wasm heap -(copy)-> JS -> WebNN API). This patch removes these extra copies.

### Description This PR is to update the win-ort-main branch to the tip main branch as of 2025-02-11. ### PR List 74c778e [WebNN EP] Automatically move input CPU tensors to ml-tensor (#23073) 3775057 use correct total length to fix static kv_cache performance (#23615) 3901e96 remove --use_vcpkg flag for Python-CUDA-Packaging-Pipeline (#23631) c610df5 Add python_requires to package metadata (#23604) 2d27d68 [QNN EP] Add QNN EP to ARM64X build targets (#23635) e666503 [webgpu] no longer need pass-in gpu adapter for custom context (#23593) af679a0 Fix logic for selecting alternate name for blob (#23617) e206950 [ARM CPU] Add fp16 mlas kernels for exp, tanh, softmax, logsoftmax, softcap (#23597) 9ba5619 Update pybind and json to the latest (#23589) c54736c Migrate iOS release pipeline to 1 ES (#23606) 3981326 Increase timeout for Windows TensorRT CI (#23625) 0274b7b fix on trtCudaVersion (#23616) 740e9ab update run CI script (#23621) 5ef1832 [WebGPU] Support PIX Capture for WebGPU EP (#23192) 0114551 Fix for C4267 warning (#23610) 002916a Validate the context_file_path before EP compile graphs (#23611) 0887e36 [webgpu] Use pushErrorScope()/popErrorScope() once for an inference run (#23438) 65008cb Auto-generated baselines by 1ES Pipeline Templates (#23603) 09e5724 [CUDA] Fix beam search of num_beams > 32 (#23599) 82840f6 Implement Flash Attention 2 for webgpu EP (#23576) a6ea57b OpenVINO EP Weights Sharing Feature (#23553) 2c2ff4a [CUDA] Fix BeamSearchTest.DummyT5WithSequenceInputIds test failure in Windows (#23596) d981b15 [webgpu/js] Optimize resize webgpu op & fix precision issues (#23591) 328a13c Enable VCPKG in more pipelines (#23590) 6728d60 [TensorRT EP] support TensorRT 10.8-GA (#23592) d1fb58b Quantization tool: Allow user to override calibrator's session EP (#23559) 649ced4 Enable user loading model with external data from memory buffer (#23557) 544bdd6 Fix ConvTranspose for certain attribute combinations (#23488) 8f6ddf3 Delete extra cgmanifest entries and files (#23583) 5f6a315 Enable VCPKG in CI build (#23426) e1e3f62 Bump lintrunner from 0.12.5 to 0.12.7 (#23326) cd8775f Fix Node JS Samples (#23581) 6b4f9c4 [WebGPU EP] Batch Norm Implementation (#23525) 1fce51b Fix all instances of 4244 and 4267 warnings in OV EP code (#23567) c29ca1c Update QNN default version to 2.31 (#23573) 2fc75a4 [mobile] Add Android BrowserStack test project back (#23551) 9e18b6a [CUDA] Update nvcc flags (#23572) b47e1e6 [QNN EP] Make offloading graph input/output quantization (to CPU) the default (#23368) 75a9b40 [ROCm] Update CI to use rocm 6.3.2 (#23577) 26ff2b6 Bump ruff from 0.9.3 to 0.9.4 (#23563) b2560a7 Update react-native to 0.72 (#23509) faee912 [js] update JavaScript API to support QNN EP options (#23486) 816e8cb [EP Perf] Update env to ubuntu 22.04 (#23570) cddc271 Use Eigen in Round implementation (#23571) e8b0bdb Shape inference: ReduceMean dispatcher, quant_pre_process: skip_symbolic_shape bugfix (#23558) 267b493 delete the supported domain version upper bounds (#23237) bb7f961 remove log spam from cpuinfo (#23548) 169917b Use latest vcpkg commit in configuration, sync manifest with deps.txt (#23554) a9d4d08 Add of ReduceMax Gradient (#23501) 6bbf1bd [js/web] upgrade version of flatbuffers (#23545) 271c509 DP4AMatMul perf refinements (#23539) cb69c59 Add fusions for SigLIP and Conformer-Encoder (#23528) 61fae9b Remove "--enable_pybind" from webgpu pipeline (#23550) 0bb4ea6 Update BiasGelu fusion and related ops (#23518) 4dde74a Add more details to BrowserStack script failure (#23520) ead9d5c Set ANDROID_USE_LEGACY_TOOLCHAIN_FILE to false (#23544) 7e24088 Enable dlpack by default (#23110) dc2f7a9 Add overload of `TryParseStringWithClassicLocale()` that uses `std::from_chars()` (#23541) 5407c69 Fix the issue that the new generated EP context model not able to find external data (#23537) fbae88f [js/web] use the recommended workaround for Vite (#23531) d5338da Fix tensor external data info length parsing issue. (#23526) e3e4173 [ROCm EP] Fix transpose helper for gfx gridsize constraints (#23527) 80bc1d2 Enable Ep context with external data for CPU nodes (#23498) bf023ab [js/web] allow import .mjs/.wasm file (#23487) 655a23f [onnxruntime/build] Add new flag enable_generic_interface to build primary EPs by default (#23342) a770a8d Update RN to 0.71.19 (#23381) 1cf0ebd Delete Prefast workflow until the build failure is fixed (#23510) d2c5e24 Add of GlobalMaxPool Gradient (#23502) ded8730 Remove thrust::unary_function (#23506) 8db97a6 [webgpu] Bump version of Dawn to b9b4a370 (#23494) fdde2e2 Fix for gcc 13.3.1: Avoid creating a copy (#23500) 96ec1dd Bump ruff from 0.9.2 to 0.9.3 (#23496) 42f0c00 Adds the new System.Numerics.Tensors as an input/output type when using dotnet 8.0 and up. (#23261) 97c2bbe Fix shape infer of onnx GroupNorm (#23477) 1fc9c48 Enable coremltools for Linux build (#23481) 13348c5 [ARM CPU] hgemm optimized for gqa (#23107) c89a798 Enable opti on Microsoft.ML.OnnxRuntime with RelWithDebInfo config (#23463) d00ae32 Revert "[Mobile] Add BrowserStack Android MAUI Test (#23383)" (#23474) 8b1d3b3 Align AvgPool ceil_mode on last value to torch (#16752) 06fc73b [TRT EP Perf Tool] Add annotations import to python script to support annotations on Python 3.8 (#23466) ### Motivation and Context This update includes the change to add QNN EP to ARM64X build targets. --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: Adrian Lizarraga <[email protected]> Co-authored-by: Ti-Tai Wang <[email protected]> Co-authored-by: Caroline Zhu <[email protected]> Co-authored-by: Grégoire <[email protected]> Co-authored-by: Jing Fang <[email protected]> Co-authored-by: Changming Sun <[email protected]> Co-authored-by: Yateng Hong <[email protected]> Co-authored-by: Michael Sharp <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Malik Shahzad Muzaffar <[email protected]> Co-authored-by: Yulong Wang <[email protected]> Co-authored-by: Dmitri Smirnov <[email protected]> Co-authored-by: Corentin Maravat <[email protected]> Co-authored-by: Jian Chen <[email protected]> Co-authored-by: Karim Vadsariya <[email protected]> Co-authored-by: Lei Cao <[email protected]> Co-authored-by: Karim Vadsariya <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Hector Li <[email protected]> Co-authored-by: Ted Themistokleous <[email protected]> Co-authored-by: Ted Themistokleous <[email protected]> Co-authored-by: Edward Chen <[email protected]> Co-authored-by: Takeshi Watanabe <[email protected]> Co-authored-by: Xavier Dupré <[email protected]> Co-authored-by: Justin Chu <[email protected]> Co-authored-by: Tianlei Wu <[email protected]> Co-authored-by: kunal-vaishnavi <[email protected]> Co-authored-by: Sushanth Rajasankar <[email protected]> Co-authored-by: PARK DongHa <[email protected]> Co-authored-by: George Wu <[email protected]> Co-authored-by: Xinpeng Dou <[email protected]> Co-authored-by: Jambay Kinley <[email protected]> Co-authored-by: Yifan Li <[email protected]> Co-authored-by: Gavin Kinsey <[email protected]> Co-authored-by: Prathik Rao <[email protected]> Co-authored-by: Jon Campbell <[email protected]> Co-authored-by: Satya Kumar Jandhyala <[email protected]> Co-authored-by: Joshua Lochner <[email protected]> Co-authored-by: Ankit Maheshkar <[email protected]> Co-authored-by: jatinwadhwa921 <[email protected]> Co-authored-by: jatinwadhwa921 <[email protected]> Co-authored-by: saurabh <[email protected]> Co-authored-by: TejalKhade28 <[email protected]> Co-authored-by: sfatimar <[email protected]> Co-authored-by: Javier E. Martinez <[email protected]> Co-authored-by: Preetha Veeramalai <[email protected]> Co-authored-by: Eric Crawford <[email protected]> Co-authored-by: microsoft-github-policy-service[bot] <77245923+microsoft-github-policy-service[bot]@users.noreply.github.com> Co-authored-by: Jie Chen <[email protected]> Co-authored-by: shaoboyan091 <[email protected]> Co-authored-by: David Hotham <[email protected]> Co-authored-by: Guenther Schmuelling <[email protected]> Co-authored-by: Enrico Galli <[email protected]>

### Description If it would improve performance, this patch moves the CPU to ml-tensor before sending the to the ONNXRuntime WebNN EP. ### Motivation and Context We are currently performing 2 extra copies on input tensors located in the CPU when using the WebNN EP (JS -(copy)-> wasm heap -(copy)-> JS -> WebNN API). This patch removes these extra copies.

### Description If it would improve performance, this patch moves outputs to MLTensor backed Tensors. ### Motivation and Context We are currently performing an extra copy on output tensors located in the CPU when using the WebNN EP (MLTensor -(copy)-> wasm heap -(copy)-> JS). This patch removes this copy by moving the readback to JS instead of wasm. As an extra benefit, we can also start the readbacks and wait for them in parallel. This change is similar to #23073

fdwr reviewed Dec 11, 2024

View reviewed changes

Honry reviewed Dec 11, 2024

View reviewed changes

js/web/lib/wasm/jsep/backend-webnn.ts Outdated Show resolved Hide resolved

js/web/lib/wasm/jsep/backend-webnn.ts Outdated Show resolved Hide resolved

PR feedback

be01b60

Honry approved these changes Dec 12, 2024

View reviewed changes

js/web/lib/wasm/jsep/backend-webnn.ts Outdated Show resolved Hide resolved

More renames from tensor(s) to tensorId(s)

5e3295f

egalli commented Dec 13, 2024

View reviewed changes

js/web/lib/wasm/wasm-core-impl.ts Outdated Show resolved Hide resolved

egalli added 2 commits December 13, 2024 14:07

Merge remote-tracking branch 'origin/main' into promote_inputs

21edcaf

Pass sessionHandle/Id directly to function instead of using activeSes…

d66258f

…sion

guschmue added the ep:WebNN WebNN execution provider label Dec 16, 2024

fdwr requested a review from fs-eire December 18, 2024 22:13

egalli added 2 commits January 6, 2025 13:45

Merge remote-tracking branch 'origin/main' into promote_inputs

765f6ea

Missing sessionId from jsepRegisterMLTensor

a56e82a

guschmue approved these changes Feb 6, 2025

View reviewed changes

ibelem mentioned this pull request Feb 8, 2025

Switching backends yields error - Failed to execute 'dispatch' on 'MLContext': Invalid inputs: The context of MLGraph doesn't match the context of the MLTensor with name "pixel_values" microsoft/webnn-developer-preview#69

Closed

guschmue merged commit 74c778e into microsoft:main Feb 11, 2025
83 checks passed

ashrit-ms mentioned this pull request Feb 11, 2025

Update win-ort-main to tip main 250211 #23646

Merged

ibelem mentioned this pull request Mar 10, 2025

No active session at get currentSessionId when run "Classify" again microsoft/webnn-developer-preview#78

Closed

ibelem mentioned this pull request Mar 27, 2025

Update transformers.js dist microsoft/webnn-developer-preview#79

Merged

egalli mentioned this pull request Apr 2, 2025

[WebNN EP] Automatically use ml-tensor for outputs #24282

Merged

[WebNN EP] Automatically move input CPU tensors to ml-tensor #23073

[WebNN EP] Automatically move input CPU tensors to ml-tensor #23073

Uh oh!

Conversation

egalli commented Dec 11, 2024

Description

Motivation and Context

Uh oh!

fdwr left a comment

Choose a reason for hiding this comment

Uh oh!

fdwr commented Dec 11, 2024

Uh oh!

fdwr commented Dec 11, 2024

Uh oh!

fdwr commented Dec 11, 2024

Uh oh!

fdwr commented Dec 11, 2024

Uh oh!

azure-pipelines bot commented Dec 11, 2024

Uh oh!

azure-pipelines bot commented Dec 11, 2024

Uh oh!

azure-pipelines bot commented Dec 11, 2024

Uh oh!

azure-pipelines bot commented Dec 11, 2024

Uh oh!

Honry left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Honry left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Honry commented Dec 12, 2024

Uh oh!

Uh oh!

guschmue commented Dec 17, 2024

Uh oh!

guschmue commented Dec 17, 2024

Uh oh!

guschmue commented Dec 17, 2024

Uh oh!

azure-pipelines bot commented Dec 17, 2024

Uh oh!

guschmue commented Dec 17, 2024

Uh oh!

azure-pipelines bot commented Dec 17, 2024

Uh oh!

azure-pipelines bot commented Dec 17, 2024

Uh oh!

azure-pipelines bot commented Dec 17, 2024

Uh oh!

fdwr commented Jan 9, 2025

Uh oh!

azure-pipelines bot commented Jan 9, 2025

Uh oh!

azure-pipelines bot commented Jan 9, 2025

Uh oh!

azure-pipelines bot commented Jan 9, 2025

Uh oh!

azure-pipelines bot commented Jan 9, 2025

Uh oh!

egalli commented Jan 21, 2025

Uh oh!

fdwr commented Jan 22, 2025

Uh oh!

guschmue commented Feb 6, 2025

Uh oh!

azure-pipelines bot commented Feb 6, 2025

Uh oh!

guschmue commented Feb 6, 2025

Uh oh!

guschmue commented Feb 6, 2025

Uh oh!

azure-pipelines bot commented Feb 6, 2025

Uh oh!

guschmue commented Feb 6, 2025