[js/web] Add Wasm Relaxed SIMD support to wasm backend #22794

jing-bao · 2024-11-11T06:04:49Z

Description

Add Wasm Relaxed SIMD support.
Use integer dot product instructions for QGemmU8X8.

Build with --enable_wasm_relaxed_simd
Use env.wasm.relaxedSimd to run it

Motivation and Context

jing-bao · 2024-11-14T01:38:55Z

Hi, @fs-eire , would you please take a look?
I haven't modified the CI pipeline file yet, just locally pass the unit tests and "npm test". If acceptable, I can add them in a new PR.
Also, clang-format will modify many mlas files aside from my added cpp file, so I'm not sure if I need to run that. Maybe you can give me some hints on the file formatting.

tools/ci_build/build.py

cmake/onnxruntime_mlas.cmake

onnxruntime/core/mlas/inc/mlas.h

guschmue · 2024-11-25T16:06:49Z

sorry, in my queue but have not gotten to it

tarekziade · 2024-11-26T10:12:56Z

I am trying this patch, I have built a small demo to compare relaxed vs non relaxed in Firefox Nightly.

https://github.com/tarekziade/onnx-relaxed-simd

So far I am not seeing any difference, digging...

onnxruntime/core/mlas/lib/qgemm_kernel_wasmrelaxedsimd.cpp

tarekziade · 2024-11-26T15:02:42Z

So I forced HasUSDot to true in the patch to try to see the difference -- assumming maybe that magic function only worked for x86, and it's definitely changing things. The summarizer model now produces gibberish and is 20% slower.

Is this patch for x86 only?

yurydelendik · 2024-11-26T15:18:00Z

If this patch is trying to exploit vpdpbusd lowering for x86 platform, then it is a bad approach and this patch shall be rejected. Use of wasm_i32x4_relaxed_dot_i8x16_i7x16_add shall respect i7x16 argument restriction, and assume nondeterministic/invalid result if arguments are out range i7.

Please confirm if this the case.

jing-bao · 2024-11-27T01:43:23Z

So I forced HasUSDot to true in the patch to try to see the difference -- assumming maybe that magic function only worked for x86, and it's definitely changing things. The summarizer model now produces gibberish and is 20% slower.

Is this patch for x86 only?

Yes this patch is currently for x86 with AVX-VNNI only. The check (maybe a similar HasSDOT) and the proper kernel for arm machines is not included here.

jing-bao · 2024-11-27T02:05:30Z

If this patch is trying to exploit vpdpbusd lowering for x86 platform, then it is a bad approach and this patch shall be rejected. Use of wasm_i32x4_relaxed_dot_i8x16_i7x16_add shall respect i7x16 argument restriction, and assume nondeterministic/invalid result if arguments are out range i7.

Please confirm if this the case.

Yes... The patch uses a check HasUSDot to limit the case to work more strictly than the wasm relaxed-simd spec has expected. It checks both the overflow nondeterministic and the signed/unsigned nondeterministic. So I guess there's no practical correctness issue here?

My local experiment shows ~1.15x performance for SAM model, and this is the best way I can think of to make use of the native AI HW capabilities in the world of Wasm. Actually XNNPACK also uses wasm_i32x4_relaxed_dot_i8x16_i7x16_add in a similar way. But I'd be happy to improve the code if we have a more elegant way.

yurydelendik · 2024-11-27T03:01:34Z

cc @dtig and @ppenzin as they might be interested in the discussion

tarekziade · 2024-11-27T10:33:36Z

Thanks @jing-bao for the feedback.

I tried on my AMD Ryzen Threadripper PRO (x86_64) and I get some errors, unrecognized opcode fd113 for my summarizer demo. Maybe the way I compiled it was not right?

I used those flags to compile

--config Release --build_wasm --skip_tests \
              --disable_wasm_exception_catching --disable_rtti \
              --enable_wasm_threads --enable_wasm_simd --use_jsep \
   ---enable_wasm_relaxed_simd

and pushed the builds to https://github.com/tarekziade/onnx-relaxed-simd/tree/main

Do you have your example available somehwere so I can try it? maybe the model I used triggers specific opcodes

jing-bao · 2024-11-28T01:52:35Z

unrecognized opcode fd113

Hi @tarekziade , The build flags seem OK. This error looks like that the relaxed-simd opcode i32x4.relaxed_dot_i8x16_i7x16_add_s is not recognized. What browser did you use for the test? Maybe there's some strange version of browser in which only part of the relaxed-simd opcodes are supported.
Could you please help also test the latest Chrome?
If my guess is right, I can update the isRelaxedSimdSupported function to test more exactly to fix it.

tarekziade · 2024-11-28T12:48:26Z

unrecognized opcode fd113

Hi @tarekziade , The build flags seem OK. This error looks like that the relaxed-simd opcode i32x4.relaxed_dot_i8x16_i7x16_add_s is not recognized. What browser did you use for the test? Maybe there's some strange version of browser in which only part of the relaxed-simd opcodes are supported. Could you please help also test the latest Chrome? If my guess is right, I can update the isRelaxedSimdSupported function to test more exactly to fix it.

This is in Firefox Nightly 134. I see it defined in our code base https://searchfox.org/mozilla-central/source/third_party/rust/wast/src/core/expr.rs#1189

@yurydelendik do you know if we have partial implementation there?

tarekziade · 2024-11-28T12:48:41Z

Could you please help also test the latest Chrome?

For sure

jing-bao · 2024-12-03T02:27:14Z

Hi @fs-eire and @tarekziade, I updated the code to add enable_wasm_simd + enable_wasm_relaxed_simd check and check the exact dot instruciton in isRelaxedSimdSupported. Please take a look again. Thanks a lot!

tarekziade · 2025-01-16T07:55:32Z

Thanks, looking this week

guschmue · 2025-01-16T20:56:46Z

/azp run ONNX Runtime Web CI Pipeline,Windows GPU CI Pipeline,Linux Android Emulator QNN CI Pipeline

guschmue · 2025-01-16T20:56:52Z

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline

guschmue · 2025-01-16T20:56:58Z

/azp run Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,Windows x64 QNN CI Pipeline,Big Models

azure-pipelines · 2025-01-16T20:57:05Z

Azure Pipelines successfully started running 2 pipeline(s).

guschmue · 2025-01-16T20:57:05Z

/azp run Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI

azure-pipelines · 2025-01-16T20:57:18Z

Azure Pipelines successfully started running 4 pipeline(s).

azure-pipelines · 2025-01-16T20:57:21Z

Azure Pipelines successfully started running 4 pipeline(s).

azure-pipelines · 2025-01-16T20:57:33Z

Azure Pipelines successfully started running 9 pipeline(s).

jing-bao · 2025-02-10T01:52:43Z

Hi, just a gentle reminder to check if there are any follow-ups needed from my side. Thanks!

jing-bao · 2025-02-26T03:13:56Z

Hi @fs-eire , any follow-ups needed from my side? Thanks!

azure-pipelines · 2025-03-13T05:43:51Z

Azure Pipelines successfully started running 9 pipeline(s).

azure-pipelines · 2025-03-13T05:43:52Z

Azure Pipelines successfully started running 9 pipeline(s).

fs-eire · 2025-03-13T18:18:45Z

Close the PR and re-open to trigger the github builds.

fs-eire · 2025-03-13T18:20:41Z

/azp run Windows GPU WebGPU CI Pipeline, Linux ROCm CI Pipeline, Windows ARM64 QNN CI Pipeline

azure-pipelines · 2025-03-13T18:20:57Z

Azure Pipelines successfully started running 2 pipeline(s).

jing-bao · 2025-03-14T02:35:13Z

Approve for all the codes except the MLAS changes.

Thanks @fs-eire! Do I need other reviewers for MLAS changes?

jywu-msft · 2025-03-17T22:25:15Z

@liqunfu @fajin-corp can you review the proposed changes to mlas files?

jing-bao · 2025-03-21T01:24:43Z

A soft reminder @liqunfu @fajin-corp. Thanks!

fs-eire · 2025-03-27T23:02:24Z

Close and re-open to trigger the pipeline

fs-eire · 2025-03-28T21:34:20Z

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,ONNX Runtime Web CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline

fs-eire · 2025-03-28T21:34:21Z

/azp run Linux QNN CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Linux Android Emulator QNN CI Pipeline,Android CI Pipeline,iOS CI Pipeline,ONNX Runtime React Native CI Pipeline,Linux DNNL CI Pipeline,Linux MIGraphX CI Pipeline,Linux ROCm CI Pipeline

azure-pipelines · 2025-03-28T21:34:49Z

Azure Pipelines successfully started running 7 pipeline(s).

azure-pipelines · 2025-03-28T21:34:50Z

Azure Pipelines successfully started running 7 pipeline(s).

fs-eire

I consider this PR is good to merge:

all critical pipelines including the old Web CI and the new (github actions based) Web CI are passing. there are failing CIs but those are the ones under migration, while the old ones are passing.
The MLAS part is generally new code and modification to existing code is guarded by macros so it should be safe for existing users.

thanks for the contribution. @jing-bao

@Eldow

### Description This PR revised the flag `ort.env.wasm.simd` to enhance its usage so that more use scenarios are covered. - Allow setting to `false` explicitly to disable SIMD checking. resolves #24292 (@Eldow) - Allow setting to `'relaxed'` to enable Relaxed SIMD checking. Relaxed SIMD is introduced first in #22794 (@jing-bao) - Behavior is not changed when not setting (ie. `undefined`) or setting to `true` - Added a warning message when setting to unknown value, and reset to `false` in this case

### Description  Add Wasm Relaxed SIMD support. Use integer dot product instructions for QGemmU8X8. 1. Build with --enable_wasm_relaxed_simd 2. Use env.wasm.relaxedSimd to run it ### Motivation and Context microsoft#22533 --------- Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>

@Eldow

### Description This PR revised the flag `ort.env.wasm.simd` to enhance its usage so that more use scenarios are covered. - Allow setting to `false` explicitly to disable SIMD checking. resolves microsoft#24292 (@Eldow) - Allow setting to `'relaxed'` to enable Relaxed SIMD checking. Relaxed SIMD is introduced first in microsoft#22794 (@jing-bao) - Behavior is not changed when not setting (ie. `undefined`) or setting to `true` - Added a warning message when setting to unknown value, and reset to `false` in this case

jing-bao requested a review from a team as a code owner November 11, 2024 06:04

fs-eire reviewed Nov 24, 2024

View reviewed changes

tools/ci_build/build.py Show resolved Hide resolved

fs-eire reviewed Nov 24, 2024

View reviewed changes

cmake/onnxruntime_mlas.cmake Outdated Show resolved Hide resolved

fs-eire reviewed Nov 24, 2024

View reviewed changes

onnxruntime/core/mlas/inc/mlas.h Show resolved Hide resolved

yurydelendik reviewed Nov 26, 2024

View reviewed changes

onnxruntime/core/mlas/lib/qgemm_kernel_wasmrelaxedsimd.cpp Show resolved Hide resolved

ppenzin mentioned this pull request Feb 14, 2025

SIMD subgroup monthly meetings WebAssembly/flexible-vectors#70

Open

fs-eire closed this Mar 13, 2025

fs-eire reopened this Mar 13, 2025

jywu-msft requested review from fajin-corp and liqunfu March 21, 2025 03:45

fs-eire closed this Mar 27, 2025

fs-eire reopened this Mar 27, 2025

fs-eire added 2 commits March 27, 2025 16:53

Merge remote-tracking branch 'origin/main' into vnni-upstream

Loading
Loading status checks…

b71f684

Merge remote-tracking branch 'origin/main' into vnni-upstream

Loading
Loading status checks…

629bebd

fs-eire approved these changes Mar 29, 2025

View reviewed changes

jywu-msft approved these changes Mar 29, 2025

View reviewed changes

guschmue approved these changes Mar 29, 2025

View reviewed changes

liqunfu approved these changes Mar 31, 2025

View reviewed changes

fs-eire merged commit ba2999c into microsoft:main Mar 31, 2025
81 of 91 checks passed

fs-eire mentioned this pull request Apr 4, 2025

[web] revise flag ort.env.wasm.simd #24314

Merged

[js/web] Add Wasm Relaxed SIMD support to wasm backend #22794

[js/web] Add Wasm Relaxed SIMD support to wasm backend #22794

Conversation

jing-bao commented Nov 11, 2024

Description

Motivation and Context

jing-bao commented Nov 14, 2024

guschmue commented Nov 25, 2024

tarekziade commented Nov 26, 2024

tarekziade commented Nov 26, 2024

yurydelendik commented Nov 26, 2024 • edited Loading

jing-bao commented Nov 27, 2024

jing-bao commented Nov 27, 2024

yurydelendik commented Nov 27, 2024

tarekziade commented Nov 27, 2024

jing-bao commented Nov 28, 2024

tarekziade commented Nov 28, 2024

tarekziade commented Nov 28, 2024

jing-bao commented Dec 3, 2024

tarekziade commented Jan 16, 2025

guschmue commented Jan 16, 2025

guschmue commented Jan 16, 2025

guschmue commented Jan 16, 2025

azure-pipelines bot commented Jan 16, 2025

guschmue commented Jan 16, 2025

azure-pipelines bot commented Jan 16, 2025

azure-pipelines bot commented Jan 16, 2025

azure-pipelines bot commented Jan 16, 2025

jing-bao commented Feb 10, 2025

jing-bao commented Feb 26, 2025

azure-pipelines bot commented Mar 13, 2025

azure-pipelines bot commented Mar 13, 2025

fs-eire commented Mar 13, 2025

fs-eire commented Mar 13, 2025

azure-pipelines bot commented Mar 13, 2025

jing-bao commented Mar 14, 2025

jywu-msft commented Mar 17, 2025

jing-bao commented Mar 21, 2025

fs-eire commented Mar 27, 2025

fs-eire commented Mar 28, 2025

fs-eire commented Mar 28, 2025

azure-pipelines bot commented Mar 28, 2025

azure-pipelines bot commented Mar 28, 2025

fs-eire left a comment

Choose a reason for hiding this comment

yurydelendik commented Nov 26, 2024 •

edited

Loading