Skip to content

[js/web] Add Wasm Relaxed SIMD support to wasm backend #22794

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Mar 31, 2025

Conversation

jing-bao
Copy link
Contributor

Description

Add Wasm Relaxed SIMD support.
Use integer dot product instructions for QGemmU8X8.

  1. Build with --enable_wasm_relaxed_simd
  2. Use env.wasm.relaxedSimd to run it

Motivation and Context

#22533

@jing-bao jing-bao requested a review from a team as a code owner November 11, 2024 06:04
@jing-bao
Copy link
Contributor Author

Hi, @fs-eire , would you please take a look?
I haven't modified the CI pipeline file yet, just locally pass the unit tests and "npm test". If acceptable, I can add them in a new PR.
Also, clang-format will modify many mlas files aside from my added cpp file, so I'm not sure if I need to run that. Maybe you can give me some hints on the file formatting.

@guschmue
Copy link
Contributor

sorry, in my queue but have not gotten to it

@tarekziade
Copy link

I am trying this patch, I have built a small demo to compare relaxed vs non relaxed in Firefox Nightly.

https://github.com/tarekziade/onnx-relaxed-simd

So far I am not seeing any difference, digging...

@tarekziade
Copy link

So I forced HasUSDot to true in the patch to try to see the difference -- assumming maybe that magic function only worked for x86, and it's definitely changing things. The summarizer model now produces gibberish and is 20% slower.

Is this patch for x86 only?

@yurydelendik
Copy link

yurydelendik commented Nov 26, 2024

If this patch is trying to exploit vpdpbusd lowering for x86 platform, then it is a bad approach and this patch shall be rejected. Use of wasm_i32x4_relaxed_dot_i8x16_i7x16_add shall respect i7x16 argument restriction, and assume nondeterministic/invalid result if arguments are out range i7.

Please confirm if this the case.

@jing-bao
Copy link
Contributor Author

So I forced HasUSDot to true in the patch to try to see the difference -- assumming maybe that magic function only worked for x86, and it's definitely changing things. The summarizer model now produces gibberish and is 20% slower.

Is this patch for x86 only?

Yes this patch is currently for x86 with AVX-VNNI only. The check (maybe a similar HasSDOT) and the proper kernel for arm machines is not included here.

@jing-bao
Copy link
Contributor Author

If this patch is trying to exploit vpdpbusd lowering for x86 platform, then it is a bad approach and this patch shall be rejected. Use of wasm_i32x4_relaxed_dot_i8x16_i7x16_add shall respect i7x16 argument restriction, and assume nondeterministic/invalid result if arguments are out range i7.

Please confirm if this the case.

Yes... The patch uses a check HasUSDot to limit the case to work more strictly than the wasm relaxed-simd spec has expected. It checks both the overflow nondeterministic and the signed/unsigned nondeterministic. So I guess there's no practical correctness issue here?

My local experiment shows ~1.15x performance for SAM model, and this is the best way I can think of to make use of the native AI HW capabilities in the world of Wasm. Actually XNNPACK also uses wasm_i32x4_relaxed_dot_i8x16_i7x16_add in a similar way. But I'd be happy to improve the code if we have a more elegant way.

@yurydelendik
Copy link

cc @dtig and @ppenzin as they might be interested in the discussion

@tarekziade
Copy link

Thanks @jing-bao for the feedback.

I tried on my AMD Ryzen Threadripper PRO (x86_64) and I get some errors, unrecognized opcode fd113 for my summarizer demo. Maybe the way I compiled it was not right?

I used those flags to compile

--config Release --build_wasm --skip_tests \
              --disable_wasm_exception_catching --disable_rtti \
              --enable_wasm_threads --enable_wasm_simd --use_jsep \
   ---enable_wasm_relaxed_simd

and pushed the builds to https://github.com/tarekziade/onnx-relaxed-simd/tree/main

Do you have your example available somehwere so I can try it? maybe the model I used triggers specific opcodes

@jing-bao
Copy link
Contributor Author

unrecognized opcode fd113

Hi @tarekziade , The build flags seem OK. This error looks like that the relaxed-simd opcode i32x4.relaxed_dot_i8x16_i7x16_add_s is not recognized. What browser did you use for the test? Maybe there's some strange version of browser in which only part of the relaxed-simd opcodes are supported.
Could you please help also test the latest Chrome?
If my guess is right, I can update the isRelaxedSimdSupported function to test more exactly to fix it.

@tarekziade
Copy link

unrecognized opcode fd113

Hi @tarekziade , The build flags seem OK. This error looks like that the relaxed-simd opcode i32x4.relaxed_dot_i8x16_i7x16_add_s is not recognized. What browser did you use for the test? Maybe there's some strange version of browser in which only part of the relaxed-simd opcodes are supported. Could you please help also test the latest Chrome? If my guess is right, I can update the isRelaxedSimdSupported function to test more exactly to fix it.

This is in Firefox Nightly 134. I see it defined in our code base https://searchfox.org/mozilla-central/source/third_party/rust/wast/src/core/expr.rs#1189

@yurydelendik do you know if we have partial implementation there?

@tarekziade
Copy link

Could you please help also test the latest Chrome?

For sure

@jing-bao
Copy link
Contributor Author

jing-bao commented Dec 3, 2024

Hi @fs-eire and @tarekziade, I updated the code to add enable_wasm_simd + enable_wasm_relaxed_simd check and check the exact dot instruciton in isRelaxedSimdSupported. Please take a look again. Thanks a lot!

@tarekziade
Copy link

Thanks, looking this week

@guschmue
Copy link
Contributor

/azp run ONNX Runtime Web CI Pipeline,Windows GPU CI Pipeline,Linux Android Emulator QNN CI Pipeline

@guschmue
Copy link
Contributor

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline

@guschmue
Copy link
Contributor

/azp run Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,Windows x64 QNN CI Pipeline,Big Models

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@guschmue
Copy link
Contributor

/azp run Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI

Copy link

Azure Pipelines successfully started running 4 pipeline(s).

1 similar comment
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

Copy link

Azure Pipelines successfully started running 9 pipeline(s).

@jing-bao
Copy link
Contributor Author

Hi, just a gentle reminder to check if there are any follow-ups needed from my side. Thanks!

@jing-bao
Copy link
Contributor Author

Hi @fs-eire , any follow-ups needed from my side? Thanks!

Copy link

Azure Pipelines successfully started running 9 pipeline(s).

1 similar comment
Copy link

Azure Pipelines successfully started running 9 pipeline(s).

@fs-eire
Copy link
Contributor

fs-eire commented Mar 13, 2025

Close the PR and re-open to trigger the github builds.

@fs-eire fs-eire closed this Mar 13, 2025
@fs-eire fs-eire reopened this Mar 13, 2025
@fs-eire
Copy link
Contributor

fs-eire commented Mar 13, 2025

/azp run Windows GPU WebGPU CI Pipeline, Linux ROCm CI Pipeline, Windows ARM64 QNN CI Pipeline

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@jing-bao
Copy link
Contributor Author

Approve for all the codes except the MLAS changes.

Thanks @fs-eire! Do I need other reviewers for MLAS changes?

@jywu-msft
Copy link
Member

@liqunfu @fajin-corp can you review the proposed changes to mlas files?

@jing-bao
Copy link
Contributor Author

A soft reminder @liqunfu @fajin-corp. Thanks!

@jywu-msft jywu-msft requested review from fajin-corp and liqunfu March 21, 2025 03:45
@fs-eire
Copy link
Contributor

fs-eire commented Mar 27, 2025

Close and re-open to trigger the pipeline

@fs-eire fs-eire closed this Mar 27, 2025
@fs-eire fs-eire reopened this Mar 27, 2025
@fs-eire
Copy link
Contributor

fs-eire commented Mar 28, 2025

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,ONNX Runtime Web CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline

@fs-eire
Copy link
Contributor

fs-eire commented Mar 28, 2025

/azp run Linux QNN CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Linux Android Emulator QNN CI Pipeline,Android CI Pipeline,iOS CI Pipeline,ONNX Runtime React Native CI Pipeline,Linux DNNL CI Pipeline,Linux MIGraphX CI Pipeline,Linux ROCm CI Pipeline

Copy link

Azure Pipelines successfully started running 7 pipeline(s).

1 similar comment
Copy link

Azure Pipelines successfully started running 7 pipeline(s).

Copy link
Contributor

@fs-eire fs-eire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I consider this PR is good to merge:

  • all critical pipelines including the old Web CI and the new (github actions based) Web CI are passing. there are failing CIs but those are the ones under migration, while the old ones are passing.
  • The MLAS part is generally new code and modification to existing code is guarded by macros so it should be safe for existing users.

thanks for the contribution. @jing-bao

@fs-eire fs-eire merged commit ba2999c into microsoft:main Mar 31, 2025
81 of 91 checks passed
fs-eire added a commit that referenced this pull request Apr 8, 2025

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
### Description

This PR revised the flag `ort.env.wasm.simd` to enhance its usage so
that more use scenarios are covered.
- Allow setting to `false` explicitly to disable SIMD checking. resolves
#24292 (@Eldow)
- Allow setting to `'relaxed'` to enable Relaxed SIMD checking. Relaxed
SIMD is introduced first in #22794 (@jing-bao)
- Behavior is not changed when not setting (ie. `undefined`) or setting
to `true`
- Added a warning message when setting to unknown value, and reset to
`false` in this case
quic-zhaoxul pushed a commit to CodeLinaro/onnxruntime that referenced this pull request Apr 17, 2025
### Description
<!-- Describe your changes. -->
Add Wasm Relaxed SIMD support. 
Use integer dot product instructions for QGemmU8X8.
1. Build with --enable_wasm_relaxed_simd
2. Use env.wasm.relaxedSimd to run it


### Motivation and Context
microsoft#22533

---------

Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
quic-zhaoxul pushed a commit to CodeLinaro/onnxruntime that referenced this pull request Apr 17, 2025
### Description

This PR revised the flag `ort.env.wasm.simd` to enhance its usage so
that more use scenarios are covered.
- Allow setting to `false` explicitly to disable SIMD checking. resolves
microsoft#24292 (@Eldow)
- Allow setting to `'relaxed'` to enable Relaxed SIMD checking. Relaxed
SIMD is introduced first in microsoft#22794 (@jing-bao)
- Behavior is not changed when not setting (ie. `undefined`) or setting
to `true`
- Added a warning message when setting to unknown value, and reset to
`false` in this case
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants