[Feature Request] Add Fusion Transformer for WebNN EP Decomposed GQA Node #24454

Yuhengwe1 · 2025-04-17T06:17:51Z

Describe the feature request

We propose adding a fusion transformer to reconstruct Group Query Attention (GQA) nodes that get decomposed during WebNN EP graph processing.

Describe scenario use case

In #23416, the WebNN EP breaks down one GQA node into smaller primitive ops to meet WebNN API constraints which will prevents optimal hardware utilization. Here is the decomposed WebNN subgraph:

Introducing a fusion transformer would reduce the operator count and enable hardware-accelerated attention kernels for WebNN produced models.

/cc @Honry @huningxin

huningxin · 2025-04-17T06:25:28Z

/cc @fdwr @guschmue @RafaelCintron

Yuhengwe1 added the feature request label Apr 17, 2025

github-actions bot added the ep:WebNN label Apr 17, 2025

Yuhengwe1 changed the title ~~[Feature Request] [WebNN EP] Add Fusion Transformer for WebNN EP Decomposed GQA Node~~ [Feature Request] Add Fusion Transformer for WebNN EP Decomposed GQA Node Apr 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Add Fusion Transformer for WebNN EP Decomposed GQA Node #24454

[Feature Request] Add Fusion Transformer for WebNN EP Decomposed GQA Node #24454

Yuhengwe1 commented Apr 17, 2025 •

edited

Loading

huningxin commented Apr 17, 2025

[Feature Request] Add Fusion Transformer for WebNN EP Decomposed GQA Node #24454

[Feature Request] Add Fusion Transformer for WebNN EP Decomposed GQA Node #24454

Comments

Yuhengwe1 commented Apr 17, 2025 • edited Loading

Describe the feature request

Describe scenario use case

huningxin commented Apr 17, 2025

Yuhengwe1 commented Apr 17, 2025 •

edited

Loading