Skip to content

[Feature Request] Add Fusion Transformer for WebNN EP Decomposed GQA Node #24454

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Yuhengwe1 opened this issue Apr 17, 2025 · 1 comment
Open
Labels
ep:WebNN WebNN execution provider feature request request for unsupported feature or enhancement

Comments

@Yuhengwe1
Copy link

Yuhengwe1 commented Apr 17, 2025

Describe the feature request

We propose adding a fusion transformer to reconstruct Group Query Attention (GQA) nodes that get decomposed during WebNN EP graph processing.

Describe scenario use case

In #23416, the WebNN EP breaks down one GQA node into smaller primitive ops to meet WebNN API constraints which will prevents optimal hardware utilization. Here is the decomposed WebNN subgraph:

Image

Introducing a fusion transformer would reduce the operator count and enable hardware-accelerated attention kernels for WebNN produced models.

/cc @Honry @huningxin

@Yuhengwe1 Yuhengwe1 added the feature request request for unsupported feature or enhancement label Apr 17, 2025
@github-actions github-actions bot added the ep:WebNN WebNN execution provider label Apr 17, 2025
@Yuhengwe1 Yuhengwe1 changed the title [Feature Request] [WebNN EP] Add Fusion Transformer for WebNN EP Decomposed GQA Node [Feature Request] Add Fusion Transformer for WebNN EP Decomposed GQA Node Apr 17, 2025
@huningxin
Copy link
Contributor

/cc @fdwr @guschmue @RafaelCintron

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:WebNN WebNN execution provider feature request request for unsupported feature or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants