-
-
Notifications
You must be signed in to change notification settings - Fork 12.3k
Open
Labels
Description
🚀 The feature, motivation and pitch
We are working on fusing all small ops in DSR1 and other popular models.
There are open work streams on a couple of these:
- RMSNorm + BlockFP8
- ROPE+KV Insert
- All Reduce + RMSNorm
One other one that is possible is fusing the MoE finalize reduction. Here is an op in FlashInfer:
-
So this would fuse the application + reduction of the topk weights onto the shared and routed experts in MoE layers
-
Here's an example trace
Alternatives
none
Additional context
https://vllm-dev.slack.com/archives/C08NFPURQ1F/p1762802402502609
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
BoyuanFeng
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Backlog