Skip to content

[Feature][Kernels]: Integrate FlashInfer MoE Fused Finalize #28423

@robertgshaw2-redhat

Description

@robertgshaw2-redhat

🚀 The feature, motivation and pitch

We are working on fusing all small ops in DSR1 and other popular models.

There are open work streams on a couple of these:

  • RMSNorm + BlockFP8
  • ROPE+KV Insert
  • All Reduce + RMSNorm

One other one that is possible is fusing the MoE finalize reduction. Here is an op in FlashInfer:

Image

cc @ProExpertProg

Alternatives

none

Additional context

https://vllm-dev.slack.com/archives/C08NFPURQ1F/p1762802402502609

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

Type

No type

Projects

Status

Backlog

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions