Fix cuda memory access violation in GQA FlashAttention #24447

RyanUnderhill · 2025-04-16T19:45:32Z

Description

zeros_ memory buffer was uninitialized, but it must be initialized to zero.

Motivation and Context

A memory allocator change in GenAI started crashing in FlashAttention and this was eventually tracked down to be the cause. The allocator change was innocent. I'm not sure how this didn't fail previously, or if it was we weren't getting the reports about it.

…ad to a runtime crash later on.

onnxruntime/contrib_ops/cuda/bert/group_query_attention.cc

### Description zeros_ memory buffer was uninitialized, but it must be initialized to zero. ### Motivation and Context A memory allocator change in GenAI started crashing in FlashAttention and this was eventually tracked down to be the cause. The allocator change was innocent. I'm not sure how this didn't fail previously, or if it was we weren't getting the reports about it. Co-authored-by: Ryan Hill <{ID}+{username}@users.noreply.github.com>

zeros_ buffer was uninitialized so wasn't always zeros. This would le…

Loading
Loading status checks…

9d4c6bb

…ad to a runtime crash later on.

RyanUnderhill requested a review from aciddelgado April 16, 2025 20:31

aciddelgado approved these changes Apr 16, 2025

View reviewed changes

baijumeswani approved these changes Apr 16, 2025

View reviewed changes

baijumeswani reviewed Apr 16, 2025

View reviewed changes

onnxruntime/contrib_ops/cuda/bert/group_query_attention.cc Show resolved Hide resolved

tianleiwu changed the title ~~Fix cuda memory access violation in FlashAttention~~ Fix cuda memory access violation in GQA FlashAttention Apr 16, 2025

RyanUnderhill mentioned this pull request Apr 16, 2025

Changes how the device OrtAllocators work, use a global OrtSession instead microsoft/onnxruntime-genai#1378

Merged

RyanUnderhill merged commit 99f2b80 into main Apr 17, 2025
84 of 89 checks passed

RyanUnderhill deleted the ryanunderhill/flashattention_crash_fix branch April 17, 2025 00:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix cuda memory access violation in GQA FlashAttention #24447

Fix cuda memory access violation in GQA FlashAttention #24447

RyanUnderhill commented Apr 16, 2025

Fix cuda memory access violation in GQA FlashAttention #24447

Fix cuda memory access violation in GQA FlashAttention #24447

Conversation

RyanUnderhill commented Apr 16, 2025

Description

Motivation and Context