Skip to content

Fix cuda memory access violation in GQA FlashAttention #24447

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 17, 2025

Conversation

RyanUnderhill
Copy link
Member

Description

zeros_ memory buffer was uninitialized, but it must be initialized to zero.

Motivation and Context

A memory allocator change in GenAI started crashing in FlashAttention and this was eventually tracked down to be the cause. The allocator change was innocent. I'm not sure how this didn't fail previously, or if it was we weren't getting the reports about it.

…ad to a runtime crash later on.
@tianleiwu tianleiwu changed the title Fix cuda memory access violation in FlashAttention Fix cuda memory access violation in GQA FlashAttention Apr 16, 2025
@RyanUnderhill RyanUnderhill merged commit 99f2b80 into main Apr 17, 2025
84 of 89 checks passed
@RyanUnderhill RyanUnderhill deleted the ryanunderhill/flashattention_crash_fix branch April 17, 2025 00:36
ashrit-ms pushed a commit that referenced this pull request Apr 24, 2025
### Description
zeros_ memory buffer was uninitialized, but it must be initialized to
zero.


### Motivation and Context
A memory allocator change in GenAI started crashing in FlashAttention
and this was eventually tracked down to be the cause. The allocator
change was innocent. I'm not sure how this didn't fail previously, or if
it was we weren't getting the reports about it.

Co-authored-by: Ryan Hill <{ID}+{username}@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants