forked from iree-org/iree
-
Notifications
You must be signed in to change notification settings - Fork 11
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Codegen] Bubble up Transpose attention V and try fuse with others be…
…fore attention (iree-org#19250) Flash Attention transpose_V variant is significantly faster than the non-transpose_V variant. This is due to many matmul intrinsics being mmtb by default. Hence, doing FA transpose_V will allow for better/more contiguous reads from shared memory to register, improving the attention performance quite a bit. This PR exposes the attention_transposeV form by generating a linalg.transpose on the V during bubbling up of transpose S.T we can give the graph some opportunities to fuse the transpose-V to it's producer. I have also confirmed that if we do not find any producer, the transpose will indeed fuse back with the attenionOp. Hence worse case, we will get same perf as before this PR. Additionally, we modify elementwise op fusion to try fuse transpose with other ops before letting it get fused back into attention. --------- Signed-off-by: Stanley Winata <[email protected]>
- Loading branch information
1 parent
5708d42
commit 41115bb
Showing
8 changed files
with
314 additions
and
17 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.