Support for transformer / attention #589

pommedeterresautee · 2022-07-31T18:10:40Z

pommedeterresautee
Jul 31, 2022

Hi, I have seen a now closed pr in this repo to support attention mechanism with cutlass and a discussion on this subject in flash attention repo.

Do you have plan to offer composable templates to cover key parts of the transformer computation graph in cutlass? (In a more performant way than PyTorch for instance)

Or do you think this lib targets lower level computation and it's up to the final user to build this kind of high level layer?

Obviously some pieces already exist like batched and / or grouped gemm, but then, not sure if that's possible to use it combined with gemm b2b fusion to limit global memory bottleneck, or if you can apply easily different epilogues in group gemm to perform several projections in parallel (to limit the nb of kernel to launch), if a residual connection may be modeled in a performant way on cutlass, if that's hard to add layernorm, etc.

kind of related, the presentation Use CUTLASS to Fuse Multiple GEMMs to Extreme Performance looks really^n great (according to its slides at least), but, starting slide 27, the audio would really help and the only one existing is in chinese without any option for a translation. Do you know if some notes or an English version exists? [https://www.nvidia.com/en-us/on-demand/session/gtcspring22-s41606/]

hwu36 · 2022-08-03T21:15:00Z

hwu36
Aug 3, 2022
Maintainer

Do you have plan to offer composable templates to cover key parts of the transformer computation graph in cutlass?

Yes.

Do you know if some notes or an English version exists?

no english. sorry. maybe google translate can help.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for transformer / attention #589

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Support for transformer / attention #589

pommedeterresautee Jul 31, 2022

Replies: 1 comment

hwu36 Aug 3, 2022 Maintainer

pommedeterresautee
Jul 31, 2022

hwu36
Aug 3, 2022
Maintainer