Using replicating nn.MultiHeadAttention with multiple performer SelfAttention modules #91

JGittles · 2022-12-06T03:45:13Z

As the title says, has anyone tried replacing multi head attention in a typical transformer with the self attention as described in this library.

my thought was that I can essentially concat the multiple self attention elements together to replicate this per the attached image from the torch website.

I'm relatively new to transformers as a whole so hopefully this question makes some sense.

for reference, considering the interest in a previous post, I've been attempting to explore performer effectiveness with DETR (https://github.com/facebookresearch/detr)

thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using replicating nn.MultiHeadAttention with multiple performer SelfAttention modules #91

Using replicating nn.MultiHeadAttention with multiple performer SelfAttention modules #91

JGittles commented Dec 6, 2022 •

edited

Loading

Using replicating nn.MultiHeadAttention with multiple performer SelfAttention modules #91

Using replicating nn.MultiHeadAttention with multiple performer SelfAttention modules #91

Comments

JGittles commented Dec 6, 2022 • edited Loading

JGittles commented Dec 6, 2022 •

edited

Loading