You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As the title says, has anyone tried replacing multi head attention in a typical transformer with the self attention as described in this library.
my thought was that I can essentially concat the multiple self attention elements together to replicate this per the attached image from the torch website.
I'm relatively new to transformers as a whole so hopefully this question makes some sense.
for reference, considering the interest in a previous post, I've been attempting to explore performer effectiveness with DETR (https://github.com/facebookresearch/detr)
thanks!
The text was updated successfully, but these errors were encountered:
As the title says, has anyone tried replacing multi head attention in a typical transformer with the self attention as described in this library.
my thought was that I can essentially concat the multiple self attention elements together to replicate this per the attached image from the torch website.
I'm relatively new to transformers as a whole so hopefully this question makes some sense.
for reference, considering the interest in a previous post, I've been attempting to explore performer effectiveness with DETR (https://github.com/facebookresearch/detr)
thanks!
The text was updated successfully, but these errors were encountered: