Implementation of Attention module in Transformer #12

LeCongThuong · 2021-11-14T02:36:42Z

Thank you for sharing your work, it has actually been helping me a lot.
I have a problem with your code relating Attention module of Transformer. May I be wrong that the Attention module should have dropout layer after softmax function (link). For example, link or link, they used dropout layer in Attention module.

isCopyman · 2024-07-29T02:44:55Z

What you mentioned is indeed a common practice

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of Attention module in Transformer #12

Implementation of Attention module in Transformer #12

LeCongThuong commented Nov 14, 2021 •

edited

Loading

isCopyman commented Jul 29, 2024

Implementation of Attention module in Transformer #12

Implementation of Attention module in Transformer #12

Comments

LeCongThuong commented Nov 14, 2021 • edited Loading

isCopyman commented Jul 29, 2024

LeCongThuong commented Nov 14, 2021 •

edited

Loading