Some clarifications about attention used #6

pandeydeep9 · 2021-05-26T15:36:07Z

Thank you for sharing the code.
According to the paper, Appendix A 2nd paragraph, dropout is not used for attention.

In line 205, the residual and result are concatenated, but I think they should be added elementwise and then passed through a layer_norm (Figure 8 ANP paper). I wonder if there is some reason for this modification.

Thanks,
Deep Pandey

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some clarifications about attention used #6

Some clarifications about attention used #6

pandeydeep9 commented May 26, 2021

Some clarifications about attention used #6

Some clarifications about attention used #6

Comments

pandeydeep9 commented May 26, 2021