You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Q : the paper says LayerNorm(x + sublayer(x)) but why are we doing x + LayerNorm(sublayer(x)). i understood why we are doing sublayer(LayerNorm(x)) instead of LayerNorm(sublayer(x)) as it helps in better training as per certain papers. But my main question is why is "x + ..." part not included in LayerNorm?
The text was updated successfully, but these errors were encountered:
from the Vaswani et al. paper as highlighted ,
data:image/s3,"s3://crabby-images/4e9d1/4e9d144e3897d5fa38606ef20b3159c78974d870" alt="Screenshot from 2024-02-16 21-39-52"
Q : the paper says LayerNorm(x + sublayer(x)) but why are we doing x + LayerNorm(sublayer(x)). i understood why we are doing sublayer(LayerNorm(x)) instead of LayerNorm(sublayer(x)) as it helps in better training as per certain papers. But my main question is why is "x + ..." part not included in LayerNorm?
The text was updated successfully, but these errors were encountered: