Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why the Encoder has a norm layer on its final output? #32

Open
SeekPoint opened this issue Dec 1, 2024 · 1 comment
Open

why the Encoder has a norm layer on its final output? #32

SeekPoint opened this issue Dec 1, 2024 · 1 comment

Comments

@SeekPoint
Copy link

class Encoder(nn.Module):

def __init__(self, features: int, layers: nn.ModuleList) -> None:
    super().__init__()
    self.layers = layers
    self.norm = LayerNormalization(features)

def forward(self, x, mask):
    for layer in self.layers:
        x = layer(x, mask)
    return self.norm(x)
@PT-10
Copy link

PT-10 commented Jan 26, 2025

@hkproj I think the implementation deviates from the architecture proposed in the paper. The paper states that normalization is applied after each sublayer i.e. there is the output of the multi-head attention added with the residual connection are normalized, which are then passed as input to the feed forward network, whose output is added with the residual connection which is then normalized again. Here's a pseudo code of what it should look like

class EncoderBlock(nn.Module):

    def __init__(self, features: int, self_attention_block: MultiHeadAttentionBlock, feed_forward_block: FeedForwardBlock, dropout: float) -> None:
        super().__init__()
        self.norm = LayerNormalization(features)
        self.self_attention_block = self_attention_block
        self.feed_forward_block = feed_forward_block
        self.residual_connections = nn.ModuleList([ResidualConnection(features, dropout) for _ in range(2)])

    def forward(self, x, src_mask):
        x = self.residual_connections[0](x, lambda x: self.self_attention_block(x, x, x, src_mask))
        x = self.norm(x)
        x = self.residual_connections[1](x, self.feed_forward_block)
        return self.norm(x)
    
class Encoder(nn.Module):

    def __init__(self, features: int, layers: nn.ModuleList) -> None:
        super().__init__()
        self.layers = layers

    def forward(self, x, mask):
        for layer in self.layers:
            x = layer(x, mask)
        return x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants