why the Encoder has a norm layer on its final output? #32

SeekPoint · 2024-12-01T17:11:32Z

class Encoder(nn.Module):

def __init__(self, features: int, layers: nn.ModuleList) -> None:
    super().__init__()
    self.layers = layers
    self.norm = LayerNormalization(features)

def forward(self, x, mask):
    for layer in self.layers:
        x = layer(x, mask)
    return self.norm(x)

The text was updated successfully, but these errors were encountered:

PT-10 · 2025-01-26T06:13:58Z

@hkproj I think the implementation deviates from the architecture proposed in the paper. The paper states that normalization is applied after each sublayer i.e. there is the output of the multi-head attention added with the residual connection are normalized, which are then passed as input to the feed forward network, whose output is added with the residual connection which is then normalized again. Here's a pseudo code of what it should look like

class EncoderBlock(nn.Module):

    def __init__(self, features: int, self_attention_block: MultiHeadAttentionBlock, feed_forward_block: FeedForwardBlock, dropout: float) -> None:
        super().__init__()
        self.norm = LayerNormalization(features)
        self.self_attention_block = self_attention_block
        self.feed_forward_block = feed_forward_block
        self.residual_connections = nn.ModuleList([ResidualConnection(features, dropout) for _ in range(2)])

    def forward(self, x, src_mask):
        x = self.residual_connections[0](x, lambda x: self.self_attention_block(x, x, x, src_mask))
        x = self.norm(x)
        x = self.residual_connections[1](x, self.feed_forward_block)
        return self.norm(x)
    
class Encoder(nn.Module):

    def __init__(self, features: int, layers: nn.ModuleList) -> None:
        super().__init__()
        self.layers = layers

    def forward(self, x, mask):
        for layer in self.layers:
            x = layer(x, mask)
        return x

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why the Encoder has a norm layer on its final output? #32

why the Encoder has a norm layer on its final output? #32

SeekPoint commented Dec 1, 2024

PT-10 commented Jan 26, 2025

why the Encoder has a norm layer on its final output? #32

why the Encoder has a norm layer on its final output? #32

Comments

SeekPoint commented Dec 1, 2024

PT-10 commented Jan 26, 2025