Revert "Sanitization / refactoring"

Helsinki-NLP · Sep 26, 2023 · d858d67 · d858d67
1 parent 22fea93
commit d858d67
Show file tree

Hide file tree

Showing 152 changed files with 5,605 additions and 2,309 deletions.
diff --git a/build_vocab.py b/build_vocab.py
@@ -1,5 +1,5 @@
 #!/usr/bin/env python
-from mammoth.bin.build_vocab import main
+from onmt.bin.build_vocab import main
 
 
 if __name__ == "__main__":

diff --git a/docs/source/CONTRIBUTING.md b/docs/source/CONTRIBUTING.md
@@ -5,7 +5,7 @@ OpenNMT-py is a community developed project and we love developer contributions.
 ## Guidelines
 Before sending a PR, please do this checklist first:
 
-- Please run `mammoth/tests/pull_request_chk.sh` and fix any errors. When adding new functionality, also add tests to this script. Included checks:
+- Please run `onmt/tests/pull_request_chk.sh` and fix any errors. When adding new functionality, also add tests to this script. Included checks:
     1. flake8 check for coding style;
     2. unittest;
     3. continuous integration tests listed in `.travis.yml`.

diff --git a/docs/source/FAQ.md b/docs/source/FAQ.md
@@ -0,0 +1,37 @@
+# Questions
+
+## What is the intuition behind fixed-length memory bank?
+Specifically, for `lin` , the intuition behind the structured attention is to replace pooling over the hidden representations with multi-hop attentive representations (fixed length).  What is the benefit for transforming source sequence representations into a fixed length memory bank?
+
+Push the model to be more language-agnostic. Sentence length tends to be language dependent. For example, French tends to produce longer sentences than English. 
+
+Does the attention in attention bridge act as an enhancement of encoder? Will the attention bridge bring any benefits to decoders?
+1. If we view attention bridge as a part of encoder, will the overall model be a partially shared encoder (separate lower layers and shared attention bridge) + separate decoders? 
+
+If the shared attention is viewed is a part of encoder for many2one translation and a part of decoder for one2many translation, the shared attention module encoder some language-independent information to enhance encoding or decoding? 
+
+## Models are saved with encoder, decoder, and generator. What is generator?
+The generator contains Linear + activation (softmax or sparsesoftmax). 
+
+### Why we need to separately save “generator”?  
+It seems unnecessary to separate the generator. Activation functions do not contain trainable parameters.
+
+
+## What is the difference between `intermediate_output` and `encoder_output`? [🔗](./onmt/attention_bridge.py#L91)
+
+`intermediate_output` is the intermediate output of stacked n-layered attention bridges. `encoder_output` is literally the output of encoder, which was reused in the n-layered `PerceiverAttentionBridgeLayer`.
+
+For `PerceiverAttentionBridgeLayer` where the encoder output is projected into fixed length via `lattent_array`. But why? 
+
+For `PerceiverAttentionBridgeLayer` :
+
+`intermediate_output` and `encoder_output` are used as: 
+
+```python
+   S, B, F = encoder_output.shape
+   if intermediate_output is not None:
+      cross_query = intermediate_output
+   else:
+      cross_query = self.latent_array.unsqueeze(0).expand(B, -1, -1)
+   encoder_output = encoder_output.transpose(0, 1)
+```
diff --git a/docs/source/attention_bridges.md b/docs/source/attention_bridges.md
@@ -1,7 +1,7 @@
 
 # Attention Bridge
 
-The embeddings are generated through the self-attention mechanism ([Attention Bridge](./mammoth/modules/attention_bridge.py)) of the encoder and establish a connection with language-specific decoders that focus their attention on these embeddings. This is why they are referred to as 'bridges'. This architectural element serves to link the encoded information with the decoding process, enhancing the flow of information between different stages of language processing.
+The embeddings are generated through the self-attention mechanism ([Attention Bridge](./onmt/attention_bridge.py)) of the encoder and establish a connection with language-specific decoders that focus their attention on these embeddings. This is why they are referred to as 'bridges'. This architectural element serves to link the encoded information with the decoding process, enhancing the flow of information between different stages of language processing.
 
 There are five types of attention mechanism implemented:
 
@@ -61,7 +61,7 @@ The `PerceiverAttentionBridgeLayer` involves a multi-headed dot product self-att
 
 3. **Linear Layer**: After normalization, the data is fed into a linear layer. This linear transformation can be seen as a learned projection of the attention-weighted data into a new space.
 
-4. **ReLU Activation**: The output of the linear layer undergoes the Rectified Linear Unit (ReLU) activation function.
+4. **ReLU Activation**: The output of the linear layer undergoes the Rectified Linear Unit (ReLU) activation function. 
 
 5. **Linear Layer (Second)**: Another linear layer is applied to the ReLU-activated output.
 
@@ -72,11 +72,11 @@ The `PerceiverAttentionBridgeLayer` involves a multi-headed dot product self-att
 The process described involves dot product self-attention. The steps are as follows:
 
 1. **Input Transformation**: Given an input matrix $\mathbf{H} \in \mathbb{R}^{d_h \times n}$, two sets of learned weight matrices are used to transform the input. These weight matrices are $\mathbf{W}_1 \in \mathbb{R}^{d_h \times d_a}$ and $\mathbf{W}_2 \in \mathbb{R}^{d_h \times d_a}$. The multiplication of $\mathbf{H}$ with $\mathbf{W}_1$ and $\mathbf{W}_2$ produces matrices $\mathbf{V}$ and $\mathbf{K}$, respectively:
-
+   
    - $\mathbf{V} = \mathbf{H} \mathbf{W}_1$
    - $\mathbf{K} = \mathbf{H} \mathbf{W}_2$
 
-2. **Attention Calculation**: The core attention calculation involves three matrices: $\mathbf{Q} \in \mathbb{R}^{d_h \times n}$, $\mathbf{K}$ (calculated previously), and $\mathbf{V}$ (calculated previously). The dot product of $\mathbf{Q}$ and $\mathbf{K}^\top$ is divided by the square root of the dimensionality of the input features ($\sqrt{d_h}$).
+2. **Attention Calculation**: The core attention calculation involves three matrices: $\mathbf{Q} \in \mathbb{R}^{d_h \times n}$, $\mathbf{K}$ (calculated previously), and $\mathbf{V}$ (calculated previously). The dot product of $\mathbf{Q}$ and $\mathbf{K}^\top$ is divided by the square root of the dimensionality of the input features ($\sqrt{d_h}$). 
 The final attended output is calculated by multiplying the attention weights with the $\mathbf{V}$ matrix: $\mathbf{H}^\prime = \operatorname{Softmax}(\frac{\mathbf{Q}\mathbf{K}^\top}{\sqrt{d_h}})\mathbf{V}$
 
 
@@ -86,4 +86,5 @@ The TransformerEncoderLayer employs multi-headed dot product self-attention (by
 
 ## FeedForwardAttentionBridgeLayer
 
-The `FeedForwardAttentionBridgeLayer` module applies a sequence of linear transformations and `ReLU` activations to the input data, followed by an attention bridge normalization, enhancing the connectivity between different parts of the model.
+The `FeedForwardAttentionBridgeLayer` module applies a sequence of linear transformations and `ReLU` activations to the input data, followed by an attention bridge normalization, enhancing the connectivity between different parts of the model. 
+
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -38,8 +38,8 @@ Contents
       :caption: API
       :maxdepth: 2
 
-      mammoth.rst
-      mammoth.modules.rst
-      mammoth.translation.rst
-      mammoth.translate.translation_server.rst
-      mammoth.inputters.rst
+      onmt.rst
+      onmt.modules.rst
+      onmt.translation.rst
+      onmt.translate.translation_server.rst
+      onmt.inputters.rst
diff --git a/docs/source/mammoth.inputters.rst b/docs/source/mammoth.inputters.rst
diff --git a/docs/source/mammoth.modules.rst b/docs/source/mammoth.modules.rst
diff --git a/docs/source/mammoth.rst b/docs/source/mammoth.rst
diff --git a/docs/source/mammoth.translate.translation_server.rst b/docs/source/mammoth.translate.translation_server.rst
diff --git a/docs/source/mammoth.translation.rst b/docs/source/mammoth.translation.rst
diff --git a/docs/source/onmt.inputters.rst b/docs/source/onmt.inputters.rst
@@ -0,0 +1,20 @@
+Data Loaders
+=================
+
+Data Readers
+-------------
+
+.. autoexception:: onmt.inputters.datareader_base.MissingDependencyException
+
+.. autoclass:: onmt.inputters.DataReaderBase
+    :members:
+
+.. autoclass:: onmt.inputters.TextDataReader
+    :members:
+
+
+Dataset
+--------
+
+.. autoclass:: onmt.inputters.Dataset
+    :members: