Merge pull request #11 from Helsinki-NLP/fix/sanitizing

Sanitization / refactoring
Helsinki-NLP · Sep 26, 2023 · 22fea93 · 22fea93
2 parents 470d466 + 1177ba8
commit 22fea93
Show file tree

Hide file tree

Showing 152 changed files with 2,309 additions and 5,605 deletions.
diff --git a/build_vocab.py b/build_vocab.py
@@ -1,5 +1,5 @@
 #!/usr/bin/env python
-from onmt.bin.build_vocab import main
+from mammoth.bin.build_vocab import main
 
 
 if __name__ == "__main__":

diff --git a/docs/source/CONTRIBUTING.md b/docs/source/CONTRIBUTING.md
@@ -5,7 +5,7 @@ OpenNMT-py is a community developed project and we love developer contributions.
 ## Guidelines
 Before sending a PR, please do this checklist first:
 
-- Please run `onmt/tests/pull_request_chk.sh` and fix any errors. When adding new functionality, also add tests to this script. Included checks:
+- Please run `mammoth/tests/pull_request_chk.sh` and fix any errors. When adding new functionality, also add tests to this script. Included checks:
     1. flake8 check for coding style;
     2. unittest;
     3. continuous integration tests listed in `.travis.yml`.

diff --git a/docs/source/FAQ.md b/docs/source/FAQ.md
diff --git a/docs/source/attention_bridges.md b/docs/source/attention_bridges.md
@@ -1,7 +1,7 @@
 
 # Attention Bridge
 
-The embeddings are generated through the self-attention mechanism ([Attention Bridge](./onmt/attention_bridge.py)) of the encoder and establish a connection with language-specific decoders that focus their attention on these embeddings. This is why they are referred to as 'bridges'. This architectural element serves to link the encoded information with the decoding process, enhancing the flow of information between different stages of language processing.
+The embeddings are generated through the self-attention mechanism ([Attention Bridge](./mammoth/modules/attention_bridge.py)) of the encoder and establish a connection with language-specific decoders that focus their attention on these embeddings. This is why they are referred to as 'bridges'. This architectural element serves to link the encoded information with the decoding process, enhancing the flow of information between different stages of language processing.
 
 There are five types of attention mechanism implemented:
 
@@ -61,7 +61,7 @@ The `PerceiverAttentionBridgeLayer` involves a multi-headed dot product self-att
 
 3. **Linear Layer**: After normalization, the data is fed into a linear layer. This linear transformation can be seen as a learned projection of the attention-weighted data into a new space.
 
-4. **ReLU Activation**: The output of the linear layer undergoes the Rectified Linear Unit (ReLU) activation function. 
+4. **ReLU Activation**: The output of the linear layer undergoes the Rectified Linear Unit (ReLU) activation function.
 
 5. **Linear Layer (Second)**: Another linear layer is applied to the ReLU-activated output.
 
@@ -72,11 +72,11 @@ The `PerceiverAttentionBridgeLayer` involves a multi-headed dot product self-att
 The process described involves dot product self-attention. The steps are as follows:
 
 1. **Input Transformation**: Given an input matrix $\mathbf{H} \in \mathbb{R}^{d_h \times n}$, two sets of learned weight matrices are used to transform the input. These weight matrices are $\mathbf{W}_1 \in \mathbb{R}^{d_h \times d_a}$ and $\mathbf{W}_2 \in \mathbb{R}^{d_h \times d_a}$. The multiplication of $\mathbf{H}$ with $\mathbf{W}_1$ and $\mathbf{W}_2$ produces matrices $\mathbf{V}$ and $\mathbf{K}$, respectively:
-   
+
    - $\mathbf{V} = \mathbf{H} \mathbf{W}_1$
    - $\mathbf{K} = \mathbf{H} \mathbf{W}_2$
 
-2. **Attention Calculation**: The core attention calculation involves three matrices: $\mathbf{Q} \in \mathbb{R}^{d_h \times n}$, $\mathbf{K}$ (calculated previously), and $\mathbf{V}$ (calculated previously). The dot product of $\mathbf{Q}$ and $\mathbf{K}^\top$ is divided by the square root of the dimensionality of the input features ($\sqrt{d_h}$). 
+2. **Attention Calculation**: The core attention calculation involves three matrices: $\mathbf{Q} \in \mathbb{R}^{d_h \times n}$, $\mathbf{K}$ (calculated previously), and $\mathbf{V}$ (calculated previously). The dot product of $\mathbf{Q}$ and $\mathbf{K}^\top$ is divided by the square root of the dimensionality of the input features ($\sqrt{d_h}$).
 The final attended output is calculated by multiplying the attention weights with the $\mathbf{V}$ matrix: $\mathbf{H}^\prime = \operatorname{Softmax}(\frac{\mathbf{Q}\mathbf{K}^\top}{\sqrt{d_h}})\mathbf{V}$
 
 
@@ -86,5 +86,4 @@ The TransformerEncoderLayer employs multi-headed dot product self-attention (by
 
 ## FeedForwardAttentionBridgeLayer
 
-The `FeedForwardAttentionBridgeLayer` module applies a sequence of linear transformations and `ReLU` activations to the input data, followed by an attention bridge normalization, enhancing the connectivity between different parts of the model. 
-
+The `FeedForwardAttentionBridgeLayer` module applies a sequence of linear transformations and `ReLU` activations to the input data, followed by an attention bridge normalization, enhancing the connectivity between different parts of the model.
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -38,8 +38,8 @@ Contents
       :caption: API
       :maxdepth: 2
 
-      onmt.rst
-      onmt.modules.rst
-      onmt.translation.rst
-      onmt.translate.translation_server.rst
-      onmt.inputters.rst
+      mammoth.rst
+      mammoth.modules.rst
+      mammoth.translation.rst
+      mammoth.translate.translation_server.rst
+      mammoth.inputters.rst
diff --git a/docs/source/mammoth.inputters.rst b/docs/source/mammoth.inputters.rst
@@ -0,0 +1,20 @@
+Data Loaders
+=================
+
+Data Readers
+-------------
+
+.. autoexception:: mammoth.inputters.datareader_base.MissingDependencyException
+
+.. autoclass:: mammoth.inputters.DataReaderBase
+    :members:
+
+.. autoclass:: mammoth.inputters.TextDataReader
+    :members:
+
+
+Dataset
+--------
+
+.. autoclass:: mammoth.inputters.Dataset
+    :members:
diff --git a/docs/source/mammoth.modules.rst b/docs/source/mammoth.modules.rst
@@ -0,0 +1,109 @@
+Modules
+=============
+
+Core Modules
+------------
+
+.. autoclass:: mammoth.modules.Embeddings
+    :members:
+
+
+Encoders
+---------
+
+.. autoclass:: mammoth.encoders.EncoderBase
+    :members:
+
+.. autoclass:: mammoth.encoders.MeanEncoder
+    :members:
+
+.. autoclass:: mammoth.encoders.RNNEncoder
+    :members:
+
+
+Decoders
+---------
+
+
+.. autoclass:: mammoth.decoders.DecoderBase
+    :members:
+
+.. autoclass:: mammoth.decoders.decoder.RNNDecoderBase
+    :members:
+
+.. autoclass:: mammoth.decoders.StdRNNDecoder
+    :members:
+
+.. autoclass:: mammoth.decoders.InputFeedRNNDecoder
+    :members:
+
+Attention
+----------
+
+.. autoclass:: mammoth.modules.AverageAttention
+    :members:
+
+.. autoclass:: mammoth.modules.GlobalAttention
+    :members:
+
+
+
+Architecture: Transformer
+----------------------------
+
+.. autoclass:: mammoth.modules.PositionalEncoding
+    :members:
+
+.. autoclass:: mammoth.modules.position_ffn.PositionwiseFeedForward
+    :members:
+
+.. autoclass:: mammoth.encoders.TransformerEncoder
+    :members:
+
+.. autoclass:: mammoth.decoders.TransformerDecoder
+    :members:
+
+.. autoclass:: mammoth.modules.MultiHeadedAttention
+    :members:
+    :undoc-members:
+
+
+Architecture: Conv2Conv
+----------------------------
+
+(These methods are from a user contribution
+and have not been thoroughly tested.)
+
+
+.. autoclass:: mammoth.encoders.CNNEncoder
+    :members:
+
+
+.. autoclass:: mammoth.decoders.CNNDecoder
+    :members:
+
+.. autoclass:: mammoth.modules.ConvMultiStepAttention
+    :members:
+
+.. autoclass:: mammoth.modules.WeightNormConv2d
+    :members:
+
+Architecture: SRU
+----------------------------
+
+.. autoclass:: mammoth.models.sru.SRU
+    :members:
+
+
+Copy Attention
+--------------
+
+.. autoclass:: mammoth.modules.CopyGenerator
+    :members:
+
+
+Structured Attention
+-------------------------------------------
+
+.. autoclass:: mammoth.modules.structured_attention.MatrixTree
+    :members:
diff --git a/docs/source/mammoth.rst b/docs/source/mammoth.rst
@@ -0,0 +1,32 @@
+Framework
+=================
+
+Model
+-----
+
+.. autoclass:: mammoth.models.NMTModel
+    :members:
+
+Trainer
+-------
+
+.. autoclass:: mammoth.Trainer
+    :members:
+
+
+.. autoclass:: mammoth.utils.Statistics
+    :members:
+
+Loss
+----
+
+
+.. autoclass:: mammoth.utils.loss.LossComputeBase
+    :members:
+
+
+Optimizer
+---------
+
+.. autoclass:: mammoth.utils.Optimizer
+    :members:
diff --git a/docs/source/mammoth.translate.translation_server.rst b/docs/source/mammoth.translate.translation_server.rst
@@ -0,0 +1,21 @@
+Server
+======
+
+
+Models
+-------------
+
+.. autoclass:: mammoth.translate.translation_server.ServerModel
+    :members:
+
+
+Core Server
+------------
+
+.. autoexception:: mammoth.translate.translation_server.ServerModelError
+
+.. autoclass:: mammoth.translate.translation_server.Timer
+    :members:
+
+.. autoclass:: mammoth.translate.translation_server.TranslationServer
+    :members:
diff --git a/docs/source/mammoth.translation.rst b/docs/source/mammoth.translation.rst
@@ -0,0 +1,39 @@
+Translation
+==================
+
+Translations
+-------------
+
+.. autoclass:: mammoth.translate.Translation
+    :members:
+
+Translator Class
+-----------------
+
+.. autoclass:: mammoth.translate.Translator
+    :members:
+
+.. autoclass:: mammoth.translate.TranslationBuilder
+    :members:
+
+
+Decoding Strategies
+--------------------
+.. autoclass:: mammoth.translate.DecodeStrategy
+    :members:
+
+.. autoclass:: mammoth.translate.BeamSearch
+    :members:
+
+.. autofunction:: mammoth.translate.greedy_search.sample_with_temperature
+
+.. autoclass:: mammoth.translate.GreedySearch
+    :members:
+
+Scoring
+--------
+.. autoclass:: mammoth.translate.penalties.PenaltyBuilder
+    :members:
+
+.. autoclass:: mammoth.translate.GNMTGlobalScorer
+    :members:
diff --git a/docs/source/onmt.inputters.rst b/docs/source/onmt.inputters.rst