Skip to content

Commit

Permalink
Revert "Sanitization / refactoring"
Browse files Browse the repository at this point in the history
  • Loading branch information
shaoxiongji authored Sep 26, 2023
1 parent 22fea93 commit d858d67
Show file tree
Hide file tree
Showing 152 changed files with 5,605 additions and 2,309 deletions.
2 changes: 1 addition & 1 deletion build_vocab.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/usr/bin/env python
from mammoth.bin.build_vocab import main
from onmt.bin.build_vocab import main


if __name__ == "__main__":
Expand Down
2 changes: 1 addition & 1 deletion docs/source/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ OpenNMT-py is a community developed project and we love developer contributions.
## Guidelines
Before sending a PR, please do this checklist first:

- Please run `mammoth/tests/pull_request_chk.sh` and fix any errors. When adding new functionality, also add tests to this script. Included checks:
- Please run `onmt/tests/pull_request_chk.sh` and fix any errors. When adding new functionality, also add tests to this script. Included checks:
1. flake8 check for coding style;
2. unittest;
3. continuous integration tests listed in `.travis.yml`.
Expand Down
37 changes: 37 additions & 0 deletions docs/source/FAQ.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Questions

## What is the intuition behind fixed-length memory bank?
Specifically, for `lin` , the intuition behind the structured attention is to replace pooling over the hidden representations with multi-hop attentive representations (fixed length). What is the benefit for transforming source sequence representations into a fixed length memory bank?

Push the model to be more language-agnostic. Sentence length tends to be language dependent. For example, French tends to produce longer sentences than English.

Does the attention in attention bridge act as an enhancement of encoder? Will the attention bridge bring any benefits to decoders?
1. If we view attention bridge as a part of encoder, will the overall model be a partially shared encoder (separate lower layers and shared attention bridge) + separate decoders?

If the shared attention is viewed is a part of encoder for many2one translation and a part of decoder for one2many translation, the shared attention module encoder some language-independent information to enhance encoding or decoding?

## Models are saved with encoder, decoder, and generator. What is generator?
The generator contains Linear + activation (softmax or sparsesoftmax).

### Why we need to separately save “generator”?
It seems unnecessary to separate the generator. Activation functions do not contain trainable parameters.


## What is the difference between `intermediate_output` and `encoder_output`? [🔗](./onmt/attention_bridge.py#L91)

`intermediate_output` is the intermediate output of stacked n-layered attention bridges. `encoder_output` is literally the output of encoder, which was reused in the n-layered `PerceiverAttentionBridgeLayer`.

For `PerceiverAttentionBridgeLayer` where the encoder output is projected into fixed length via `lattent_array`. But why?

For `PerceiverAttentionBridgeLayer` :

`intermediate_output` and `encoder_output` are used as:

```python
S, B, F = encoder_output.shape
if intermediate_output is not None:
cross_query = intermediate_output
else:
cross_query = self.latent_array.unsqueeze(0).expand(B, -1, -1)
encoder_output = encoder_output.transpose(0, 1)
```
11 changes: 6 additions & 5 deletions docs/source/attention_bridges.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@

# Attention Bridge

The embeddings are generated through the self-attention mechanism ([Attention Bridge](./mammoth/modules/attention_bridge.py)) of the encoder and establish a connection with language-specific decoders that focus their attention on these embeddings. This is why they are referred to as 'bridges'. This architectural element serves to link the encoded information with the decoding process, enhancing the flow of information between different stages of language processing.
The embeddings are generated through the self-attention mechanism ([Attention Bridge](./onmt/attention_bridge.py)) of the encoder and establish a connection with language-specific decoders that focus their attention on these embeddings. This is why they are referred to as 'bridges'. This architectural element serves to link the encoded information with the decoding process, enhancing the flow of information between different stages of language processing.

There are five types of attention mechanism implemented:

Expand Down Expand Up @@ -61,7 +61,7 @@ The `PerceiverAttentionBridgeLayer` involves a multi-headed dot product self-att

3. **Linear Layer**: After normalization, the data is fed into a linear layer. This linear transformation can be seen as a learned projection of the attention-weighted data into a new space.

4. **ReLU Activation**: The output of the linear layer undergoes the Rectified Linear Unit (ReLU) activation function.
4. **ReLU Activation**: The output of the linear layer undergoes the Rectified Linear Unit (ReLU) activation function.

5. **Linear Layer (Second)**: Another linear layer is applied to the ReLU-activated output.

Expand All @@ -72,11 +72,11 @@ The `PerceiverAttentionBridgeLayer` involves a multi-headed dot product self-att
The process described involves dot product self-attention. The steps are as follows:

1. **Input Transformation**: Given an input matrix $\mathbf{H} \in \mathbb{R}^{d_h \times n}$, two sets of learned weight matrices are used to transform the input. These weight matrices are $\mathbf{W}_1 \in \mathbb{R}^{d_h \times d_a}$ and $\mathbf{W}_2 \in \mathbb{R}^{d_h \times d_a}$. The multiplication of $\mathbf{H}$ with $\mathbf{W}_1$ and $\mathbf{W}_2$ produces matrices $\mathbf{V}$ and $\mathbf{K}$, respectively:

- $\mathbf{V} = \mathbf{H} \mathbf{W}_1$
- $\mathbf{K} = \mathbf{H} \mathbf{W}_2$

2. **Attention Calculation**: The core attention calculation involves three matrices: $\mathbf{Q} \in \mathbb{R}^{d_h \times n}$, $\mathbf{K}$ (calculated previously), and $\mathbf{V}$ (calculated previously). The dot product of $\mathbf{Q}$ and $\mathbf{K}^\top$ is divided by the square root of the dimensionality of the input features ($\sqrt{d_h}$).
2. **Attention Calculation**: The core attention calculation involves three matrices: $\mathbf{Q} \in \mathbb{R}^{d_h \times n}$, $\mathbf{K}$ (calculated previously), and $\mathbf{V}$ (calculated previously). The dot product of $\mathbf{Q}$ and $\mathbf{K}^\top$ is divided by the square root of the dimensionality of the input features ($\sqrt{d_h}$).
The final attended output is calculated by multiplying the attention weights with the $\mathbf{V}$ matrix: $\mathbf{H}^\prime = \operatorname{Softmax}(\frac{\mathbf{Q}\mathbf{K}^\top}{\sqrt{d_h}})\mathbf{V}$


Expand All @@ -86,4 +86,5 @@ The TransformerEncoderLayer employs multi-headed dot product self-attention (by

## FeedForwardAttentionBridgeLayer

The `FeedForwardAttentionBridgeLayer` module applies a sequence of linear transformations and `ReLU` activations to the input data, followed by an attention bridge normalization, enhancing the connectivity between different parts of the model.
The `FeedForwardAttentionBridgeLayer` module applies a sequence of linear transformations and `ReLU` activations to the input data, followed by an attention bridge normalization, enhancing the connectivity between different parts of the model.

10 changes: 5 additions & 5 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,8 @@ Contents
:caption: API
:maxdepth: 2

mammoth.rst
mammoth.modules.rst
mammoth.translation.rst
mammoth.translate.translation_server.rst
mammoth.inputters.rst
onmt.rst
onmt.modules.rst
onmt.translation.rst
onmt.translate.translation_server.rst
onmt.inputters.rst
20 changes: 0 additions & 20 deletions docs/source/mammoth.inputters.rst

This file was deleted.

109 changes: 0 additions & 109 deletions docs/source/mammoth.modules.rst

This file was deleted.

32 changes: 0 additions & 32 deletions docs/source/mammoth.rst

This file was deleted.

21 changes: 0 additions & 21 deletions docs/source/mammoth.translate.translation_server.rst

This file was deleted.

39 changes: 0 additions & 39 deletions docs/source/mammoth.translation.rst

This file was deleted.

20 changes: 20 additions & 0 deletions docs/source/onmt.inputters.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
Data Loaders
=================

Data Readers
-------------

.. autoexception:: onmt.inputters.datareader_base.MissingDependencyException

.. autoclass:: onmt.inputters.DataReaderBase
:members:

.. autoclass:: onmt.inputters.TextDataReader
:members:


Dataset
--------

.. autoclass:: onmt.inputters.Dataset
:members:
Loading

0 comments on commit d858d67

Please sign in to comment.