Skip to content

Commit

Permalink
docs fixing
Browse files Browse the repository at this point in the history
  • Loading branch information
Mickus Timothee committed Sep 19, 2023
1 parent e2d77ac commit d783503
Show file tree
Hide file tree
Showing 19 changed files with 302 additions and 302 deletions.
2 changes: 1 addition & 1 deletion docs/source/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ OpenNMT-py is a community developed project and we love developer contributions.
## Guidelines
Before sending a PR, please do this checklist first:

- Please run `onmt/tests/pull_request_chk.sh` and fix any errors. When adding new functionality, also add tests to this script. Included checks:
- Please run `mammoth/tests/pull_request_chk.sh` and fix any errors. When adding new functionality, also add tests to this script. Included checks:
1. flake8 check for coding style;
2. unittest;
3. continuous integration tests listed in `.travis.yml`.
Expand Down
22 changes: 11 additions & 11 deletions docs/source/FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,7 @@ Note: all the details about every flag and options for each transform can be fou

Transform name: `filtertoolong`

Class: `onmt.transforms.misc.FilterTooLongTransform`
Class: `mammoth.transforms.misc.FilterTooLongTransform`

The following options can be added to the configuration :
- `src_seq_length`: maximum source sequence length;
Expand All @@ -246,7 +246,7 @@ The following options can be added to the configuration :

Transform name: `prefix`

Class: `onmt.transforms.misc.PrefixTransform`
Class: `mammoth.transforms.misc.PrefixTransform`

For each dataset that the `prefix` transform is applied to, you can set the additional `src_prefix` and `tgt_prefix` parameters in its data configuration:

Expand Down Expand Up @@ -276,7 +276,7 @@ Common options for the tokenization transforms are the following:

Transform name: `onmt_tokenize`

Class: `onmt.transforms.tokenize.ONMTTokenizerTransform`
Class: `mammoth.transforms.tokenize.ONMTTokenizerTransform`

Additional options are available:
- `src_subword_type`: type of subword model for source side (from `["none", "sentencepiece", "bpe"]`);
Expand All @@ -288,15 +288,15 @@ Additional options are available:

Transform name: `sentencepiece`

Class: `onmt.transforms.tokenize.SentencePieceTransform`
Class: `mammoth.transforms.tokenize.SentencePieceTransform`

The `src_subword_model` and `tgt_subword_model` should be valid sentencepiece models.

#### BPE ([subword-nmt](https://github.com/rsennrich/subword-nmt))

Transform name: `bpe`

Class: `onmt.transforms.tokenize.BPETransform`
Class: `mammoth.transforms.tokenize.BPETransform`

The `src_subword_model` and `tgt_subword_model` should be valid BPE models.

Expand All @@ -321,7 +321,7 @@ These different types of noise can be controlled with the following options:

Transform name: `switchout`

Class: `onmt.transforms.sampling.SwitchOutTransform`
Class: `mammoth.transforms.sampling.SwitchOutTransform`

Options:

Expand All @@ -331,7 +331,7 @@ Options:

Transform name: `tokendrop`

Class: `onmt.transforms.sampling.TokenDropTransform`
Class: `mammoth.transforms.sampling.TokenDropTransform`

Options:

Expand All @@ -341,7 +341,7 @@ Options:

Transform name: `tokenmask`

Class: `onmt.transforms.sampling.TokenMaskTransform`
Class: `mammoth.transforms.sampling.TokenMaskTransform`

Options:

Expand Down Expand Up @@ -427,7 +427,7 @@ The `example` argument of `apply` is a `dict` of the form:
}
```

This is defined in `onmt.inputters.corpus.ParallelCorpus.load`. This class is not easily extendable for now but it can be considered for future developments. For instance, we could create some `CustomParallelCorpus` class that would handle other kind of inputs.
This is defined in `mammoth.inputters.corpus.ParallelCorpus.load`. This class is not easily extendable for now but it can be considered for future developments. For instance, we could create some `CustomParallelCorpus` class that would handle other kind of inputs.


## Can I get word alignments while translating?
Expand Down Expand Up @@ -649,7 +649,7 @@ A server configuration file (`./available_models/conf.json`) is required. It con
### II. How to start the server without Docker ?
---
##### 0. Get the code
The translation server has been merged into onmt-py `master` branch.
The translation server has been merged into mammoth-py `master` branch.
Keep in line with master for last fix / improvements.
##### 1. Install `flask`
```bash
Expand Down Expand Up @@ -699,7 +699,7 @@ RUN pip install --no-cache-dir -r requirements.txt
COPY server.py ./
COPY tools ./tools
COPY available_models ./available_models
COPY onmt ./onmt
COPY mammoth ./mammoth
CMD ["python", "./server.py"]
```
Expand Down
2 changes: 1 addition & 1 deletion docs/source/examples/LanguageModelGeneration.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ Options contained in the loaded model will trigger language modeling specific in
head data/wikitext-103-raw/wiki.valid.bpe | cut -d" " -f-15 > data/wikitext-103-raw/lm_input.txt
```

To proceed with LM inference, sampling methods such as top-k sampling or nucleus sampling are usually applied. Details and options about inference methods can be found in [`onmt/opts.py`](https://github.com/OpenNMT/OpenNMT-py/tree/master/onmt/opts.py).
To proceed with LM inference, sampling methods such as top-k sampling or nucleus sampling are usually applied. Details and options about inference methods can be found in [`mammoth/opts.py`](https://github.com/OpenNMT/OpenNMT-py/tree/master/mammoth/opts.py).

The following command will provide inference with nucleus sampling of p=0.9 and return the 3 sequences with the lowest perplexity out of the 10 generated sequences:
```bash
Expand Down
64 changes: 32 additions & 32 deletions docs/source/examples/Library.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The example notebook (available [here](https://github.com/OpenNMT/OpenNMT-py/blob/master/docs/source/examples/Library.ipynb)) should be able to run as a standalone execution, provided `onmt` is in the path (installed via `pip` for instance).\n",
"The example notebook (available [here](https://github.com/OpenNMT/OpenNMT-py/blob/master/docs/source/examples/Library.ipynb)) should be able to run as a standalone execution, provided `mammoth` is in the path (installed via `pip` for instance).\n",
"\n",
"Some parts may not be 100% 'library-friendly' but it's mostly workable."
]
Expand Down Expand Up @@ -42,12 +42,12 @@
"metadata": {},
"outputs": [],
"source": [
"import onmt\n",
"from onmt.inputters.inputter import _load_vocab, _build_fields_vocab, get_fields, IterOnDevice\n",
"from onmt.inputters.corpus import ParallelCorpus\n",
"from onmt.inputters.dynamic_iterator import DynamicDatasetIter\n",
"from onmt.translate import GNMTGlobalScorer, Translator, TranslationBuilder\n",
"from onmt.utils.misc import set_random_seed"
"import mammoth\n",
"from mammoth.inputters.inputter import _load_vocab, _build_fields_vocab, get_fields, IterOnDevice\n",
"from mammoth.inputters.corpus import ParallelCorpus\n",
"from mammoth.inputters.dynamic_iterator import DynamicDatasetIter\n",
"from mammoth.translate import GNMTGlobalScorer, Translator, TranslationBuilder\n",
"from mammoth.utils.misc import set_random_seed"
]
},
{
Expand Down Expand Up @@ -75,7 +75,7 @@
],
"source": [
"# enable logging\n",
"from onmt.utils.logging import init_logger, logger\n",
"from mammoth.utils.logging import init_logger, logger\n",
"init_logger()"
]
},
Expand Down Expand Up @@ -214,7 +214,7 @@
"metadata": {},
"outputs": [],
"source": [
"from onmt.utils.parse import ArgumentParser\n",
"from mammoth.utils.parse import ArgumentParser\n",
"parser = ArgumentParser(description='build_vocab.py')"
]
},
Expand All @@ -224,7 +224,7 @@
"metadata": {},
"outputs": [],
"source": [
"from onmt.opts import dynamic_prepare_opts\n",
"from mammoth.opts import dynamic_prepare_opts\n",
"dynamic_prepare_opts(parser, build_vocab_only=True)"
]
},
Expand Down Expand Up @@ -279,7 +279,7 @@
}
],
"source": [
"from onmt.bin.build_vocab import build_vocab_main\n",
"from mammoth.bin.build_vocab import build_vocab_main\n",
"build_vocab_main(opts)"
]
},
Expand Down Expand Up @@ -382,8 +382,8 @@
{
"data": {
"text/plain": [
"{'src': <onmt.inputters.text_dataset.TextMultiField at 0x7fca93802c50>,\n",
" 'tgt': <onmt.inputters.text_dataset.TextMultiField at 0x7fca93802f60>,\n",
"{'src': <mammoth.inputters.text_dataset.TextMultiField at 0x7fca93802c50>,\n",
" 'tgt': <mammoth.inputters.text_dataset.TextMultiField at 0x7fca93802f60>,\n",
" 'indices': <torchtext.data.field.Field at 0x7fca93802940>}"
]
},
Expand Down Expand Up @@ -478,29 +478,29 @@
"rnn_size = 500\n",
"# Specify the core model.\n",
"\n",
"encoder_embeddings = onmt.modules.Embeddings(emb_size, len(src_vocab),\n",
"encoder_embeddings = mammoth.modules.Embeddings(emb_size, len(src_vocab),\n",
" word_padding_idx=src_padding)\n",
"\n",
"encoder = onmt.encoders.RNNEncoder(hidden_size=rnn_size, num_layers=1,\n",
"encoder = mammoth.encoders.RNNEncoder(hidden_size=rnn_size, num_layers=1,\n",
" rnn_type=\"LSTM\", bidirectional=True,\n",
" embeddings=encoder_embeddings)\n",
"\n",
"decoder_embeddings = onmt.modules.Embeddings(emb_size, len(tgt_vocab),\n",
"decoder_embeddings = mammoth.modules.Embeddings(emb_size, len(tgt_vocab),\n",
" word_padding_idx=tgt_padding)\n",
"decoder = onmt.decoders.decoder.InputFeedRNNDecoder(\n",
"decoder = mammoth.decoders.decoder.InputFeedRNNDecoder(\n",
" hidden_size=rnn_size, num_layers=1, bidirectional_encoder=True, \n",
" rnn_type=\"LSTM\", embeddings=decoder_embeddings)\n",
"\n",
"device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n",
"model = onmt.models.model.NMTModel(encoder, decoder)\n",
"model = mammoth.models.model.NMTModel(encoder, decoder)\n",
"model.to(device)\n",
"\n",
"# Specify the tgt word generator and loss computation module\n",
"model.generator = nn.Sequential(\n",
" nn.Linear(rnn_size, len(tgt_vocab)),\n",
" nn.LogSoftmax(dim=-1)).to(device)\n",
"\n",
"loss = onmt.utils.loss.NMTLossCompute(\n",
"loss = mammoth.utils.loss.NMTLossCompute(\n",
" criterion=nn.NLLLoss(ignore_index=tgt_padding, reduction=\"sum\"),\n",
" generator=model.generator)"
]
Expand All @@ -520,7 +520,7 @@
"source": [
"lr = 1\n",
"torch_optimizer = torch.optim.SGD(model.parameters(), lr=lr)\n",
"optim = onmt.utils.optimizers.Optimizer(\n",
"optim = mammoth.utils.optimizers.Optimizer(\n",
" torch_optimizer, learning_rate=lr, max_grad_norm=2)"
]
},
Expand Down Expand Up @@ -681,7 +681,7 @@
{
"data": {
"text/plain": [
"<onmt.utils.statistics.Statistics at 0x7fca934e8e80>"
"<mammoth.utils.statistics.Statistics at 0x7fca934e8e80>"
]
},
"execution_count": 28,
Expand All @@ -690,10 +690,10 @@
}
],
"source": [
"report_manager = onmt.utils.ReportMgr(\n",
"report_manager = mammoth.utils.ReportMgr(\n",
" report_every=50, start_time=None, tensorboard_writer=None)\n",
"\n",
"trainer = onmt.Trainer(model=model,\n",
"trainer = mammoth.Trainer(model=model,\n",
" train_loss=loss,\n",
" valid_loss=loss,\n",
" optim=optim,\n",
Expand Down Expand Up @@ -726,9 +726,9 @@
"metadata": {},
"outputs": [],
"source": [
"src_data = {\"reader\": onmt.inputters.str2reader[\"text\"](), \"data\": src_val}\n",
"tgt_data = {\"reader\": onmt.inputters.str2reader[\"text\"](), \"data\": tgt_val}\n",
"_readers, _data = onmt.inputters.Dataset.config(\n",
"src_data = {\"reader\": mammoth.inputters.str2reader[\"text\"](), \"data\": src_val}\n",
"tgt_data = {\"reader\": mammoth.inputters.str2reader[\"text\"](), \"data\": tgt_val}\n",
"_readers, _data = mammoth.inputters.Dataset.config(\n",
" [('src', src_data), ('tgt', tgt_data)])"
]
},
Expand All @@ -738,9 +738,9 @@
"metadata": {},
"outputs": [],
"source": [
"dataset = onmt.inputters.Dataset(\n",
"dataset = mammoth.inputters.Dataset(\n",
" vocab_fields, readers=_readers, data=_data,\n",
" sort_key=onmt.inputters.str2sortkey[\"text\"])"
" sort_key=mammoth.inputters.str2sortkey[\"text\"])"
]
},
{
Expand All @@ -749,7 +749,7 @@
"metadata": {},
"outputs": [],
"source": [
"data_iter = onmt.inputters.OrderedIterator(\n",
"data_iter = mammoth.inputters.OrderedIterator(\n",
" dataset=dataset,\n",
" device=\"cuda\",\n",
" batch_size=10,\n",
Expand All @@ -766,8 +766,8 @@
"metadata": {},
"outputs": [],
"source": [
"src_reader = onmt.inputters.str2reader[\"text\"]\n",
"tgt_reader = onmt.inputters.str2reader[\"text\"]\n",
"src_reader = mammoth.inputters.str2reader[\"text\"]\n",
"tgt_reader = mammoth.inputters.str2reader[\"text\"]\n",
"scorer = GNMTGlobalScorer(alpha=0.7, \n",
" beta=0., \n",
" length_penalty=\"avg\", \n",
Expand All @@ -779,7 +779,7 @@
" tgt_reader=tgt_reader, \n",
" global_scorer=scorer,\n",
" gpu=gpu)\n",
"builder = onmt.translate.TranslationBuilder(data=dataset, \n",
"builder = mammoth.translate.TranslationBuilder(data=dataset, \n",
" fields=vocab_fields)"
]
},
Expand Down
Loading

0 comments on commit d783503

Please sign in to comment.