update docs #59

shaoxiongji · 2024-03-05T10:42:24Z

add scripts for translation
enhance tutorial

TimotheeMickus

Minor changes suggested, but it's only a suggestion.

docs/source/examples/sharing_schemes.md

docs/source/quickstart.md

Runnable quickstart docs

docs/source/examples/sharing_schemes.md

TimotheeMickus

Feels like we should have a tutorial page about distributed training.

This page should

mention that mammoth was written with distributed modular training on SLURM clusters in mind
explain world_size (total number of devices) / gpu_ranks (devices visible on the node) in the configs and the node_gpu (device on which a task is ran) in the task configs
provide a wrapper, and explain
- why it's necessary (i.e., so that we can set variables specific to a SLURM node)
- and how it works (arguments declared inside wrapper.sh are not evaluated until the wrapper is ran on a node, so they are node specific; arguments declared outside of the config are evaluated globally and first, so they are shared by all nodes).
provide examples that work in multi-gpu and multi-node setting.

We might also want a page about how to define a task, and how this links to the vocab, encoder_layers, decoder_layers and so on:

mention the config has a dict tasks, where keys are unique task identifiers and values are structured task definitions that will define what your model does (we have the (i) (ii) (iii) criteria from the demo paper for swahili to catalan one could include here)
explain the sharing groups and how their shapes are defined globally by the encoder_layers and decoder_layers
explain how the src_tgt task key links a task to the source and target vocabs defined globally, and mention how to do explicit vocab sharing. We probably want to stress that embeddings are defined per vocab at this stage
explain how the physicial compute device is decided with node_gpu and link back to the distributed training page
provide other keys, in particular path_valid_{src,tgt} to define validation loops and transforms

We probably also want to change the 101, quickstart, and sharing scheme pages such that

the example training only expects 1 GPU (it shouldn't fail if you have only 1 GPU but will silently ignore some of the tasks)
the wrapper not presented in the sharing schemes page
there are some pointers to the task definition page

docs/source/quickstart.md

TimotheeMickus · 2024-03-12T08:38:05Z

docs/source/quickstart.md


 ```bash
-python -u "$@" --node_rank $SLURM_NODEID -u ${PATH_TO_MAMMOTH}/train.py \


The provided command seems to be a mix between the wrapper's internal and a default wrapper-free call. You probably want one if the two following:

either run with a wrapper so it can handle multiple nodes; with the wrapper provided in sharing_schemes.md

srun wrapper.sh \ ${PATH_TO_MAMMOTH}/train.py \ -config my_config.yaml \ -master_port 9974 \ -master_ip ${SLURM_NODENAME} \ # and maybe -tensorboard -tensorboard_log_dir -save_model

or the default python call:

python3 -u \ ${PATH_TO_MAMMOTH}/train.py \ -config my_config.yaml \ # and maybe -tensorboard -tensorboard_log_dir -save_model

In the latter case -master_port 9974 and -master_ip ${SLURM_NODENAME} should no longer be required.

mostly fixed. Quickstart removes wrapper.sh (i guess for simplicity). I will leave the distributed training later maybe a tutorial in a new page.

docs/source/examples/sharing_schemes.md

docs/source/quickstart.md

TimotheeMickus · 2024-03-12T09:25:16Z

@stefanik12 does the call to train.py still work if people install it from pip?

shaoxiongji and others added 9 commits February 20, 2024 13:29

remove data proc

14c5649

clarify task

b11b9b7

add translate for mammoth101

beac3fd

shorten quickstart

2d18b92

add toc tree

3df66bc

add unpc

338f990

mammoth sharing schemes

9057c65

minor fixes

9e6b79f

Update prepare_data.md to make Europarl work

e69d0ae

shaoxiongji requested a review from TimotheeMickus March 5, 2024 10:42

TimotheeMickus approved these changes Mar 5, 2024

View reviewed changes

docs/source/examples/sharing_schemes.md Show resolved Hide resolved

docs/source/examples/sharing_schemes.md Outdated Show resolved Hide resolved

docs/source/quickstart.md Outdated Show resolved Hide resolved

docs/source/quickstart.md Show resolved Hide resolved

stefanik12 and others added 11 commits March 5, 2024 15:11

Update quickstart.md to work out-of-box

f7ffb9f

Update setup.py

0e9209b

Update prepare_data.md

210e967

Update prepare_data.md

d42b89f

Merge branch 'docs' into runnable-quickstart-docs

8237760

main.md: single-line installation

ed57159

prepare_data.md: legacy OPUS 100 link

81dfaeb

Update prepare_data.md

afe53e0

pip install mammoth-nlp in quickstart.md

83c500b

pip install mammoth-nlp in main.md

2d5c6c9

Merge pull request #62 from stefanik12/runnable-quickstart-docs

5cb21fd

Runnable quickstart docs