adding more info to docs

Helsinki-NLP · May 15, 2024 · 3ffee12 · 3ffee12
1 parent d359e76
commit 3ffee12
Show file tree

Hide file tree

Showing 3 changed files with 63 additions and 7 deletions.
diff --git a/docs/index.rst b/docs/index.rst
@@ -4,13 +4,13 @@ OpusDistillery
 Welcome to OpusDistillery's documentation!
 
 OpusDistillery is an end-to-end pipeline to perform systematic multilingual distillation of MT models.
-It is built on top of the `Firefox Translations Training pipeline <https://github.com/mozilla/firefox-translations-training>`,
-originally developed within the `Bergamot project<https://browser.mt>`, for training efficient NMT models that can run locally in a web browser.
+It is built on top of the `Firefox Translations Training pipeline <https://github.com/mozilla/firefox-translations-training>`_,
+originally developed within the `Bergamot project<https://browser.mt>`_, for training efficient NMT models that can run locally in a web browser.
 
 The pipeline is capable of training a translation model for any language pair(s) end to end.
 Translation quality depends on the chosen datasets, data cleaning procedures and hyperparameters. Some settings, especially low resource languages might require extra tuning.
 
-We use `Marian<https://marian-nmt.github.io/>`, the fast neural machine translation engine .
+We use `Marian<https://marian-nmt.github.io/>`_, the fast neural machine translation engine .
 
 New features:
 
@@ -22,4 +22,5 @@ New features:
    :caption: Get started
    :maxdepth: 1
 
-   installation.md
+   installation.md
+   usage.md
diff --git a/docs/installation.md b/docs/installation.md
@@ -1,4 +1,6 @@
-# Getting started on CSC's puhti and mahti
+# Installation
+
+## Getting started on CSC's puhti and mahti
 1. Clone the repository.
 2. Download the Ftt.sif container to the repository root.
 3. Create a virtual Python environment for Snakemake (e.g. in the parent dir of the repository):
@@ -11,9 +13,9 @@
 7. If the data directory is not located in the parent directory of the repository, edit _profiles/slurm-puhti/config.yaml_ or _profiles/slurm-mahti/config.yaml_ and change the bindings in the singularity-args section to point to your data directory, and also enter the _data_ directory path as the _root_ value of the _config_ section.
 8. Edit profiles/slurm-puhti/config.cluster.yaml to change the CSC account to one you have access to. 
 9. Load cuda modules: module load gcc/9.4.0 cuda cudnn
-10. Run pipeline: `make run-hpc PROFILE="slurm-puhti"` or `make run PROFILE="slurm-mahti"`
+10. Run pipeline: `make run-hpc PROFILE="slurm-puhti"` or `make run PROFILE="slurm-mahti"`. More information in [Basic Usage](usage.md)
 
-# Getting started on CSC's lumi
+## Getting started on CSC's lumi
 1. Clone the repository.
 2. Download the Ftt.sif container to the repository root.
 3. Create a virtual Python environment for Snakemake (e.g. in the parent dir of the repository):

diff --git a/docs/usage.md b/docs/usage.md
@@ -0,0 +1,53 @@
+# Basic usage
+
+## Running
+
+Load all the necessary modules as explained in [Installation](installation.md)
+
+Dry run first to check that everything was installed correctly:
+
+```
+make dry-run
+```
+
+To run the pipeline:
+```
+make run
+```
+
+To test the whole pipeline end to end (it is supposed to run relatively quickly and does not train anything useful):
+
+```
+make test
+```
+You can also run a speicific profile or config by overriding variables from Makefile
+```
+make run PROFILE=slurm-moz CONFIG=configs/config.test.yml
+```
+
+### Specific target
+
+By default, all Snakemake rules are executed. To run the pipeline up to a specific rule use:
+```
+make run TARGET=<non-wildcard-rule-or-path>
+```
+For example, collect corpus first:
+```
+make run TARGET=merge_corpus
+```
+
+You can also use the full file path, for example:
+```
+make run TARGET=/models/ru-en/bicleaner/teacher-base0/model.npz.best-ce-mean-words.npz
+```
+### Rerunning
+
+If you want to rerun a specific step or steps, you can delete the result files that are expected in the Snakemake rule output.
+Snakemake might complain about a missing file and suggest to run it with `--clean-metadata` flag. In this case run:
+```
+make clean-meta TARGET=<missing-file-name>
+```
+and then as usual:
+```
+make run
+```