Skip to content

Commit

Permalink
some small corrections in README template
Browse files Browse the repository at this point in the history
  • Loading branch information
jorgtied committed Oct 8, 2023
1 parent c6a2fa7 commit 1d215d6
Show file tree
Hide file tree
Showing 2 changed files with 34 additions and 12 deletions.
23 changes: 17 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,10 @@ This is a challenge set for machine translation that contains 33G translation un
* [Training](data/README.md), [development](https://object.pouta.csc.fi/Tatoeba-Challenge-devtest/dev.tar) and [test data](https://object.pouta.csc.fi/Tatoeba-Challenge-devtest/test.tar)
* [Baseline models](results/tatoeba-models-all.md) and [results](results/tatoeba-results-all.md) ([training procedures](https://github.com/Helsinki-NLP/OPUS-MT-train/blob/master/doc/TatoebaChallenge.md))
* [Ideal for multilingual models and transfer learning](results/tatoeba-results-langgroup.md)
* New: [The OPUS-MT leaderboard](https://opus.nlpl.eu/dashboard/)
* New: [The status of available NMT models on a map](https://opus.nlpl.eu/NMT-map/Tatoeba/all/src2trg/) (for release v2020-07-28)

[![NMT map](images/NMT-map-small.png)](https://opus.nlpl.eu/NMT-map/Tatoeba/all/src2trg/)
[![NMT map](images/NMT-map-small.png)](https://opus.nlpl.eu/NMT-map/Tatoeba-all/src2trg/)


## Tasks
Expand All @@ -22,14 +23,15 @@ This is a challenge set for machine translation that contains 33G translation un

## Downloads

* [All test data](https://object.pouta.csc.fi/Tatoeba-Challenge-devtest/test.tar) ([individual files](https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/data/release/test))
* [All development data](https://object.pouta.csc.fi/Tatoeba-Challenge-devtest/dev.tar) ([individual files](https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/data/release/dev))
* [All test data](https://object.pouta.csc.fi/Tatoeba-Challenge-devtest/test.tar) ([individual files](data/release/test))
* [All development data](https://object.pouta.csc.fi/Tatoeba-Challenge-devtest/dev.tar) ([individual files](data/release/dev))
* [Bilingual training data](data/README-v2023-09-26.md), language-pair specific downloads
* [Extra bilingual training data](data/subsets/NoTestData-v2023-09-26.md), language-pair specific downloads
* [Monolingual data sets](data/MonolingualData.md), [with document boundaries](data/Wiki.md), [de-duplicated and shuffled](data/Wiki.md)
* [Incrementally updated development and test data](https://object.pouta.csc.fi/Tatoeba-Challenge-devtest/devtest.tar), ([here for individual language pairs](https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/data/devtest))
* [Incrementally updated development and test data](https://object.pouta.csc.fi/Tatoeba-Challenge-devtest/devtest.tar), ([here for individual language pairs](data/devtest))
* [Release history](data/Releases.md)
* NEW: [Automatically translated monolingual data](data/Backtranslations.md)
* NEW: [Pre-trained sentence piece models](https://github.com/Helsinki-NLP/OPUS-MT-train/blob/master/tatoeba/SentencePieceModels.md)

The latest release also includes some parallel data sets in the same language in order to test paraphrase models. Note, however, that the support for paraphrasing is really limited in our data sets.

Expand Down Expand Up @@ -80,7 +82,7 @@ Files with the extension `.src` refer to sentences in the source language (`deu`

Other notes about the compilation of the data sets can be found in [Development.md](doc/Development.md) and the complete lists of language pairs is in [data/README.md](data/README.md).

New releases are planned in the future and will be announced here. Development and test data will be updated regularly but the original test sets will stay in the release. Updates of the test data will be available through this [devtest release](https://object.pouta.csc.fi/Tatoeba-Challenge-devtest/devtest.tar) and will not include any examples available in development data. Those data sets are also available from this git repository in the sub directory [data/devtest/](https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/data/devtest).
New releases are planned in the future and will be announced here. Development and test data will be updated regularly but the original test sets will stay in the release. Updates of the test data will be available through this [devtest release](https://object.pouta.csc.fi/Tatoeba-Challenge-devtest/devtest.tar) and will not include any examples available in development data. Those data sets are also available from this git repository in the sub directory [data/devtest/](data/devtest).


## The translation challenge
Expand Down Expand Up @@ -124,6 +126,15 @@ Challenge subset results (v2023-09-26):
* results for the [higher resource language pairs](results/tatoeba-results-v2023-09-26-subset-higher.md)
* results for the [highest resource language pairs](results/tatoeba-results-v2023-09-26-subset-highest.md)

Challenge subset results (v2021-08-07):

* results for the [zero-shot language pairs](results/tatoeba-results-v2021-08-07-subset-zero.md)
* results for the [lowest resource language pairs](results/tatoeba-results-v2021-08-07-subset-lowest.md)
* results for the [lower resource language pairs](results/tatoeba-results-v2021-08-07-subset-lower.md)
* results for the [medium resource language pairs](results/tatoeba-results-v2021-08-07-subset-medium.md)
* results for the [higher resource language pairs](results/tatoeba-results-v2021-08-07-subset-higher.md)
* results for the [highest resource language pairs](results/tatoeba-results-v2021-08-07-subset-highest.md)

Challenge subset results (v2020-07-28):

* results for the [zero-shot language pairs](results/tatoeba-results-v2020-07-28-subset-zero.md)
Expand All @@ -133,7 +144,7 @@ Challenge subset results (v2020-07-28):
* results for the [higher resource language pairs](results/tatoeba-results-v2020-07-28-subset-higher.md)
* results for the [highest resource language pairs](results/tatoeba-results-v2020-07-28-subset-highest.md)

We publish (reasonable) models to be re-used and deployed through [OPUS-MT](https://github.com/Helsinki-NLP/Opus-MT) and linked from the [model subdir in this github](https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/models). This includes multilingual models that cover several languages in source and target to enable transfer learning across languages.
We publish (reasonable) models to be re-used and deployed through [OPUS-MT](https://github.com/Helsinki-NLP/Opus-MT) and linked from the [model subdir in this github](models). This includes multilingual models that cover several languages in source and target to enable transfer learning across languages.



Expand Down
23 changes: 17 additions & 6 deletions README.template
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,10 @@ This is a challenge set for machine translation that contains %%TRAIN_SIZE%% tra
* [Training](data/README.md), [development](https://object.pouta.csc.fi/Tatoeba-Challenge-devtest/dev.tar) and [test data](https://object.pouta.csc.fi/Tatoeba-Challenge-devtest/test.tar)
* [Baseline models](results/tatoeba-models-all.md) and [results](results/tatoeba-results-all.md) ([training procedures](https://github.com/Helsinki-NLP/OPUS-MT-train/blob/master/doc/TatoebaChallenge.md))
* [Ideal for multilingual models and transfer learning](results/tatoeba-results-langgroup.md)
* New: [The OPUS-MT leaderboard](https://opus.nlpl.eu/dashboard/)
* New: [The status of available NMT models on a map](https://opus.nlpl.eu/NMT-map/Tatoeba/all/src2trg/) (for release v2020-07-28)

[![NMT map](images/NMT-map-small.png)](https://opus.nlpl.eu/NMT-map/Tatoeba/all/src2trg/)
[![NMT map](images/NMT-map-small.png)](https://opus.nlpl.eu/NMT-map/Tatoeba-all/src2trg/)


## Tasks
Expand All @@ -22,14 +23,15 @@ This is a challenge set for machine translation that contains %%TRAIN_SIZE%% tra

## Downloads

* [All test data](https://object.pouta.csc.fi/Tatoeba-Challenge-devtest/test.tar) ([individual files](https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/data/release/test))
* [All development data](https://object.pouta.csc.fi/Tatoeba-Challenge-devtest/dev.tar) ([individual files](https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/data/release/dev))
* [All test data](https://object.pouta.csc.fi/Tatoeba-Challenge-devtest/test.tar) ([individual files](data/release/test))
* [All development data](https://object.pouta.csc.fi/Tatoeba-Challenge-devtest/dev.tar) ([individual files](data/release/dev))
* [Bilingual training data](data/README-%%TRAINSET_RELEASE%%.md), language-pair specific downloads
* [Extra bilingual training data](data/subsets/NoTestData-%%EXTRATRAINSET_RELEASE%%.md), language-pair specific downloads
* [Monolingual data sets](data/MonolingualData.md), [with document boundaries](data/Wiki.md), [de-duplicated and shuffled](data/Wiki.md)
* [Incrementally updated development and test data](https://object.pouta.csc.fi/Tatoeba-Challenge-devtest/devtest.tar), ([here for individual language pairs](https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/data/devtest))
* [Incrementally updated development and test data](https://object.pouta.csc.fi/Tatoeba-Challenge-devtest/devtest.tar), ([here for individual language pairs](data/devtest))
* [Release history](data/Releases.md)
* NEW: [Automatically translated monolingual data](data/Backtranslations.md)
* NEW: [Pre-trained sentence piece models](https://github.com/Helsinki-NLP/OPUS-MT-train/blob/master/tatoeba/SentencePieceModels.md)

The latest release also includes some parallel data sets in the same language in order to test paraphrase models. Note, however, that the support for paraphrasing is really limited in our data sets.

Expand Down Expand Up @@ -80,7 +82,7 @@ Files with the extension `.src` refer to sentences in the source language (`deu`

Other notes about the compilation of the data sets can be found in [Development.md](doc/Development.md) and the complete lists of language pairs is in [data/README.md](data/README.md).

New releases are planned in the future and will be announced here. Development and test data will be updated regularly but the original test sets will stay in the release. Updates of the test data will be available through this [devtest release](https://object.pouta.csc.fi/Tatoeba-Challenge-devtest/devtest.tar) and will not include any examples available in development data. Those data sets are also available from this git repository in the sub directory [data/devtest/](https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/data/devtest).
New releases are planned in the future and will be announced here. Development and test data will be updated regularly but the original test sets will stay in the release. Updates of the test data will be available through this [devtest release](https://object.pouta.csc.fi/Tatoeba-Challenge-devtest/devtest.tar) and will not include any examples available in development data. Those data sets are also available from this git repository in the sub directory [data/devtest/](data/devtest).


## The translation challenge
Expand Down Expand Up @@ -124,6 +126,15 @@ Challenge subset results (%%RELEASE%%):
* results for the [higher resource language pairs](results/tatoeba-results-%%RELEASE%%-subset-higher.md)
* results for the [highest resource language pairs](results/tatoeba-results-%%RELEASE%%-subset-highest.md)

Challenge subset results (v2021-08-07):

* results for the [zero-shot language pairs](results/tatoeba-results-v2021-08-07-subset-zero.md)
* results for the [lowest resource language pairs](results/tatoeba-results-v2021-08-07-subset-lowest.md)
* results for the [lower resource language pairs](results/tatoeba-results-v2021-08-07-subset-lower.md)
* results for the [medium resource language pairs](results/tatoeba-results-v2021-08-07-subset-medium.md)
* results for the [higher resource language pairs](results/tatoeba-results-v2021-08-07-subset-higher.md)
* results for the [highest resource language pairs](results/tatoeba-results-v2021-08-07-subset-highest.md)

Challenge subset results (v2020-07-28):

* results for the [zero-shot language pairs](results/tatoeba-results-v2020-07-28-subset-zero.md)
Expand All @@ -133,7 +144,7 @@ Challenge subset results (v2020-07-28):
* results for the [higher resource language pairs](results/tatoeba-results-v2020-07-28-subset-higher.md)
* results for the [highest resource language pairs](results/tatoeba-results-v2020-07-28-subset-highest.md)

We publish (reasonable) models to be re-used and deployed through [OPUS-MT](https://github.com/Helsinki-NLP/Opus-MT) and linked from the [model subdir in this github](https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/models). This includes multilingual models that cover several languages in source and target to enable transfer learning across languages.
We publish (reasonable) models to be re-used and deployed through [OPUS-MT](https://github.com/Helsinki-NLP/Opus-MT) and linked from the [model subdir in this github](models). This includes multilingual models that cover several languages in source and target to enable transfer learning across languages.



Expand Down

0 comments on commit 1d215d6

Please sign in to comment.