Skip to content

Commit

Permalink
link to pre-trained SPMs added
Browse files Browse the repository at this point in the history
  • Loading branch information
jorgtied committed Nov 4, 2022
1 parent ea1ca55 commit cd814d8
Show file tree
Hide file tree
Showing 281 changed files with 5,913 additions and 313 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ This is a challenge set for machine translation that contains 29G translation un
* [Incrementally updated development and test data](https://object.pouta.csc.fi/Tatoeba-Challenge-devtest/devtest.tar), ([here for individual language pairs](https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/data/devtest))
* [Release history](data/Releases.md)
* NEW: [Automatically translated monolingual data](data/Backtranslations.md)
* NEW: [Pre-trained sentence piece models](https://github.com/Helsinki-NLP/OPUS-MT-train/blob/master/tatoeba/SentencePieceModels.md)

The latest release also includes some parallel data sets in the same language in order to test paraphrase models. Note, however, that the support for paraphrasing is really limited in our data sets.

Expand Down
10 changes: 10 additions & 0 deletions models/aav-fiu/opus-2021-02-19.scores.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,9 @@ eng-est flores101-devtest 0.520 17.8 https://object.pouta.csc.fi/Tatoeba-MT-mode
eng-est flores200-dev 0.51202 17.6 https://object.pouta.csc.fi/Tatoeba-MT-models/aav-fiu/opus-2021-02-19.zip 997 18917
eng-est flores200-devtest 0.51881 17.5 https://object.pouta.csc.fi/Tatoeba-MT-models/aav-fiu/opus-2021-02-19.zip 1012 19788
eng-est newsdev2018 0.48906 16.7 https://object.pouta.csc.fi/Tatoeba-MT-models/aav-fiu/opus-2021-02-19.zip 2000 34492
eng-est newsdev2018 0.490 16.7 https://object.pouta.csc.fi/Tatoeba-MT-models/aav-fiu/opus-2021-02-19.zip 2000 34492
eng-est newstest2018 0.50050 17.7 https://object.pouta.csc.fi/Tatoeba-MT-models/aav-fiu/opus-2021-02-19.zip 2000 36269
eng-est newstest2018 0.502 17.9 https://object.pouta.csc.fi/Tatoeba-MT-models/aav-fiu/opus-2021-02-19.zip 2000 36269
eng-est tatoeba-test-v2020-07-28 0.662 46.5 https://object.pouta.csc.fi/Tatoeba-MT-models/aav-fiu/opus-2021-02-19.zip 1359 7992
eng-est tatoeba-test-v2021-03-30 0.662 46.5 https://object.pouta.csc.fi/Tatoeba-MT-models/aav-fiu/opus-2021-02-19.zip 1359 7992
eng-est tatoeba-test-v2021-08-07 0.662 46.5 https://object.pouta.csc.fi/Tatoeba-MT-models/aav-fiu/opus-2021-02-19.zip 1359 7992
Expand All @@ -12,13 +14,21 @@ eng-fin flores101-devtest 0.509 16.6 https://object.pouta.csc.fi/Tatoeba-MT-mode
eng-fin flores200-dev 0.50009 16.1 https://object.pouta.csc.fi/Tatoeba-MT-models/aav-fiu/opus-2021-02-19.zip 997 17938
eng-fin flores200-devtest 0.50851 16.5 https://object.pouta.csc.fi/Tatoeba-MT-models/aav-fiu/opus-2021-02-19.zip 1012 18781
eng-fin newsdev2015 0.49054 15.2 https://object.pouta.csc.fi/Tatoeba-MT-models/aav-fiu/opus-2021-02-19.zip 1500 23091
eng-fin newsdev2015 0.493 15.3 https://object.pouta.csc.fi/Tatoeba-MT-models/aav-fiu/opus-2021-02-19.zip 1500 23091
eng-fin newstest2015 0.50055 16.6 https://object.pouta.csc.fi/Tatoeba-MT-models/aav-fiu/opus-2021-02-19.zip 1370 19735
eng-fin newstest2015 0.503 16.8 https://object.pouta.csc.fi/Tatoeba-MT-models/aav-fiu/opus-2021-02-19.zip 1370 19735
eng-fin newstest2016 0.51274 18.0 https://object.pouta.csc.fi/Tatoeba-MT-models/aav-fiu/opus-2021-02-19.zip 3000 47678
eng-fin newstest2016 0.514 18.2 https://object.pouta.csc.fi/Tatoeba-MT-models/aav-fiu/opus-2021-02-19.zip 3000 47678
eng-fin newstest2017 0.53201 19.9 https://object.pouta.csc.fi/Tatoeba-MT-models/aav-fiu/opus-2021-02-19.zip 3002 45269
eng-fin newstest2017 0.535 20.1 https://object.pouta.csc.fi/Tatoeba-MT-models/aav-fiu/opus-2021-02-19.zip 3002 45269
eng-fin newstest2018 0.47266 12.9 https://object.pouta.csc.fi/Tatoeba-MT-models/aav-fiu/opus-2021-02-19.zip 3000 44836
eng-fin newstest2018 0.473 13.0 https://object.pouta.csc.fi/Tatoeba-MT-models/aav-fiu/opus-2021-02-19.zip 3000 44836
eng-fin newstest2019 0.49226 18.1 https://object.pouta.csc.fi/Tatoeba-MT-models/aav-fiu/opus-2021-02-19.zip 1997 38369
eng-fin newstest2019 0.494 18.4 https://object.pouta.csc.fi/Tatoeba-MT-models/aav-fiu/opus-2021-02-19.zip 1997 38369
eng-fin newstestB2016 0.48463 14.8 https://object.pouta.csc.fi/Tatoeba-MT-models/aav-fiu/opus-2021-02-19.zip 3000 45766
eng-fin newstestB2016 0.487 14.9 https://object.pouta.csc.fi/Tatoeba-MT-models/aav-fiu/opus-2021-02-19.zip 3000 45766
eng-fin newstestB2017 0.49889 16.8 https://object.pouta.csc.fi/Tatoeba-MT-models/aav-fiu/opus-2021-02-19.zip 3002 45506
eng-fin newstestB2017 0.501 17.0 https://object.pouta.csc.fi/Tatoeba-MT-models/aav-fiu/opus-2021-02-19.zip 3002 45506
eng-fin tatoeba-test-v2020-07-28 0.577 32.7 https://object.pouta.csc.fi/Tatoeba-MT-models/aav-fiu/opus-2021-02-19.zip 10000 60517
eng-fin tatoeba-test-v2021-03-30 0.577 32.5 https://object.pouta.csc.fi/Tatoeba-MT-models/aav-fiu/opus-2021-02-19.zip 10186 61736
eng-fin tatoeba-test-v2021-08-07 0.576 32.3 https://object.pouta.csc.fi/Tatoeba-MT-models/aav-fiu/opus-2021-02-19.zip 10690 65122
Expand Down
1 change: 1 addition & 0 deletions models/aav-myn/opus-2021-02-18.scores.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ khm-eng flores101-devtest 0.314 7.1 https://object.pouta.csc.fi/Tatoeba-MT-model
khm-eng flores200-dev 0.34395 8.9 https://object.pouta.csc.fi/Tatoeba-MT-models/aav-myn/opus-2021-02-18.zip 997 23555
khm-eng flores200-devtest 0.31803 7.4 https://object.pouta.csc.fi/Tatoeba-MT-models/aav-myn/opus-2021-02-18.zip 1012 24721
khm-eng newstest2020 0.242 3.7 https://object.pouta.csc.fi/Tatoeba-MT-models/aav-myn/opus-2021-02-18.zip 2320 44960
khm-eng newstest2020 0.24549 3.8 https://object.pouta.csc.fi/Tatoeba-MT-models/aav-myn/opus-2021-02-18.zip 2320 44960
khm-eng tatoeba-test-v2020-07-28 0.236 7.0 https://object.pouta.csc.fi/Tatoeba-MT-models/aav-myn/opus-2021-02-18.zip 752 4394
khm-eng tatoeba-test-v2021-03-30 0.236 7.0 https://object.pouta.csc.fi/Tatoeba-MT-models/aav-myn/opus-2021-02-18.zip 754 4412
khm-eng tatoeba-test-v2021-08-07 0.237 7.0 https://object.pouta.csc.fi/Tatoeba-MT-models/aav-myn/opus-2021-02-18.zip 726 4288
Expand Down
98 changes: 98 additions & 0 deletions models/afa-afa/opus-2021-02-23.bleu-scores.txt
Original file line number Diff line number Diff line change
@@ -1,52 +1,150 @@
apc-amh flores200-dev 1.3
apc-amh flores200-devtest 1.5
apc-ara flores200-dev 14.6
apc-ara flores200-devtest 14.5
apc-arz flores200-dev 5.0
apc-arz flores200-devtest 5.2
apc-eng flores200-dev 17.8
apc-eng flores200-devtest 17.7
apc-heb flores200-dev 5.7
apc-heb flores200-devtest 6.0
apc-kab flores200-dev 0.3
apc-kab flores200-devtest 0.3
apc-mlt flores200-dev 7.2
apc-mlt flores200-devtest 6.9
apc-som flores200-dev 2.0
apc-som flores200-devtest 2.1
apc-tir flores200-dev 1.6
apc-tir flores200-devtest 1.7
ara-amh flores101-dev 1.8
ara-amh flores101-devtest 1.8
ara-amh flores200-dev 1.9
ara-amh flores200-devtest 1.8
ara-arz flores200-dev 9.1
ara-arz flores200-devtest 9.7
ara-eng flores101-dev 24.4
ara-eng flores101-devtest 24.1
ara-eng flores200-dev 24.4
ara-eng flores200-devtest 24.3
ara-eng tatoeba-test-v2020-07-28 37.5
ara-eng tatoeba-test-v2021-03-30 37.3
ara-eng tatoeba-test-v2021-08-07 37.3
ara-eng tico19-test 24.1
ara-heb flores101-dev 8.2
ara-heb flores101-devtest 8.8
ara-heb flores200-dev 8.2
ara-heb flores200-devtest 8.7
ara-heb tatoeba-test-v2020-07-28 34.7
ara-heb tatoeba-test-v2021-03-30 34.7
ara-heb tatoeba-test-v2021-08-07 34.7
ara-kab flores200-dev 0.4
ara-kab flores200-devtest 0.2
ara-mlt flores101-dev 11.4
ara-mlt flores101-devtest 10.5
ara-mlt flores200-dev 11.3
ara-mlt flores200-devtest 10.5
ara-som flores101-dev 2.6
ara-som flores101-devtest 2.2
ara-som flores200-dev 2.6
ara-som flores200-devtest 2.3
ara-tir flores200-dev 2.1
ara-tir flores200-devtest 2.0
arq-eng tatoeba-test-v2020-07-28 6.7
arq-eng tatoeba-test-v2021-03-30 6.8
arq-eng tatoeba-test-v2021-08-07 6.8
arz-amh flores200-dev 1.0
arz-amh flores200-devtest 1.2
arz-ara flores200-dev 26.2
arz-ara flores200-devtest 25.5
arz-eng flores200-dev 14.3
arz-eng flores200-devtest 13.9
arz-heb flores200-dev 5.3
arz-heb flores200-devtest 5.2
arz-kab flores200-dev 0.3
arz-kab flores200-devtest 0.2
arz-mlt flores200-dev 6.1
arz-mlt flores200-devtest 5.8
arz-som flores200-dev 1.8
arz-som flores200-devtest 1.8
arz-tir flores200-dev 1.3
arz-tir flores200-devtest 1.4
heb-amh flores101-dev 1.9
heb-amh flores101-devtest 1.6
heb-amh flores200-dev 1.9
heb-amh flores200-devtest 1.5
heb-ara flores101-dev 6.8
heb-ara flores101-devtest 6.6
heb-ara flores200-dev 6.8
heb-ara flores200-devtest 6.3
heb-ara tatoeba-test-v2020-07-28 17.8
heb-ara tatoeba-test-v2021-03-30 17.8
heb-ara tatoeba-test-v2021-08-07 17.8
heb-arz flores200-dev 2.9
heb-arz flores200-devtest 2.2
heb-eng flores101-dev 24.9
heb-eng flores101-devtest 24.0
heb-eng flores200-dev 24.7
heb-eng flores200-devtest 24.1
heb-eng tatoeba-test-v2020-07-28 40.7
heb-eng tatoeba-test-v2021-03-30 40.9
heb-eng tatoeba-test-v2021-08-07 41.0
heb-kab flores200-dev 0.4
heb-kab flores200-devtest 0.3
heb-mlt flores101-dev 8.9
heb-mlt flores101-devtest 8.6
heb-mlt flores200-dev 8.7
heb-mlt flores200-devtest 8.6
heb-som flores101-dev 2.4
heb-som flores101-devtest 2.3
heb-som flores200-dev 2.4
heb-som flores200-devtest 2.4
heb-tir flores200-dev 2.2
heb-tir flores200-devtest 1.8
kab-amh flores200-dev 0.3
kab-amh flores200-devtest 0.3
kab-ara flores200-dev 0.5
kab-ara flores200-devtest 0.4
kab-arz flores200-dev 0.2
kab-arz flores200-devtest 0.2
kab-eng flores200-dev 2.4
kab-eng flores200-devtest 2.1
kab-eng tatoeba-test-v2020-07-28 4.7
kab-eng tatoeba-test-v2021-03-30 4.6
kab-eng tatoeba-test-v2021-08-07 4.5
kab-heb flores200-dev 0.1
kab-heb flores200-devtest 0.3
kab-mlt flores200-dev 1.0
kab-mlt flores200-devtest 1.1
kab-som flores200-dev 0.6
kab-som flores200-devtest 0.6
kab-tir flores200-dev 0.2
kab-tir flores200-devtest 0.4
mlt-amh flores101-dev 2.7
mlt-amh flores101-devtest 3.0
mlt-amh flores200-dev 2.8
mlt-amh flores200-devtest 3.2
mlt-ara flores101-dev 9.9
mlt-ara flores101-devtest 9.9
mlt-ara flores200-dev 9.9
mlt-ara flores200-devtest 10.2
mlt-arz flores200-dev 4.5
mlt-arz flores200-devtest 4.1
mlt-eng flores101-dev 41.6
mlt-eng flores101-devtest 40.7
mlt-eng flores200-dev 41.5
mlt-eng flores200-devtest 40.6
mlt-eng tatoeba-test-v2020-07-28 45.9
mlt-eng tatoeba-test-v2021-03-30 45.9
mlt-eng tatoeba-test-v2021-08-07 45.9
mlt-heb flores101-dev 10.5
mlt-heb flores101-devtest 11.3
mlt-heb flores200-dev 10.0
mlt-heb flores200-devtest 10.9
mlt-kab flores200-dev 0.3
mlt-kab flores200-devtest 0.4
mlt-som flores101-dev 4.3
mlt-som flores101-devtest 3.9
mlt-som flores200-dev 4.3
mlt-som flores200-devtest 3.9
mlt-tir flores200-dev 2.8
mlt-tir flores200-devtest 2.5
98 changes: 98 additions & 0 deletions models/afa-afa/opus-2021-02-23.chrf-scores.txt
Original file line number Diff line number Diff line change
@@ -1,52 +1,150 @@
apc-amh flores200-dev 0.16842
apc-amh flores200-devtest 0.17815
apc-ara flores200-dev 0.49671
apc-ara flores200-devtest 0.50228
apc-arz flores200-dev 0.32542
apc-arz flores200-devtest 0.33272
apc-eng flores200-dev 0.46734
apc-eng flores200-devtest 0.46472
apc-heb flores200-dev 0.33032
apc-heb flores200-devtest 0.33572
apc-kab flores200-dev 0.18805
apc-kab flores200-devtest 0.18732
apc-mlt flores200-dev 0.40466
apc-mlt flores200-devtest 0.40696
apc-som flores200-dev 0.29369
apc-som flores200-devtest 0.28787
apc-tir flores200-dev 0.15826
apc-tir flores200-devtest 0.15468
ara-amh flores101-dev 0.192
ara-amh flores101-devtest 0.203
ara-amh flores200-dev 0.19192
ara-amh flores200-devtest 0.20256
ara-arz flores200-dev 0.40009
ara-arz flores200-devtest 0.40321
ara-eng flores101-dev 0.529
ara-eng flores101-devtest 0.528
ara-eng flores200-dev 0.52938
ara-eng flores200-devtest 0.52869
ara-eng tatoeba-test-v2020-07-28 0.555
ara-eng tatoeba-test-v2021-03-30 0.553
ara-eng tatoeba-test-v2021-08-07 0.553
ara-eng tico19-test 0.529
ara-heb flores101-dev 0.367
ara-heb flores101-devtest 0.375
ara-heb flores200-dev 0.36926
ara-heb flores200-devtest 0.37582
ara-heb tatoeba-test-v2020-07-28 0.549
ara-heb tatoeba-test-v2021-03-30 0.549
ara-heb tatoeba-test-v2021-08-07 0.549
ara-kab flores200-dev 0.19135
ara-kab flores200-devtest 0.19049
ara-mlt flores101-dev 0.458
ara-mlt flores101-devtest 0.453
ara-mlt flores200-dev 0.45823
ara-mlt flores200-devtest 0.45340
ara-som flores101-dev 0.317
ara-som flores101-devtest 0.310
ara-som flores200-dev 0.31688
ara-som flores200-devtest 0.31095
ara-tir flores200-dev 0.17771
ara-tir flores200-devtest 0.17455
arq-eng tatoeba-test-v2020-07-28 0.229
arq-eng tatoeba-test-v2021-03-30 0.229
arq-eng tatoeba-test-v2021-08-07 0.229
arz-amh flores200-dev 0.16295
arz-amh flores200-devtest 0.16996
arz-ara flores200-dev 0.56797
arz-ara flores200-devtest 0.56696
arz-eng flores200-dev 0.43968
arz-eng flores200-devtest 0.43523
arz-heb flores200-dev 0.32238
arz-heb flores200-devtest 0.32537
arz-kab flores200-dev 0.18921
arz-kab flores200-devtest 0.19019
arz-mlt flores200-dev 0.39411
arz-mlt flores200-devtest 0.38801
arz-som flores200-dev 0.29068
arz-som flores200-devtest 0.29162
arz-tir flores200-dev 0.15364
arz-tir flores200-devtest 0.15195
heb-amh flores101-dev 0.185
heb-amh flores101-devtest 0.189
heb-amh flores200-dev 0.18495
heb-amh flores200-devtest 0.18928
heb-ara flores101-dev 0.341
heb-ara flores101-devtest 0.348
heb-ara flores200-dev 0.34398
heb-ara flores200-devtest 0.34749
heb-ara tatoeba-test-v2020-07-28 0.479
heb-ara tatoeba-test-v2021-03-30 0.479
heb-ara tatoeba-test-v2021-08-07 0.479
heb-arz flores200-dev 0.25312
heb-arz flores200-devtest 0.25147
heb-eng flores101-dev 0.509
heb-eng flores101-devtest 0.501
heb-eng flores200-dev 0.50870
heb-eng flores200-devtest 0.50376
heb-eng tatoeba-test-v2020-07-28 0.576
heb-eng tatoeba-test-v2021-03-30 0.578
heb-eng tatoeba-test-v2021-08-07 0.578
heb-kab flores200-dev 0.19160
heb-kab flores200-devtest 0.19149
heb-mlt flores101-dev 0.430
heb-mlt flores101-devtest 0.425
heb-mlt flores200-dev 0.43016
heb-mlt flores200-devtest 0.42541
heb-som flores101-dev 0.301
heb-som flores101-devtest 0.294
heb-som flores200-dev 0.30178
heb-som flores200-devtest 0.29568
heb-tir flores200-dev 0.16616
heb-tir flores200-devtest 0.16111
kab-amh flores200-dev 8.060
kab-amh flores200-devtest 8.109
kab-ara flores200-dev 0.15285
kab-ara flores200-devtest 0.15128
kab-arz flores200-dev 7.051
kab-arz flores200-devtest 7.065
kab-eng flores200-dev 0.22666
kab-eng flores200-devtest 0.22332
kab-eng tatoeba-test-v2020-07-28 0.220
kab-eng tatoeba-test-v2021-03-30 0.221
kab-eng tatoeba-test-v2021-08-07 0.221
kab-heb flores200-dev 0.14961
kab-heb flores200-devtest 0.14683
kab-mlt flores200-dev 0.16212
kab-mlt flores200-devtest 0.15918
kab-som flores200-dev 0.18271
kab-som flores200-devtest 0.18689
kab-tir flores200-dev 7.499
kab-tir flores200-devtest 7.286
mlt-amh flores101-dev 0.221
mlt-amh flores101-devtest 0.229
mlt-amh flores200-dev 0.22155
mlt-amh flores200-devtest 0.22732
mlt-ara flores101-dev 0.385
mlt-ara flores101-devtest 0.383
mlt-ara flores200-dev 0.38561
mlt-ara flores200-devtest 0.38540
mlt-arz flores200-dev 0.31654
mlt-arz flores200-devtest 0.31726
mlt-eng flores101-dev 0.674
mlt-eng flores101-devtest 0.669
mlt-eng flores200-dev 0.67395
mlt-eng flores200-devtest 0.66904
mlt-eng tatoeba-test-v2020-07-28 0.627
mlt-eng tatoeba-test-v2021-03-30 0.627
mlt-eng tatoeba-test-v2021-08-07 0.627
mlt-heb flores101-dev 0.396
mlt-heb flores101-devtest 0.398
mlt-heb flores200-dev 0.39812
mlt-heb flores200-devtest 0.39796
mlt-kab flores200-dev 0.19394
mlt-kab flores200-devtest 0.19508
mlt-som flores101-dev 0.329
mlt-som flores101-devtest 0.321
mlt-som flores200-dev 0.32954
mlt-som flores200-devtest 0.32173
mlt-tir flores200-dev 0.19216
mlt-tir flores200-devtest 0.18913
Loading

0 comments on commit cd814d8

Please sign in to comment.