Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
opus-2020-07-06.yml		opus-2020-07-06.yml
opus-2020-07-26.yml		opus-2020-07-26.yml
opus.yml		opus.yml
opus2m-2020-08-01.yml		opus2m-2020-08-01.yml
opus2m.yml		opus2m.yml

README.md

opus-2020-07-06.zip

dataset: opus
model: transformer
source language(s): eng
target language(s): acm afb amh apc ara arq ary arz hau_Latn heb kab mlt rif_Latn shy_Latn som tir
model: transformer
pre-processing: normalization + SentencePiece (spm32k,spm32k)
a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
download: opus-2020-07-06.zip
test set translations: opus-2020-07-06.test.txt
test set scores: opus-2020-07-06.eval.txt

Benchmarks

testset	BLEU	chr-F
Tatoeba-test.eng-amh.eng.amh	9.6	0.502
Tatoeba-test.eng-ara.eng.ara	11.5	0.402
Tatoeba-test.eng-hau.eng.hau	10.1	0.450
Tatoeba-test.eng-heb.eng.heb	31.3	0.542
Tatoeba-test.eng-kab.eng.kab	1.2	0.179
Tatoeba-test.eng-mlt.eng.mlt	15.0	0.525
Tatoeba-test.eng.multi	13.8	0.364
Tatoeba-test.eng-rif.eng.rif	1.6	0.072
Tatoeba-test.eng-shy.eng.shy	0.8	0.066
Tatoeba-test.eng-som.eng.som	0.0	0.294
Tatoeba-test.eng-tir.eng.tir	2.4	0.233

opus-2020-07-26.zip

dataset: opus
model: transformer
source language(s): eng
target language(s): acm afb amh apc ara arq ary arz hau_Latn heb kab mlt rif_Latn shy_Latn som tir
model: transformer
pre-processing: normalization + SentencePiece (spm32k,spm32k)
a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
download: opus-2020-07-26.zip
test set translations: opus-2020-07-26.test.txt
test set scores: opus-2020-07-26.eval.txt

Benchmarks

testset	BLEU	chr-F
Tatoeba-test.eng-amh.eng.amh	10.6	0.513
Tatoeba-test.eng-ara.eng.ara	11.2	0.397
Tatoeba-test.eng-hau.eng.hau	8.2	0.429
Tatoeba-test.eng-heb.eng.heb	31.3	0.541
Tatoeba-test.eng-kab.eng.kab	1.2	0.175
Tatoeba-test.eng-mlt.eng.mlt	17.0	0.532
Tatoeba-test.eng.multi	13.7	0.363
Tatoeba-test.eng-rif.eng.rif	1.5	0.109
Tatoeba-test.eng-shy.eng.shy	0.7	0.093
Tatoeba-test.eng-som.eng.som	16.0	0.272
Tatoeba-test.eng-tir.eng.tir	2.6	0.238

opus2m-2020-08-01.zip

dataset: opus2m
model: transformer
source language(s): eng
target language(s): acm afb amh apc ara arq ary arz hau_Latn heb kab mlt rif_Latn shy_Latn som tir
model: transformer
pre-processing: normalization + SentencePiece (spm32k,spm32k)
a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
download: opus2m-2020-08-01.zip
test set translations: opus2m-2020-08-01.test.txt
test set scores: opus2m-2020-08-01.eval.txt

Benchmarks

testset	BLEU	chr-F
Tatoeba-test.eng-amh.eng.amh	11.6	0.504
Tatoeba-test.eng-ara.eng.ara	12.0	0.404
Tatoeba-test.eng-hau.eng.hau	10.2	0.429
Tatoeba-test.eng-heb.eng.heb	32.3	0.551
Tatoeba-test.eng-kab.eng.kab	1.6	0.191
Tatoeba-test.eng-mlt.eng.mlt	17.7	0.551
Tatoeba-test.eng.multi	14.4	0.375
Tatoeba-test.eng-rif.eng.rif	1.7	0.103
Tatoeba-test.eng-shy.eng.shy	0.8	0.090
Tatoeba-test.eng-som.eng.som	16.0	0.429
Tatoeba-test.eng-tir.eng.tir	2.7	0.238