Skip to content

Latest commit

 

History

History

eng-zlw

opus-2020-06-28.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): ces csb_Latn pol
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-06-28.zip
  • test set translations: opus-2020-06-28.test.txt
  • test set scores: opus-2020-06-28.eval.txt

Benchmarks

testset BLEU chr-F
newssyscomb2009-engces.eng.ces 19.6 0.478
news-test2008-engces.eng.ces 16.9 0.453
newstest2009-engces.eng.ces 17.8 0.468
newstest2010-engces.eng.ces 18.1 0.472
newstest2011-engces.eng.ces 19.4 0.474
newstest2012-engces.eng.ces 17.4 0.454
newstest2013-engces.eng.ces 20.5 0.480
newstest2015-encs-engces.eng.ces 20.3 0.485
newstest2016-encs-engces.eng.ces 22.9 0.505
newstest2017-encs-engces.eng.ces 18.4 0.464
newstest2018-encs-engces.eng.ces 18.0 0.466
newstest2019-encs-engces.eng.ces 19.4 0.474
Tatoeba-test.eng-ces.eng.ces 41.8 0.615
Tatoeba-test.eng-csb.eng.csb 1.4 0.190
Tatoeba-test.eng.multi 41.3 0.619
Tatoeba-test.eng-pol.eng.pol 40.6 0.623

opus-2020-07-27.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): ces csb_Latn dsb hsb pol
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-07-27.zip
  • test set translations: opus-2020-07-27.test.txt
  • test set scores: opus-2020-07-27.eval.txt

Benchmarks

testset BLEU chr-F
newssyscomb2009-engces.eng.ces 19.8 0.480
news-test2008-engces.eng.ces 17.1 0.453
newstest2009-engces.eng.ces 17.9 0.470
newstest2010-engces.eng.ces 18.3 0.474
newstest2011-engces.eng.ces 19.1 0.474
newstest2012-engces.eng.ces 17.4 0.452
newstest2013-engces.eng.ces 20.1 0.478
newstest2015-encs-engces.eng.ces 19.8 0.485
newstest2016-encs-engces.eng.ces 22.8 0.504
newstest2017-encs-engces.eng.ces 18.6 0.465
newstest2018-encs-engces.eng.ces 18.1 0.467
newstest2019-encs-engces.eng.ces 19.3 0.472
Tatoeba-test.eng-ces.eng.ces 41.5 0.614
Tatoeba-test.eng-csb.eng.csb 3.1 0.207
Tatoeba-test.eng-dsb.eng.dsb 1.8 0.157
Tatoeba-test.eng-hsb.eng.hsb 4.6 0.186
Tatoeba-test.eng.multi 40.9 0.616
Tatoeba-test.eng-pol.eng.pol 40.8 0.623

opus2m-2020-08-02.zip

  • dataset: opus2m
  • model: transformer
  • source language(s): eng
  • target language(s): ces csb_Latn dsb hsb pol
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus2m-2020-08-02.zip
  • test set translations: opus2m-2020-08-02.test.txt
  • test set scores: opus2m-2020-08-02.eval.txt

Benchmarks

testset BLEU chr-F
newssyscomb2009-engces.eng.ces 20.6 0.488
news-test2008-engces.eng.ces 18.3 0.466
newstest2009-engces.eng.ces 19.8 0.483
newstest2010-engces.eng.ces 19.8 0.486
newstest2011-engces.eng.ces 20.6 0.489
newstest2012-engces.eng.ces 18.6 0.464
newstest2013-engces.eng.ces 22.3 0.495
newstest2015-encs-engces.eng.ces 21.7 0.502
newstest2016-encs-engces.eng.ces 24.5 0.521
newstest2017-encs-engces.eng.ces 20.1 0.480
newstest2018-encs-engces.eng.ces 19.9 0.483
newstest2019-encs-engces.eng.ces 21.2 0.490
Tatoeba-test.eng-ces.eng.ces 43.7 0.632
Tatoeba-test.eng-csb.eng.csb 1.2 0.188
Tatoeba-test.eng-dsb.eng.dsb 1.5 0.167
Tatoeba-test.eng-hsb.eng.hsb 5.7 0.199
Tatoeba-test.eng.multi 42.8 0.632
Tatoeba-test.eng-pol.eng.pol 43.2 0.641

opus1m+bt-2021-04-10.zip

  • dataset: opus1m+bt
  • model: transformer-align
  • source language(s): eng
  • target language(s): ces csb dsb hsb pol
  • model: transformer-align
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • valid language labels: >>ces<< >>csb<< >>csb_Latn<< >>czk<< >>dsb<< >>hsb<< >>pol<< >>pox<< >>slk<< >>szl<<
  • download: opus1m+bt-2021-04-10.zip
  • test set translations: opus1m+bt-2021-04-10.test.txt
  • test set scores: opus1m+bt-2021-04-10.eval.txt

Benchmarks

testset BLEU chr-F #sent #words BP
newssyscomb2009.eng-ces 19.7 0.476 502 10032 0.976
news-test2008.eng-ces 16.4 0.448 2051 42484 0.978
newstest2009.eng-ces 17.3 0.462 2525 55533 0.981
newstest2010.eng-ces 17.6 0.466 2489 52958 0.979
newstest2011.eng-ces 19.0 0.472 3003 65653 0.950
newstest2012.eng-ces 16.8 0.446 3003 65456 0.934
newstest2013.eng-ces 20.0 0.475 3000 57250 0.955
newstest2015-encs.eng-ces 19.6 0.481 2656 45931 1.000
newstest2016-encs.eng-ces 22.1 0.498 2999 57013 0.985
newstest2017-encs.eng-ces 18.0 0.460 3005 54461 0.970
newstest2018-encs.eng-ces 17.7 0.462 2983 54772 0.992
newstest2019-encs.eng-ces 18.7 0.469 1997 43373 0.971
Tatoeba-test.eng-ces 39.9 0.601 10000 65287 0.983
Tatoeba-test.eng-csb 6.0 0.208 27 243 0.811
Tatoeba-test.eng-dsb 22.5 0.394 34 184 1.000
Tatoeba-test.eng-hsb 30.6 0.458 40 207 1.000
Tatoeba-test.eng-multi 39.4 0.606 10000 65263 0.970
Tatoeba-test.eng-pol 40.0 0.618 10000 64899 0.959

opus4m+btTCv20210807-2021-09-30.zip

  • dataset: opus4m+btTCv20210807
  • model: transformer
  • source language(s): eng
  • target language(s): ces csb dsb hsb pol slk szl
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • valid language labels: >>ces<< >>csb<< >>csb_Latn<< >>czk<< >>dsb<< >>hsb<< >>pol<< >>pox<< >>slk<< >>szl<<
  • download: opus4m+btTCv20210807-2021-09-30.zip
  • test set translations: opus4m+btTCv20210807-2021-09-30.test.txt
  • test set scores: opus4m+btTCv20210807-2021-09-30.eval.txt

Benchmarks

testset BLEU chr-F #sent #words BP
newssyscomb2009.eng-ces 20.6 0.489 502 10032 0.980
news-test2008.eng-ces 18.1 0.464 2051 42484 0.983
newstest2009.eng-ces 18.9 0.478 2525 55533 0.982
newstest2010.eng-ces 19.6 0.486 2489 52958 0.986
newstest2011.eng-ces 20.8 0.489 3003 65653 0.956
newstest2012.eng-ces 18.2 0.462 3003 65456 0.935
newstest2013.eng-ces 22.0 0.494 3000 57250 0.961
newstest2015-encs.eng-ces 21.0 0.495 2656 45931 1.000
newstest2016-encs.eng-ces 24.5 0.518 2999 57013 0.991
newstest2017-encs.eng-ces 19.6 0.476 3005 54461 0.976
newstest2018-encs.eng-ces 19.4 0.480 2983 54772 1.000
newstest2019-encs.eng-ces 20.8 0.488 1997 43373 0.980
Tatoeba-test-v2021-08-07.eng-multi 39.1 0.602 10000 65766 0.987
Tatoeba-test-v2021-08-07.multi-multi 39.1 0.602 10000 65766 0.987