Skip to content

Latest commit

 

History

History

eng-gmw

opus-2020-06-28.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): afr ang_Latn deu enm_Latn frr fry gos gsw ksh ltz nds nld pdc sco stq swg yid
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-06-28.zip
  • test set translations: opus-2020-06-28.test.txt
  • test set scores: opus-2020-06-28.eval.txt

Benchmarks

testset BLEU chr-F
Tatoeba-test.eng-afr.eng.afr 56.0 0.739
Tatoeba-test.eng-ang.eng.ang 5.6 0.146
Tatoeba-test.eng-deu.eng.deu 39.2 0.605
Tatoeba-test.eng-enm.eng.enm 1.8 0.249
Tatoeba-test.eng-frr.eng.frr 8.4 0.250
Tatoeba-test.eng-fry.eng.fry 16.8 0.412
Tatoeba-test.eng-gos.eng.gos 2.6 0.258
Tatoeba-test.eng-gsw.eng.gsw 1.5 0.220
Tatoeba-test.eng-ksh.eng.ksh 1.4 0.225
Tatoeba-test.eng-ltz.eng.ltz 18.0 0.341
Tatoeba-test.eng.multi 39.5 0.593
Tatoeba-test.eng-nds.eng.nds 18.7 0.434
Tatoeba-test.eng-nld.eng.nld 52.4 0.694
Tatoeba-test.eng-pdc.eng.pdc 10.0 0.291
Tatoeba-test.eng-sco.eng.sco 31.6 0.519
Tatoeba-test.eng-stq.eng.stq 9.5 0.407
Tatoeba-test.eng-swg.eng.swg 7.7 0.276
Tatoeba-test.eng-yid.eng.yid 6.6 0.291

opus-2020-07-26.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): afr ang_Latn deu enm_Latn frr fry gos gsw ksh ltz nds nld pdc sco stq swg yid
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-07-26.zip
  • test set translations: opus-2020-07-26.test.txt
  • test set scores: opus-2020-07-26.eval.txt

Benchmarks

testset BLEU chr-F
newssyscomb2009-engdeu.eng.deu 20.2 0.512
news-test2008-engdeu.eng.deu 20.5 0.504
newstest2009-engdeu.eng.deu 19.9 0.508
newstest2010-engdeu.eng.deu 22.1 0.520
newstest2011-engdeu.eng.deu 20.0 0.503
newstest2012-engdeu.eng.deu 20.2 0.500
newstest2013-engdeu.eng.deu 23.8 0.527
newstest2015-ende-engdeu.eng.deu 27.0 0.560
newstest2016-ende-engdeu.eng.deu 31.9 0.595
newstest2017-ende-engdeu.eng.deu 25.9 0.551
newstest2018-ende-engdeu.eng.deu 37.8 0.635
newstest2019-ende-engdeu.eng.deu 33.8 0.602
Tatoeba-test.eng-afr.eng.afr 55.0 0.732
Tatoeba-test.eng-ang.eng.ang 6.3 0.161
Tatoeba-test.eng-deu.eng.deu 39.3 0.605
Tatoeba-test.eng-enm.eng.enm 2.0 0.241
Tatoeba-test.eng-frr.eng.frr 8.4 0.249
Tatoeba-test.eng-fry.eng.fry 16.5 0.405
Tatoeba-test.eng-gos.eng.gos 2.5 0.291
Tatoeba-test.eng-gsw.eng.gsw 1.4 0.179
Tatoeba-test.eng-ksh.eng.ksh 1.4 0.184
Tatoeba-test.eng-ltz.eng.ltz 17.1 0.336
Tatoeba-test.eng.multi 40.6 0.599
Tatoeba-test.eng-nds.eng.nds 19.1 0.431
Tatoeba-test.eng-nld.eng.nld 52.3 0.692
Tatoeba-test.eng-pdc.eng.pdc 5.1 0.244
Tatoeba-test.eng-sco.eng.sco 27.6 0.481
Tatoeba-test.eng-stq.eng.stq 5.6 0.384
Tatoeba-test.eng-swg.eng.swg 5.9 0.234
Tatoeba-test.eng-yid.eng.yid 7.1 0.290

opus2m-2020-08-01.zip

  • dataset: opus2m
  • model: transformer
  • source language(s): eng
  • target language(s): afr ang_Latn deu enm_Latn frr fry gos gsw ksh ltz nds nld pdc sco stq swg yid
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus2m-2020-08-01.zip
  • test set translations: opus2m-2020-08-01.test.txt
  • test set scores: opus2m-2020-08-01.eval.txt

Benchmarks

testset BLEU chr-F
newssyscomb2009-engdeu.eng.deu 21.4 0.518
news-test2008-engdeu.eng.deu 21.0 0.510
newstest2009-engdeu.eng.deu 20.4 0.513
newstest2010-engdeu.eng.deu 22.9 0.528
newstest2011-engdeu.eng.deu 20.5 0.508
newstest2012-engdeu.eng.deu 21.0 0.507
newstest2013-engdeu.eng.deu 24.7 0.533
newstest2015-ende-engdeu.eng.deu 28.2 0.568
newstest2016-ende-engdeu.eng.deu 33.3 0.605
newstest2017-ende-engdeu.eng.deu 26.5 0.559
newstest2018-ende-engdeu.eng.deu 39.9 0.649
newstest2019-ende-engdeu.eng.deu 35.9 0.616
Tatoeba-test.eng-afr.eng.afr 55.7 0.740
Tatoeba-test.eng-ang.eng.ang 6.5 0.164
Tatoeba-test.eng-deu.eng.deu 40.4 0.614
Tatoeba-test.eng-enm.eng.enm 2.3 0.254
Tatoeba-test.eng-frr.eng.frr 8.4 0.248
Tatoeba-test.eng-fry.eng.fry 17.9 0.424
Tatoeba-test.eng-gos.eng.gos 2.2 0.309
Tatoeba-test.eng-gsw.eng.gsw 1.6 0.186
Tatoeba-test.eng-ksh.eng.ksh 1.5 0.189
Tatoeba-test.eng-ltz.eng.ltz 20.2 0.383
Tatoeba-test.eng.multi 41.6 0.609
Tatoeba-test.eng-nds.eng.nds 18.9 0.437
Tatoeba-test.eng-nld.eng.nld 53.1 0.699
Tatoeba-test.eng-pdc.eng.pdc 7.7 0.262
Tatoeba-test.eng-sco.eng.sco 37.7 0.557
Tatoeba-test.eng-stq.eng.stq 5.9 0.380
Tatoeba-test.eng-swg.eng.swg 6.2 0.236
Tatoeba-test.eng-yid.eng.yid 6.8 0.296

opus1m+bt-2021-04-10.zip

  • dataset: opus1m+bt
  • model: transformer-align
  • source language(s): eng
  • target language(s): afr ang bar deu enm frr fry gos gsw hrx jam ksh ltz nds nld pdc sco stq swg tpi yid
  • model: transformer-align
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • valid language labels: >>act<< >>afr<< >>afs<< >>aig<< >>ang<< >>ang_Latn<< >>bah<< >>bar<< >>bis<< >>bjs<< >>brc<< >>bzj<< >>bzk<< >>cim<< >>dcr<< >>deu<< >>djk<< >>drt<< >>dum<< >>eng<< >>enm<< >>enm_Latn<< >>fpe<< >>frk<< >>frr<< >>fry<< >>gcl<< >>gct<< >>geh<< >>gmh<< >>gml<< >>goh<< >>gos<< >>gpe<< >>gsw<< >>gul<< >>gyn<< >>hrx<< >>hrx_Latn<< >>hwc<< >>icr<< >>jam<< >>jvd<< >>kri<< >>ksh<< >>kww<< >>lim<< >>lng<< >>ltz<< >>mhn<< >>nds<< >>nld<< >>odt<< >>ofs<< >>oor<< >>osx<< >>pcm<< >>pdc<< >>pdt<< >>pey<< >>pfl<< >>pih<< >>pis<< >>qlm<< >>rop<< >>sco<< >>sdz<< >>skw<< >>sli<< >>srm<< >>srn<< >>stl<< >>stq<< >>svc<< >>swg<< >>sxu<< >>tch<< >>tcs<< >>tgh<< >>tpi<< >>trf<< >>twd<< >>uln<< >>vel<< >>vic<< >>vls<< >>vmf<< >>wae<< >>wep<< >>wes<< >>wym<< >>ydd<< >>yec<< >>yid<< >>yih<< >>zea<<
  • download: opus1m+bt-2021-04-10.zip
  • test set translations: opus1m+bt-2021-04-10.test.txt
  • test set scores: opus1m+bt-2021-04-10.eval.txt

Benchmarks

testset BLEU chr-F #sent #words BP
newssyscomb2009.eng-deu 19.6 0.509 502 11271 0.987
news-test2008.eng-deu 19.6 0.496 2051 47427 0.991
newstest2009.eng-deu 19.3 0.504 2525 62816 0.994
newstest2010.eng-deu 21.0 0.512 2489 61511 0.957
newstest2011.eng-deu 19.0 0.496 3003 72981 0.981
newstest2012.eng-deu 19.4 0.494 3003 72886 0.959
newstest2013.eng-deu 22.8 0.520 3000 63737 0.973
newstest2014-deen.eng-deu 22.7 0.539 3003 62964 1.000
newstest2015-ende.eng-deu 25.9 0.551 2169 44260 1.000
newstest2016-ende.eng-deu 30.8 0.588 2999 62670 0.987
newstest2017-ende.eng-deu 24.7 0.542 3004 61291 1.000
newstest2018-ende.eng-deu 35.8 0.623 2998 64276 0.999
newstest2019-ende.eng-deu 31.8 0.588 1997 48969 0.970
Tatoeba-test.eng-afr 54.9 0.731 1374 10314 0.977
Tatoeba-test.eng-ang 6.2 0.144 189 1967 0.986
Tatoeba-test.eng-bar 1.1 0.185 93 807 0.890
Tatoeba-test.eng-deu 36.4 0.585 10000 83334 0.991
Tatoeba-test.eng-enm 1.0 0.223 49 299 1.000
Tatoeba-test.eng-frr 8.4 0.059 2 9 0.882
Tatoeba-test.eng-fry 24.4 0.491 205 1529 0.978
Tatoeba-test.eng-gos 0.6 0.187 1152 5513 1.000
Tatoeba-test.eng-gsw 1.7 0.265 205 984 1.000
Tatoeba-test.eng-hrx 5.3 0.285 221 1297 1.000
Tatoeba-test.eng-jam 2.6 0.139 35 148 1.000
Tatoeba-test.eng-ksh 1.8 0.219 26 208 1.000
Tatoeba-test.eng-ltz 37.9 0.565 283 1733 1.000
Tatoeba-test.eng-multi 38.4 0.583 10000 74720 0.993
Tatoeba-test.eng-nds 21.4 0.454 2500 18258 0.983
Tatoeba-test.eng-nld 49.9 0.677 10000 71423 0.977
Tatoeba-test.eng-pdc 25.5 0.463 53 386 0.976
Tatoeba-test.eng-sco 41.0 0.602 27 214 1.000
Tatoeba-test.eng-stq 22.6 0.419 5 32 1.000
Tatoeba-test.eng-swg 8.5 0.296 33 259 1.000
Tatoeba-test.eng-tpi 26.8 0.495 49 257 1.000
Tatoeba-test.eng-yid 7.0 0.309 1168 8087 0.978