Skip to content

Latest commit

 

History

History

eng-inc

opus-2020-06-28.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): asm awa ben bho gom guj hif_Latn hin mai mar npi ori pan_Guru pnb rom sin snd_Arab urd
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-06-28.zip
  • test set translations: opus-2020-06-28.test.txt
  • test set scores: opus-2020-06-28.eval.txt

Benchmarks

testset BLEU chr-F
Tatoeba-test.eng-asm.eng.asm 3.0 0.245
Tatoeba-test.eng-awa.eng.awa 0.4 0.098
Tatoeba-test.eng-ben.eng.ben 16.5 0.481
Tatoeba-test.eng-bho.eng.bho 0.8 0.110
Tatoeba-test.eng-guj.eng.guj 19.9 0.393
Tatoeba-test.eng-hif.eng.hif 0.5 0.022
Tatoeba-test.eng-hin.eng.hin 17.4 0.463
Tatoeba-test.eng-kok.eng.kok 8.1 0.006
Tatoeba-test.eng-lah.eng.lah 0.2 0.001
Tatoeba-test.eng-mai.eng.mai 7.6 0.374
Tatoeba-test.eng-mar.eng.mar 20.4 0.464
Tatoeba-test.eng.multi 17.0 0.442
Tatoeba-test.eng-nep.eng.nep 1.0 0.102
Tatoeba-test.eng-ori.eng.ori 2.2 0.198
Tatoeba-test.eng-pan.eng.pan 8.4 0.343
Tatoeba-test.eng-rom.eng.rom 0.3 0.185
Tatoeba-test.eng-sin.eng.sin 9.5 0.368
Tatoeba-test.eng-snd.eng.snd 6.8 0.343
Tatoeba-test.eng-urd.eng.urd 12.5 0.414

opus-2020-07-06.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): asm awa ben bho gom guj hif_Latn hin mai mar npi ori pan_Guru pnb rom san_Deva sin snd_Arab urd
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-07-06.zip
  • test set translations: opus-2020-07-06.test.txt
  • test set scores: opus-2020-07-06.eval.txt

Benchmarks

testset BLEU chr-F
Tatoeba-test.eng-asm.eng.asm 3.6 0.277
Tatoeba-test.eng-awa.eng.awa 0.4 0.144
Tatoeba-test.eng-ben.eng.ben 15.9 0.466
Tatoeba-test.eng-bho.eng.bho 0.6 0.152
Tatoeba-test.eng-guj.eng.guj 20.9 0.380
Tatoeba-test.eng-hif.eng.hif 0.6 0.032
Tatoeba-test.eng-hin.eng.hin 17.2 0.461
Tatoeba-test.eng-kok.eng.kok 3.3 0.022
Tatoeba-test.eng-lah.eng.lah 0.3 0.007
Tatoeba-test.eng-mai.eng.mai 8.9 0.392
Tatoeba-test.eng-mar.eng.mar 20.1 0.463
Tatoeba-test.eng.multi 16.8 0.439
Tatoeba-test.eng-nep.eng.nep 0.6 0.058
Tatoeba-test.eng-ori.eng.ori 2.2 0.187
Tatoeba-test.eng-pan.eng.pan 9.6 0.351
Tatoeba-test.eng-rom.eng.rom 0.4 0.188
Tatoeba-test.eng-san.eng.san 1.5 0.111
Tatoeba-test.eng-sin.eng.sin 9.1 0.370
Tatoeba-test.eng-snd.eng.snd 1.9 0.235
Tatoeba-test.eng-urd.eng.urd 12.7 0.412

opus-2020-07-26.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): asm awa ben bho gom guj hif_Latn hin mai mar npi ori pan_Guru pnb rom san_Deva sin snd_Arab urd
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-07-26.zip
  • test set translations: opus-2020-07-26.test.txt
  • test set scores: opus-2020-07-26.eval.txt

Benchmarks

testset BLEU chr-F
newsdev2014-enghin.eng.hin 7.5 0.337
newsdev2019-engu-engguj.eng.guj 6.3 0.282
newstest2014-hien-enghin.eng.hin 11.0 0.358
newstest2019-engu-engguj.eng.guj 7.1 0.291
Tatoeba-test.eng-asm.eng.asm 3.7 0.260
Tatoeba-test.eng-awa.eng.awa 0.4 0.144
Tatoeba-test.eng-ben.eng.ben 16.0 0.466
Tatoeba-test.eng-bho.eng.bho 0.6 0.143
Tatoeba-test.eng-guj.eng.guj 20.2 0.375
Tatoeba-test.eng-hif.eng.hif 0.5 0.040
Tatoeba-test.eng-hin.eng.hin 17.3 0.462
Tatoeba-test.eng-kok.eng.kok 3.3 0.044
Tatoeba-test.eng-lah.eng.lah 0.2 0.005
Tatoeba-test.eng-mai.eng.mai 9.3 0.385
Tatoeba-test.eng-mar.eng.mar 19.9 0.461
Tatoeba-test.eng.multi 16.6 0.436
Tatoeba-test.eng-nep.eng.nep 0.7 0.067
Tatoeba-test.eng-ori.eng.ori 2.2 0.196
Tatoeba-test.eng-pan.eng.pan 7.0 0.342
Tatoeba-test.eng-rom.eng.rom 0.4 0.187
Tatoeba-test.eng-san.eng.san 1.7 0.109
Tatoeba-test.eng-sin.eng.sin 9.1 0.365
Tatoeba-test.eng-snd.eng.snd 5.6 0.343
Tatoeba-test.eng-urd.eng.urd 12.9 0.411

opus2m-2020-08-01.zip

  • dataset: opus2m
  • model: transformer
  • source language(s): eng
  • target language(s): asm awa ben bho gom guj hif_Latn hin mai mar npi ori pan_Guru pnb rom san_Deva sin snd_Arab urd
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus2m-2020-08-01.zip
  • test set translations: opus2m-2020-08-01.test.txt
  • test set scores: opus2m-2020-08-01.eval.txt

Benchmarks

testset BLEU chr-F
newsdev2014-enghin.eng.hin 8.2 0.342
newsdev2019-engu-engguj.eng.guj 6.5 0.293
newstest2014-hien-enghin.eng.hin 11.4 0.364
newstest2019-engu-engguj.eng.guj 7.2 0.296
Tatoeba-test.eng-asm.eng.asm 2.7 0.277
Tatoeba-test.eng-awa.eng.awa 0.5 0.132
Tatoeba-test.eng-ben.eng.ben 16.7 0.470
Tatoeba-test.eng-bho.eng.bho 4.3 0.227
Tatoeba-test.eng-guj.eng.guj 17.5 0.373
Tatoeba-test.eng-hif.eng.hif 0.6 0.028
Tatoeba-test.eng-hin.eng.hin 17.7 0.469
Tatoeba-test.eng-kok.eng.kok 1.7 0.000
Tatoeba-test.eng-lah.eng.lah 0.3 0.028
Tatoeba-test.eng-mai.eng.mai 15.6 0.429
Tatoeba-test.eng-mar.eng.mar 21.3 0.477
Tatoeba-test.eng.multi 17.3 0.448
Tatoeba-test.eng-nep.eng.nep 0.8 0.081
Tatoeba-test.eng-ori.eng.ori 2.2 0.208
Tatoeba-test.eng-pan.eng.pan 8.0 0.347
Tatoeba-test.eng-rom.eng.rom 0.4 0.197
Tatoeba-test.eng-san.eng.san 0.5 0.108
Tatoeba-test.eng-sin.eng.sin 9.1 0.364
Tatoeba-test.eng-snd.eng.snd 4.4 0.284
Tatoeba-test.eng-urd.eng.urd 13.3 0.423

opus1m+bt-2021-04-13.zip

  • dataset: opus1m+bt
  • model: transformer-align
  • source language(s): eng
  • target language(s): asm awa ben bho dty gbm gom guj hif hin mai mar nep npi ori pan pnb rmn rmy rom san sin snd urd
  • model: transformer-align
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • valid language labels: >>aee<< >>aeq<< >>anp<< >>anr<< >>asm<< >>awa<< >>bdv<< >>ben<< >>ben_Cyrl<< >>ben_Deva<< >>ben_Gujr<< >>bfb<< >>bfy<< >>bfz<< >>bgc<< >>bgd<< >>bge<< >>bgq<< >>bgw<< >>bha<< >>bhb<< >>bhd<< >>bhe<< >>bhi<< >>bho<< >>bht<< >>bhu<< >>bjj<< >>bkk<< >>bmj<< >>bns<< >>bpx<< >>bpy<< >>bra<< >>btv<< >>ccp<< >>cdh<< >>cdi<< >>cdj<< >>cih<< >>clh<< >>ctg<< >>dcc<< >>dgo<< >>dhd<< >>dhn<< >>dho<< >>div<< >>dmk<< >>dml<< >>doi<< >>dry<< >>dty<< >>dub<< >>duh<< >>dwz<< >>emx<< >>gas<< >>gbk<< >>gbl<< >>gbm<< >>gda<< >>gdx<< >>ggg<< >>ghr<< >>gig<< >>gjk<< >>gju<< >>glh<< >>gom<< >>gra<< >>guj<< >>gwc<< >>gwf<< >>gwt<< >>haj<< >>hca<< >>hif<< >>hif_Latn<< >>hii<< >>hin<< >>hlb<< >>hnd<< >>hne<< >>hno<< >>hns<< >>hoj<< >>jat<< >>jdg<< >>jml<< >>jnd<< >>jns<< >>kas<< >>kbu<< >>keq<< >>key<< >>kfr<< >>kfs<< >>kft<< >>kfu<< >>kfv<< >>kfx<< >>kfy<< >>khn<< >>khw<< >>kjo<< >>kls<< >>knn<< >>kok<< >>kra<< >>ksy<< >>kvx<< >>kxp<< >>kyw<< >>lah<< >>lbm<< >>lhl<< >>lmn<< >>lss<< >>luv<< >>mag<< >>mai<< >>mar<< >>mby<< >>mjl<< >>mjz<< >>mkb<< >>mke<< >>mki<< >>mtr<< >>mup<< >>mve<< >>mvy<< >>mwr<< >>nag<< >>nep<< >>nhh<< >>nli<< >>nlx<< >>noe<< >>noi<< >>npi<< >>odk<< >>omr<< >>ori<< >>ort<< >>ory<< >>pan<< >>pan_Guru<< >>paq<< >>pcl<< >>pgg<< >>phd<< >>phl<< >>phr<< >>pli<< >>plk<< >>plp<< >>pmh<< >>pmu<< >>pnb<< >>pnb_Guru<< >>psh<< >>psi<< >>psu<< >>pwr<< >>qpp<< >>raj<< >>rei<< >>rhg<< >>rjs<< >>rkt<< >>rmc<< >>rmf<< >>rmi<< >>rml<< >>rmn<< >>rmo<< >>rmq<< >>rmt<< >>rmw<< >>rmy<< >>rom<< >>rtw<< >>rwr<< >>san<< >>san_Deva<< >>saz<< >>sbn<< >>sck<< >>scl<< >>sdg<< >>sdr<< >>shd<< >>sin<< >>sjp<< >>skr<< >>smm<< >>smv<< >>snd<< >>snd_Arab<< >>soi<< >>spv<< >>srx<< >>ssi<< >>sts<< >>swv<< >>syl<< >>tdb<< >>the<< >>thl<< >>thq<< >>thr<< >>tkb<< >>tkt<< >>tnv<< >>tra<< >>trw<< >>urd<< >>ush<< >>vaa<< >>vah<< >>vas<< >>vav<< >>ved<< >>vgr<< >>wbr<< >>wry<< >>wsv<< >>wtm<< >>xhe<< >>xka<< >>xnr<<
  • download: opus1m+bt-2021-04-13.zip
  • test set translations: opus1m+bt-2021-04-13.test.txt
  • test set scores: opus1m+bt-2021-04-13.eval.txt

Benchmarks

testset BLEU chr-F #sent #words BP
newsdev2014.eng-hin 8.4 0.363 520 9538 1.000
newsdev2019-engu.eng-guj 7.6 0.312 1998 39137 0.810
newstest2014-hien.eng-hin 12.0 0.384 2507 60878 1.000
newstest2019-engu.eng-guj 7.9 0.320 998 21927 0.806
Tatoeba-test.eng-asm 3.5 0.256 117 569 1.000
Tatoeba-test.eng-awa 0.4 0.084 279 1148 1.000
Tatoeba-test.eng-ben 9.9 0.446 2500 11654 1.000
Tatoeba-test.eng-bho 2.0 0.246 42 244 1.000
Tatoeba-test.eng-gbm 0.3 0.075 39 153 1.000
Tatoeba-test.eng-guj 20.8 0.418 154 824 1.000
Tatoeba-test.eng-hif 0.7 0.038 36 231 1.000
Tatoeba-test.eng-hin 17.0 0.466 5000 32904 1.000
Tatoeba-test.eng-kok 8.1 0.005 1 6 1.000
Tatoeba-test.eng-lah 0.2 0.018 32 182 1.000
Tatoeba-test.eng-mai 7.8 0.304 8 19 1.000
Tatoeba-test.eng-mar 22.1 0.504 10000 58667 0.985
Tatoeba-test.eng-multi 16.3 0.451 10000 59570 1.000
Tatoeba-test.eng-nep 0.7 0.104 115 413 1.000
Tatoeba-test.eng-ori 0.3 0.003 33 205 1.000
Tatoeba-test.eng-pan 6.2 0.312 87 603 1.000
Tatoeba-test.eng-rom 2.9 0.253 671 4974 1.000
Tatoeba-test.eng-san 1.0 0.107 144 389 1.000
Tatoeba-test.eng-sin 8.1 0.350 45 234 1.000
Tatoeba-test.eng-snd 6.5 0.334 4 18 1.000
Tatoeba-test.eng-urd 10.5 0.390 1663 12154 1.000
tico19-test.eng-ben 6.7 0.376 2100 51751 1.000
tico19-test.eng-hin 18.1 0.432 2100 62738 1.000
tico19-test.eng-mar 7.5 0.364 2100 50881 0.844
tico19-test.eng-nep 7.6 0.407 2100 48706 1.000
tico19-test.eng-urd 8.4 0.329 2100 65363 0.943