Skip to content

Latest commit

 

History

History

eng-iir

opus-2020-07-14.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): asm awa ben bho gom guj hif_Latn hin jdt_Cyrl kur_Arab kur_Latn mai mar npi ori oss pan_Guru pes pes_Latn pes_Thaa pnb pus rom san_Deva sin snd_Arab tgk_Cyrl tly_Latn urd zza
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-07-14.zip
  • test set translations: opus-2020-07-14.test.txt
  • test set scores: opus-2020-07-14.eval.txt

Benchmarks

testset BLEU chr-F
Tatoeba-test.eng-asm.eng.asm 1.7 0.181
Tatoeba-test.eng-awa.eng.awa 0.2 0.041
Tatoeba-test.eng-ben.eng.ben 14.6 0.440
Tatoeba-test.eng-bho.eng.bho 0.4 0.101
Tatoeba-test.eng-fas.eng.fas 2.9 0.216
Tatoeba-test.eng-guj.eng.guj 14.8 0.346
Tatoeba-test.eng-hif.eng.hif 1.1 0.090
Tatoeba-test.eng-hin.eng.hin 16.1 0.445
Tatoeba-test.eng-jdt.eng.jdt 8.0 0.016
Tatoeba-test.eng-kok.eng.kok 4.1 0.006
Tatoeba-test.eng-kur.eng.kur 3.8 0.118
Tatoeba-test.eng-lah.eng.lah 0.4 0.033
Tatoeba-test.eng-mai.eng.mai 10.9 0.398
Tatoeba-test.eng-mar.eng.mar 18.6 0.445
Tatoeba-test.eng.multi 12.7 0.374
Tatoeba-test.eng-nep.eng.nep 0.7 0.028
Tatoeba-test.eng-ori.eng.ori 1.4 0.185
Tatoeba-test.eng-oss.eng.oss 2.1 0.203
Tatoeba-test.eng-pan.eng.pan 5.3 0.322
Tatoeba-test.eng-pus.eng.pus 0.4 0.109
Tatoeba-test.eng-rom.eng.rom 0.9 0.213
Tatoeba-test.eng-san.eng.san 0.9 0.093
Tatoeba-test.eng-sin.eng.sin 10.8 0.370
Tatoeba-test.eng-snd.eng.snd 2.4 0.251
Tatoeba-test.eng-tgk.eng.tgk 6.5 0.328
Tatoeba-test.eng-tly.eng.tly 0.6 0.018
Tatoeba-test.eng-urd.eng.urd 10.9 0.387
Tatoeba-test.eng-zza.eng.zza 0.6 0.033

opus-2020-07-19.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): asm awa ben bho gom guj hif_Latn hin jdt_Cyrl kur_Arab kur_Latn mai mar npi ori oss pan_Guru pes pes_Latn pes_Thaa pnb pus rom san_Deva sin snd_Arab tgk_Cyrl tly_Latn urd zza
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-07-19.zip
  • test set translations: opus-2020-07-19.test.txt
  • test set scores: opus-2020-07-19.eval.txt

Benchmarks

testset BLEU chr-F
Tatoeba-test.eng-asm.eng.asm 1.7 0.194
Tatoeba-test.eng-awa.eng.awa 0.2 0.031
Tatoeba-test.eng-ben.eng.ben 14.9 0.447
Tatoeba-test.eng-bho.eng.bho 0.4 0.110
Tatoeba-test.eng-fas.eng.fas 3.3 0.219
Tatoeba-test.eng-guj.eng.guj 17.3 0.366
Tatoeba-test.eng-hif.eng.hif 1.0 0.079
Tatoeba-test.eng-hin.eng.hin 16.5 0.451
Tatoeba-test.eng-jdt.eng.jdt 9.5 0.136
Tatoeba-test.eng-kok.eng.kok 8.1 0.040
Tatoeba-test.eng-kur.eng.kur 3.1 0.123
Tatoeba-test.eng-lah.eng.lah 0.9 0.036
Tatoeba-test.eng-mai.eng.mai 9.8 0.374
Tatoeba-test.eng-mar.eng.mar 19.3 0.456
Tatoeba-test.eng.multi 13.1 0.380
Tatoeba-test.eng-nep.eng.nep 0.9 0.037
Tatoeba-test.eng-ori.eng.ori 1.3 0.190
Tatoeba-test.eng-oss.eng.oss 2.2 0.194
Tatoeba-test.eng-pan.eng.pan 8.5 0.337
Tatoeba-test.eng-pus.eng.pus 1.0 0.123
Tatoeba-test.eng-rom.eng.rom 1.3 0.221
Tatoeba-test.eng-san.eng.san 1.0 0.106
Tatoeba-test.eng-sin.eng.sin 10.8 0.382
Tatoeba-test.eng-snd.eng.snd 2.8 0.205
Tatoeba-test.eng-tgk.eng.tgk 6.9 0.324
Tatoeba-test.eng-tly.eng.tly 0.6 0.024
Tatoeba-test.eng-urd.eng.urd 11.8 0.396
Tatoeba-test.eng-zza.eng.zza 0.5 0.033

opus-2020-07-26.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): asm awa ben bho gom guj hif_Latn hin jdt_Cyrl kur_Arab kur_Latn mai mar npi ori oss pan_Guru pes pes_Latn pes_Thaa pnb pus rom san_Deva sin snd_Arab tgk_Cyrl tly_Latn urd zza
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-07-26.zip
  • test set translations: opus-2020-07-26.test.txt
  • test set scores: opus-2020-07-26.eval.txt

Benchmarks

testset BLEU chr-F
newsdev2014-enghin.eng.hin 6.8 0.318
newsdev2019-engu-engguj.eng.guj 5.7 0.274
newstest2014-hien-enghin.eng.hin 9.9 0.344
newstest2019-engu-engguj.eng.guj 6.5 0.284
Tatoeba-test.eng-asm.eng.asm 2.3 0.219
Tatoeba-test.eng-awa.eng.awa 0.3 0.026
Tatoeba-test.eng-ben.eng.ben 15.3 0.454
Tatoeba-test.eng-bho.eng.bho 0.3 0.078
Tatoeba-test.eng-fas.eng.fas 3.5 0.222
Tatoeba-test.eng-guj.eng.guj 17.7 0.367
Tatoeba-test.eng-hif.eng.hif 1.1 0.078
Tatoeba-test.eng-hin.eng.hin 16.7 0.455
Tatoeba-test.eng-jdt.eng.jdt 0.8 0.000
Tatoeba-test.eng-kok.eng.kok 6.6 0.006
Tatoeba-test.eng-kur.eng.kur 2.6 0.113
Tatoeba-test.eng-lah.eng.lah 0.7 0.092
Tatoeba-test.eng-mai.eng.mai 9.8 0.371
Tatoeba-test.eng-mar.eng.mar 19.8 0.462
Tatoeba-test.eng.multi 13.3 0.384
Tatoeba-test.eng-nep.eng.nep 0.4 0.013
Tatoeba-test.eng-ori.eng.ori 1.4 0.209
Tatoeba-test.eng-oss.eng.oss 2.3 0.179
Tatoeba-test.eng-pan.eng.pan 6.9 0.329
Tatoeba-test.eng-pus.eng.pus 1.5 0.122
Tatoeba-test.eng-rom.eng.rom 1.8 0.224
Tatoeba-test.eng-san.eng.san 1.5 0.108
Tatoeba-test.eng-sin.eng.sin 9.9 0.378
Tatoeba-test.eng-snd.eng.snd 4.5 0.337
Tatoeba-test.eng-tgk.eng.tgk 7.1 0.337
Tatoeba-test.eng-tly.eng.tly 0.4 0.015
Tatoeba-test.eng-urd.eng.urd 11.9 0.398
Tatoeba-test.eng-zza.eng.zza 0.4 0.026

opus2m-2020-08-01.zip

  • dataset: opus2m
  • model: transformer
  • source language(s): eng
  • target language(s): asm awa ben bho gom guj hif_Latn hin jdt_Cyrl kur_Arab kur_Latn mai mar npi ori oss pan_Guru pes pes_Latn pes_Thaa pnb pus rom san_Deva sin snd_Arab tgk_Cyrl tly_Latn urd zza
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus2m-2020-08-01.zip
  • test set translations: opus2m-2020-08-01.test.txt
  • test set scores: opus2m-2020-08-01.eval.txt

Benchmarks

testset BLEU chr-F
newsdev2014-enghin.eng.hin 6.7 0.326
newsdev2019-engu-engguj.eng.guj 6.0 0.283
newstest2014-hien-enghin.eng.hin 10.4 0.353
newstest2019-engu-engguj.eng.guj 6.6 0.282
Tatoeba-test.eng-asm.eng.asm 2.7 0.249
Tatoeba-test.eng-awa.eng.awa 0.4 0.122
Tatoeba-test.eng-ben.eng.ben 15.3 0.459
Tatoeba-test.eng-bho.eng.bho 3.7 0.161
Tatoeba-test.eng-fas.eng.fas 3.4 0.227
Tatoeba-test.eng-guj.eng.guj 18.5 0.365
Tatoeba-test.eng-hif.eng.hif 1.0 0.064
Tatoeba-test.eng-hin.eng.hin 17.0 0.461
Tatoeba-test.eng-jdt.eng.jdt 3.9 0.122
Tatoeba-test.eng-kok.eng.kok 5.5 0.059
Tatoeba-test.eng-kur.eng.kur 4.0 0.125
Tatoeba-test.eng-lah.eng.lah 0.3 0.008
Tatoeba-test.eng-mai.eng.mai 9.3 0.445
Tatoeba-test.eng-mar.eng.mar 20.7 0.473
Tatoeba-test.eng.multi 13.7 0.392
Tatoeba-test.eng-nep.eng.nep 0.6 0.060
Tatoeba-test.eng-ori.eng.ori 2.4 0.193
Tatoeba-test.eng-oss.eng.oss 2.1 0.174
Tatoeba-test.eng-pan.eng.pan 9.7 0.355
Tatoeba-test.eng-pus.eng.pus 1.0 0.126
Tatoeba-test.eng-rom.eng.rom 1.3 0.230
Tatoeba-test.eng-san.eng.san 1.3 0.101
Tatoeba-test.eng-sin.eng.sin 11.7 0.384
Tatoeba-test.eng-snd.eng.snd 2.8 0.180
Tatoeba-test.eng-tgk.eng.tgk 8.1 0.353
Tatoeba-test.eng-tly.eng.tly 0.5 0.015
Tatoeba-test.eng-urd.eng.urd 12.3 0.409
Tatoeba-test.eng-zza.eng.zza 0.5 0.025