Folders and files Name Name Last commit message
Last commit date
parent directory
View all files
dataset: opus
model: transformer
source language(s): eng
target language(s): asm awa ben bho gom guj hif_Latn hin mai mar npi ori pan_Guru pnb rom sin snd_Arab urd
model: transformer
pre-processing: normalization + SentencePiece (spm32k,spm32k)
a sentence initial language token is required in the form of >>id<<
(id = valid target language ID)
download: opus-2020-06-28.zip
test set translations: opus-2020-06-28.test.txt
test set scores: opus-2020-06-28.eval.txt
testset
BLEU
chr-F
Tatoeba-test.eng-asm.eng.asm
3.0
0.245
Tatoeba-test.eng-awa.eng.awa
0.4
0.098
Tatoeba-test.eng-ben.eng.ben
16.5
0.481
Tatoeba-test.eng-bho.eng.bho
0.8
0.110
Tatoeba-test.eng-guj.eng.guj
19.9
0.393
Tatoeba-test.eng-hif.eng.hif
0.5
0.022
Tatoeba-test.eng-hin.eng.hin
17.4
0.463
Tatoeba-test.eng-kok.eng.kok
8.1
0.006
Tatoeba-test.eng-lah.eng.lah
0.2
0.001
Tatoeba-test.eng-mai.eng.mai
7.6
0.374
Tatoeba-test.eng-mar.eng.mar
20.4
0.464
Tatoeba-test.eng.multi
17.0
0.442
Tatoeba-test.eng-nep.eng.nep
1.0
0.102
Tatoeba-test.eng-ori.eng.ori
2.2
0.198
Tatoeba-test.eng-pan.eng.pan
8.4
0.343
Tatoeba-test.eng-rom.eng.rom
0.3
0.185
Tatoeba-test.eng-sin.eng.sin
9.5
0.368
Tatoeba-test.eng-snd.eng.snd
6.8
0.343
Tatoeba-test.eng-urd.eng.urd
12.5
0.414
dataset: opus
model: transformer
source language(s): eng
target language(s): asm awa ben bho gom guj hif_Latn hin mai mar npi ori pan_Guru pnb rom san_Deva sin snd_Arab urd
model: transformer
pre-processing: normalization + SentencePiece (spm32k,spm32k)
a sentence initial language token is required in the form of >>id<<
(id = valid target language ID)
download: opus-2020-07-06.zip
test set translations: opus-2020-07-06.test.txt
test set scores: opus-2020-07-06.eval.txt
testset
BLEU
chr-F
Tatoeba-test.eng-asm.eng.asm
3.6
0.277
Tatoeba-test.eng-awa.eng.awa
0.4
0.144
Tatoeba-test.eng-ben.eng.ben
15.9
0.466
Tatoeba-test.eng-bho.eng.bho
0.6
0.152
Tatoeba-test.eng-guj.eng.guj
20.9
0.380
Tatoeba-test.eng-hif.eng.hif
0.6
0.032
Tatoeba-test.eng-hin.eng.hin
17.2
0.461
Tatoeba-test.eng-kok.eng.kok
3.3
0.022
Tatoeba-test.eng-lah.eng.lah
0.3
0.007
Tatoeba-test.eng-mai.eng.mai
8.9
0.392
Tatoeba-test.eng-mar.eng.mar
20.1
0.463
Tatoeba-test.eng.multi
16.8
0.439
Tatoeba-test.eng-nep.eng.nep
0.6
0.058
Tatoeba-test.eng-ori.eng.ori
2.2
0.187
Tatoeba-test.eng-pan.eng.pan
9.6
0.351
Tatoeba-test.eng-rom.eng.rom
0.4
0.188
Tatoeba-test.eng-san.eng.san
1.5
0.111
Tatoeba-test.eng-sin.eng.sin
9.1
0.370
Tatoeba-test.eng-snd.eng.snd
1.9
0.235
Tatoeba-test.eng-urd.eng.urd
12.7
0.412
dataset: opus
model: transformer
source language(s): eng
target language(s): asm awa ben bho gom guj hif_Latn hin mai mar npi ori pan_Guru pnb rom san_Deva sin snd_Arab urd
model: transformer
pre-processing: normalization + SentencePiece (spm32k,spm32k)
a sentence initial language token is required in the form of >>id<<
(id = valid target language ID)
download: opus-2020-07-26.zip
test set translations: opus-2020-07-26.test.txt
test set scores: opus-2020-07-26.eval.txt
testset
BLEU
chr-F
newsdev2014-enghin.eng.hin
7.5
0.337
newsdev2019-engu-engguj.eng.guj
6.3
0.282
newstest2014-hien-enghin.eng.hin
11.0
0.358
newstest2019-engu-engguj.eng.guj
7.1
0.291
Tatoeba-test.eng-asm.eng.asm
3.7
0.260
Tatoeba-test.eng-awa.eng.awa
0.4
0.144
Tatoeba-test.eng-ben.eng.ben
16.0
0.466
Tatoeba-test.eng-bho.eng.bho
0.6
0.143
Tatoeba-test.eng-guj.eng.guj
20.2
0.375
Tatoeba-test.eng-hif.eng.hif
0.5
0.040
Tatoeba-test.eng-hin.eng.hin
17.3
0.462
Tatoeba-test.eng-kok.eng.kok
3.3
0.044
Tatoeba-test.eng-lah.eng.lah
0.2
0.005
Tatoeba-test.eng-mai.eng.mai
9.3
0.385
Tatoeba-test.eng-mar.eng.mar
19.9
0.461
Tatoeba-test.eng.multi
16.6
0.436
Tatoeba-test.eng-nep.eng.nep
0.7
0.067
Tatoeba-test.eng-ori.eng.ori
2.2
0.196
Tatoeba-test.eng-pan.eng.pan
7.0
0.342
Tatoeba-test.eng-rom.eng.rom
0.4
0.187
Tatoeba-test.eng-san.eng.san
1.7
0.109
Tatoeba-test.eng-sin.eng.sin
9.1
0.365
Tatoeba-test.eng-snd.eng.snd
5.6
0.343
Tatoeba-test.eng-urd.eng.urd
12.9
0.411
dataset: opus2m
model: transformer
source language(s): eng
target language(s): asm awa ben bho gom guj hif_Latn hin mai mar npi ori pan_Guru pnb rom san_Deva sin snd_Arab urd
model: transformer
pre-processing: normalization + SentencePiece (spm32k,spm32k)
a sentence initial language token is required in the form of >>id<<
(id = valid target language ID)
download: opus2m-2020-08-01.zip
test set translations: opus2m-2020-08-01.test.txt
test set scores: opus2m-2020-08-01.eval.txt
testset
BLEU
chr-F
newsdev2014-enghin.eng.hin
8.2
0.342
newsdev2019-engu-engguj.eng.guj
6.5
0.293
newstest2014-hien-enghin.eng.hin
11.4
0.364
newstest2019-engu-engguj.eng.guj
7.2
0.296
Tatoeba-test.eng-asm.eng.asm
2.7
0.277
Tatoeba-test.eng-awa.eng.awa
0.5
0.132
Tatoeba-test.eng-ben.eng.ben
16.7
0.470
Tatoeba-test.eng-bho.eng.bho
4.3
0.227
Tatoeba-test.eng-guj.eng.guj
17.5
0.373
Tatoeba-test.eng-hif.eng.hif
0.6
0.028
Tatoeba-test.eng-hin.eng.hin
17.7
0.469
Tatoeba-test.eng-kok.eng.kok
1.7
0.000
Tatoeba-test.eng-lah.eng.lah
0.3
0.028
Tatoeba-test.eng-mai.eng.mai
15.6
0.429
Tatoeba-test.eng-mar.eng.mar
21.3
0.477
Tatoeba-test.eng.multi
17.3
0.448
Tatoeba-test.eng-nep.eng.nep
0.8
0.081
Tatoeba-test.eng-ori.eng.ori
2.2
0.208
Tatoeba-test.eng-pan.eng.pan
8.0
0.347
Tatoeba-test.eng-rom.eng.rom
0.4
0.197
Tatoeba-test.eng-san.eng.san
0.5
0.108
Tatoeba-test.eng-sin.eng.sin
9.1
0.364
Tatoeba-test.eng-snd.eng.snd
4.4
0.284
Tatoeba-test.eng-urd.eng.urd
13.3
0.423
dataset: opus1m+bt
model: transformer-align
source language(s): eng
target language(s): asm awa ben bho dty gbm gom guj hif hin mai mar nep npi ori pan pnb rmn rmy rom san sin snd urd
model: transformer-align
pre-processing: normalization + SentencePiece (spm32k,spm32k)
a sentence initial language token is required in the form of >>id<<
(id = valid target language ID)
valid language labels: >>aee<< >>aeq<< >>anp<< >>anr<< >>asm<< >>awa<< >>bdv<< >>ben<< >>ben_Cyrl<< >>ben_Deva<< >>ben_Gujr<< >>bfb<< >>bfy<< >>bfz<< >>bgc<< >>bgd<< >>bge<< >>bgq<< >>bgw<< >>bha<< >>bhb<< >>bhd<< >>bhe<< >>bhi<< >>bho<< >>bht<< >>bhu<< >>bjj<< >>bkk<< >>bmj<< >>bns<< >>bpx<< >>bpy<< >>bra<< >>btv<< >>ccp<< >>cdh<< >>cdi<< >>cdj<< >>cih<< >>clh<< >>ctg<< >>dcc<< >>dgo<< >>dhd<< >>dhn<< >>dho<< >>div<< >>dmk<< >>dml<< >>doi<< >>dry<< >>dty<< >>dub<< >>duh<< >>dwz<< >>emx<< >>gas<< >>gbk<< >>gbl<< >>gbm<< >>gda<< >>gdx<< >>ggg<< >>ghr<< >>gig<< >>gjk<< >>gju<< >>glh<< >>gom<< >>gra<< >>guj<< >>gwc<< >>gwf<< >>gwt<< >>haj<< >>hca<< >>hif<< >>hif_Latn<< >>hii<< >>hin<< >>hlb<< >>hnd<< >>hne<< >>hno<< >>hns<< >>hoj<< >>jat<< >>jdg<< >>jml<< >>jnd<< >>jns<< >>kas<< >>kbu<< >>keq<< >>key<< >>kfr<< >>kfs<< >>kft<< >>kfu<< >>kfv<< >>kfx<< >>kfy<< >>khn<< >>khw<< >>kjo<< >>kls<< >>knn<< >>kok<< >>kra<< >>ksy<< >>kvx<< >>kxp<< >>kyw<< >>lah<< >>lbm<< >>lhl<< >>lmn<< >>lss<< >>luv<< >>mag<< >>mai<< >>mar<< >>mby<< >>mjl<< >>mjz<< >>mkb<< >>mke<< >>mki<< >>mtr<< >>mup<< >>mve<< >>mvy<< >>mwr<< >>nag<< >>nep<< >>nhh<< >>nli<< >>nlx<< >>noe<< >>noi<< >>npi<< >>odk<< >>omr<< >>ori<< >>ort<< >>ory<< >>pan<< >>pan_Guru<< >>paq<< >>pcl<< >>pgg<< >>phd<< >>phl<< >>phr<< >>pli<< >>plk<< >>plp<< >>pmh<< >>pmu<< >>pnb<< >>pnb_Guru<< >>psh<< >>psi<< >>psu<< >>pwr<< >>qpp<< >>raj<< >>rei<< >>rhg<< >>rjs<< >>rkt<< >>rmc<< >>rmf<< >>rmi<< >>rml<< >>rmn<< >>rmo<< >>rmq<< >>rmt<< >>rmw<< >>rmy<< >>rom<< >>rtw<< >>rwr<< >>san<< >>san_Deva<< >>saz<< >>sbn<< >>sck<< >>scl<< >>sdg<< >>sdr<< >>shd<< >>sin<< >>sjp<< >>skr<< >>smm<< >>smv<< >>snd<< >>snd_Arab<< >>soi<< >>spv<< >>srx<< >>ssi<< >>sts<< >>swv<< >>syl<< >>tdb<< >>the<< >>thl<< >>thq<< >>thr<< >>tkb<< >>tkt<< >>tnv<< >>tra<< >>trw<< >>urd<< >>ush<< >>vaa<< >>vah<< >>vas<< >>vav<< >>ved<< >>vgr<< >>wbr<< >>wry<< >>wsv<< >>wtm<< >>xhe<< >>xka<< >>xnr<<
download: opus1m+bt-2021-04-13.zip
test set translations: opus1m+bt-2021-04-13.test.txt
test set scores: opus1m+bt-2021-04-13.eval.txt
testset
BLEU
chr-F
#sent
#words
BP
newsdev2014.eng-hin
8.4
0.363
520
9538
1.000
newsdev2019-engu.eng-guj
7.6
0.312
1998
39137
0.810
newstest2014-hien.eng-hin
12.0
0.384
2507
60878
1.000
newstest2019-engu.eng-guj
7.9
0.320
998
21927
0.806
Tatoeba-test.eng-asm
3.5
0.256
117
569
1.000
Tatoeba-test.eng-awa
0.4
0.084
279
1148
1.000
Tatoeba-test.eng-ben
9.9
0.446
2500
11654
1.000
Tatoeba-test.eng-bho
2.0
0.246
42
244
1.000
Tatoeba-test.eng-gbm
0.3
0.075
39
153
1.000
Tatoeba-test.eng-guj
20.8
0.418
154
824
1.000
Tatoeba-test.eng-hif
0.7
0.038
36
231
1.000
Tatoeba-test.eng-hin
17.0
0.466
5000
32904
1.000
Tatoeba-test.eng-kok
8.1
0.005
1
6
1.000
Tatoeba-test.eng-lah
0.2
0.018
32
182
1.000
Tatoeba-test.eng-mai
7.8
0.304
8
19
1.000
Tatoeba-test.eng-mar
22.1
0.504
10000
58667
0.985
Tatoeba-test.eng-multi
16.3
0.451
10000
59570
1.000
Tatoeba-test.eng-nep
0.7
0.104
115
413
1.000
Tatoeba-test.eng-ori
0.3
0.003
33
205
1.000
Tatoeba-test.eng-pan
6.2
0.312
87
603
1.000
Tatoeba-test.eng-rom
2.9
0.253
671
4974
1.000
Tatoeba-test.eng-san
1.0
0.107
144
389
1.000
Tatoeba-test.eng-sin
8.1
0.350
45
234
1.000
Tatoeba-test.eng-snd
6.5
0.334
4
18
1.000
Tatoeba-test.eng-urd
10.5
0.390
1663
12154
1.000
tico19-test.eng-ben
6.7
0.376
2100
51751
1.000
tico19-test.eng-hin
18.1
0.432
2100
62738
1.000
tico19-test.eng-mar
7.5
0.364
2100
50881
0.844
tico19-test.eng-nep
7.6
0.407
2100
48706
1.000
tico19-test.eng-urd
8.4
0.329
2100
65363
0.943
You can’t perform that action at this time.