Folders and files Name Name Last commit message
Last commit date
parent directory
View all files
dataset: opus
model: transformer
source language(s): eng
target language(s): aze_Latn bak chv crh crh_Latn kaz_Cyrl kaz_Latn kir_Cyrl kjh kum ota_Arab ota_Latn sah tat tat_Arab tat_Latn tuk tuk_Latn tur tyv uig_Arab uig_Cyrl uzb_Cyrl uzb_Latn
model: transformer
pre-processing: normalization + SentencePiece (spm32k,spm32k)
a sentence initial language token is required in the form of >>id<<
(id = valid target language ID)
download: opus-2020-06-28.zip
test set translations: opus-2020-06-28.test.txt
test set scores: opus-2020-06-28.eval.txt
testset
BLEU
chr-F
Tatoeba-test.eng-aze.eng.aze
26.4
0.563
Tatoeba-test.eng-bak.eng.bak
4.6
0.254
Tatoeba-test.eng-chv.eng.chv
3.8
0.271
Tatoeba-test.eng-crh.eng.crh
9.5
0.327
Tatoeba-test.eng-kaz.eng.kaz
10.8
0.350
Tatoeba-test.eng-kir.eng.kir
25.8
0.483
Tatoeba-test.eng-kjh.eng.kjh
1.9
0.034
Tatoeba-test.eng-kum.eng.kum
3.2
0.051
Tatoeba-test.eng.multi
18.5
0.443
Tatoeba-test.eng-ota.eng.ota
0.5
0.061
Tatoeba-test.eng-sah.eng.sah
0.8
0.026
Tatoeba-test.eng-tat.eng.tat
9.4
0.292
Tatoeba-test.eng-tuk.eng.tuk
5.2
0.311
Tatoeba-test.eng-tur.eng.tur
32.2
0.605
Tatoeba-test.eng-tyv.eng.tyv
7.6
0.185
Tatoeba-test.eng-uig.eng.uig
0.1
0.147
Tatoeba-test.eng-uzb.eng.uzb
2.2
0.253
dataset: opus
model: transformer
source language(s): eng
target language(s): aze_Latn bak chv crh crh_Latn kaz_Cyrl kaz_Latn kir_Cyrl kjh kum ota_Arab ota_Latn sah tat tat_Arab tat_Latn tuk tuk_Latn tur tyv uig_Arab uig_Cyrl uzb_Cyrl uzb_Latn
model: transformer
pre-processing: normalization + SentencePiece (spm32k,spm32k)
a sentence initial language token is required in the form of >>id<<
(id = valid target language ID)
download: opus-2020-07-14.zip
test set translations: opus-2020-07-14.test.txt
test set scores: opus-2020-07-14.eval.txt
testset
BLEU
chr-F
Tatoeba-test.eng-aze.eng.aze
25.7
0.560
Tatoeba-test.eng-bak.eng.bak
5.2
0.267
Tatoeba-test.eng-chv.eng.chv
3.7
0.264
Tatoeba-test.eng-crh.eng.crh
7.4
0.301
Tatoeba-test.eng-kaz.eng.kaz
11.4
0.353
Tatoeba-test.eng-kir.eng.kir
25.4
0.496
Tatoeba-test.eng-kjh.eng.kjh
1.3
0.035
Tatoeba-test.eng-kum.eng.kum
2.2
0.046
Tatoeba-test.eng.multi
18.0
0.436
Tatoeba-test.eng-ota.eng.ota
0.2
0.059
Tatoeba-test.eng-sah.eng.sah
0.5
0.021
Tatoeba-test.eng-tat.eng.tat
9.7
0.304
Tatoeba-test.eng-tuk.eng.tuk
5.6
0.305
Tatoeba-test.eng-tur.eng.tur
32.1
0.602
Tatoeba-test.eng-tyv.eng.tyv
4.8
0.224
Tatoeba-test.eng-uig.eng.uig
0.1
0.150
Tatoeba-test.eng-uzb.eng.uzb
3.3
0.264
dataset: opus
model: transformer
source language(s): eng
target language(s): aze_Latn bak chv crh crh_Latn kaz_Cyrl kaz_Latn kir_Cyrl kjh kum ota_Arab ota_Latn sah tat tat_Arab tat_Latn tuk tuk_Latn tur tyv uig_Arab uig_Cyrl uzb_Cyrl uzb_Latn
model: transformer
pre-processing: normalization + SentencePiece (spm32k,spm32k)
a sentence initial language token is required in the form of >>id<<
(id = valid target language ID)
download: opus-2020-07-20.zip
test set translations: opus-2020-07-20.test.txt
test set scores: opus-2020-07-20.eval.txt
testset
BLEU
chr-F
Tatoeba-test.eng-aze.eng.aze
26.4
0.569
Tatoeba-test.eng-bak.eng.bak
7.1
0.309
Tatoeba-test.eng-chv.eng.chv
2.6
0.267
Tatoeba-test.eng-crh.eng.crh
13.9
0.330
Tatoeba-test.eng-kaz.eng.kaz
12.2
0.362
Tatoeba-test.eng-kir.eng.kir
24.5
0.486
Tatoeba-test.eng-kjh.eng.kjh
2.1
0.042
Tatoeba-test.eng-kum.eng.kum
2.6
0.080
Tatoeba-test.eng.multi
18.6
0.445
Tatoeba-test.eng-ota.eng.ota
0.4
0.059
Tatoeba-test.eng-sah.eng.sah
0.6
0.035
Tatoeba-test.eng-tat.eng.tat
9.6
0.309
Tatoeba-test.eng-tuk.eng.tuk
5.3
0.311
Tatoeba-test.eng-tur.eng.tur
32.9
0.611
Tatoeba-test.eng-tyv.eng.tyv
3.4
0.232
Tatoeba-test.eng-uig.eng.uig
0.1
0.154
Tatoeba-test.eng-uzb.eng.uzb
3.1
0.267
dataset: opus
model: transformer
source language(s): eng
target language(s): aze_Latn bak chv crh crh_Latn kaz_Cyrl kaz_Latn kir_Cyrl kjh kum ota_Arab ota_Latn sah tat tat_Arab tat_Latn tuk tuk_Latn tur tyv uig_Arab uig_Cyrl uzb_Cyrl uzb_Latn
model: transformer
pre-processing: normalization + SentencePiece (spm32k,spm32k)
a sentence initial language token is required in the form of >>id<<
(id = valid target language ID)
download: opus-2020-07-27.zip
test set translations: opus-2020-07-27.test.txt
test set scores: opus-2020-07-27.eval.txt
testset
BLEU
chr-F
newsdev2016-entr-engtur.eng.tur
9.5
0.423
newstest2016-entr-engtur.eng.tur
8.0
0.397
newstest2017-entr-engtur.eng.tur
7.8
0.394
newstest2018-entr-engtur.eng.tur
8.2
0.396
Tatoeba-test.eng-aze.eng.aze
26.0
0.568
Tatoeba-test.eng-bak.eng.bak
9.2
0.320
Tatoeba-test.eng-chv.eng.chv
3.9
0.266
Tatoeba-test.eng-crh.eng.crh
7.6
0.347
Tatoeba-test.eng-kaz.eng.kaz
10.4
0.352
Tatoeba-test.eng-kir.eng.kir
26.9
0.508
Tatoeba-test.eng-kjh.eng.kjh
2.0
0.052
Tatoeba-test.eng-kum.eng.kum
2.7
0.073
Tatoeba-test.eng.multi
18.8
0.447
Tatoeba-test.eng-ota.eng.ota
0.4
0.064
Tatoeba-test.eng-sah.eng.sah
0.7
0.028
Tatoeba-test.eng-tat.eng.tat
9.6
0.309
Tatoeba-test.eng-tuk.eng.tuk
5.5
0.309
Tatoeba-test.eng-tur.eng.tur
33.4
0.617
Tatoeba-test.eng-tyv.eng.tyv
3.6
0.125
Tatoeba-test.eng-uig.eng.uig
0.1
0.152
Tatoeba-test.eng-uzb.eng.uzb
3.3
0.268
dataset: opus2m
model: transformer
source language(s): eng
target language(s): aze_Latn bak chv crh crh_Latn kaz_Cyrl kaz_Latn kir_Cyrl kjh kum ota_Arab ota_Latn sah tat tat_Arab tat_Latn tuk tuk_Latn tur tyv uig_Arab uig_Cyrl uzb_Cyrl uzb_Latn
model: transformer
pre-processing: normalization + SentencePiece (spm32k,spm32k)
a sentence initial language token is required in the form of >>id<<
(id = valid target language ID)
download: opus2m-2020-08-01.zip
test set translations: opus2m-2020-08-01.test.txt
test set scores: opus2m-2020-08-01.eval.txt
testset
BLEU
chr-F
newsdev2016-entr-engtur.eng.tur
10.1
0.437
newstest2016-entr-engtur.eng.tur
9.2
0.410
newstest2017-entr-engtur.eng.tur
9.0
0.410
newstest2018-entr-engtur.eng.tur
9.2
0.413
Tatoeba-test.eng-aze.eng.aze
26.8
0.577
Tatoeba-test.eng-bak.eng.bak
7.6
0.308
Tatoeba-test.eng-chv.eng.chv
4.3
0.270
Tatoeba-test.eng-crh.eng.crh
8.1
0.330
Tatoeba-test.eng-kaz.eng.kaz
11.1
0.359
Tatoeba-test.eng-kir.eng.kir
28.6
0.524
Tatoeba-test.eng-kjh.eng.kjh
1.0
0.041
Tatoeba-test.eng-kum.eng.kum
2.2
0.075
Tatoeba-test.eng.multi
19.9
0.455
Tatoeba-test.eng-ota.eng.ota
0.5
0.065
Tatoeba-test.eng-sah.eng.sah
0.7
0.030
Tatoeba-test.eng-tat.eng.tat
9.7
0.316
Tatoeba-test.eng-tuk.eng.tuk
5.9
0.317
Tatoeba-test.eng-tur.eng.tur
34.6
0.623
Tatoeba-test.eng-tyv.eng.tyv
5.4
0.210
Tatoeba-test.eng-uig.eng.uig
0.1
0.155
Tatoeba-test.eng-uzb.eng.uzb
3.4
0.275
dataset: opus1m+bt
model: transformer-align
source language(s): eng
target language(s): aze bak chv crh kaz kir kjh kum nog ota sah tat tuk tur tyv uig uzb
model: transformer-align
pre-processing: normalization + SentencePiece (spm32k,spm32k)
a sentence initial language token is required in the form of >>id<<
(id = valid target language ID)
valid language labels: >>aib<< >>alt<< >>atv<< >>azb<< >>aze<< >>aze_Arab<< >>aze_Latn<< >>azj<< >>bak<< >>bgx<< >>chg<< >>chv<< >>cjs<< >>clw<< >>crh<< >>crh_Latn<< >>dlg<< >>gag<< >>ili<< >>jct<< >>kaa<< >>kaz<< >>kaz_Cyrl<< >>kaz_Latn<< >>kdr<< >>kim<< >>kir<< >>kir_Cyrl<< >>kjh<< >>klj<< >>kmz<< >>krc<< >>kum<< >>nog<< >>ota<< >>ota_Arab<< >>ota_Latn<< >>otk<< >>oui<< >>qwm<< >>qxq<< >>sah<< >>slq<< >>slr<< >>sty<< >>tat<< >>tat_Arab<< >>tat_Latn<< >>tuk<< >>tuk_Cyrl<< >>tuk_Latn<< >>tur<< >>tyv<< >>uig<< >>uig_Arab<< >>uig_Cyrl<< >>uig_Latn<< >>uum<< >>uzb<< >>uzb_Cyrl<< >>uzb_Latn<< >>uzn<< >>uzs<< >>xbo<< >>xpc<< >>ybe<<
download: opus1m+bt-2021-04-10.zip
test set translations: opus1m+bt-2021-04-10.test.txt
test set scores: opus1m+bt-2021-04-10.eval.txt
testset
BLEU
chr-F
#sent
#words
BP
newsdev2016-entr.eng-tur
9.3
0.418
1001
16127
0.874
newstest2016-entr.eng-tur
8.5
0.397
3000
50782
0.844
newstest2017-entr.eng-tur
8.9
0.397
3007
51977
0.838
newstest2018-entr.eng-tur
8.5
0.396
3000
53731
0.823
Tatoeba-test.eng-aze
25.5
0.561
2659
12984
1.000
Tatoeba-test.eng-bak
15.0
0.441
39
179
0.977
Tatoeba-test.eng-chv
4.4
0.274
333
1715
1.000
Tatoeba-test.eng-crh
15.6
0.365
22
105
0.857
Tatoeba-test.eng-crh_Latn
16.5
0.382
21
100
0.838
Tatoeba-test.eng-kaz
12.1
0.391
397
2133
0.911
Tatoeba-test.eng-kaz_Cyrl
12.3
0.398
390
2093
0.916
Tatoeba-test.eng-kaz_Latn
2.4
0.052
7
40
0.549
Tatoeba-test.eng-kir
24.5
0.490
118
548
1.000
Tatoeba-test.eng-kjh
1.3
0.015
17
65
1.000
Tatoeba-test.eng-kum
4.2
0.076
8
33
1.000
Tatoeba-test.eng-multi
18.5
0.447
10000
57483
1.000
Tatoeba-test.eng-nog
0.7
0.036
83
336
1.000
Tatoeba-test.eng-ota
0.6
0.073
678
3724
1.000
Tatoeba-test.eng-ota_Arab
0.4
0.009
366
1993
1.000
Tatoeba-test.eng-ota_Latn
1.0
0.135
312
1731
1.000
Tatoeba-test.eng-sah
1.8
0.118
39
173
0.922
Tatoeba-test.eng-tat
9.9
0.321
1451
8875
1.000
Tatoeba-test.eng-tat_Arab
20.0
0.046
4
16
1.000
Tatoeba-test.eng-tat_Latn
0.8
0.121
180
1500
0.884
Tatoeba-test.eng-tuk
8.4
0.364
2500
15474
1.000
Tatoeba-test.eng-tuk_Latn
8.4
0.364
2499
15473
1.000
Tatoeba-test.eng-tur
31.9
0.603
10000
60466
0.900
Tatoeba-test.eng-tyv
19.6
0.302
5
24
0.662
Tatoeba-test.eng-uig
0.3
0.164
3024
15719
1.000
Tatoeba-test.eng-uig_Arab
0.3
0.164
3021
15702
1.000
Tatoeba-test.eng-uig_Cyrl
3.8
0.175
3
17
1.000
Tatoeba-test.eng-uzb
4.6
0.304
457
2010
1.000
Tatoeba-test.eng-uzb_Cyrl
0.6
0.165
157
761
1.000
Tatoeba-test.eng-uzb_Latn
12.3
0.410
300
1249
1.000
You can’t perform that action at this time.