Skip to content

Latest commit

 

History

History

eng-trk

opus-2020-06-28.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): aze_Latn bak chv crh crh_Latn kaz_Cyrl kaz_Latn kir_Cyrl kjh kum ota_Arab ota_Latn sah tat tat_Arab tat_Latn tuk tuk_Latn tur tyv uig_Arab uig_Cyrl uzb_Cyrl uzb_Latn
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-06-28.zip
  • test set translations: opus-2020-06-28.test.txt
  • test set scores: opus-2020-06-28.eval.txt

Benchmarks

testset BLEU chr-F
Tatoeba-test.eng-aze.eng.aze 26.4 0.563
Tatoeba-test.eng-bak.eng.bak 4.6 0.254
Tatoeba-test.eng-chv.eng.chv 3.8 0.271
Tatoeba-test.eng-crh.eng.crh 9.5 0.327
Tatoeba-test.eng-kaz.eng.kaz 10.8 0.350
Tatoeba-test.eng-kir.eng.kir 25.8 0.483
Tatoeba-test.eng-kjh.eng.kjh 1.9 0.034
Tatoeba-test.eng-kum.eng.kum 3.2 0.051
Tatoeba-test.eng.multi 18.5 0.443
Tatoeba-test.eng-ota.eng.ota 0.5 0.061
Tatoeba-test.eng-sah.eng.sah 0.8 0.026
Tatoeba-test.eng-tat.eng.tat 9.4 0.292
Tatoeba-test.eng-tuk.eng.tuk 5.2 0.311
Tatoeba-test.eng-tur.eng.tur 32.2 0.605
Tatoeba-test.eng-tyv.eng.tyv 7.6 0.185
Tatoeba-test.eng-uig.eng.uig 0.1 0.147
Tatoeba-test.eng-uzb.eng.uzb 2.2 0.253

opus-2020-07-14.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): aze_Latn bak chv crh crh_Latn kaz_Cyrl kaz_Latn kir_Cyrl kjh kum ota_Arab ota_Latn sah tat tat_Arab tat_Latn tuk tuk_Latn tur tyv uig_Arab uig_Cyrl uzb_Cyrl uzb_Latn
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-07-14.zip
  • test set translations: opus-2020-07-14.test.txt
  • test set scores: opus-2020-07-14.eval.txt

Benchmarks

testset BLEU chr-F
Tatoeba-test.eng-aze.eng.aze 25.7 0.560
Tatoeba-test.eng-bak.eng.bak 5.2 0.267
Tatoeba-test.eng-chv.eng.chv 3.7 0.264
Tatoeba-test.eng-crh.eng.crh 7.4 0.301
Tatoeba-test.eng-kaz.eng.kaz 11.4 0.353
Tatoeba-test.eng-kir.eng.kir 25.4 0.496
Tatoeba-test.eng-kjh.eng.kjh 1.3 0.035
Tatoeba-test.eng-kum.eng.kum 2.2 0.046
Tatoeba-test.eng.multi 18.0 0.436
Tatoeba-test.eng-ota.eng.ota 0.2 0.059
Tatoeba-test.eng-sah.eng.sah 0.5 0.021
Tatoeba-test.eng-tat.eng.tat 9.7 0.304
Tatoeba-test.eng-tuk.eng.tuk 5.6 0.305
Tatoeba-test.eng-tur.eng.tur 32.1 0.602
Tatoeba-test.eng-tyv.eng.tyv 4.8 0.224
Tatoeba-test.eng-uig.eng.uig 0.1 0.150
Tatoeba-test.eng-uzb.eng.uzb 3.3 0.264

opus-2020-07-20.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): aze_Latn bak chv crh crh_Latn kaz_Cyrl kaz_Latn kir_Cyrl kjh kum ota_Arab ota_Latn sah tat tat_Arab tat_Latn tuk tuk_Latn tur tyv uig_Arab uig_Cyrl uzb_Cyrl uzb_Latn
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-07-20.zip
  • test set translations: opus-2020-07-20.test.txt
  • test set scores: opus-2020-07-20.eval.txt

Benchmarks

testset BLEU chr-F
Tatoeba-test.eng-aze.eng.aze 26.4 0.569
Tatoeba-test.eng-bak.eng.bak 7.1 0.309
Tatoeba-test.eng-chv.eng.chv 2.6 0.267
Tatoeba-test.eng-crh.eng.crh 13.9 0.330
Tatoeba-test.eng-kaz.eng.kaz 12.2 0.362
Tatoeba-test.eng-kir.eng.kir 24.5 0.486
Tatoeba-test.eng-kjh.eng.kjh 2.1 0.042
Tatoeba-test.eng-kum.eng.kum 2.6 0.080
Tatoeba-test.eng.multi 18.6 0.445
Tatoeba-test.eng-ota.eng.ota 0.4 0.059
Tatoeba-test.eng-sah.eng.sah 0.6 0.035
Tatoeba-test.eng-tat.eng.tat 9.6 0.309
Tatoeba-test.eng-tuk.eng.tuk 5.3 0.311
Tatoeba-test.eng-tur.eng.tur 32.9 0.611
Tatoeba-test.eng-tyv.eng.tyv 3.4 0.232
Tatoeba-test.eng-uig.eng.uig 0.1 0.154
Tatoeba-test.eng-uzb.eng.uzb 3.1 0.267

opus-2020-07-27.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): aze_Latn bak chv crh crh_Latn kaz_Cyrl kaz_Latn kir_Cyrl kjh kum ota_Arab ota_Latn sah tat tat_Arab tat_Latn tuk tuk_Latn tur tyv uig_Arab uig_Cyrl uzb_Cyrl uzb_Latn
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-07-27.zip
  • test set translations: opus-2020-07-27.test.txt
  • test set scores: opus-2020-07-27.eval.txt

Benchmarks

testset BLEU chr-F
newsdev2016-entr-engtur.eng.tur 9.5 0.423
newstest2016-entr-engtur.eng.tur 8.0 0.397
newstest2017-entr-engtur.eng.tur 7.8 0.394
newstest2018-entr-engtur.eng.tur 8.2 0.396
Tatoeba-test.eng-aze.eng.aze 26.0 0.568
Tatoeba-test.eng-bak.eng.bak 9.2 0.320
Tatoeba-test.eng-chv.eng.chv 3.9 0.266
Tatoeba-test.eng-crh.eng.crh 7.6 0.347
Tatoeba-test.eng-kaz.eng.kaz 10.4 0.352
Tatoeba-test.eng-kir.eng.kir 26.9 0.508
Tatoeba-test.eng-kjh.eng.kjh 2.0 0.052
Tatoeba-test.eng-kum.eng.kum 2.7 0.073
Tatoeba-test.eng.multi 18.8 0.447
Tatoeba-test.eng-ota.eng.ota 0.4 0.064
Tatoeba-test.eng-sah.eng.sah 0.7 0.028
Tatoeba-test.eng-tat.eng.tat 9.6 0.309
Tatoeba-test.eng-tuk.eng.tuk 5.5 0.309
Tatoeba-test.eng-tur.eng.tur 33.4 0.617
Tatoeba-test.eng-tyv.eng.tyv 3.6 0.125
Tatoeba-test.eng-uig.eng.uig 0.1 0.152
Tatoeba-test.eng-uzb.eng.uzb 3.3 0.268

opus2m-2020-08-01.zip

  • dataset: opus2m
  • model: transformer
  • source language(s): eng
  • target language(s): aze_Latn bak chv crh crh_Latn kaz_Cyrl kaz_Latn kir_Cyrl kjh kum ota_Arab ota_Latn sah tat tat_Arab tat_Latn tuk tuk_Latn tur tyv uig_Arab uig_Cyrl uzb_Cyrl uzb_Latn
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus2m-2020-08-01.zip
  • test set translations: opus2m-2020-08-01.test.txt
  • test set scores: opus2m-2020-08-01.eval.txt

Benchmarks

testset BLEU chr-F
newsdev2016-entr-engtur.eng.tur 10.1 0.437
newstest2016-entr-engtur.eng.tur 9.2 0.410
newstest2017-entr-engtur.eng.tur 9.0 0.410
newstest2018-entr-engtur.eng.tur 9.2 0.413
Tatoeba-test.eng-aze.eng.aze 26.8 0.577
Tatoeba-test.eng-bak.eng.bak 7.6 0.308
Tatoeba-test.eng-chv.eng.chv 4.3 0.270
Tatoeba-test.eng-crh.eng.crh 8.1 0.330
Tatoeba-test.eng-kaz.eng.kaz 11.1 0.359
Tatoeba-test.eng-kir.eng.kir 28.6 0.524
Tatoeba-test.eng-kjh.eng.kjh 1.0 0.041
Tatoeba-test.eng-kum.eng.kum 2.2 0.075
Tatoeba-test.eng.multi 19.9 0.455
Tatoeba-test.eng-ota.eng.ota 0.5 0.065
Tatoeba-test.eng-sah.eng.sah 0.7 0.030
Tatoeba-test.eng-tat.eng.tat 9.7 0.316
Tatoeba-test.eng-tuk.eng.tuk 5.9 0.317
Tatoeba-test.eng-tur.eng.tur 34.6 0.623
Tatoeba-test.eng-tyv.eng.tyv 5.4 0.210
Tatoeba-test.eng-uig.eng.uig 0.1 0.155
Tatoeba-test.eng-uzb.eng.uzb 3.4 0.275

opus1m+bt-2021-04-10.zip

  • dataset: opus1m+bt
  • model: transformer-align
  • source language(s): eng
  • target language(s): aze bak chv crh kaz kir kjh kum nog ota sah tat tuk tur tyv uig uzb
  • model: transformer-align
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • valid language labels: >>aib<< >>alt<< >>atv<< >>azb<< >>aze<< >>aze_Arab<< >>aze_Latn<< >>azj<< >>bak<< >>bgx<< >>chg<< >>chv<< >>cjs<< >>clw<< >>crh<< >>crh_Latn<< >>dlg<< >>gag<< >>ili<< >>jct<< >>kaa<< >>kaz<< >>kaz_Cyrl<< >>kaz_Latn<< >>kdr<< >>kim<< >>kir<< >>kir_Cyrl<< >>kjh<< >>klj<< >>kmz<< >>krc<< >>kum<< >>nog<< >>ota<< >>ota_Arab<< >>ota_Latn<< >>otk<< >>oui<< >>qwm<< >>qxq<< >>sah<< >>slq<< >>slr<< >>sty<< >>tat<< >>tat_Arab<< >>tat_Latn<< >>tuk<< >>tuk_Cyrl<< >>tuk_Latn<< >>tur<< >>tyv<< >>uig<< >>uig_Arab<< >>uig_Cyrl<< >>uig_Latn<< >>uum<< >>uzb<< >>uzb_Cyrl<< >>uzb_Latn<< >>uzn<< >>uzs<< >>xbo<< >>xpc<< >>ybe<<
  • download: opus1m+bt-2021-04-10.zip
  • test set translations: opus1m+bt-2021-04-10.test.txt
  • test set scores: opus1m+bt-2021-04-10.eval.txt

Benchmarks

testset BLEU chr-F #sent #words BP
newsdev2016-entr.eng-tur 9.3 0.418 1001 16127 0.874
newstest2016-entr.eng-tur 8.5 0.397 3000 50782 0.844
newstest2017-entr.eng-tur 8.9 0.397 3007 51977 0.838
newstest2018-entr.eng-tur 8.5 0.396 3000 53731 0.823
Tatoeba-test.eng-aze 25.5 0.561 2659 12984 1.000
Tatoeba-test.eng-bak 15.0 0.441 39 179 0.977
Tatoeba-test.eng-chv 4.4 0.274 333 1715 1.000
Tatoeba-test.eng-crh 15.6 0.365 22 105 0.857
Tatoeba-test.eng-crh_Latn 16.5 0.382 21 100 0.838
Tatoeba-test.eng-kaz 12.1 0.391 397 2133 0.911
Tatoeba-test.eng-kaz_Cyrl 12.3 0.398 390 2093 0.916
Tatoeba-test.eng-kaz_Latn 2.4 0.052 7 40 0.549
Tatoeba-test.eng-kir 24.5 0.490 118 548 1.000
Tatoeba-test.eng-kjh 1.3 0.015 17 65 1.000
Tatoeba-test.eng-kum 4.2 0.076 8 33 1.000
Tatoeba-test.eng-multi 18.5 0.447 10000 57483 1.000
Tatoeba-test.eng-nog 0.7 0.036 83 336 1.000
Tatoeba-test.eng-ota 0.6 0.073 678 3724 1.000
Tatoeba-test.eng-ota_Arab 0.4 0.009 366 1993 1.000
Tatoeba-test.eng-ota_Latn 1.0 0.135 312 1731 1.000
Tatoeba-test.eng-sah 1.8 0.118 39 173 0.922
Tatoeba-test.eng-tat 9.9 0.321 1451 8875 1.000
Tatoeba-test.eng-tat_Arab 20.0 0.046 4 16 1.000
Tatoeba-test.eng-tat_Latn 0.8 0.121 180 1500 0.884
Tatoeba-test.eng-tuk 8.4 0.364 2500 15474 1.000
Tatoeba-test.eng-tuk_Latn 8.4 0.364 2499 15473 1.000
Tatoeba-test.eng-tur 31.9 0.603 10000 60466 0.900
Tatoeba-test.eng-tyv 19.6 0.302 5 24 0.662
Tatoeba-test.eng-uig 0.3 0.164 3024 15719 1.000
Tatoeba-test.eng-uig_Arab 0.3 0.164 3021 15702 1.000
Tatoeba-test.eng-uig_Cyrl 3.8 0.175 3 17 1.000
Tatoeba-test.eng-uzb 4.6 0.304 457 2010 1.000
Tatoeba-test.eng-uzb_Cyrl 0.6 0.165 157 761 1.000
Tatoeba-test.eng-uzb_Latn 12.3 0.410 300 1249 1.000