Skip to content

Latest commit

 

History

History

eng-bnt

opus-2020-06-28.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): kdx kin lin lug nya run sna swh toi_Latn tso umb xho zul
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-06-28.zip
  • test set translations: opus-2020-06-28.test.txt
  • test set scores: opus-2020-06-28.eval.txt

Benchmarks

testset BLEU chr-F
Tatoeba-test.eng-kdx.eng.kdx 2.8 0.266
Tatoeba-test.eng-kin.eng.kin 5.1 0.522
Tatoeba-test.eng-lin.eng.lin 1.0 0.267
Tatoeba-test.eng-lug.eng.lug 15.2 0.576
Tatoeba-test.eng.multi 13.6 0.490
Tatoeba-test.eng-nya.eng.nya 16.2 0.600
Tatoeba-test.eng-run.eng.run 12.6 0.476
Tatoeba-test.eng-sna.eng.sna 24.9 0.633
Tatoeba-test.eng-swa.eng.swa 1.5 0.149
Tatoeba-test.eng-toi.eng.toi 8.3 0.210
Tatoeba-test.eng-tso.eng.tso 41.3 0.698
Tatoeba-test.eng-umb.eng.umb 4.6 0.349
Tatoeba-test.eng-xho.eng.xho 29.1 0.619
Tatoeba-test.eng-zul.eng.zul 30.4 0.749

opus-2020-07-06.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): kin lin lug nya run sna swh toi_Latn tso umb xho zul
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-07-06.zip
  • test set translations: opus-2020-07-06.test.txt
  • test set scores: opus-2020-07-06.eval.txt

Benchmarks

testset BLEU chr-F
Tatoeba-test.eng-kin.eng.kin 8.1 0.596
Tatoeba-test.eng-lin.eng.lin 1.3 0.276
Tatoeba-test.eng-lug.eng.lug 4.8 0.503
Tatoeba-test.eng.multi 13.7 0.491
Tatoeba-test.eng-nya.eng.nya 20.1 0.623
Tatoeba-test.eng-run.eng.run 13.0 0.478
Tatoeba-test.eng-sna.eng.sna 29.6 0.618
Tatoeba-test.eng-swa.eng.swa 1.3 0.156
Tatoeba-test.eng-toi.eng.toi 14.1 0.290
Tatoeba-test.eng-tso.eng.tso 32.9 0.607
Tatoeba-test.eng-umb.eng.umb 4.0 0.330
Tatoeba-test.eng-xho.eng.xho 23.4 0.600
Tatoeba-test.eng-zul.eng.zul 33.0 0.725

opus-2020-07-14.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): kin lin lug nya run sna swh toi_Latn tso umb xho zul
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-07-14.zip
  • test set translations: opus-2020-07-14.test.txt
  • test set scores: opus-2020-07-14.eval.txt

Benchmarks

testset BLEU chr-F
Tatoeba-test.eng-kin.eng.kin 10.2 0.540
Tatoeba-test.eng-lin.eng.lin 1.1 0.275
Tatoeba-test.eng-lug.eng.lug 5.1 0.433
Tatoeba-test.eng.multi 12.0 0.444
Tatoeba-test.eng-nya.eng.nya 25.7 0.621
Tatoeba-test.eng-run.eng.run 13.2 0.487
Tatoeba-test.eng-sna.eng.sna 32.3 0.652
Tatoeba-test.eng-toi.eng.toi 10.7 0.255
Tatoeba-test.eng-tso.eng.tso 41.3 0.698
Tatoeba-test.eng-umb.eng.umb 4.4 0.329
Tatoeba-test.eng-xho.eng.xho 24.9 0.613
Tatoeba-test.eng-zul.eng.zul 35.7 0.753

opus-2020-07-26.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): kin lin lug nya run sna swh toi_Latn tso umb xho zul
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-07-26.zip
  • test set translations: opus-2020-07-26.test.txt
  • test set scores: opus-2020-07-26.eval.txt

Benchmarks

testset BLEU chr-F
Tatoeba-test.eng-kin.eng.kin 12.5 0.519
Tatoeba-test.eng-lin.eng.lin 1.1 0.277
Tatoeba-test.eng-lug.eng.lug 4.8 0.415
Tatoeba-test.eng.multi 12.1 0.449
Tatoeba-test.eng-nya.eng.nya 22.1 0.616
Tatoeba-test.eng-run.eng.run 13.2 0.492
Tatoeba-test.eng-sna.eng.sna 32.1 0.669
Tatoeba-test.eng-swa.eng.swa 1.7 0.180
Tatoeba-test.eng-toi.eng.toi 10.7 0.266
Tatoeba-test.eng-tso.eng.tso 26.9 0.631
Tatoeba-test.eng-umb.eng.umb 5.2 0.295
Tatoeba-test.eng-xho.eng.xho 22.6 0.615
Tatoeba-test.eng-zul.eng.zul 41.1 0.769

opus1m+bt-2021-04-13.zip

  • dataset: opus1m+bt
  • model: transformer-align
  • source language(s): eng
  • target language(s): kin lin lug nya run sna swa swc swh toi tso umb xho zul
  • model: transformer-align
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • valid language labels: >>abb<< >>agh<< >>akw<< >>asa<< >>auh<< >>axk<< >>baf<< >>bag<< >>bas<< >>bbg<< >>bbi<< >>bbm<< >>bcp<< >>bdp<< >>bdu<< >>beb<< >>bem<< >>beq<< >>bez<< >>bhy<< >>bip<< >>biw<< >>biz<< >>bja<< >>bkf<< >>bkh<< >>bkj<< >>bkp<< >>bkt<< >>bkw<< >>bli<< >>blv<< >>bmb<< >>bmg<< >>bml<< >>bmw<< >>bng<< >>bni<< >>bnm<< >>bnx<< >>boh<< >>bok<< >>bou<< >>boy<< >>bpj<< >>bqm<< >>bqu<< >>bqz<< >>brf<< >>bri<< >>brl<< >>bsi<< >>bss<< >>btb<< >>btc<< >>buf<< >>bui<< >>bum<< >>buu<< >>buw<< >>bvb<< >>bvg<< >>bvx<< >>bwc<< >>bwg<< >>bwl<< >>bws<< >>bwt<< >>bww<< >>bwz<< >>bxc<< >>bxg<< >>bxk<< >>bxp<< >>byi<< >>bzm<< >>bzo<< >>cce<< >>ccl<< >>cgg<< >>chw<< >>cjk<< >>coh<< >>cuh<< >>cwa<< >>cwb<< >>cwe<< >>dav<< >>dde<< >>dez<< >>dhm<< >>dhs<< >>dig<< >>dii<< >>diu<< >>diz<< >>dma<< >>dmx<< >>dne<< >>doe<< >>dov<< >>dua<< >>dug<< >>dzn<< >>ebo<< >>ebu<< >>ekm<< >>eko<< >>eto<< >>ewo<< >>fan<< >>fip<< >>flr<< >>fwe<< >>gev<< >>gey<< >>gmx<< >>gog<< >>guz<< >>gwe<< >>gwr<< >>gyi<< >>han<< >>haq<< >>hav<< >>hay<< >>hba<< >>heh<< >>hem<< >>her<< >>hij<< >>hka<< >>hke<< >>hol<< >>hom<< >>hoo<< >>hum<< >>ida<< >>ifm<< >>ikz<< >>ilb<< >>isn<< >>iyx<< >>jgb<< >>jit<< >>jmc<< >>job<< >>kam<< >>kbj<< >>kbs<< >>kck<< >>kcu<< >>kcv<< >>kcw<< >>kcz<< >>kdc<< >>kde<< >>kdg<< >>kdn<< >>keb<< >>ked<< >>khu<< >>khx<< >>khy<< >>kik<< >>kin<< >>kiv<< >>kiz<< >>kki<< >>kkj<< >>kkq<< >>kkw<< >>kmb<< >>kme<< >>kmw<< >>kng<< >>kny<< >>koh<< >>kon<< >>koo<< >>koq<< >>kqn<< >>ksb<< >>ksf<< >>ksv<< >>ktf<< >>ktu<< >>kty<< >>kua<< >>kuj<< >>kwc<< >>kwm<< >>kwn<< >>kws<< >>kwu<< >>kwy<< >>kxx<< >>kya<< >>kzn<< >>kzo<< >>kzy<< >>lag<< >>lai<< >>lam<< >>lch<< >>ldi<< >>lea<< >>leb<< >>leh<< >>lej<< >>lel<< >>lem<< >>leo<< >>lfa<< >>lgm<< >>lgz<< >>lie<< >>lik<< >>lin<< >>liz<< >>lkb<< >>lke<< >>lko<< >>lks<< >>llb<< >>lli<< >>lnb<< >>lol<< >>lon<< >>loo<< >>loq<< >>loz<< >>lri<< >>lrm<< >>lse<< >>lsm<< >>lto<< >>lts<< >>lua<< >>lub<< >>lue<< >>lug<< >>luj<< >>lum<< >>lun<< >>lup<< >>luy<< >>lwa<< >>lwg<< >>lyn<< >>mbm<< >>mbo<< >>mck<< >>mcp<< >>mcx<< >>mdn<< >>mdp<< >>mdq<< >>mdt<< >>mdu<< >>mdw<< >>mer<< >>mfu<< >>mgg<< >>mgh<< >>mgq<< >>mgr<< >>mgs<< >>mgv<< >>mgw<< >>mgy<< >>mgz<< >>mhb<< >>mhm<< >>mho<< >>mhw<< >>mjh<< >>mkk<< >>mkw<< >>mlb<< >>mlk<< >>mmu<< >>mmz<< >>mny<< >>mow<< >>mpa<< >>mvw<< >>mwe<< >>mwn<< >>mws<< >>mwz<< >>mxc<< >>mxg<< >>mxo<< >>myc<< >>mye<< >>myx<< >>mzd<< >>nba<< >>nbd<< >>nbl<< >>nda<< >>ndc<< >>nde<< >>ndg<< >>ndh<< >>ndj<< >>ndk<< >>ndl<< >>ndn<< >>ndo<< >>ndq<< >>ndw<< >>ngc<< >>ngd<< >>ngl<< >>ngo<< >>ngp<< >>ngq<< >>ngy<< >>ngz<< >>nih<< >>nim<< >>nix<< >>njx<< >>njy<< >>nka<< >>nkc<< >>nkn<< >>nkt<< >>nkv<< >>nkw<< >>nle<< >>nlj<< >>nlo<< >>nmd<< >>nmg<< >>nmq<< >>nnb<< >>nne<< >>nnq<< >>noq<< >>now<< >>nql<< >>nra<< >>nse<< >>nso<< >>nsx<< >>nte<< >>ntk<< >>nto<< >>nui<< >>nuj<< >>nvo<< >>nxd<< >>nxi<< >>nxo<< >>nya<< >>nyc<< >>nyd<< >>nye<< >>nyf<< >>nyg<< >>nyj<< >>nyk<< >>nym<< >>nyn<< >>nyo<< >>nyr<< >>nyu<< >>nyy<< >>nzb<< >>nzd<< >>old<< >>olu<< >>oml<< >>ozm<< >>pae<< >>pbr<< >>pem<< >>phm<< >>pic<< >>piw<< >>pkb<< >>pmm<< >>pof<< >>poy<< >>puu<< >>rag<< >>reg<< >>rim<< >>rnd<< >>rng<< >>rnw<< >>rof<< >>rub<< >>ruc<< >>ruf<< >>run<< >>rwk<< >>rwm<< >>sak<< >>sbk<< >>sbm<< >>sbp<< >>sbs<< >>sbw<< >>sby<< >>sdj<< >>seg<< >>seh<< >>sgm<< >>shc<< >>shq<< >>shr<< >>sie<< >>skt<< >>slx<< >>smd<< >>smx<< >>sna<< >>sng<< >>snq<< >>soc<< >>sod<< >>soe<< >>soo<< >>sop<< >>sot<< >>sox<< >>soz<< >>ssc<< >>ssw<< >>sub<< >>suj<< >>suk<< >>suw<< >>swa<< >>swb<< >>swc<< >>swh<< >>swj<< >>swk<< >>sxb<< >>sxe<< >>syi<< >>syx<< >>szg<< >>szv<< >>tap<< >>tbt<< >>tck<< >>teg<< >>tek<< >>tga<< >>thk<< >>tii<< >>tke<< >>tlj<< >>tll<< >>tmv<< >>tny<< >>tog<< >>toh<< >>toi<< >>toi_Latn<< >>tsa<< >>tsc<< >>tsn<< >>tso<< >>tsv<< >>ttf<< >>ttj<< >>ttl<< >>tum<< >>tvs<< >>tvu<< >>twl<< >>two<< >>twx<< >>tyi<< >>tyx<< >>ukh<< >>umb<< >>vau<< >>ven<< >>vid<< >>vif<< >>vin<< >>vmk<< >>vmr<< >>vmw<< >>vum<< >>vun<< >>wbh<< >>wbi<< >>wdd<< >>wlc<< >>wmw<< >>wni<< >>won<< >>wum<< >>wun<< >>xdo<< >>xho<< >>xku<< >>xkv<< >>xma<< >>xmc<< >>xog<< >>xsq<< >>yaf<< >>yao<< >>yas<< >>yat<< >>yav<< >>yel<< >>yey<< >>yko<< >>ymk<< >>yns<< >>yom<< >>zaj<< >>zak<< >>zdj<< >>zga<< >>zin<< >>zmb<< >>zmf<< >>zmn<< >>zmp<< >>zmq<< >>zms<< >>zmw<< >>zmx<< >>zul<<
  • download: opus1m+bt-2021-04-13.zip
  • test set translations: opus1m+bt-2021-04-13.test.txt
  • test set scores: opus1m+bt-2021-04-13.eval.txt

Benchmarks

testset BLEU chr-F #sent #words BP
Tatoeba-test.eng-kin 9.1 0.525 17 80 1.000
Tatoeba-test.eng-lin 2.2 0.304 28 188 1.000
Tatoeba-test.eng-lug 21.4 0.652 2 8 1.000
Tatoeba-test.eng-multi 16.0 0.484 2810 11879 1.000
Tatoeba-test.eng-nya 26.5 0.621 22 93 1.000
Tatoeba-test.eng-run 13.7 0.488 1703 6708 1.000
Tatoeba-test.eng-sna 35.2 0.653 41 141 1.000
Tatoeba-test.eng-swa 5.7 0.244 386 1881 1.000
Tatoeba-test.eng-toi 5.3 0.285 2 9 1.000
Tatoeba-test.eng-tso 24.9 0.616 3 14 1.000
Tatoeba-test.eng-umb 4.9 0.274 32 117 1.000
Tatoeba-test.eng-xho 26.6 0.615 152 651 1.000
Tatoeba-test.eng-zul 41.8 0.760 34 97 1.000
tico19-test.eng-kin 7.4 0.310 2100 55149 0.888
tico19-test.eng-lin 14.0 0.440 2100 61228 1.000
tico19-test.eng-lug 14.3 0.461 2100 52919 0.955
tico19-test.eng-swa 24.7 0.548 2100 58862 0.958
tico19-test.eng-zul 14.0 0.554 2100 44122 1.000