Skip to content

Latest commit

 

History

History

eng-zls

opus-2020-06-28.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): bul bul_Latn mkd slv
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-06-28.zip
  • test set translations: opus-2020-06-28.test.txt
  • test set scores: opus-2020-06-28.eval.txt

Benchmarks

testset BLEU chr-F
Tatoeba-test.eng-bul.eng.bul 46.5 0.648
Tatoeba-test.eng-mkd.eng.mkd 44.1 0.635
Tatoeba-test.eng.multi 41.8 0.612
Tatoeba-test.eng-slv.eng.slv 17.9 0.353

opus-2020-07-06.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): bos_Latn bul bul_Latn hrv mkd slv srp_Cyrl srp_Latn
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-07-06.zip
  • test set translations: opus-2020-07-06.test.txt
  • test set scores: opus-2020-07-06.eval.txt

Benchmarks

testset BLEU chr-F
Tatoeba-test.eng-bul.eng.bul 46.2 0.646
Tatoeba-test.eng-hbs.eng.hbs 0.8 0.051
Tatoeba-test.eng-mkd.eng.mkd 43.5 0.629
Tatoeba-test.eng.multi 42.4 0.612
Tatoeba-test.eng-slv.eng.slv 17.6 0.348

opus-2020-07-27.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): bos_Latn bul bul_Latn hrv mkd slv srp_Cyrl srp_Latn
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-07-27.zip
  • test set translations: opus-2020-07-27.test.txt
  • test set scores: opus-2020-07-27.eval.txt

Benchmarks

testset BLEU chr-F
Tatoeba-test.eng-bul.eng.bul 46.3 0.648
Tatoeba-test.eng-hbs.eng.hbs 40.3 0.613
Tatoeba-test.eng-mkd.eng.mkd 44.4 0.636
Tatoeba-test.eng.multi 41.9 0.615
Tatoeba-test.eng-slv.eng.slv 18.2 0.351

opus2m-2020-08-02.zip

  • dataset: opus2m
  • model: transformer
  • source language(s): eng
  • target language(s): bos_Latn bul bul_Latn hrv mkd slv srp_Cyrl srp_Latn
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus2m-2020-08-02.zip
  • test set translations: opus2m-2020-08-02.test.txt
  • test set scores: opus2m-2020-08-02.eval.txt

Benchmarks

testset BLEU chr-F
Tatoeba-test.eng-bul.eng.bul 47.6 0.657
Tatoeba-test.eng-hbs.eng.hbs 40.7 0.619
Tatoeba-test.eng-mkd.eng.mkd 45.2 0.642
Tatoeba-test.eng.multi 42.7 0.622
Tatoeba-test.eng-slv.eng.slv 17.9 0.351

opus1m+bt-2021-04-10.zip

  • dataset: opus1m+bt
  • model: transformer-align
  • source language(s): eng
  • target language(s): bos bul cnr hbs hrv mkd slv srp
  • model: transformer-align
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • valid language labels: >>bos<< >>bos_Cyrl<< >>bos_Latn<< >>bul<< >>chu<< >>cnr<< >>cnr_Latn<< >>hbs<< >>hbs_Cyrl<< >>hrv<< >>kjv<< >>mkd<< >>slv<< >>srp<< >>srp_Cyrl<< >>srp_Latn<< >>svm<<
  • download: opus1m+bt-2021-04-10.zip
  • test set translations: opus1m+bt-2021-04-10.test.txt
  • test set scores: opus1m+bt-2021-04-10.eval.txt

Benchmarks

testset BLEU chr-F #sent #words BP
Tatoeba-test.eng-bos_Latn 49.5 0.679 300 1645 0.970
Tatoeba-test.eng-bul 44.1 0.630 10000 69473 0.959
Tatoeba-test.eng-hbs 38.1 0.597 10000 63826 0.972
Tatoeba-test.eng-hrv 46.5 0.662 1468 9332 0.987
Tatoeba-test.eng-mkd 42.6 0.621 10000 61951 0.975
Tatoeba-test.eng-multi 33.6 0.498 10000 64724 0.900
Tatoeba-test.eng-slv 18.0 0.350 2007 11909 0.999
Tatoeba-test.eng-srp_Cyrl 39.7 0.599 1577 9131 1.000
Tatoeba-test.eng-srp_Latn 35.5 0.579 6655 43718 0.962

opus4m+btTCv20210807-2021-09-30.zip

  • dataset: opus4m+btTCv20210807
  • model: transformer
  • source language(s): eng
  • target language(s): bos bul chu cnr hbs hrv mkd slv srp
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • valid language labels: >>bos<< >>bos_Cyrl<< >>bos_Latn<< >>bul<< >>chu<< >>cnr<< >>cnr_Latn<< >>hbs<< >>hbs_Cyrl<< >>hrv<< >>kjv<< >>mkd<< >>slv<< >>srp<< >>srp_Cyrl<< >>srp_Latn<< >>svm<<
  • download: opus4m+btTCv20210807-2021-09-30.zip
  • test set translations: opus4m+btTCv20210807-2021-09-30.test.txt
  • test set scores: opus4m+btTCv20210807-2021-09-30.eval.txt

Benchmarks

testset BLEU chr-F #sent #words BP
Tatoeba-test-v2021-08-07.eng-multi 32.2 0.493 10000 63924 0.911
Tatoeba-test-v2021-08-07.multi-multi 32.2 0.493 10000 63924 0.911