- dataset: opus
- model: transformer
- source language(s): bul bul_Latn mkd slv
- target language(s): eng
- model: transformer
- pre-processing: normalization + SentencePiece (spm32k,spm32k)
- download: opus-2020-06-28.zip
- test set translations: opus-2020-06-28.test.txt
- test set scores: opus-2020-06-28.eval.txt
testset | BLEU | chr-F |
---|---|---|
Tatoeba-test.bul-eng.bul.eng | 54.7 | 0.693 |
Tatoeba-test.mkd-eng.mkd.eng | 54.0 | 0.676 |
Tatoeba-test.multi.eng | 52.8 | 0.671 |
Tatoeba-test.slv-eng.slv.eng | 25.3 | 0.410 |
- dataset: opus
- model: transformer
- source language(s): bul bul_Latn mkd slv
- target language(s): eng
- model: transformer
- pre-processing: normalization + SentencePiece (spm32k,spm32k)
- download: opus-2020-07-04.zip
- test set translations: opus-2020-07-04.test.txt
- test set scores: opus-2020-07-04.eval.txt
testset | BLEU | chr-F |
---|---|---|
Tatoeba-test.bul-eng.bul.eng | 53.9 | 0.686 |
Tatoeba-test.mkd-eng.mkd.eng | 52.5 | 0.662 |
Tatoeba-test.multi.eng | 50.5 | 0.648 |
Tatoeba-test.slv-eng.slv.eng | 21.8 | 0.374 |
- dataset: opus
- model: transformer
- source language(s): bos_Latn bul bul_Latn hrv mkd slv srp_Cyrl srp_Latn
- target language(s): eng
- model: transformer
- pre-processing: normalization + SentencePiece (spm32k,spm32k)
- download: opus-2020-07-27.zip
- test set translations: opus-2020-07-27.test.txt
- test set scores: opus-2020-07-27.eval.txt
testset | BLEU | chr-F |
---|---|---|
Tatoeba-test.bul-eng.bul.eng | 53.9 | 0.686 |
Tatoeba-test.hbs-eng.hbs.eng | 54.8 | 0.693 |
Tatoeba-test.mkd-eng.mkd.eng | 53.4 | 0.672 |
Tatoeba-test.multi.eng | 52.5 | 0.668 |
Tatoeba-test.slv-eng.slv.eng | 24.9 | 0.405 |
- dataset: opus2m
- model: transformer
- source language(s): bos_Latn bul bul_Latn hrv mkd slv srp_Cyrl srp_Latn
- target language(s): eng
- model: transformer
- pre-processing: normalization + SentencePiece (spm32k,spm32k)
- download: opus2m-2020-08-01.zip
- test set translations: opus2m-2020-08-01.test.txt
- test set scores: opus2m-2020-08-01.eval.txt
testset | BLEU | chr-F |
---|---|---|
Tatoeba-test.bul-eng.bul.eng | 54.9 | 0.693 |
Tatoeba-test.hbs-eng.hbs.eng | 55.7 | 0.700 |
Tatoeba-test.mkd-eng.mkd.eng | 54.6 | 0.681 |
Tatoeba-test.multi.eng | 53.6 | 0.676 |
Tatoeba-test.slv-eng.slv.eng | 25.6 | 0.407 |
- dataset: opus4m
- model: transformer
- source language(s): bos_Latn bul bul_Latn hrv mkd slv srp_Cyrl srp_Latn
- target language(s): eng
- model: transformer
- pre-processing: normalization + SentencePiece (spm32k,spm32k)
- download: opus4m-2020-08-12.zip
- test set translations: opus4m-2020-08-12.test.txt
- test set scores: opus4m-2020-08-12.eval.txt
testset | BLEU | chr-F |
---|---|---|
Tatoeba-test.bul-eng.bul.eng | 55.4 | 0.697 |
Tatoeba-test.hbs-eng.hbs.eng | 55.6 | 0.701 |
Tatoeba-test.mkd-eng.mkd.eng | 54.7 | 0.682 |
Tatoeba-test.multi.eng | 53.8 | 0.677 |
Tatoeba-test.slv-eng.slv.eng | 25.0 | 0.408 |
- dataset: opus1m+bt
- model: transformer-align
- source language(s): bos bul cnr hbs hrv mkd slv srp
- target language(s): eng
- model: transformer-align
- pre-processing: normalization + SentencePiece (spm32k,spm32k)
- download: opus1m+bt-2021-05-02.zip
- test set translations: opus1m+bt-2021-05-02.test.txt
- test set scores: opus1m+bt-2021-05-02.eval.txt
testset | BLEU | chr-F | #sent | #words | BP |
---|---|---|---|---|---|
Tatoeba-test.bos_Latn-eng | 59.9 | 0.757 | 300 | 1818 | 0.954 |
Tatoeba-test.bul-eng | 54.2 | 0.689 | 10000 | 71861 | 0.983 |
Tatoeba-test.hbs-eng | 54.5 | 0.692 | 10000 | 68833 | 0.974 |
Tatoeba-test.hrv-eng | 54.2 | 0.702 | 1468 | 10556 | 0.965 |
Tatoeba-test.mkd-eng | 53.8 | 0.675 | 10000 | 65601 | 0.987 |
Tatoeba-test.multi-eng | 53.0 | 0.675 | 10000 | 68639 | 0.980 |
Tatoeba-test.slv-eng | 25.1 | 0.402 | 2007 | 13702 | 0.994 |
Tatoeba-test.srp_Cyrl-eng | 51.5 | 0.669 | 1577 | 10162 | 0.961 |
Tatoeba-test.srp_Latn-eng | 55.0 | 0.692 | 6655 | 46297 | 0.980 |
- dataset: opus4m+btTCv20210807
- model: transformer
- source language(s): bos bul chu cnr hbs hrv mkd slv srp
- target language(s): eng
- model: transformer
- pre-processing: normalization + SentencePiece (spm32k,spm32k)
- a sentence initial language token is required in the form of
>>id<<
(id = valid target language ID) - valid language labels:
- download: opus4m+btTCv20210807-2021-10-01.zip
- test set translations: opus4m+btTCv20210807-2021-10-01.test.txt
- test set scores: opus4m+btTCv20210807-2021-10-01.eval.txt
testset | BLEU | chr-F | #sent | #words | BP |
---|---|---|---|---|---|
Tatoeba-test-v2021-08-07.multi-eng | 52.8 | 0.669 | 10000 | 69450 | 0.981 |
Tatoeba-test-v2021-08-07.multi-multi | 52.8 | 0.669 | 10000 | 69450 | 0.981 |
- dataset: opusTCv20210807+bt
- model: transformer-big
- source language(s): bos_Cyrl bos_Latn bul chu_Latn cnr cnr_Latn hbs hbs_Cyrl hrv mkd slv srp_Cyrl srp_Latn
- target language(s): eng
- raw source language(s): bos bul chu cnr hbs hrv mkd slv srp
- raw target language(s): eng
- model: transformer-big
- pre-processing: normalization + SentencePiece (spm32k,spm32k)
- download: opusTCv20210807+bt_transformer-big_2022-03-17.zip
- test set translations: opusTCv20210807+bt_transformer-big_2022-03-17.test.txt
- test set scores: opusTCv20210807+bt_transformer-big_2022-03-17.eval.txt
testset | BLEU | chr-F | #sent | #words | BP |
---|---|---|---|---|---|
Tatoeba-test-v2021-08-07.bos_Latn-eng | 66.5 | 0.79335 | 301 | 1824 | 0.976 |
Tatoeba-test-v2021-08-07.bul-eng | 59.2 | 0.72623 | 10000 | 71861 | 0.979 |
Tatoeba-test-v2021-08-07.hbs-eng | 57.2 | 0.71720 | 10017 | 68927 | 1.000 |
Tatoeba-test-v2021-08-07.hrv-eng | 59.2 | 0.74027 | 1480 | 10620 | 0.994 |
Tatoeba-test-v2021-08-07.mkd-eng | 57.2 | 0.69973 | 10010 | 65664 | 0.984 |
Tatoeba-test-v2021-08-07.multi-eng | 55.6 | 0.69058 | 10000 | 68722 | 1.000 |
Tatoeba-test-v2021-08-07.slv-eng | 23.5 | 0.39503 | 2495 | 16940 | 1.000 |
Tatoeba-test-v2021-08-07.srp_Cyrl-eng | 46.9 | 0.67556 | 1580 | 10180 | 1.000 |
Tatoeba-test-v2021-08-07.srp_Latn-eng | 58.4 | 0.71807 | 6656 | 46303 | 0.994 |