Evaluations of NLLB-1.3B Distilled on FLoRes200 are incorrect, and may be duplicates of FLoRes101. #1

shauncassini · 2023-11-02T20:34:41Z

Hello. First off, thanks for conducting such extensive evaluations on all of these models. I am finding it very useful for checking my own results. However, when looking into your evaluation files, I've noticed the following:

nllb-200-distilled-1.3B/flores101-devtest.eng-deu.eval:

chrF2++|nrefs:1|case:mixed|eff:yes|nc:6|nw:2|space:no|version:2.3.1 = 0.59321
BLEU|nrefs:1|case:mixed|eff:no|tok:flores200|smooth:exp|version:2.3.1 = 41.4 68.6/51.0/40.2/32.2 (BP = 0.897 ratio = 0.902 hyp_len = 35747 ref_len = 39633)
BLEU|nrefs:1|case:mixed|eff:no|tok:13a|smooth:exp|version:2.3.1 = 35.2 67.6/43.8/31.1/22.7 (BP = 0.926 ratio = 0.929 hyp_len = 23307 ref_len = 25094)
COMET+default = 0.5955
chrF2|nrefs:1|case:mixed|eff:yes|nc:6|nw:0|space:no|version:2.3.1 = 0.61757

is exactly the same as

nllb-200-distilled-1.3B/flores200-devtest.eng-deu.eval:

chrF2++|nrefs:1|case:mixed|eff:yes|nc:6|nw:2|space:no|version:2.3.1 = 0.59321
BLEU|nrefs:1|case:mixed|eff:no|tok:flores200|smooth:exp|version:2.3.1 = 41.4 68.6/51.0/40.2/32.2 (BP = 0.897 ratio = 0.902 hyp_len = 35747 ref_len = 39633)
BLEU|nrefs:1|case:mixed|eff:no|tok:13a|smooth:exp|version:2.3.1 = 35.2 67.6/43.8/31.1/22.7 (BP = 0.926 ratio = 0.929 hyp_len = 23307 ref_len = 25094)
COMET+default = 0.5955
chrF2|nrefs:1|case:mixed|eff:yes|nc:6|nw:0|space:no|version:2.3.1 = 0.61757

furthermore, when running sacrebleu on the model output files for FLoRes200, I get different results. It is likely that these are duplicates. Maybe the flores101.output was evaluated twice?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluations of NLLB-1.3B Distilled on FLoRes200 are incorrect, and may be duplicates of FLoRes101. #1

Evaluations of NLLB-1.3B Distilled on FLoRes200 are incorrect, and may be duplicates of FLoRes101. #1

shauncassini commented Nov 2, 2023 •

edited

Loading

Evaluations of NLLB-1.3B Distilled on FLoRes200 are incorrect, and may be duplicates of FLoRes101. #1

Evaluations of NLLB-1.3B Distilled on FLoRes200 are incorrect, and may be duplicates of FLoRes101. #1

Comments

shauncassini commented Nov 2, 2023 • edited Loading

shauncassini commented Nov 2, 2023 •

edited

Loading