Reproducibility information benchmarks #4

BramVanroy · 2024-02-24T11:35:30Z

Hello

I've been looking to do a large-scale comparison of (m)any MT model that has to do with Dutch (xx->NL, NL->XX) with all the test sets that I can find. The OPUS leaderboard is a great starting point for me. In a first step, I would like to reproduce the scores in the OPUS leaderboard. For reproducibiliy sake it would therefore be useful if there is an overview of some meta information on the benchmarks:

metric parameters used (e.g. n-gram size for BLEU, model for COMET, etc.)
generation parameters for the models (num beams, sampling (topk/topp/temperature)?
frameworks versions (CUDA/ROCm versions, torch version, transformers version, etc.) - basically a pip freeze

If you can share any info about this, I'd be grateful!

Bram

The text was updated successfully, but these errors were encountered:

jorgtied · 2024-03-05T21:44:50Z

We should make this more transparent. Part of the answers are hidden in the scripts we use for evaluation. Look at those makefile targets: https://github.com/Helsinki-NLP/OPUS-MT-leaderboard-recipes/blob/master/eval.mk

For huggingface models we mainly use this script: https://github.com/Helsinki-NLP/External-MT-leaderboard/blob/master/models/huggingface/translate.py called by https://github.com/Helsinki-NLP/External-MT-leaderboard/blob/master/models/huggingface/Makefile

Alternatively also scripts from here: https://github.com/Helsinki-NLP/External-MT-leaderboard/tree/master/models/huggingface-accelerate

For NLLB and m2m100 from facebook, look at https://github.com/Helsinki-NLP/External-MT-leaderboard/blob/master/models/huggingface/facebook/Makefile

For COMET: This is such a moving target with all kinds of models coming out and also slight changes with the implementation.

For newer OPUS-MT models there are also logfiles like this one: https://opus.nlpl.eu/dashboard/logfile.php?model1=unknown&model2=unknown&test=newstest2018&scoreslang=all&model=Tatoeba-MT-models%2Ffin-eng%2FopusTCv20210807%2Bnopar%2Bft95-sepvoc_transformer-tiny11-align_2023-07-03&src=fin&trg=eng&pkg=opusmt

This does not tell you everything you want to know but at least some partial information and pointers.