Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducibility information benchmarks #4

Open
BramVanroy opened this issue Feb 24, 2024 · 2 comments
Open

Reproducibility information benchmarks #4

BramVanroy opened this issue Feb 24, 2024 · 2 comments

Comments

@BramVanroy
Copy link

BramVanroy commented Feb 24, 2024

Hello

I've been looking to do a large-scale comparison of (m)any MT model that has to do with Dutch (xx->NL, NL->XX) with all the test sets that I can find. The OPUS leaderboard is a great starting point for me. In a first step, I would like to reproduce the scores in the OPUS leaderboard. For reproducibiliy sake it would therefore be useful if there is an overview of some meta information on the benchmarks:

  • metric parameters used (e.g. n-gram size for BLEU, model for COMET, etc.)
  • generation parameters for the models (num beams, sampling (topk/topp/temperature)?
  • frameworks versions (CUDA/ROCm versions, torch version, transformers version, etc.) - basically a pip freeze

If you can share any info about this, I'd be grateful!

Bram

@jorgtied
Copy link
Member

jorgtied commented Mar 5, 2024

We should make this more transparent. Part of the answers are hidden in the scripts we use for evaluation. Look at those makefile targets: https://github.com/Helsinki-NLP/OPUS-MT-leaderboard-recipes/blob/master/eval.mk

For huggingface models we mainly use this script: https://github.com/Helsinki-NLP/External-MT-leaderboard/blob/master/models/huggingface/translate.py called by https://github.com/Helsinki-NLP/External-MT-leaderboard/blob/master/models/huggingface/Makefile

Alternatively also scripts from here: https://github.com/Helsinki-NLP/External-MT-leaderboard/tree/master/models/huggingface-accelerate

For NLLB and m2m100 from facebook, look at https://github.com/Helsinki-NLP/External-MT-leaderboard/blob/master/models/huggingface/facebook/Makefile

For COMET: This is such a moving target with all kinds of models coming out and also slight changes with the implementation.

For newer OPUS-MT models there are also logfiles like this one: https://opus.nlpl.eu/dashboard/logfile.php?model1=unknown&model2=unknown&test=newstest2018&scoreslang=all&model=Tatoeba-MT-models%2Ffin-eng%2FopusTCv20210807%2Bnopar%2Bft95-sepvoc_transformer-tiny11-align_2023-07-03&src=fin&trg=eng&pkg=opusmt

This does not tell you everything you want to know but at least some partial information and pointers.

@jorgtied
Copy link
Member

jorgtied commented Mar 8, 2024

I forgot to mention that signatures for sacrebleu are available from the repo, for example, https://github.com/Helsinki-NLP/OPUS-MT-leaderboard/blob/master/models/Tatoeba-MT-models/afr-deu/opus-2021-02-18/ntrex128.afr-deu.eval

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants