You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using the leaderboard to decide which model to choose for which language pair.
I find it a very good basis as one obtains an average over a whole set of benchmarks and can - to some extend -judge how stable a model performs.
Going through the example outputs in detail, I nevertheless realized, that the multi30k_task2_test_2016 dataset mostly contains pairs of - almost - unrelated source and reference, for example:
SOURCE: The man with pierced ears is wearing glasses and an orange hat.
REFERENCE: Der Mann trägt eine orange Wollmütze.
Here the pierced ears and the glasses are not present in the reference.
Or even worse:
SOURCE: Two men sitting on the roof of a house while another one stands on a ladder.
REFERENCE: Dachdecker bei der Arbeit.
Here the reference would be transated as "roofers at work".
This is similar for the other examples in the dataset.
I do not know if this dataset has any other valid use case, but I don't find it useful to judge machine translation quality.
Could you remove it from the leaderboard?
The text was updated successfully, but these errors were encountered:
schniewmatz
changed the title
Excluding multi30k_task2_test_2016 dataset from leaderboard fro eng-deu/eng-deu language pair
Excluding multi30k_task2_test_2016 dataset from leaderboard fro eng-deu/deu-eng language pair
Aug 24, 2023
I am using the leaderboard to decide which model to choose for which language pair.
I find it a very good basis as one obtains an average over a whole set of benchmarks and can - to some extend -judge how stable a model performs.
Going through the example outputs in detail, I nevertheless realized, that the
multi30k_task2_test_2016
dataset mostly contains pairs of - almost - unrelated source and reference, for example:Here the pierced ears and the glasses are not present in the reference.
Or even worse:
Here the reference would be transated as "roofers at work".
This is similar for the other examples in the dataset.
I do not know if this dataset has any other valid use case, but I don't find it useful to judge machine translation quality.
Could you remove it from the leaderboard?
The text was updated successfully, but these errors were encountered: