Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low inter-annotator agreement? #2

Open
kocmitom opened this issue Oct 21, 2021 · 0 comments
Open

Low inter-annotator agreement? #2

kocmitom opened this issue Oct 21, 2021 · 0 comments

Comments

@kocmitom
Copy link

Hello,

I have been analyzing your results, maybe I missed something important, but when you take into account only sentences that do NOT change [1], you get the following graph:

image

In other words, not changing anything helps HT to score better. It can be also visualized in the following way. If you take only scores for sentences that didn't change and compare how the ranking changes between BTS and ATS, you get this distribution:

image

This shows that the ranking of MT vs HT changes for almost half of the sentences (only 389 sentences for MT_Y and 440 sentences for MT_Z stay consistent) in one or the other direction no matter that the sentence didn't change. This illustrates a low inter-annotator agreement, therefore the claims in the paper are not possible to conclude. Or what am I missing something?

[1] changing two lines at https://github.com/ahrii-kim/suboptimal_test_set/blob/master/evaluation/score.py#L117 to "F"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant