Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Entity in verbalized answer different than seed entity of the question #2

Open
montellasebastien opened this issue Nov 28, 2021 · 0 comments

Comments

@montellasebastien
Copy link

Hi,

I went through your dataset, and there are several discrepancies between the question entity and the answer entity.
For example, the 4th and 24th example in training:

{"answer":"male","answer_sentence":"sterjo is a male.","question":"Which sex does Joseph Louis Watkins, Jr. belong to ?","question_entity_label":"Joseph Louis Watkins, Jr.","question_id":41188,"question_relation":"P21"}
{"answer":"male","answer_sentence":"peter muller is a male.","question":"What is the sex of William Bailey ?","question_entity_label":"William Bailey","question_id":12117,"question_relation":"P21"}

First statistics with 10 recurrent repeated names show that it stands for 5% of the training and testing data.
Can you correct those errors ? It directly impacts the BLEU and METEOR scores of the NLG model you trained since those errors also appear in the gold of the test file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant