Reproducing Table 5: Sentence Infilling - CIDEr / BLEU-4 metrics #59

yair-schiff · 2023-03-29T02:13:41Z

Hi @XiangLi1999,

Thank you for open sourcing this work!

I am trying to reproduce the results from Table 5 - the infilling experiment. Specifically, I was wondering where the CIDEr and BLEU-4 scores come from and how they are computed? On the aNLG leaderboard, I don't see those metrics reported

Any guidance you can provide here will be much appreciated.

Thanks!

XiangLi1999 · 2023-03-30T02:59:55Z

Hi Yair,

Thanks for reaching out!

We compute these two scores because it’s also reported in https://arxiv.org/pdf/2202.11705.pdf (which is our primary baseline of comparison).

We compute it via evaluation scripts released along with the e2e benchmark. https://github.com/tuetschek/e2e-metrics

Best,
Lisa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducing Table 5: Sentence Infilling - CIDEr / BLEU-4 metrics #59

Reproducing Table 5: Sentence Infilling - CIDEr / BLEU-4 metrics #59

yair-schiff commented Mar 29, 2023

XiangLi1999 commented Mar 30, 2023

Reproducing Table 5: Sentence Infilling - CIDEr / BLEU-4 metrics #59

Reproducing Table 5: Sentence Infilling - CIDEr / BLEU-4 metrics #59

Comments

yair-schiff commented Mar 29, 2023

XiangLi1999 commented Mar 30, 2023