Skip to content

Yale-LILY/FactualEval-protocols

 
 

Repository files navigation

Investigating Crowdsourcing Protocols for Evaluating the Factual Consistency of Summaries

This repository contains code, data, and templates for crowdsourcing protocols, described by the paper: Investigating Crowdsourcing Protocols for Evaluating the Factual Consistency of Summaries.

Scripts

calculate.ipynb: to calculate the score distribution, krippendorff reliability, and SHR reliability.

Data

We released our evaluation templates and annotations to promote future work on factual consistency evaluation. The annotations can be found in for CNN&DM data, for XSUM data and templates

Model

The code for BART, ProphetNet, PEGASUS, and BERTSUM is based on Fairseq(-py). Our pretrained models can be found in for CNN&DM data and for XSUM data

Citation

If you use our code in your research, please cite our work:

@inproceedings{tang2022investigating,
   title={Investigating Crowdsourcing Protocols for Evaluating the Factual Consistency of Summaries},
   author={Tang, Xiangru and Fabbri, Alexander R and Mao, Ziming and Adams, Griffin and Wang, Borui and Li, Haoran and Mehdad, Yashar and Radev, Dragomir},
   booktitle={North American Association for Computational Linguistics (NAACL)},
   year={2022}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 52.6%
  • Jupyter Notebook 39.5%
  • R 7.9%