Skip to content
/ MuCPAD Public

MuCPAD: A Multi-Domain Chinese Predicate-Argument Dataset

License

Notifications You must be signed in to change notification settings

SUDA-LA/MuCPAD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

MuCPAD

MuCPAD is proposed by the paper "A Multi-Domain Chinese Predicate-Argument Dataset" presented at the NAACL conference, which is a multi-source predicate-argument dataset in Chinese.

MuCPAD consists of 30,897 sentences and 92,051 predicates which covers 6 different domains. The dataset exhibits three important features: 1) Based on a frame-free annotation methodology, we avoid writing complex frames for new predicates. 2) We explicitly annotate omitted core arguments to recover more complete semantic structure, considering that content omission is ubiquitous in multi-domain Chinese texts. 3) We compile 53-page annotation guidelines and adopt strict double annotation for improving data quality.

The file "examples" contains 120 examples from the MuCPAD, covering 6 different domains, and the PDF file is our annotation guideline for MuCPAD.

If you need MuCPAD, please contact our first author. It is noted that we can directly provide you the source-domain data which annotated on Chinese SemBank and target-domain data, but the source domain data which annotated on CoNLL-2009 texts, we will check whether you have the licence of CoNLL-2009 before we provide the annotated data.

About

MuCPAD: A Multi-Domain Chinese Predicate-Argument Dataset

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published