MuCPAD is proposed by the paper "A Multi-Domain Chinese Predicate-Argument Dataset" presented at the NAACL conference, which is a multi-source predicate-argument dataset in Chinese.
MuCPAD consists of 30,897 sentences and 92,051 predicates which covers 6 different domains. The dataset exhibits three important features: 1) Based on a frame-free annotation methodology, we avoid writing complex frames for new predicates. 2) We explicitly annotate omitted core arguments to recover more complete semantic structure, considering that content omission is ubiquitous in multi-domain Chinese texts. 3) We compile 53-page annotation guidelines and adopt strict double annotation for improving data quality.
The file "examples" contains 120 examples from the MuCPAD, covering 6 different domains, and the PDF file is our annotation guideline for MuCPAD.
If you need MuCPAD, please contact our first author. It is noted that we can directly provide you the source-domain data which annotated on Chinese SemBank and target-domain data, but the source domain data which annotated on CoNLL-2009 texts, we will check whether you have the licence of CoNLL-2009 before we provide the annotated data.