Annotation-Inspired Implicit Discourse Relation Classification with Auxiliary Discourse Connective Generation
Code for the ACL 2023 paper "Annotation-Inspired Implicit Discourse Relation Classification with Auxiliary Discourse Connective Generation"
If any questions, please contact the email: [email protected]
Our working environment is Python 3.8. Before you run the code, please make sure you have installed all the required packages. You can achieve it by simply execute the shell as sh requirements.sh
Then you need to download roberta-base from here, and put it under the folder "data/pretrained_models/roberta-base".
For PDTB 2.0
- copy the raw corpus (folder with .pdtb files) under the folder "data/dataset/pdtb2/raw",
- do preprocessing via
python3 preprocessing
. (you may need to active some codes in main function of preprocesing.py). The raw corpus looks like: 00, 01, 02, ..., 24.
For PDTB 3.0
- copy the raw corpus under the folder "data/dataset/pdtb3/raw/gold" and "data/dataset/pdtb3/raw/data", where the former is label files and the latter is text files.
- do preprocessing via
python3 preprocessing
. (you may need to active some codes in main function of preprocesing.py). The corpus in both raw/gold and raw/data looks like: 00, 01, 02, ..., 24.
For PCC
- Download raw corpus from here and unzip the file.
- Go into the unzip directory, do
python3 connectives_xml2tsv.py
. It will generate a file called "pcc_discourse_relations_all.tsv". - Put the file "pcc_discourse_relations_all.tsv" under the folder "data/dataset/pcc/raw".
- Do preprocessing via
python3 preprocessing
. (you may need to active some codes in main function of preprocesing.py).
For PDTB 2.0, you can directly run each script. For instance, you can do sh run_joint.sh
to reproduce the results of our method.
For PDTB 3.0, you need to change (set) the dataset parameter in script to "pdtb3". Note that, in order to reproduce our results, you also need to modify the sample_k
to 200. For more details, please refer to the paper.
For PCC, you need to change the dataset into "pcc" and modify the sample_k
to 10 and conn_threshold
to 5.
You can cite our paper through:
@inproceedings{liu-strube-2023-annotation,
title = "Annotation-Inspired Implicit Discourse Relation Classification with Auxiliary Discourse Connective Generation",
author = "Liu, Wei and
Strube, Michael",
booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = jul,
year = "2023",
address = "Toronto, Canada",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.acl-long.874",
pages = "15696--15712",
}