In this repo, we implement our research on the applicability of large-scale language models (LLMs) on ATE tasks in three forms of prompting: (1) sequence-labeling response; (2) text-generative response; and (3) filling the gap of both types. We conduct experiments on ACTER corpora of three languages and four domains. Check out our paper at TSD Conference: here
Please install all the necessary libraries noted in requirements.txt using this command:
pip install -r requirements.txt
The experiments were conducted on ACTER datasets:
ACTER dataset | |
---|---|
Languages | English, French, and Dutch |
Domains | Corruption, Wind energy, Equitation, Heart failure |
Download the ACTER dataset at here and save into ACTER folder.
Please refer to the work from ate-2022 for the implementation of the sequence-labeling baseline.
Run the following command to generate the templates:
cd template_ate/
python gen_template.py
Run the following command to train all the models:
cd template_ate/
chmod +x run.sh
./run.sh
Add your API key to prompts/prompt_classifier.py
and run the following command.
cd prompts/
python prompt_classifier.py [--data_path] [--lang] [--ver] [--formats] [--output_path]
where:
--data_path
is the path to the data directory;--lang
is the language of the corpus;--ver
is the version of corpus (ANN or NES);--formats
is the prompting designed format;--output_path
is the path to the output csv file.
Run the following command to run all the models:
cd prompts/
chmod +x run_prompt.sh
./run_prompt.sh
For evaluation, run the following command:
cd prompts/
python evaluate.py [--data_path] [--lang] [--ver]
where:
--data_path
is the path to the data directory;--lang
is the language of the corpus;--ver
is the version of corpus (ANN or NES).
Run the following command to run all the evaluation:
cd prompts/
chmod +x run_eval.sh
./run_eval.sh
Login huggingface-clo
by Huggingface account tokens via this command
huggingface-cli login
and run the following command to run the model:
cd prompts/
python llama2.py [--lang] [--ver] [--formats] [--output_path]
where:
--lang
is the language of the corpus;--ver
is the version of corpus (ANN or NES);--formats
is the prompting designed format (1,2, or 3);--output_path
is the path to the output csv file.
Run the following command to run all the models:
cd prompts/
chmod +x run_llama.sh
./run_llama.sh
Settings | English Precision | English Recall | English F1-score | French Precision | French Recall | French F1-score | Dutch Precision | Dutch Recall | Dutch F1-score |
---|---|---|---|---|---|---|---|---|---|
BIO classifier | |||||||||
TRAIN: Wind, Equi - VAL: Corp | 58.6 | 40.7 | 48.0 | 68.8 | 34.2 | 45.7 | 73.5 | 54.1 | 62.3 |
TRAIN: Corp, Equi - VAL: Wind | 58.5 | 49.5 | 53.6 | 70.7 | 41.0 | 51.9 | 73.3 | 59.7 | 65.8 |
TRAIN: Corp, Wind - VAL: Equi | 58.1 | 48.1 | 52.6 | 70.5 | 44.4 | 54.5 | 70.3 | 62.2 | 66.0 |
TemplateATE | |||||||||
TRAIN: Wind, Equi - VAL: Corp | 30.5 | 24.8 | 27.4 | 40.4 | 26.1 | 31.7 | 32.2 | 45.6 | 37.8 |
TRAIN: Corp, Equi - VAL: Wind | 24.4 | 21.3 | 22.8 | 31.7 | 26.6 | 28.9 | 29.6 | 37.4 | 33.0 |
TRAIN: Corp, Wind - VAL: Equi | 32.5 | 29.2 | 30.7 | 26.9 | 37.0 | 31.2 | 32.7 | 43.9 | 37.4 |
GPT-ATE | |||||||||
In-domain Few-shot format #1 | 10.8 | 14.4 | 12.3 | 11.3 | 11.6 | 11.4 | 18.3 | 14.1 | 15.9 |
In-domain Few-shot format #2 | 26.6 | 67.6 | 38.2 | 28.5 | 67.0 | 40.0 | 36.8 | 79.6 | 50.3 |
In-domain Few-shot format #3 | 39.6 | 48.3 | 43.5 | 45.5 | 50.8 | 48.0 | 61.1 | 56.6 | 58.8 |
Settings | English Precision | English Recall | English F1-score | French Precision | French Recall | French F1-score | Dutch Precision | Dutch Recall | Dutch F1-score |
---|---|---|---|---|---|---|---|---|---|
BIO classifier | |||||||||
TRAIN: Wind, Equi - VAL: Corp | 63.0 | 45.0 | 52.5 | 69.4 | 40.4 | 51.1 | 72.9 | 58.8 | 65.1 |
TRAIN: Corp, Equi - VAL: Wind | 63.9 | 50.3 | 56.3 | 72.0 | 47.2 | 57.0 | 75.9 | 58.6 | 66.1 |
TRAIN: Corp, Wind - VAL: Equi | 62.1 | 52.1 | 56.7 | 72.4 | 48.5 | 58.1 | 73.3 | 61.5 | 66.9 |
TemplateATE | |||||||||
TRAIN: Wind, Equi - VAL: Corp | 30.4 | 31.5 | 31.0 | 36.4 | 39.3 | 37.8 | 30.4 | 45.2 | 36.4 |
TRAIN: Corp, Equi - VAL: Wind | 27.1 | 29.6 | 28.3 | 31.1 | 24.2 | 27.2 | 41.1 | 37.8 | 39.4 |
TRAIN: Corp, Wind - VAL: Equi | 34.7 | 32.5 | 33.6 | 40.7 | 33.0 | 36.5 | 32.2 | 47.3 | 38.3 |
GPT-ATE | |||||||||
In-domain Few-shot format #1 | 10.3 | 13.1 | 11.5 | 10.8 | 12.0 | 11.4 | 14.8 | 13.2 | 14.0 |
In-domain Few-shot format #2 | 29.2 | 69.2 | 41.1 | 27.9 | 66.8 | 39.4 | 39.8 | 78.5 | 52.8 |
In-domain Few-shot format #3 | 39.8 | 53.1 | 45.5 | 44.7 | 54.4 | 49.1 | 63.6 | 60.6 | 62.1 |
- License: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) (https://creativecommons.org/licenses/by-nc-sa/4.0/)
- Reference: Please cite the following paper if you use this method for your research
@inproceedings{tran2024prompting,
title={Is Prompting What Term Extraction Needs?},
author={Tran, Hanh Thi Hong and González-Gallardo, Carlos-Emiliano and Delauney, Julien and Moreno, Jose and Doucet, Antoine and Pollak, Senja},
booktitle={27th International Conference on Text, Speech and Dialogue (TSD 2024)},
year={2024},
note={Accepted}
}