This is a baseline system for the feedback comment generation@GenChal2022 This system is an encoder-decoder with a copy mechanism based on a pointer generator netowork. The implementation is based on fairseq.
- Python 3.7+
- Install the required libraries:
$ pip install -r requirements.txt
Download FCG dataset from https://fcg.sharedtask.org/ and unzip to data/train_dev
.
Convert to fairseq compatible format using the following shell script.
$ src/preprocess_train_dev.sh -i data/train_dev -o data/fcg-bin
Train a model based on a pointer generator netowork.
$ fairseq-train \
data/fcg-bin --task fcg --arch fcg \
--optimizer adam --lr 0.001 \
--max-tokens 1024 --max-epoch 50 \
--eval-bleu --eval-bleu --best-checkpoint-metric bleu \
--maximize-best-checkpoint-metric \
--user-dir src \
--save-dir results/fcg_baseline
First, output in the form of one tokenized comment per line by fairseq-generate
.
$ fairseq-generate \
data/fcg-bin --task fcg \
--path results/fcg_baseline/checkpoint_best.pt \
--batch-size 128 --beam 5 \
--user-dir src --gen-subset valid | grep '^H' | LC_ALL=C sort -V | cut -f3 > results/fcg_baseline/DEV.prep_feedback_comment.out
Then, convert to a submission format.
$ python src/postprocess.py -i data/train_dev/DEV.prep_feedback_comment.public.tsv -s results/fcg_baseline/DEV.prep_feedback_comment.out -m data/train_dev/spm.model -o results/fcg_baseline/DEV.prep_feedback_comment.out.tsv
Compute precision, recall and f1 score based on sentence BLEU using SacreBLEU. You should provide detokenized system outputs file.
$ python src/evaluate_bleu.py -i results/fcg_baseline/DEV.prep_feedback_comment.out.tsv -r data/train_dev/DEV.prep_feedback_comment.public.tsv
BLEU precision: 46.341357634752534
BLEU recall: 46.341357634752534
BLEU F1: 46.341357634752534
$ python src/evaluate_bleu.py -i results/fcg_baseline/DEV.prep_feedback_comment.out.tsv -r data/train_dev/DEV.prep_feedback_comment.public.tsv -v
...
Input: So restaurants divide the area [[in to]] two sections .
System: Choose a <preposition> that indicates the beneficiary of an action instead of the <preposition> <<of>>.
Reference: It seems to be a careless mistake, but use the <preposition> that expresses the <prepositions> <<in>> and <<to>> in one word.
BLEU: 14.889568593912923
System length: 170
Reference length: 170
BLEU precision: 46.341357634752534
BLEU recall: 46.341357634752534
BLEU F1: 46.341357634752534
Summarize the final results based on the manual evaluation. Input is an xlsx file containing the manual evaluation results.
$ python src/evaluate.py -i results/manual_evaluation/DEV.prep_feedback_comment.public.xlsx
-------
Basic stats
-------
Input file: results/manual_evaluation/DEV.prep_feedback_comment.public.xlsx
Num of reference feedback comments: 215
Num of generated feedback comments: 212
Num of generated <NO_COMMENT>: 0
-------
Manual evaluation
-------
Precision: 0.428 (92 / 215)
Recall: 0.428 (92 / 215)
F1: 0.428
-------
BLEU
-------
Precision: 0.463
Recall: 0.463
F1: 0.463
-------