Baseline system for the feedback comment generation@GenChal2022

This is a baseline system for the feedback comment generation@GenChal2022 This system is an encoder-decoder with a copy mechanism based on a pointer generator netowork. The implementation is based on fairseq.

Requirements

Python 3.7+
Install the required libraries:

$ pip install -r requirements.txt

Dataset

Download FCG dataset from https://fcg.sharedtask.org/ and unzip to data/train_dev.

Usage

Preprocess

Convert to fairseq compatible format using the following shell script.

$ src/preprocess_train_dev.sh -i data/train_dev -o data/fcg-bin

Train

Train a model based on a pointer generator netowork.

$ fairseq-train \
    data/fcg-bin --task fcg --arch fcg \
    --optimizer adam --lr 0.001 \
    --max-tokens 1024 --max-epoch 50 \
    --eval-bleu --eval-bleu --best-checkpoint-metric bleu \
    --maximize-best-checkpoint-metric \
    --user-dir src \
    --save-dir results/fcg_baseline

Generate

First, output in the form of one tokenized comment per line by fairseq-generate.

$ fairseq-generate \
    data/fcg-bin --task fcg \
    --path results/fcg_baseline/checkpoint_best.pt \
    --batch-size 128 --beam 5 \
    --user-dir src --gen-subset valid | grep '^H' | LC_ALL=C sort -V | cut -f3 > results/fcg_baseline/DEV.prep_feedback_comment.out

Then, convert to a submission format.

$ python src/postprocess.py -i data/train_dev/DEV.prep_feedback_comment.public.tsv -s results/fcg_baseline/DEV.prep_feedback_comment.out -m data/train_dev/spm.model -o results/fcg_baseline/DEV.prep_feedback_comment.out.tsv

Evaluate

Automatic evaluation based on BLEU

Compute precision, recall and f1 score based on sentence BLEU using SacreBLEU. You should provide detokenized system outputs file.

$ python src/evaluate_bleu.py -i results/fcg_baseline/DEV.prep_feedback_comment.out.tsv -r data/train_dev/DEV.prep_feedback_comment.public.tsv
BLEU precision: 46.341357634752534
BLEU recall: 46.341357634752534
BLEU F1: 46.341357634752534

$ python src/evaluate_bleu.py -i results/fcg_baseline/DEV.prep_feedback_comment.out.tsv -r data/train_dev/DEV.prep_feedback_comment.public.tsv -v
...
Input: So restaurants divide the area [[in to]] two sections .
System: Choose a <preposition> that indicates the beneficiary of an action instead of the <preposition> <<of>>.
Reference: It seems to be a careless mistake, but use the <preposition> that expresses the <prepositions> <<in>> and <<to>> in one word.
BLEU: 14.889568593912923

System length: 170
Reference length: 170

BLEU precision: 46.341357634752534
BLEU recall: 46.341357634752534
BLEU F1: 46.341357634752534

Summarize results of manual evaluation

Summarize the final results based on the manual evaluation. Input is an xlsx file containing the manual evaluation results.

$ python src/evaluate.py -i results/manual_evaluation/DEV.prep_feedback_comment.public.xlsx
-------
Basic stats
-------
Input file: results/manual_evaluation/DEV.prep_feedback_comment.public.xlsx
Num of reference feedback comments: 215
Num of generated feedback comments: 212
Num of generated <NO_COMMENT>: 0
-------
Manual evaluation
-------
Precision: 0.428 (92 / 215)
Recall: 0.428 (92 / 215)
F1: 0.428
-------
BLEU
-------
Precision: 0.463
Recall: 0.463
F1: 0.463
-------

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Baseline system for the feedback comment generation@GenChal2022

Requirements

Dataset

Usage

Preprocess

Train

Generate

Evaluate

Automatic evaluation based on BLEU

Summarize results of manual evaluation

About

Releases

Packages

Languages

k-hanawa/fcg_genchal2022_baseline

Folders and files

Latest commit

History

Repository files navigation

Baseline system for the feedback comment generation@GenChal2022

Requirements

Dataset

Usage

Preprocess

Train

Generate

Evaluate

Automatic evaluation based on BLEU

Summarize results of manual evaluation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages