- We have uploaded an
evaluation.sh
script for your convenience to reproduce prediction and evaluation shown in Table 4. Please update your local repo, and please copy and pastee2e-metrics
folder into the same directory as this script. This script will run Transformer Baseline, PAG, CNE_Enc, CNE_Dec prediction and evaluate the generated context automatically. - Please run the following commands to get the prediction and scores.
$ git clone https://github.com/tuetschek/e2e-metrics
$ ./evaluation.sh
- Notice: Since for the transformer baseline model, it was an quite early experiment, so we deleted it. The transformer baseline model weights are trained again today, so the evaluation results is slightly different from Table 4. Besides, there is a typo in Table 4 in the report, the actual ROUGE-L score for the best CNE-Dec experiments is 0.6850, instead of 0.6500.
The code of our course project is modified based on https://github.com/UKPLab/e2e-nlg-challenge-2017
Updates:
-
We found a critical error in the original code, which could possibly make the training wrong. (components/model/modules/attention/attn_bahd.py#68) We have fixed it in this project.
-
Two more data pre-processing strategies are implemented. (A General Model for Neural Text Generation from Structured Data, End-to-End Content and Plan Selection for Data-to-Text Generation)
-
Transformer model
-
Sentence Control stratigies:
- Predict and Generate (PAG)
- Controllable NOS Embedding (CNE)
- Official website: http://www.macs.hw.ac.uk/InteractionLab/E2E/
- Evaluation protocol: automatic metrics for system development
We provide basic scripts and their utilities in this repository, along with some output files' content:
run_experiment.py
: main script to run (please freeze this script since experiment configurations are declared in YAML file).config/train_XXXX.yaml
andconfig/predict_XXXX.yaml
: configuration files to use with the script above.components/
: data_preprocessing (HAV, HIT, UKP), model (MLP, GRU, Transformer...), trainer, evaluator, and necessary utils.
- 64-bit Linux versions
- Python 3 and dependencies:
- PyTorch v1.2.0
- Progressbar2 v3.18.1
- Install Python3 dependencies:
$ conda install pytorch torchvision cuda80 -c soumith
$ conda install progressbar2
- Python2 dependencies are needed only to run the official evaluation scripts.
- Step 1
For your convenience, we have setup multiple configuration files in config/
.
train_transformer.YAML
andpredict_transformer.YAML
run experiments using standard Transformer model. Please freezenos_option
to be 0.train_transformer_CNE_Enc.YAML
andpredict_transformer_CNE_Enc.YAML
run experiments using Transformer model with CNE NOS embedding in encoder input. Please freezenos_option
andnos_position
to maintain the CNE_Enc structure.train_transformer_CNE_Dec.YAML
andpredict_transformer_CNE_Dec.YAML
run experiments using Transformer model with CNE NOS embedding in decoder input. Please freezenos_option
andnos_position
to maintain the CNE_Dec structure.train_transformer_PAG.YAML
andpredict_transformer_PAG.YAML
run experiments using Transformer model with PAG NOS. Please freezenos_option
to maintain the PAG structure. (nos_position
is not used in this method).train_XXXX.YAML
andpredict_XXXX.YAML
files are required to be used in pairs.train_MLP.YAML
andpredict_MLP.YAML
run experiments using standard MLP model.train_gru.YAML
andpredict_gru.YAML
run experiments using standard gru model.train_tfenc.YAML
andpredict_tf.YAML
run experiments using transformer as encoder model.data_module
can be chosen frome2e_data_hav
,e2e_data_hit
, ande2e_data_MLP
, do not forget to modify corresponding parameters in configuration files following the instruction within them.- Constraint: NOS related terms are specifically designed for Transformer model.
- For other model configuration parameters including
embedding size
,hidden size
,input size
, etc., if your setting confirms the instruction within the configuration file, the script can execute correctly.
- Step2
Please first download e2e-metrics toolkit from 3. We are going to use the measure_score.py script to evaluate the quality of the generated sentence.
-
Adjust data paths and choose a configuration file or use your own defined YAML (train_transformer.yaml, as a running example).
-
Run the following command:
$ python run_experiment.py config/train_transformer.yaml
-
After the experiment, a folder will be created in the directory specified by the experiments_dir field of train_transformer.yaml file. This folder should contain the following files:
- model weights and development set predictions for each training epoch (weights.epochX, predictions.epochX)
- a csv file with scores and train/dev losses for each epoch (scores.csv)
- configuration dictionary in json format (config.json)
- pdf files with learning curves (optional)
- experiment log (log.txt)
-
If you use a model for prediction (setting "predict" as the value for the mode field in the config file and specifying model path in model_fn), use corresponding predict_transformer.yaml or set the configuration parameters to be the same as your training configuration if you want to use the self-defined configuration. the predictions done by the loaded model will be stored in:
* $model_fn.devset.predictions.txt
* $model_fn.testset.predictions.txt
Evaluate the generated context using following command:
```
$ cd e2e-metrics
$ python measure_scores.py <ref> <generated>
$ #Eg.
$ python measure_scores.py e2e-dataset/testset_w_refs.csv.multi-ref exp/e2e_model_transformer_seed1-emb256-hid512-drop0.05-bs128-lr0.0002_2020-Dec-17_23.53.25/weights.epoch1.testset.predictions.txt.
```
This script will calculate BLEU, NIST, CIDEr, ROUGE-L, and METEOR.