Skip to content

Colorful project for Stanford's CS224U Natural Language Understanding class

Notifications You must be signed in to change notification settings

bnewm0609/cs224u-project

Repository files navigation

Communication-based Evaluation for Natural Language Generation

Project Description

Currently many NLG models are evaluated using n-gram overlap metrics like BLEU and ROUGE, but these metrics do not capture semantics let alone speaker intentions. People use language to communicate, and if we want NLG models to effectively communicate with people, we should evaluate them based on how well they communicate. We illustrate how this communication-based evaluation would work using the color reference game scenario from Monroe et al., 2017. We collected color reference game captions of various qualities and investigated how well models that use the captions to play the reference game can distinguish between dffierent quality captions compared to n-gram overlap metrics.

We have released the data we collected. It can be found in data/csv/clean_data.csv. The code to recreate the plots in the paper using the data and pretrained models can be found in this jupyter notebook.

Folder and File Descriptions

caption_featurizers.py contains code to process captions with an appropriate tokenizer into a format expected by the models. color_featurizers.py is a similar featurizer for the color inputs.

evaluation.py contains performance metric code for all models.

example_experiments.py contains examples of experiments that can be run with models such as the Literal Listener.

experiment.py contains code for model evaluation and the feature handler class that interfaces between the Monroe data, feature functions, and the models.

baseline_listener_samples, literal_listener_samples, and imaginative_listener_samples contain the ten sampled model parameters with optimal hyperparameters from the Baseline, Literal, and Imaginative Listener models, respectively.

data contains all the data used in the project, including the Monroe data and the synthetic data.

model contains all other model parameters for the models experimented with over the course of the project.

notebooks contains Jupyter notebooks for the experiments and scripts used to explore data, generate models, run models, sample models, score models, and other tasks.

About

Colorful project for Stanford's CS224U Natural Language Understanding class

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •