Workflow: Building A Dialogue Model

When building a model for a specific release channel of thingpedia-common-devices, you should first generate a dataset, then train the model, and then evaluate the model.

Generating The Dataset

You can generate a dataset locally with:

make -j release=$release datadir

(e.g. make -j release=main datadir)

You can see a number of hyperparameters on the make command line. Look for "hyperparameters" in the Makefile for the full list.

A standard-sized dataset takes about 6 hours on a machine with at least 8 cores and at least 60GB of RAM. A smaller dataset can be generated for local testing with:

make release=$release subdatasets=1 target_pruning_size=25 max_turns=2 debug_level=2 datadir

This only takes a few minutes.

If you have genie-k8s configure, you can also generate a full-sized dataset with:

make syncup
cd ../genie-k8s
./generate-dataset.sh --experiment $release --dataset $dataset

$dataset is a short arbitrary name that will be used to refer to your dataset.

Don't forget to complete the config.mk before using make syncup

Training The Model

At this point, you must use genie-k8s to train a model:

./train.sh --experiment $release --dataset $dataset --model $model --task almond_dialogue_nlu -- $flags

Notice the -- separating the model name and the hyperparameter flags. Look at the tracking spreadsheet for the current set of flags to use for the best model in each release.

Evaluating The Model

You can evaluate a single model with:

make release=$release model=$model eval_set={dev | test} evaluate

For the dev set, the model is evaluated on data contained in eval/$release/dev/annotated.txt (multi-device dialogues) and in $release/*/eval/dev/annotated.txt (single-device dialogues).

If genie-k8s is configured correctly, the model will be downloaded automatically. If not, the model must be downloaded and placed in eval/$release/models/$model.

The output of the command is a CSV line with:

evaluation set
number of evaluation dialogues
number of evaluation turns
% of completely correct dialogues (exact match, slot only)
% of accuracy first turns (exact match, slot only)
turn by turn accuracy (exact match, slot only)
accuracy up to first error (exact match, slot only)
average turn at which the first error occurs (exact match, slot only)

You can also keep mutiple models to evaluate by setting $release_eval_{train|dev}_models in config.mk. For example:

universe_eval_dev_models += gcampax/1

If you do that, you can use make evaluate-all to evaluate all models at once.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

training.md

training.md

Workflow: Building A Dialogue Model

Generating The Dataset

Training The Model

Evaluating The Model

Files

training.md

Latest commit

History

training.md

File metadata and controls

Workflow: Building A Dialogue Model

Generating The Dataset

Training The Model

Evaluating The Model