To Do

additional experimental conditions:
- discourse markers / discourse + laughter
- freezing the utterance encoder
- in-domain pre-training for BERT
- GloVe aggregation for utterances
  - BiLSTM
  - CNN / average pool?
methodological improvements:
- use customized BERT vocab/word-piece tokenization for baseline models as well as BERT
additional corpora:
- AMI -- has dialogue acts
- SBCSAE -- normalize laughter & use for pre-training
improve reporting and analysis:
- macro F1 / macro precision? See: Guillou et. al., 2016 (thanks, Sharid!)
- majority class baseline
- time to train / number of parameters / task-trained parameters
- laughter impact
  - story for laughter in each of the DAs
  - which pairs of DAs is laughter most helpful at distinguishing (i.e. in the confusion matrix, where is there the biggest decrease from NL -> L?)
  - counts for DAs (total, with/wo laughs)
not super exciting but maybe we should try:
- DAR model hyperparameter tuning (hidden_size, n_layers, dropout, use_lstm)
- play with learning rate
- use the BERT Adam optimiser (implements a warm-up)
probably future work:
- probing tasks of the hidden layer
  - predict dialogue end (or turns to end)
  - predict turn change
- dialogue model pre-training
  - instead of training the dialogue model to predict DAs directly, predict the encoder representation of the next utterance (unsupervised)
  - test/probe by guessing DAs (or other discourse properties) with an additional linear layer

Experiments

Model zoo

	encoder model	pre-training	additional pre-training	task corpus	fine tune encoder
1	LSTM	gloVe	No	AMI-DA/AMI-DA-NL/SWDA/SWDA-NL	Yes
2	BERT	Yes	No	AMI-DA/AMI-DA-NL/SWDA/SWDA-NL	Yes
3	BERT	Yes	No	AMI-DA/SWDA	Yes
4	BERT	NO	NO	AMI-DA/SWDA	Yes
5	BERT	YES	AMI/SWBD (token/utterance/both)
6	BERT	YES	No	AMI-DA/SWDA	No
7	BERT	Yes	Ubuntu

Questions to answer

Is pre-trained BERT useful for dialogue at all?

baseline LSTM model (1) vs. pre-trained BERT (2)
pre-trained BERT (2) vs. randomly-initalized BERT (4) vs. BERT w/ additional pre-training (5)
Analysis:
- Compare performance on SWDA & AMI-DA of model w/ highest val. accuracy after 20 epochs
- Does randomly initialized BERT catch up to pre-trained BERT in performance (if so, maybe catastrophic forgetting is happening)

Does the model make use of dialogue-specific features?

BERT fine-tuned on SWDA/AMI-DA (2.1/2.3) vs. BERT fine tuned on SWDA-NL/AMI-DA-NL (2.2/2.4)
LSTM trained on SWDA/AMI-DA (1.1/1.3) vs. LSTM trained on SWDA-NL/AMI-DA-NL (1.2/1.4)
Analysis:
- Compare performance of models trained with and without laughter. If models with laughter do better, the model must be using the laughter.
- Compare increase (assuming there is one) in performance for LSTM vs. BERT
- DA-specific performance. Which DAs does laughter help disambiguate?
- Performance difference on utterances with/following/preceding laughter
BERT fine-tuned on SWDA
Analysis: Compare the model trained with laughter on the test set with/without laughter. Is this better/worse than comparing the with the model that was also trained with the no-laughter condition? Not sure...

Does additional in-domain pre-training help?

BERT with additional pre-training (masked token/next utterance/both) and frozen during fine-tuning (5) vs. BERT with no additional pre-training (frozen/not-frozen) (2,6)
Analysis
- Compare performance of 5 and 2
- How long does it take 6 to catch up to 5? (if at all)

(bonus) How well do dialogue-tuned models transfer to other dialogue settings?

BERT pre-trained on one domain (AMI/SWBD) and fine tuned on the other (AMI-DA/SWDA)
BERT pre-trained on a big dialogue corpus like Ubuntu
Analysis
- Compare performance of 5.1-3 vs. 5.4-6 on AMI and visa versa on SWDA.
- Compare performance of 7 to 5 and 2. (Is Ubuntu pre-training as good as in-genre pre-training? Is it better than no dialogue pre-training at all?)

Name		Name	Last commit message	Last commit date
Latest commit History 321 Commits
analysis		analysis
data		data
nix		nix
reports		reports
swda @ 0cd491b		swda @ 0cd491b
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
Contextual Laughter.ipynb		Contextual Laughter.ipynb
Visualize Dialogue States.ipynb		Visualize Dialogue States.ipynb
__init__.py		__init__.py
ami.py		ami.py
data.py		data.py
eval_model.py		eval_model.py
finetune_on_pregenerated.py		finetune_on_pregenerated.py
laughter_position.py		laughter_position.py
model.py		model.py
opensubtitles.py		opensubtitles.py
pre_training.py		pre_training.py
pregenerate_training_data.py		pregenerate_training_data.py
readme.md		readme.md
requirements.txt		requirements.txt
shell.nix		shell.nix
train.py		train.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

To Do

Experiments

Model zoo

Questions to answer

Is pre-trained BERT useful for dialogue at all?

Does the model make use of dialogue-specific features?

Does additional in-domain pre-training help?

(bonus) How well do dialogue-tuned models transfer to other dialogue settings?

About

Releases

Packages

Contributors 2

Languages

GU-CLASP/DistributionalDiscourse

Folders and files

Latest commit

History

Repository files navigation

To Do

Experiments

Model zoo

Questions to answer

Is pre-trained BERT useful for dialogue at all?

Does the model make use of dialogue-specific features?

Does additional in-domain pre-training help?

(bonus) How well do dialogue-tuned models transfer to other dialogue settings?

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages