Skip to content

GU-CLASP/DistributionalDiscourse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

To Do

  • additional experimental conditions:
    • discourse markers / discourse + laughter
    • freezing the utterance encoder
    • in-domain pre-training for BERT
    • GloVe aggregation for utterances
      • BiLSTM
      • CNN / average pool?
  • methodological improvements:
    • use customized BERT vocab/word-piece tokenization for baseline models as well as BERT
  • additional corpora:
    • AMI -- has dialogue acts
    • SBCSAE -- normalize laughter & use for pre-training
  • improve reporting and analysis:
    • macro F1 / macro precision? See: Guillou et. al., 2016 (thanks, Sharid!)
    • majority class baseline
    • time to train / number of parameters / task-trained parameters
    • laughter impact
      • story for laughter in each of the DAs
      • which pairs of DAs is laughter most helpful at distinguishing (i.e. in the confusion matrix, where is there the biggest decrease from NL -> L?)
      • counts for DAs (total, with/wo laughs)
  • not super exciting but maybe we should try:
    • DAR model hyperparameter tuning (hidden_size, n_layers, dropout, use_lstm)
    • play with learning rate
    • use the BERT Adam optimiser (implements a warm-up)
  • probably future work:
    • probing tasks of the hidden layer
      • predict dialogue end (or turns to end)
      • predict turn change
    • dialogue model pre-training
      • instead of training the dialogue model to predict DAs directly, predict the encoder representation of the next utterance (unsupervised)
      • test/probe by guessing DAs (or other discourse properties) with an additional linear layer

Experiments

Model zoo

encoder model pre-training additional pre-training task corpus fine tune encoder
1 LSTM gloVe No AMI-DA/AMI-DA-NL/SWDA/SWDA-NL Yes
2 BERT Yes No AMI-DA/AMI-DA-NL/SWDA/SWDA-NL Yes
3 BERT Yes No AMI-DA/SWDA Yes
4 BERT NO NO AMI-DA/SWDA Yes
5 BERT YES AMI/SWBD (token/utterance/both)
6 BERT YES No AMI-DA/SWDA No
7 BERT Yes Ubuntu

Questions to answer

Is pre-trained BERT useful for dialogue at all?

  • baseline LSTM model (1) vs. pre-trained BERT (2)
  • pre-trained BERT (2) vs. randomly-initalized BERT (4) vs. BERT w/ additional pre-training (5)
  • Analysis:
    • Compare performance on SWDA & AMI-DA of model w/ highest val. accuracy after 20 epochs
    • Does randomly initialized BERT catch up to pre-trained BERT in performance (if so, maybe catastrophic forgetting is happening)

Does the model make use of dialogue-specific features?

  • BERT fine-tuned on SWDA/AMI-DA (2.1/2.3) vs. BERT fine tuned on SWDA-NL/AMI-DA-NL (2.2/2.4)

  • LSTM trained on SWDA/AMI-DA (1.1/1.3) vs. LSTM trained on SWDA-NL/AMI-DA-NL (1.2/1.4)

  • Analysis:

    • Compare performance of models trained with and without laughter. If models with laughter do better, the model must be using the laughter.
    • Compare increase (assuming there is one) in performance for LSTM vs. BERT
    • DA-specific performance. Which DAs does laughter help disambiguate?
    • Performance difference on utterances with/following/preceding laughter
  • BERT fine-tuned on SWDA

  • Analysis: Compare the model trained with laughter on the test set with/without laughter. Is this better/worse than comparing the with the model that was also trained with the no-laughter condition? Not sure...

Does additional in-domain pre-training help?

  • BERT with additional pre-training (masked token/next utterance/both) and frozen during fine-tuning (5) vs. BERT with no additional pre-training (frozen/not-frozen) (2,6)
  • Analysis
    • Compare performance of 5 and 2
    • How long does it take 6 to catch up to 5? (if at all)

(bonus) How well do dialogue-tuned models transfer to other dialogue settings?

  • BERT pre-trained on one domain (AMI/SWBD) and fine tuned on the other (AMI-DA/SWDA)
  • BERT pre-trained on a big dialogue corpus like Ubuntu
  • Analysis
    • Compare performance of 5.1-3 vs. 5.4-6 on AMI and visa versa on SWDA.
    • Compare performance of 7 to 5 and 2. (Is Ubuntu pre-training as good as in-genre pre-training? Is it better than no dialogue pre-training at all?)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published