multilayer support, dataset improvments and more #38

chenb67 · 2016-07-03T14:23:02Z

Hi,

The PR is pretty big and when I rebased I encountered some conflicts..
I decided to comment out the LR decay code since adam is supposed to handle it, so your consideration when merging..

some of the new features:

multilayers lstm
enhancements to dataset handling, allow to set vocab size, shuffle before every epoch, validation set, load from csv
change to seqlstm - allows to double the network size ( i'm training on a single 4G GPU 4 layers of 1024 units in each side with 10k vocab )
dropout
l2 reg /weight decay
early stop on validation/training

I also fixed some major bugs with perplexity calculation (with the help of @vikram-gupta), and bugs with memory efficiency.

…epoch

…and train

…epoch

…and train

macournoyer · 2016-07-03T22:02:51Z

Awesome work as always! I'm currently running /w th train.lua --cuda --hiddenSize 1000 --numLayers 2 --dataset 30000 --vocabSize 10000 and will update with results.

macournoyer · 2016-07-04T22:55:02Z

Results after th train.lua --cuda --hiddenSize 1000 --numLayers 2 --dataset 30000 --vocabSize 10000 for 50 epoch:

Epoch stats:
  Errors: min= 1.298625054104
          max= 3.8777817894112
       median= 2.3050528590272
         mean= 2.3216041151624
          std= 0.32521225826956
          ppl= 10.192010358467
     val loss= 5.8125658944054
      val ppl= 334.47625630575

The val ppl increased after each epoch (started from 121)

Eval:

you> hi
neuralconvo> I'm not sure you're not going to be a little.
you> what's your name?
neuralconvo> I'm not sure you're not going to be a little.
you> how old are you?
neuralconvo> I'm not sure you're not going to be a little.

I'm not sure if it's the eval code that is broken or the model. I've had similar issues too when I switched to SeqLSTM (in seqlstm branch).

Will try re-training w/ a single layer.

chenb67 · 2016-07-05T07:18:12Z

Hi, I think the problem is the small dataset you are using, only 50k examples.
try the full set - I get to ppl 30 on val this way.
The answers will tend to be generic when early stopping on the validation set, you can try to overfit the training data like before with the flag --earlyStopOnTrain

macournoyer · 2016-07-05T20:40:21Z

Even if it overfits the data, don't you find it suspect that the eval always returns the same exact output?

On master, when evaluating, I get a different output for every input even w/ small datasets. But w/ this one change I got similar behaviour (always same output). So I'm suspecting it's SeqLSTM.

I'm re-running the training w/ the full dataset and 15k vocab. I'll post results as soon as I got a couple epoch done.

chenb67 · 2016-07-06T13:54:21Z

You are right, it seems like even when the model should memorize the dataset it still gives the same response every time..
I'll investigate further and update you soon.

vikram-gupta · 2016-07-07T05:29:11Z

I am also getting the same responses when training with the following params -

th train.lua --cuda --hiddenSize 1000 --numLayers 2 --dataset 0 --batchSize 5

Ran one more experiment with only one layer (50 epochs) and am getting same response :(

th train.lua --cuda --hiddenSize 1000 --numLayers 1 --dataset 0 --batchSize 5

chenb67 · 2016-07-07T06:53:54Z

using this settings - (takes less than an hour to start seeing results)
th train.lua --batchSize 128 --hiddenSize 512 --cuda --numLayers 1 --vocabSize 10000 --dropout 0 --weightDecay 0 --earlyStopOnTrain --dataset 100000
I managed to overfit a model that responds differently to inputs.

It do however seems like it takes more time to establish communication between the encoder and the decoder, and the model works mostly as a language model in the first epochs.

chenb67 · 2016-07-12T13:51:50Z

Hi @macournoyer, @vikram-gupta , I added a commit that turn off seqLSTM by default(use LSTM instead) and allows to switch it back on using the flag --seqLstm.
My experiments show similar results using LSTM/SeqLSTM with the same number of units.
I think the lack of variety in the answers originates from the regularisation we introduced (dropout + wd) also, some of the papers acknowledge this issue with those kinds models - check http://arxiv.org/abs/1510.03055

macournoyer · 2016-07-12T21:23:09Z

@chenb67 thx for the fix and the paper! Will check it out.

I'm re-training w/ this and will see.

vikram-gupta · 2016-07-13T13:18:46Z

Thanks @chenb67

I trained the models with the following params. Note that, i used --seqLSTM flag because the code was crashing during evaluation as we are converting the input to table.

th train.lua --batchSize 64 --hiddenSize 1000 --cuda --numLayers 1 --vocabSize 10000 --dropout 0 --weightDecay 0 --earlyStopOnTrain --dataset 100000 --seqLstm

The results have improved but we still have something more to do before they are as good as @macournoyer reported initially. Its surprising that even after nullifying almost all of the changes, the results are still not same as before. @macournoyer any clues?

you> how are you?
neuralconvo> Oh, you met him...
you> where are you?
neuralconvo> In your place?
you> what is your name?
neuralconvo> You're talking about the precogs...
you> how old are you?
neuralconvo> You're talking about the precogs...
you> where do you live?
neuralconvo> I'm just an eye out.
you> are you intelligent?
neuralconvo> Yes, sir.
you> are you a bot?
neuralconvo> But don't you remember?
you> are you hungry?
neuralconvo> Oh, you met him...
you> hello
neuralconvo> You're talking about the precogs...

After 50 epochs, these were the stats -
Errors:
min= 0.17394069818649
max= 0.61486148644254
median= 0.37594955411701
mean= 0.37832337311441
std= 0.07127508704293
ppl= 1.4598349379268
val loss= 7.2912249430419
val ppl= 1467.3670378147

The error on training kept on going down with each epoch.

macournoyer · 2016-07-13T13:45:17Z

Something definitely happened in this branch or recently on master that decreased the (subjective) quality of the responses in eval.th.

It might be in the recent changes I pushed on master, I'm looking into it...

chenb67 added 30 commits June 15, 2016 10:14

add support for multilayer LSTM

b32fb17

update dataset handling to support vocab size , shuffle before every …

38bc326

…epoch

fix bug with initial dataset parsing

c8467b3

clean params memory when saving

f57a66b

fix bug with vocaulary creation (penlight bug)

d890d60

add weight decay

2265f55

add support for validation set in dataset + eval function in seq2seq …

d0211c3

…and train

fix potential bug with samples loadingcsv loading

1ffcb70

fix next bug + eval end of batch bug

2584439

fix dataset end of batch handling

f6a7a38

add dropout + gradnorm monitoring

8d42e0a

fix eval with dropout

43906b8

fix eval procedure

31b7115

simplify criterion usage

d4cf6d8

fix loss calculation for minibatches

86124b7

use minibatches for validation set

cfecf99

add option to select param for early stopping

3d2aa19

change to SeqLSTM

899c8e4

fix optimParams memory efficiency between epochs

61c4438

move criterion to CPU before saving

38cb321

add support for multilayer LSTM

b93c224

update dataset handling to support vocab size , shuffle before every …

e5fa3b1

…epoch

fix bug with initial dataset parsing

586bd1b

clean params memory when saving

eb735d1

fix bug with vocaulary creation (penlight bug)

379508a

add weight decay

3c400d5

add support for validation set in dataset + eval function in seq2seq …

fe7bb24

…and train

fix potential bug with samples loadingcsv loading

eccfd8a

fix next bug + eval end of batch bug

5d598c2

fix dataset end of batch handling

ce0a9f5

chenb67 added 12 commits July 3, 2016 16:52

add dropout + gradnorm monitoring

7082b0d

fix eval with dropout

fa59e54

fix eval procedure

35f40a6

simplify criterion usage

80059eb

fix loss calculation for minibatches

b1a0e59

use minibatches for validation set

b864799

add option to select param for early stopping

df6154f

change to SeqLSTM

b20c13e

fix optimParams memory efficiency between epochs

0414609

move criterion to CPU before saving

b72b6c2

fix merge issues

9f53e82

merge master

94fce61

This was referenced Jul 5, 2016

Multiple layers support #34

Closed

Multiple GPU support #26

Open

make seqlstm turned off by default and controlled via the flag --seqLstm

8a57f72

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multilayer support, dataset improvments and more #38

multilayer support, dataset improvments and more #38

chenb67 commented Jul 3, 2016

macournoyer commented Jul 3, 2016

macournoyer commented Jul 4, 2016 •

edited

Loading

chenb67 commented Jul 5, 2016

macournoyer commented Jul 5, 2016

chenb67 commented Jul 6, 2016

vikram-gupta commented Jul 7, 2016 •

edited

Loading

chenb67 commented Jul 7, 2016

chenb67 commented Jul 12, 2016

macournoyer commented Jul 12, 2016

vikram-gupta commented Jul 13, 2016 •

edited

Loading

macournoyer commented Jul 13, 2016 •

edited

Loading

multilayer support, dataset improvments and more #38

Are you sure you want to change the base?

multilayer support, dataset improvments and more #38

Conversation

chenb67 commented Jul 3, 2016

macournoyer commented Jul 3, 2016

macournoyer commented Jul 4, 2016 • edited Loading

chenb67 commented Jul 5, 2016

macournoyer commented Jul 5, 2016

chenb67 commented Jul 6, 2016

vikram-gupta commented Jul 7, 2016 • edited Loading

chenb67 commented Jul 7, 2016

chenb67 commented Jul 12, 2016

macournoyer commented Jul 12, 2016

vikram-gupta commented Jul 13, 2016 • edited Loading

macournoyer commented Jul 13, 2016 • edited Loading

macournoyer commented Jul 4, 2016 •

edited

Loading

vikram-gupta commented Jul 7, 2016 •

edited

Loading

vikram-gupta commented Jul 13, 2016 •

edited

Loading

macournoyer commented Jul 13, 2016 •

edited

Loading