Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multilayer support, dataset improvments and more #38

Open
wants to merge 43 commits into
base: master
Choose a base branch
from

Conversation

chenb67
Copy link
Contributor

@chenb67 chenb67 commented Jul 3, 2016

Hi,

The PR is pretty big and when I rebased I encountered some conflicts..
I decided to comment out the LR decay code since adam is supposed to handle it, so your consideration when merging..

some of the new features:

  • multilayers lstm
  • enhancements to dataset handling, allow to set vocab size, shuffle before every epoch, validation set, load from csv
  • change to seqlstm - allows to double the network size ( i'm training on a single 4G GPU 4 layers of 1024 units in each side with 10k vocab )
  • dropout
  • l2 reg /weight decay
  • early stop on validation/training

I also fixed some major bugs with perplexity calculation (with the help of @vikram-gupta), and bugs with memory efficiency.

chenb67 added 30 commits June 15, 2016 10:14
@macournoyer
Copy link
Owner

Awesome work as always! I'm currently running /w th train.lua --cuda --hiddenSize 1000 --numLayers 2 --dataset 30000 --vocabSize 10000 and will update with results.

@macournoyer
Copy link
Owner

macournoyer commented Jul 4, 2016

Results after th train.lua --cuda --hiddenSize 1000 --numLayers 2 --dataset 30000 --vocabSize 10000 for 50 epoch:

Epoch stats:
  Errors: min= 1.298625054104
          max= 3.8777817894112
       median= 2.3050528590272
         mean= 2.3216041151624
          std= 0.32521225826956
          ppl= 10.192010358467
     val loss= 5.8125658944054
      val ppl= 334.47625630575

The val ppl increased after each epoch (started from 121)

Eval:

you> hi
neuralconvo> I'm not sure you're not going to be a little.
you> what's your name?
neuralconvo> I'm not sure you're not going to be a little.
you> how old are you?
neuralconvo> I'm not sure you're not going to be a little.

I'm not sure if it's the eval code that is broken or the model. I've had similar issues too when I switched to SeqLSTM (in seqlstm branch).

Will try re-training w/ a single layer.

@chenb67
Copy link
Contributor Author

chenb67 commented Jul 5, 2016

Hi, I think the problem is the small dataset you are using, only 50k examples.
try the full set - I get to ppl 30 on val this way.
The answers will tend to be generic when early stopping on the validation set, you can try to overfit the training data like before with the flag --earlyStopOnTrain

@macournoyer
Copy link
Owner

Even if it overfits the data, don't you find it suspect that the eval always returns the same exact output?

On master, when evaluating, I get a different output for every input even w/ small datasets. But w/ this one change I got similar behaviour (always same output). So I'm suspecting it's SeqLSTM.

I'm re-running the training w/ the full dataset and 15k vocab. I'll post results as soon as I got a couple epoch done.

This was referenced Jul 5, 2016
@chenb67
Copy link
Contributor Author

chenb67 commented Jul 6, 2016

You are right, it seems like even when the model should memorize the dataset it still gives the same response every time..
I'll investigate further and update you soon.

@vikram-gupta
Copy link

vikram-gupta commented Jul 7, 2016

I am also getting the same responses when training with the following params -

th train.lua --cuda --hiddenSize 1000 --numLayers 2 --dataset 0 --batchSize 5

Ran one more experiment with only one layer (50 epochs) and am getting same response :(

th train.lua --cuda --hiddenSize 1000 --numLayers 1 --dataset 0 --batchSize 5

@chenb67
Copy link
Contributor Author

chenb67 commented Jul 7, 2016

using this settings - (takes less than an hour to start seeing results)
th train.lua --batchSize 128 --hiddenSize 512 --cuda --numLayers 1 --vocabSize 10000 --dropout 0 --weightDecay 0 --earlyStopOnTrain --dataset 100000
I managed to overfit a model that responds differently to inputs.

It do however seems like it takes more time to establish communication between the encoder and the decoder, and the model works mostly as a language model in the first epochs.

@chenb67
Copy link
Contributor Author

chenb67 commented Jul 12, 2016

Hi @macournoyer, @vikram-gupta , I added a commit that turn off seqLSTM by default(use LSTM instead) and allows to switch it back on using the flag --seqLstm.
My experiments show similar results using LSTM/SeqLSTM with the same number of units.
I think the lack of variety in the answers originates from the regularisation we introduced (dropout + wd) also, some of the papers acknowledge this issue with those kinds models - check http://arxiv.org/abs/1510.03055

@macournoyer
Copy link
Owner

@chenb67 thx for the fix and the paper! Will check it out.

I'm re-training w/ this and will see.

@vikram-gupta
Copy link

vikram-gupta commented Jul 13, 2016

Thanks @chenb67

I trained the models with the following params. Note that, i used --seqLSTM flag because the code was crashing during evaluation as we are converting the input to table.

th train.lua --batchSize 64 --hiddenSize 1000 --cuda --numLayers 1 --vocabSize 10000 --dropout 0 --weightDecay 0 --earlyStopOnTrain --dataset 100000 --seqLstm

The results have improved but we still have something more to do before they are as good as @macournoyer reported initially. Its surprising that even after nullifying almost all of the changes, the results are still not same as before. @macournoyer any clues?

you> how are you?
neuralconvo> Oh, you met him...
you> where are you?
neuralconvo> In your place?
you> what is your name?
neuralconvo> You're talking about the precogs...
you> how old are you?
neuralconvo> You're talking about the precogs...
you> where do you live?
neuralconvo> I'm just an eye out.
you> are you intelligent?
neuralconvo> Yes, sir.
you> are you a bot?
neuralconvo> But don't you remember?
you> are you hungry?
neuralconvo> Oh, you met him...
you> hello
neuralconvo> You're talking about the precogs...

After 50 epochs, these were the stats -
Errors:
min= 0.17394069818649
max= 0.61486148644254
median= 0.37594955411701
mean= 0.37832337311441
std= 0.07127508704293
ppl= 1.4598349379268
val loss= 7.2912249430419
val ppl= 1467.3670378147

The error on training kept on going down with each epoch.

@macournoyer
Copy link
Owner

macournoyer commented Jul 13, 2016

Something definitely happened in this branch or recently on master that decreased the (subjective) quality of the responses in eval.th.

It might be in the recent changes I pushed on master, I'm looking into it...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants