-
Notifications
You must be signed in to change notification settings - Fork 346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multilayer support, dataset improvments and more #38
base: master
Are you sure you want to change the base?
Conversation
Awesome work as always! I'm currently running /w |
Results after
The val ppl increased after each epoch (started from 121) Eval:
I'm not sure if it's the eval code that is broken or the model. I've had similar issues too when I switched to SeqLSTM (in seqlstm branch). Will try re-training w/ a single layer. |
Hi, I think the problem is the small dataset you are using, only 50k examples. |
Even if it overfits the data, don't you find it suspect that the eval always returns the same exact output? On master, when evaluating, I get a different output for every input even w/ small datasets. But w/ this one change I got similar behaviour (always same output). So I'm suspecting it's SeqLSTM. I'm re-running the training w/ the full dataset and 15k vocab. I'll post results as soon as I got a couple epoch done. |
You are right, it seems like even when the model should memorize the dataset it still gives the same response every time.. |
I am also getting the same responses when training with the following params - th train.lua --cuda --hiddenSize 1000 --numLayers 2 --dataset 0 --batchSize 5 Ran one more experiment with only one layer (50 epochs) and am getting same response :( th train.lua --cuda --hiddenSize 1000 --numLayers 1 --dataset 0 --batchSize 5 |
using this settings - (takes less than an hour to start seeing results) It do however seems like it takes more time to establish communication between the encoder and the decoder, and the model works mostly as a language model in the first epochs. |
Hi @macournoyer, @vikram-gupta , I added a commit that turn off seqLSTM by default(use LSTM instead) and allows to switch it back on using the flag --seqLstm. |
@chenb67 thx for the fix and the paper! Will check it out. I'm re-training w/ this and will see. |
Thanks @chenb67 I trained the models with the following params. Note that, i used --seqLSTM flag because the code was crashing during evaluation as we are converting the input to table. th train.lua --batchSize 64 --hiddenSize 1000 --cuda --numLayers 1 --vocabSize 10000 --dropout 0 --weightDecay 0 --earlyStopOnTrain --dataset 100000 --seqLstm The results have improved but we still have something more to do before they are as good as @macournoyer reported initially. Its surprising that even after nullifying almost all of the changes, the results are still not same as before. @macournoyer any clues? you> how are you? After 50 epochs, these were the stats - The error on training kept on going down with each epoch. |
Something definitely happened in this branch or recently on master that decreased the (subjective) quality of the responses in It might be in the recent changes I pushed on master, I'm looking into it... |
Hi,
The PR is pretty big and when I rebased I encountered some conflicts..
I decided to comment out the LR decay code since adam is supposed to handle it, so your consideration when merging..
some of the new features:
I also fixed some major bugs with perplexity calculation (with the help of @vikram-gupta), and bugs with memory efficiency.