Problem on running language model example code in "dp" package with cuda option #162

kkjh0723 · 2015-09-22T05:25:03Z

Hi all,

I recently started to study torch7 and found some example codes in 'dp' package.
At first I try to run languagemodel.lua with CPU, and it worked. I tried both without option and with some options(softmaxtree, progress...).
Each epoch it shows like this,

==> epoch # 1 for optimizer :
[============================= 1000000/1000000 ==================>] ETA: 0ms | Step: 0ms
==> example speed = 2685.9343208491 examples/s
[============================= 100000/100000 ====================>] ETA: 0ms | Step: 0ms
jinhyung:1442886446:1:optimizer:loss avgErr 0.027386303028322
jinhyung:1442886446:1:optimizer:perplexity perplexity = 1107.1536726439
jinhyung:1442886446:1:validator:perplexity perplexity = 901.22609355108
jinhyung:1442886446:1:tester:perplexity perplexity = 905.12637877775

Then, I run it with cuda option. When I run without softmaxtree option, it's not running.
So I tried with softmaxtree option(th languagemodel.lua --progress --cuda --softmaxtree). At first it run and seemed work but the values of loss and perplexity were all 'nan'. Like below,

{
accUpdate : false
batchNorm : false
batchSize : 256
contextSize : 5
cuda : true
dropout : false
forestGaterSize : {}
hiddenSize : {200}
inputEmbeddingSize : 100
learningRate : 0.1
maxEpoch : 400
maxOutNorm : 2
maxTries : 30
momentum : 0
outputEmbeddingSize : 100
progress : true
schedule : {[250]=0.01,[350]=0.001}
silent : false
small : false
softmaxforest : false
softmaxtree : true
tiny : false
trainEpochSize : 1000000
trainOnly : false
useDevice : 1
validEpochSize : 100000
}
Input to first hidden layer has 500 neurons.
Model :
nn.Sequential {
input -> (1) -> (2) -> output: nn.ParallelTable {
input
|-> (1): nn.Sequential { | [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> output] | (1): nn.Dictionary | (2): nn.Collapse | (3): nn.Linear(500 -> 200) | (4): nn.Tanh | (5): nn.Linear(200 -> 100) | (6): nn.Tanh | } |-> (2): nn.Convert
... -> output
}
(2): nn.SoftMaxTree
}
FileLogger: log will be written to /home/jinhyung/save/jinhyung:1442844325:1/log
==> epoch # 1 for optimizer :
[=================== 1000000/1000000 =========>] ETA: 0ms | Step: 0ms
==> example speed = 6141.5748786068 examples/s
[=================== 100000/100000 ===========>] ETA: 0ms | Step: 0ms
jinhyung:1442844325:1:optimizer:loss avgErr nan
jinhyung:1442844325:1:optimizer:perplexity perplexity = nan
jinhyung:1442844325:1:validator:perplexity perplexity = nan
jinhyung:1442844325:1:tester:perplexity perplexity = nan
==> epoch # 2 for optimizer :
[=================== 1000000/1000000 =========>] ETA: 0ms | Step: 0ms
==> example speed = 5911.7152838998 examples/s
[=================== 100000/100000 ===========>] ETA: 0ms | Step: 0ms
jinhyung:1442844325:1:optimizer:loss avgErr nan
jinhyung:1442844325:1:optimizer:perplexity perplexity = nan
jinhyung:1442844325:1:validator:perplexity perplexity = nan
jinhyung:1442844325:1:tester:perplexity perplexity = nan
==> epoch # 3 for optimizer :
[=================== 1000000/1000000 =========>] ETA: 0ms | Step: 0ms
==> example speed = 5907.3460937005 examples/s
[=================== 100000/100000 ===========>] ETA: 0ms | Step: 0ms
jinhyung:1442844325:1:optimizer:loss avgErr nan
jinhyung:1442844325:1:optimizer:perplexity perplexity = nan
jinhyung:1442844325:1:validator:perplexity perplexity = nan
jinhyung:1442844325:1:tester:perplexity perplexity = nan

I'm not sure what's wrong. Is that a cuda problem? or do I need to install some other packages?
Please advise me how to solve this problem.

The text was updated successfully, but these errors were encountered:

nicholas-leonard · 2015-09-22T14:07:15Z

Can you try running the script after updating dp, rnn, dpnn, nn, cunn ?

kkjh0723 · 2015-09-23T02:59:24Z

Thanks for the comment. I tried updating the packages above and run the script again, but still same problem...
(By the way, does updating means install again with luarocks?)

nicholas-leonard · 2015-09-24T17:05:03Z

@kkjh0723 Yeah you can update with luarocks install [package] or

cd package
luarocks make rocks/[package.rockspec]

I just tried running that exact command :

th examples/languagemodel.lua --cuda --useDevice 1 --softmaxtree --progress

I get no NaNs.

kkjh0723 · 2015-09-25T03:57:00Z

@nicholas-leonard thanks again!!
First thing, when I reduce the batch size to 16, it works(no NaN). But it become slow down...
I tested your exact command, but still I get nans.

do you have any clue??
I also asked my friend to test on other GPU(GTX 780 Ti, cuda 7.0 ) but that also shows Nan when ruuning with larger batch size(over 32), and when reduce the batch size to 16 it worked.
I can run it with small batch size but takes long time... it's similar speed with running 8 cpu cores.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem on running language model example code in "dp" package with cuda option #162

Problem on running language model example code in "dp" package with cuda option #162

kkjh0723 commented Sep 22, 2015

nicholas-leonard commented Sep 22, 2015

kkjh0723 commented Sep 23, 2015

nicholas-leonard commented Sep 24, 2015

kkjh0723 commented Sep 25, 2015

Problem on running language model example code in "dp" package with cuda option #162

Problem on running language model example code in "dp" package with cuda option #162

Comments

kkjh0723 commented Sep 22, 2015

nicholas-leonard commented Sep 22, 2015

kkjh0723 commented Sep 23, 2015

nicholas-leonard commented Sep 24, 2015

kkjh0723 commented Sep 25, 2015