You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I run my model on half precision(fp16) the Loss function returns NaN. It all works fine when I use normal floating point precision (fp32) so I don't think it is a problem of the learning parameters. It also is NaN right from the beginning of the training.
I am using the SpatialCrossEntropyCriterion and I also do explicitly not convert every MaxPooling and BatchNormalization to cudnn since these don't work otherwise.
Relevant code:
criterion=cudnn.SpatialCrossEntropyCriterion(classWeights):cudaHalf()
model=createNet()
model=model:cudaHalf()
-- cudnn ignoeres Pooling layers due to compatibiliy problems with Unpoolingcudnn.convert(model, cudnn, function(module)
returntorch.type(module):find("SpatialMaxPooling") ~=nil-- compatibility problemsortorch.type(module):find("SpatialBatchNormalization") ~=nil-- apparently no cudaHalf implementationend)
...-- during training this returns nan right from beginning or sometimes at second iterationloss=criterion:forward(outputGpu, labels)
I am wondering if the reason is the (not existing?) CudaHalf implementation for the BatchNormalization module?
The text was updated successfully, but these errors were encountered:
Okay I figured out that the nan's were due the adam optimisation. The default epsilon of 1e-8 is too low and rounded to zero like pointed out here. Setting it to 1e-4 fixes the nan problem but now the optimisation does not decrease the loss anymore. Is there a way to solve this wile keeping the same learning rate?
When I run my model on half precision(fp16) the Loss function returns NaN. It all works fine when I use normal floating point precision (fp32) so I don't think it is a problem of the learning parameters. It also is NaN right from the beginning of the training.
I am using the SpatialCrossEntropyCriterion and I also do explicitly not convert every MaxPooling and BatchNormalization to cudnn since these don't work otherwise.
Relevant code:
I am wondering if the reason is the (not existing?) CudaHalf implementation for the BatchNormalization module?
The text was updated successfully, but these errors were encountered: