Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Не удается запустить обучение на процессоре intel #19

Open
MadMaxNice opened this issue Jun 21, 2022 · 1 comment

Comments

@MadMaxNice
Copy link

MadMaxNice commented Jun 21, 2022

Добрый день!
Рассматриваю возможность обучения модели на сервере с процессором intel
Сервер на AMD, при backend =numpy - без проблем работает, обучение стартует
На сервере с intel, при backend =numpy, появляется ошибка:
`Epoch 1 of 100
Traceback (most recent call last):
File "/home/admin/mmn_env/lib/python3.6/site-packages/PuzzleLib/Containers/Sequential.py", line 189, in updateData
mod(data)
File "/home/admin/mmn_env/lib/python3.6/site-packages/PuzzleLib/Modules/Module.py", line 132, in call
self.updateData(data)
File "/home/admin/mmn_env/lib/python3.6/site-packages/PuzzleLib/Modules/BatchNorm1D.py", line 18, in updateData
super().updateData(data)
File "/home/admin/mmn_env/lib/python3.6/site-packages/PuzzleLib/Modules/BatchNormND.py", line 56, in updateData
data, self.scale, self.bias, self.mean, self.var, self.epsilon, factor, False
File "/home/admin/mmn_env/lib/python3.6/site-packages/PuzzleLib/Backend/Dnn.py", line 363, in wrapBatchNormNd
outdata = NumpyDnn.batchNorm2d(data, scale, bias, mean, var, epsilon, test, out=out)
File "/home/admin/mmn_env/lib/python3.6/site-packages/PuzzleLib/CPU/Wrappers/NumpyDnn.py", line 119, in batchNorm2d
assert test
AssertionError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "Train.py", line 156, in
main()
File "Train.py", line 148, in main
args.saveFolder, '{}_{}'.format(args.checkpointName, epoch))
File "Train.py", line 60, in train
out = model(gpuInput)
File "/home/admin/mmn_env/lib/python3.6/site-packages/PuzzleLib/Modules/Module.py", line 132, in call
self.updateData(data)
File "/home/admin/mmn_env/lib/python3.6/site-packages/PuzzleLib/Containers/Sequential.py", line 195, in updateData
self.handleError(mod, e)
File "/home/admin/mmn_env/lib/python3.6/site-packages/PuzzleLib/Containers/Container.py", line 240, in handleError
raise ContainerError("%s:\nModule (%s) error:\n%s%s" % (self, mod, type(e), msg))
PuzzleLib.Containers.Container.ContainerError: Container Sequential (name: w2l):
Module (Module BatchNorm1D (name: conv1d_0_bn)) error:
<class 'AssertionError'>
`
Прошу подсказать, с чем это может быть связно, как бороться? Возможно имеется уже собранный конфиг / репозиторий под intel - numpy
Установить DNNL получилось, но, там появляется также ошибка :)

Ошибка, возникающая при использовании backend = intel
Epoch 1 of 100
Traceback (most recent call last):
File "Train.py", line 156, in
main()
File "Train.py", line 148, in main
args.saveFolder, '{}_{}'.format(args.checkpointName, epoch))
File "Train.py", line 65, in train
error, grad = ctc([out, outlen], [targets, targetSizes])
File "/home/admin/mmn_env/train/PuzzleLib/Cost/Cost.py", line 65, in call
self.grad = self.calcGrad(pred, target)
File "/home/admin/mmn_env/train/PuzzleLib/Cost/CTC.py", line 29, in calcGrad
_, grad = ctcLoss(data, datalen, labels, lengths, self.blank, error=self.devErr, normalized=self.normalized)
TypeError: 'NoneType' object is not callable

Также отмечу, что устанавливал оптимизированные под intel версии numpy и scipy - безуспешно
Спасибо

@MadMaxNice
Copy link
Author

@sxdxfan, придите пожалуйста :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant