Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check failed: error == cudaSuccess (77 vs. 0) an illegal memory access was encountered #147

Open
Shikherneo2 opened this issue May 27, 2019 · 2 comments

Comments

@Shikherneo2
Copy link

Hi

I have been facing this issue for a while now, where the training suddenly stops with the error "Check failed: error == cudaSuccess (77 vs. 0) an illegal memory access was encountered".
Weirdly, it runs fine until about 150K iterations, after which this error pops up. I tried reducing the batch size, but it didnt help. Does not seem to be an out of memory issue, as I checked the usage when the error ocurred. Thank you for any help.

I0524 19:24:23.882612 13813 solver.cpp:486] Iteration 150000, lr = 0.01
F0524 19:24:39.746389 13813 math_functions.cu:81] Check failed: error == cudaSuccess (77 vs. 0) an illegal memory access was encountered
*** Check failure stack trace: ***
@ 0x7efd2b3690cd google::LogMessage::Fail()
@ 0x7efd2b36af33 google::LogMessage::SendToLog()
@ 0x7efd2b368c28 google::LogMessage::Flush()
@ 0x7efd2b36b999 google::LogMessageFatal::~LogMessageFatal()
@ 0x7efd2b81f8ba caffe::caffe_gpu_memcpy()
@ 0x7efd2b7a1ac0 caffe::SyncedMemory::gpu_data()
@ 0x7efd2b675562 caffe::Blob<>::gpu_data()
@ 0x7efd2b6ad189 caffe::BaseConvolutionLayer<>::forward_gpu_bias()
@ 0x7efd2b7e18d8 caffe::ConvolutionLayer<>::Forward_gpu()
@ 0x7efd2b785b9a caffe::Net<>::ForwardFromTo()
@ 0x7efd2b785cc7 caffe::Net<>::ForwardPrefilled()
@ 0x7efd2b79f556 caffe::Solver<>::Step()
@ 0x7efd2b79fea2 caffe::Solver<>::Solve()
@ 0x55ee0359557c train()
@ 0x55ee03592487 main
@ 0x7efd2a7c7b97 __libc_start_main
@ 0x55ee03592c2a _start

@JimHeo
Copy link

JimHeo commented Jul 16, 2019

Did you solve it?
I got the same issue...when i run compute_bn_statistics.py with iter 40000 model for PASCAL VOC dataset.

@slimway
Copy link

slimway commented Jun 14, 2020

I've been facing the same issue when changing the number of classes and the class_weights. the weird part is that when the labels 0 weight is set to zero, it works perfectly fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants