Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: CUDA Error: out of memory #27

Open
pharouknucleus opened this issue Sep 21, 2018 · 11 comments
Open

RuntimeError: CUDA Error: out of memory #27

pharouknucleus opened this issue Sep 21, 2018 · 11 comments

Comments

@pharouknucleus
Copy link

Please help me resolve this issue

@saadmanrafat
Copy link

try smaller batch size

@pharouknucleus
Copy link
Author

I have 64 batches. and the input size is 256 and the output size 242. By how much I am going to reduce it?

@saadmanrafat
Copy link

saadmanrafat commented Oct 7, 2018

try batch size 8, 16, 32. See if it works

@pharouknucleus
Copy link
Author

It is still showing me this error:
Traceback (most recent call last):
File "C:\Users\Nasir Isa\Documents\1Research\algortihm\CheXNet-master\CheXNet-master\m3.py", line 149, in
main()
File "C:\Users\Nasir Isa\Documents\1Research\algortihm\CheXNet-master\CheXNet-master\m3.py", line 95, in main
output = model(input_var)
File "C:\Users\Nasir Isa\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\nn\modules\module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "C:\Users\Nasir Isa\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\nn\parallel\data_parallel.py", line 121, in forward
return self.module(*inputs[0], **kwargs[0])
File "C:\Users\Nasir Isa\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\nn\modules\module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "C:\Users\Nasir Isa\Documents\1Research\algortihm\CheXNet-master\CheXNet-master\m3.py", line 144, in forward
x = self.densenet121(x)
File "C:\Users\Nasir Isa\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\nn\modules\module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "C:\Users\Nasir Isa\AppData\Local\Programs\Python\Python36\lib\site-packages\torchvision\models\densenet.py", line 220, in forward
features = self.features(x)
File "C:\Users\Nasir Isa\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\nn\modules\module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "C:\Users\Nasir Isa\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\nn\modules\container.py", line 91, in forward
input = module(input)
File "C:\Users\Nasir Isa\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\nn\modules\module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "C:\Users\Nasir Isa\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\nn\modules\conv.py", line 301, in forward
self.padding, self.dilation, self.groups)
RuntimeError: CUDA error: out of memory

@robhyb19
Copy link

@omrfrkmfy Were you ever able to figure out a solution to the problem? I'm dealing with the same issue

@pharouknucleus
Copy link
Author

pharouknucleus commented May 20, 2019 via email

@Viswanath660
Copy link

With 4 worker cores of NVIDIA P100, I had to gave 12 batch size. But, the AUROC is 49%, may be due to small batch size

@cherrymj
Copy link

Maybe you can try this idea
https://blog.csdn.net/xijuezhu8128/article/details/86594478

@LiJiaqi96
Copy link

I have encountered the same issue and solved it by forcing no gradient when using model.eval()
with torch.no_grad():
for i, (data,label) in enumerate(test_loader):
...
(Remember to use tab)
This makes the model do not save intermediate results so that temporary memory use will be freed after each batch.
May it helps.

@Candyeeee
Copy link

With 4 worker cores of NVIDIA P100, I had to gave 12 batch size. But, the AUROC is 49%, may be due to small batch size

I am dealing with the same issue and when I try multiple times it achieves different results. Did you solve it or find why?

icekang added a commit to icekang/CheXNet that referenced this issue Nov 15, 2020
# Problem
According to this [issue](arnoweng#27)
I also forked this repo and try running on my Colab project. The same problem arose: `RuntimeError: CUDA Error: out of memory`.

# Solution
As far as my knowledge, the problem happened because some section did not require `grad` yet still did anyway. Thus, [`with no_grad()`](https://pytorch.org/docs/stable/generated/torch.no_grad.html) should be presented.
@icekang
Copy link

icekang commented Nov 15, 2020

Hello, I know this is so late and it seems like the owner does not continue maintain the code for years. Yet if some of you end up with this problem and somehow, run into this issue. Try my solution: https://github.com/arnoweng/CheXNet/pull/39. I just started learning pytorch today and I'm not a pytorch pro. It is possible that my changes would lead to logic flaws. If that the case please tell me :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants