-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
label-free predictions: CUDA out of memory in prediction #90
Comments
Hi, I'd try to reduce the size of the image that you're trying to predict on by tiling it into smaller images. Cheers, |
Thank you @krentzd PS: section 6.2 says |
No, the number of z-slices for prediction shouldn't depend on the number of z-slices used for training. |
Thanks @krentzd,
I tried multiple times and no matter what I am getting: My trainning was with 36 mb images (924x624 pix, 32 slices) and it worked fine. For my prediction even a 4 slice 200 x 200 pix Thanks a lot |
Hi, Could you check what GPU you're assigned when you do the prediction. And do you run the entire notebook or do you skip sections in between before loading your pre-trained model in Section 6? Cheers, |
Thank you @krentzd, The first time it happened it was immediately after training sections 1, 2, 3, 4.1, 5.1, 6.1 Yesterday and today I have been doing sections 1.1, 1.2, 2 and 6.1. The full error message I get in 6.1 is:
Cheers |
Hi, Hope this helps. |
Hi @lucpaul, Thank you for your feedback. The issue I am having is for the prediction in 6.1 not the training. I am getting the Are you saying that because I used too many images for the model creation I am having trouble making predictions? Thanks a lot |
Oh, I see, I misunderstood. Apologies. Sorry again for the misunderstanding earlier. |
No problem, thanks for your help
The model.p file is 266MB
Actually, I can't run this cell right now as I am getting
I saw no issues with the RAM or disk space before I got the I noticed that I never purged the the pytorch_fnet folder from previous models training attempts (section 6.4). Maybe I should purge it and start again from scratch. I will probably do that and report back tomorrow or the day after tomorrow. Thanks a lot! |
Hi @lucpaul,
The model.p file is now 280MB.
I have no problem making the plot of training errors in 5.1 but I can't run 5.2. as it tries to find the predicted images in the QC folder but there are none in there (unless I misunderstand and I need to predict them first then go back and to the QC?)
They are the same.
Yes, I did everything again from scratch and this happened both straight after training and with a new runtime (I did install the fnet dependencies)
Everything seemed normal.
The last thing I see being executed is: The whole output I get for cell 6.1 is: `Requirement already up-to-date: scipy==1.2.0 in /usr/local/lib/python3.7/dist-packages (1.2.0)
|
Hello, I apologize for not coming back sooner. I have been working on a proper update on the fnet notebook which hopefully can be released soon, given some of the comments here and some other things I noticed. Interestingly, I encountered this error too. And it appears to be already existing in the original code, see here, for example: AllenCellModeling/pytorch_fnet#153. |
Hi again, I have been looking at this error now and can reproduce the error by using larger images (1024x1024x32 in my case) on a K80 GPU. So it has to do with the size of the image being loaded into the network. However, I am not sure it can be fixed easily within the notebook. It has to do with memory allocation in the GPUs and how CUDA handles the memory allocation to pytorch. It seems odd that such an error would occur during prediction and not training but it's not something I found a solution for yet and not sure if I have the capacity either. I have tried clearing out the cache of the GPU using The only fix I could find was reducing the image dimensions of the individual images I wanted to predict. So I would suggest you reduce your image dimensions, for example by making smaller patches, if you want to use the notebook on your data. Otherwise, you could search for the CUDA out of memory issue in the prediction context and see if you find a solution, maybe along the lines I tried above. After checking the issue now, I don't believe it is related to our implementation of label-free prediction in this project but is related either to the source code or is a pytorch specific problem, so I will close this issue here. Feel free to open this again if you find a more satisfactory solution. |
Hi,
I am trying to make label-free predictions with a model I previously trained with the fnet notebook.
However I am getting the following message:
RuntimeError: CUDA out of memory. Tried to allocate 4.75 GiB (GPU 0; 15.90 GiB total capacity; 11.32 GiB already allocated; 1.63 GiB free; 13.15 GiB reserved in total by PyTorch)
This happens even if I restart the runtime. Is there anything I can do about that?
Thank you very much!
The text was updated successfully, but these errors were encountered: