Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

48G A40 BatchSize=1 OutOfMemory(OOM) #2

Closed
AInkCode opened this issue Jan 4, 2024 · 3 comments
Closed

48G A40 BatchSize=1 OutOfMemory(OOM) #2

AInkCode opened this issue Jan 4, 2024 · 3 comments

Comments

@AInkCode
Copy link

AInkCode commented Jan 4, 2024

Dear @FT-ZHOU-ZZZ,

I hope this message finds you well. I am reaching out to request additional details regarding the GPU memory specifications for running your code. Despite utilizing a 48GB A40 GPU with a batch size of 1, I am encountering an Out Of Memory (OOM) error.

Could you kindly provide further insights or recommendations on how to manage or mitigate this memory issue? Any information on the expected memory consumption or prerequisites for the GPU would be greatly appreciated.

Thank you for your time and assistance.

Best regards,
A Passionate Medical Imaging Student from Peking University

@FT-ZHOU-ZZZ
Copy link
Owner

Thanks for your interests to my work.
All experiments were conducted with single NVIDIA RTX A6000 (48GB).
According to previous experience, single RTX 3090 is enough for TCGA-BLCA, TCGA-BRCA, and TCGA-LUAD.
However, for TCGA-GBMLGG and TCGA-UCEC, some patients have multiple WSIs, especially for TCGA-GBMLGG, resulting in OOM.
In such case, you can randomly sample certain number of patches for these special patients to reduce the computational requirements. That will not significantly impact the overall performance.

@rongyua
Copy link

rongyua commented Apr 21, 2024

File "main.py", line 118, in
results = main(args)
File "main.py", line 94, in main
model, train_loader, val_loader, criterion, optimizer, scheduler
File "/home/u2023170674/CMTA-main/models/cmta/engine.py", line 48, in learning
self.train(train_loader, model, criterion, optimizer)
File "/home/u2023170674/CMTA-main/models/cmta/engine.py", line 87, in train
x_omic3=data_omic3, x_omic4=data_omic4, x_omic5=data_omic5, x_omic6=data_omic6)
File "/home/u2023170674/.conda/envs/CMTA/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/u2023170674/CMTA-main/models/cmta/network.py", line 183, in forward
pathomics_features) # cls token + patch tokens
File "/home/u2023170674/.conda/envs/CMTA/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/u2023170674/CMTA-main/models/cmta/network.py", line 76, in forward
h = self.layer2(h) # [B, N, 512]
File "/home/u2023170674/.conda/envs/CMTA/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/u2023170674/CMTA-main/models/cmta/network.py", line 28, in forward
x = x + self.attn(self.norm(x))
File "/home/u2023170674/.conda/envs/CMTA/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/u2023170674/CMTA-main/models/cmta/util.py", line 264, in forward
out = (attn1 @ attn2_inv) @ (attn3 @ v)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.10 GiB (GPU 0; 22.25 GiB total capacity; 40.47 GiB already allocated; 22.25 GiB free; 40.48 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

There seems to be an oom problem when calculating cross attention

@FT-ZHOU-ZZZ
Copy link
Owner

FT-ZHOU-ZZZ commented Apr 21, 2024

I have updated this repository to address OOM issue, please refer to README for more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants