48G A40 BatchSize=1 OutOfMemory(OOM) #2

AInkCode · 2024-01-04T13:22:05Z

I hope this message finds you well. I am reaching out to request additional details regarding the GPU memory specifications for running your code. Despite utilizing a 48GB A40 GPU with a batch size of 1, I am encountering an Out Of Memory (OOM) error.

Could you kindly provide further insights or recommendations on how to manage or mitigate this memory issue? Any information on the expected memory consumption or prerequisites for the GPU would be greatly appreciated.

Thank you for your time and assistance.

Best regards,
A Passionate Medical Imaging Student from Peking University

FT-ZHOU-ZZZ · 2024-02-27T09:30:57Z

Thanks for your interests to my work.
All experiments were conducted with single NVIDIA RTX A6000 (48GB).
According to previous experience, single RTX 3090 is enough for TCGA-BLCA, TCGA-BRCA, and TCGA-LUAD.
However, for TCGA-GBMLGG and TCGA-UCEC, some patients have multiple WSIs, especially for TCGA-GBMLGG, resulting in OOM.
In such case, you can randomly sample certain number of patches for these special patients to reduce the computational requirements. That will not significantly impact the overall performance.

rongyua · 2024-04-21T12:04:22Z

File "main.py", line 118, in
results = main(args)
File "main.py", line 94, in main
model, train_loader, val_loader, criterion, optimizer, scheduler
File "/home/u2023170674/CMTA-main/models/cmta/engine.py", line 48, in learning
self.train(train_loader, model, criterion, optimizer)
File "/home/u2023170674/CMTA-main/models/cmta/engine.py", line 87, in train
x_omic3=data_omic3, x_omic4=data_omic4, x_omic5=data_omic5, x_omic6=data_omic6)
File "/home/u2023170674/.conda/envs/CMTA/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/u2023170674/CMTA-main/models/cmta/network.py", line 183, in forward
pathomics_features) # cls token + patch tokens
File "/home/u2023170674/.conda/envs/CMTA/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/u2023170674/CMTA-main/models/cmta/network.py", line 76, in forward
h = self.layer2(h) # [B, N, 512]
File "/home/u2023170674/.conda/envs/CMTA/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/u2023170674/CMTA-main/models/cmta/network.py", line 28, in forward
x = x + self.attn(self.norm(x))
File "/home/u2023170674/.conda/envs/CMTA/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/u2023170674/CMTA-main/models/cmta/util.py", line 264, in forward
out = (attn1 @ attn2_inv) @ (attn3 @ v)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.10 GiB (GPU 0; 22.25 GiB total capacity; 40.47 GiB already allocated; 22.25 GiB free; 40.48 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

There seems to be an oom problem when calculating cross attention

FT-ZHOU-ZZZ · 2024-04-21T17:05:58Z

I have updated this repository to address OOM issue, please refer to README for more details.

FT-ZHOU-ZZZ closed this as completed May 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

48G A40 BatchSize=1 OutOfMemory(OOM) #2

48G A40 BatchSize=1 OutOfMemory(OOM) #2

AInkCode commented Jan 4, 2024

FT-ZHOU-ZZZ commented Feb 27, 2024

rongyua commented Apr 21, 2024

FT-ZHOU-ZZZ commented Apr 21, 2024 •

edited

Loading

48G A40 BatchSize=1 OutOfMemory(OOM) #2

48G A40 BatchSize=1 OutOfMemory(OOM) #2

Comments

AInkCode commented Jan 4, 2024

FT-ZHOU-ZZZ commented Feb 27, 2024

rongyua commented Apr 21, 2024

FT-ZHOU-ZZZ commented Apr 21, 2024 • edited Loading

FT-ZHOU-ZZZ commented Apr 21, 2024 •

edited

Loading