GPU memory keeps increasing with every batch and finally leads to CUDA being out of memory #9

Lin-ZN · 2023-05-17T03:13:43Z

Hi Otaheri,

Thank you for your good work!

Do you meet the problem that when training GNet, the GPU memory keeps increasing with every batch and finally leads to CUDA being out of memory? When training GNet, the first few iterations have seen a significant GPU memory increase, while the later each batch has increased by approximately 2 or 4M. As far as I know, if we don't intentionally save some variables with gradients, the release and new occupation of memory after the first iteration in training neural networks should be in a balanced stage, that is, there won't be any further increase in GPU memory. Could you help me solve this problem? Thank you very much!

siddharthKatageri · 2024-03-03T01:42:18Z

@Lin-ZN I noticed the same. Were you able to find the cause for this?
@otaheri

Lin-ZN · 2024-03-05T09:01:25Z

@siddharthKatageri Hello, this issue may be caused by a mismatch in the PyTorch environment and its dependencies. I resolved the problem by adjusting the environment configuration. I hope this information helps you as well.

z050209 · 2024-04-05T03:32:00Z

@Lin-ZN hi, I also face the same issue, would please also share your python version and pytorch version etc.? Thank you!

otaheri · 2024-04-22T17:08:08Z

Hi @siddharthKatageri @z050209 @Lin-ZN, did you guys find a solution for this? Unfortunately I couldn't replicate this so far. Maybe, can you provide more details about the system and config that you have so that we can try replicating this and finding a solution? Also, if you found a solution for this, it would be great to share it. Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU memory keeps increasing with every batch and finally leads to CUDA being out of memory #9

GPU memory keeps increasing with every batch and finally leads to CUDA being out of memory #9

Lin-ZN commented May 17, 2023

siddharthKatageri commented Mar 3, 2024

Lin-ZN commented Mar 5, 2024

z050209 commented Apr 5, 2024

otaheri commented Apr 22, 2024

GPU memory keeps increasing with every batch and finally leads to CUDA being out of memory #9

GPU memory keeps increasing with every batch and finally leads to CUDA being out of memory #9

Comments

Lin-ZN commented May 17, 2023

siddharthKatageri commented Mar 3, 2024

Lin-ZN commented Mar 5, 2024

z050209 commented Apr 5, 2024

otaheri commented Apr 22, 2024