Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError:CUDA out of memory #8

Open
2000lf opened this issue Nov 29, 2024 · 4 comments
Open

RuntimeError:CUDA out of memory #8

2000lf opened this issue Nov 29, 2024 · 4 comments

Comments

@2000lf
Copy link

2000lf commented Nov 29, 2024

Can I remove the attention layer for high resolution img?

@explainingai-code
Copy link
Owner

explainingai-code commented Nov 29, 2024

Yes you can try removing the attention to see if that gets rid of the error. Whats the image size you are working with ?

Adding few other things that will reduce the compute requirement in case you are working with the default config.

  1. Reduce Batch Size(config uses 64 as of now)
  2. Keep attention only in midblock and remove Downblock attention(here) and Upblock attention(here)
  3. By default the downsampling is disabled on last downblock(since mnist images are anyways small), so change this value to be [True, True, True]

@2000lf
Copy link
Author

2000lf commented Nov 29, 2024

Yes you can try removing the attention to see if that gets rid of the error. Whats the image size you are working with ?

Adding few other things that will reduce the compute requirement in case you are working with the default config.

1. Reduce Batch Size(config uses 64 as of now)

2. Keep attention only in midblock and remove Downblock attention([here](https://github.com/explainingai-code/DDPM-Pytorch/blob/main/models/unet_base.py#L98-L105)) and Upblock attention([here](https://github.com/explainingai-code/DDPM-Pytorch/blob/main/models/unet_base.py#L275-L281))

3. By default the downsampling is disabled on last downblock(since mnist images are anyways small), so change [this](https://github.com/explainingai-code/DDPM-Pytorch/blob/main/config/default.yaml#L14) value to be `[True, True, True]`

Thank you for your advice,I use img with size 900*1600. That consume huge memory when encounter attention layer.

@explainingai-code
Copy link
Owner

Got it. Yeah try with a batch size of 1 first.
If it works then you can train with gradient accumulation.
But if that also fails then you would have to either remove attention layers or train with smaller sized images.

@2000lf
Copy link
Author

2000lf commented Nov 29, 2024

Got it. Yeah try with a batch size of 1 first. If it works then you can train with gradient accumulation. But if that also fails then you would have to either remove attention layers or train with smaller sized images.

Thank you ,I will try as you told.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants