Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

when i finished training the cuda mem is stilled occupied , how to free the mem? #205

Open
alpttex19 opened this issue Oct 30, 2024 · 1 comment

Comments

@alpttex19
Copy link

This is for bugs only

Did you already ask in the discord?

Yes/No

You verified that this is a bug and not a feature request or question by asking in the discord?

Yes/No

Describe the bug

@maflx
Copy link

maflx commented Oct 30, 2024

I have the same problem. The cleanup() doesn't clean the memory:

job.cleanup()

I have added:
del job.process
del job
gc.collect()
torch.cuda.empty_cache()
reducing the memory but I still have some memory leak.

Another memory problem I have is that I try to make two trainings in parallel using two gpus, but the one using cuda:1 always use some memory in cuda:0 when executing the line:

self.scaler.step(self.optimizer)

I suspect the problem is with the bitsandbytes optimizer although I'm not sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants