when i finished training the cuda mem is stilled occupied , how to free the mem? #205

alpttex19 · 2024-10-30T07:39:44Z

This is for bugs only

Did you already ask in the discord?

Yes/No

You verified that this is a bug and not a feature request or question by asking in the discord?

Yes/No

Describe the bug

maflx · 2024-10-30T14:43:59Z

I have the same problem. The cleanup() doesn't clean the memory:

ai-toolkit/toolkit/job.py

Line 44 in 58f9d01

job.cleanup()

I have added:
del job.process
del job
gc.collect()
torch.cuda.empty_cache()
reducing the memory but I still have some memory leak.

Another memory problem I have is that I try to make two trainings in parallel using two gpus, but the one using cuda:1 always use some memory in cuda:0 when executing the line:

ai-toolkit/extensions_built_in/sd_trainer/SDTrainer.py

Line 1635 in 58f9d01

self.scaler.step(self.optimizer)

I suspect the problem is with the bitsandbytes optimizer although I'm not sure.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

when i finished training the cuda mem is stilled occupied , how to free the mem? #205

when i finished training the cuda mem is stilled occupied , how to free the mem? #205

alpttex19 commented Oct 30, 2024

maflx commented Oct 30, 2024

when i finished training the cuda mem is stilled occupied , how to free the mem? #205

when i finished training the cuda mem is stilled occupied , how to free the mem? #205

Comments

alpttex19 commented Oct 30, 2024

This is for bugs only

Describe the bug

maflx commented Oct 30, 2024