Request for Optimization, fixing issues with CUDA devices #23

LumiWasTaken · 2022-12-11T23:39:33Z

Hey there!

I really love the project and the idea behind it.

Sadly i lack infos to run it properly.

On my Device (3060) it runs via GPU very quickly runs into an OOM issue maximizing my 12Gb VRAM when merging 2x 2GB Models in CUDA

Its unclear if the Script is able to handle float16 and float32 mixes or the error "dot function not implemented for 'half'" is a user / env issue.

Fix issues like
<class 'KeyError'> 'model_ema.decay'
For some models that are based on NovelAi or are unpruned?

I'd like to have more infos about your current enviroment.

I have desperately tried to get it working on a RTX 5000 but despite all efforts all attempts to run it on a GPU run into a OOM issue.

Also Feature request for:
Saving the Model after x Iterations so when i do i can compare results. i have found that after a certain iteration count the results get worse than expected.
Renaming default output name "merge.ckpt" to something like "model_a_name_without_ext--model_b_name_without_ext--alpha--xxxiter.ckpt"

LumiWasTaken · 2022-12-12T00:08:44Z

Or when running in GPU / CUDA mode its common to have this issue:

<class 'RuntimeError'> Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)

brucethemoose · 2022-12-12T15:31:44Z

Yeah, IDK what the memory requirements are but it maxxes out my 16GB of RAM and eats tons of swap.

And I also noticed some models don't work as an "A" input (with the error you described), but will work as a "B" input.

LumiWasTaken · 2022-12-12T15:48:14Z

Yeah, IDK what the memory requirements are but it maxxes out my 16GB of RAM and eats tons of swap.

And I also noticed some models don't work as an "A" input (with the error you described), but will work as a "B" input.

talking about VRAM

brucethemoose · 2022-12-12T15:50:58Z

Yeah, IDK what the memory requirements are but it maxxes out my 16GB of RAM and eats tons of swap.
And I also noticed some models don't work as an "A" input (with the error you described), but will work as a "B" input.

talking about VRAM

Yeah, but my theory is that if RAM usage is that high, setting the device to GPU will probably require a similar amount of memory.

LumiWasTaken · 2022-12-12T15:53:45Z

Yeah, IDK what the memory requirements are but it maxxes out my 16GB of RAM and eats tons of swap.
And I also noticed some models don't work as an "A" input (with the error you described), but will work as a "B" input.

talking about VRAM

Yeah, but my theory is that if RAM usage is that high, setting the device to GPU will probably require a similar amount of memory.

That sounds fair, so using 16GB of RAM as an equivalent is okay.

But i ran it on a GPU with 24GB VRAM and for testing on a A100 40GB and it did max it out again and ran into an error... so there is that issue

ogkalu2 · 2022-12-12T15:57:59Z

Hi. Sorry to hear that. Even I'm unsure of the exact requirements at this point.
Can you try running this commit and see if it works. I think this was was slower but used less resources
93b0e95

LumiWasTaken · 2022-12-12T17:47:10Z

Hi. Sorry to hear that. Even I'm unsure of the exact requirements at this point. Can you try running this commit and see if it works. I think this was was slower but used less resources 93b0e95

In this case i run into the issue again:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)

SD_rebasin_merge.py --model_a "fileA.ckpt" --model_b "fileB.ckpt" --device cuda

ogkalu2 · 2022-12-12T17:59:35Z

Make your device Cpu. It'll still run on cuda on the parts that it can. Of that's what you've been doimg, You should also try that for latest commit

LumiWasTaken · 2022-12-12T18:04:44Z

Make your device Cpu. It'll still run on cuda on the parts that it can. Of that's what you've been doimg, You should also try that for latest commit

well i have seen 0% gpu utilization and 100% cpu

brucethemoose · 2022-12-12T18:11:32Z

Merges are reasonably fast on CPU, thats not really an issue IMO since they are so infrequent.

But being locked to torch 11 because of the cpu requirement kinda is an issue 🤔.

LumiWasTaken · 2022-12-12T18:12:48Z

Merges are reasonably fast on CPU, thats not really an issue IMO.

Being locked to torch 11 because of the cpu requirement kinda is though 🤔.

Its not really for me... and especially when i wanna do a larger batch of model merges via a seperate script its a bit meh

brucethemoose · 2022-12-12T18:13:41Z

Merges are reasonably fast on CPU, thats not really an issue IMO.
Being locked to torch 11 because of the cpu requirement kinda is though 🤔.

Its not really for me... and especially when i wanna do a larger batch of model merges via a seperate script its a bit meh

Yeah but even a mega merge script is still gonna take less than 5 minutes.

In ML world, thats basically free :P

brucethemoose · 2022-12-12T18:15:30Z

(For reference a merge finishes in like 30 seconds on my 8C 4900HS running linux)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request for Optimization, fixing issues with CUDA devices #23

Request for Optimization, fixing issues with CUDA devices #23

LumiWasTaken commented Dec 11, 2022 •

edited

Loading

LumiWasTaken commented Dec 12, 2022

brucethemoose commented Dec 12, 2022 •

edited

Loading

LumiWasTaken commented Dec 12, 2022

brucethemoose commented Dec 12, 2022

LumiWasTaken commented Dec 12, 2022

ogkalu2 commented Dec 12, 2022

LumiWasTaken commented Dec 12, 2022

ogkalu2 commented Dec 12, 2022

LumiWasTaken commented Dec 12, 2022

brucethemoose commented Dec 12, 2022 •

edited

Loading

LumiWasTaken commented Dec 12, 2022

brucethemoose commented Dec 12, 2022 •

edited

Loading

brucethemoose commented Dec 12, 2022

Request for Optimization, fixing issues with CUDA devices #23

Request for Optimization, fixing issues with CUDA devices #23

Comments

LumiWasTaken commented Dec 11, 2022 • edited Loading

LumiWasTaken commented Dec 12, 2022

brucethemoose commented Dec 12, 2022 • edited Loading

LumiWasTaken commented Dec 12, 2022

brucethemoose commented Dec 12, 2022

LumiWasTaken commented Dec 12, 2022

ogkalu2 commented Dec 12, 2022

LumiWasTaken commented Dec 12, 2022

ogkalu2 commented Dec 12, 2022

LumiWasTaken commented Dec 12, 2022

brucethemoose commented Dec 12, 2022 • edited Loading

LumiWasTaken commented Dec 12, 2022

brucethemoose commented Dec 12, 2022 • edited Loading

brucethemoose commented Dec 12, 2022

LumiWasTaken commented Dec 11, 2022 •

edited

Loading

brucethemoose commented Dec 12, 2022 •

edited

Loading

brucethemoose commented Dec 12, 2022 •

edited

Loading

brucethemoose commented Dec 12, 2022 •

edited

Loading