Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for Optimization, fixing issues with CUDA devices #23

Open
LumiWasTaken opened this issue Dec 11, 2022 · 13 comments
Open

Request for Optimization, fixing issues with CUDA devices #23

LumiWasTaken opened this issue Dec 11, 2022 · 13 comments

Comments

@LumiWasTaken
Copy link

LumiWasTaken commented Dec 11, 2022

Hey there!

I really love the project and the idea behind it.

Sadly i lack infos to run it properly.

On my Device (3060) it runs via GPU very quickly runs into an OOM issue maximizing my 12Gb VRAM when merging 2x 2GB Models in CUDA

Its unclear if the Script is able to handle float16 and float32 mixes or the error "dot function not implemented for 'half'" is a user / env issue.

Fix issues like
<class 'KeyError'> 'model_ema.decay'
For some models that are based on NovelAi or are unpruned?

I'd like to have more infos about your current enviroment.

I have desperately tried to get it working on a RTX 5000 but despite all efforts all attempts to run it on a GPU run into a OOM issue.

Also Feature request for:
Saving the Model after x Iterations so when i do i can compare results. i have found that after a certain iteration count the results get worse than expected.
Renaming default output name "merge.ckpt" to something like "model_a_name_without_ext--model_b_name_without_ext--alpha--xxxiter.ckpt"

@LumiWasTaken
Copy link
Author

Or when running in GPU / CUDA mode its common to have this issue:

<class 'RuntimeError'> Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)

@brucethemoose
Copy link

brucethemoose commented Dec 12, 2022

Yeah, IDK what the memory requirements are but it maxxes out my 16GB of RAM and eats tons of swap.

And I also noticed some models don't work as an "A" input (with the error you described), but will work as a "B" input.

@LumiWasTaken
Copy link
Author

Yeah, IDK what the memory requirements are but it maxxes out my 16GB of RAM and eats tons of swap.

And I also noticed some models don't work as an "A" input (with the error you described), but will work as a "B" input.

talking about VRAM

@brucethemoose
Copy link

Yeah, IDK what the memory requirements are but it maxxes out my 16GB of RAM and eats tons of swap.
And I also noticed some models don't work as an "A" input (with the error you described), but will work as a "B" input.

talking about VRAM

Yeah, but my theory is that if RAM usage is that high, setting the device to GPU will probably require a similar amount of memory.

@LumiWasTaken
Copy link
Author

Yeah, IDK what the memory requirements are but it maxxes out my 16GB of RAM and eats tons of swap.
And I also noticed some models don't work as an "A" input (with the error you described), but will work as a "B" input.

talking about VRAM

Yeah, but my theory is that if RAM usage is that high, setting the device to GPU will probably require a similar amount of memory.

That sounds fair, so using 16GB of RAM as an equivalent is okay.

But i ran it on a GPU with 24GB VRAM and for testing on a A100 40GB and it did max it out again and ran into an error... so there is that issue

@ogkalu2
Copy link
Owner

ogkalu2 commented Dec 12, 2022

Hi. Sorry to hear that. Even I'm unsure of the exact requirements at this point.
Can you try running this commit and see if it works. I think this was was slower but used less resources
93b0e95

@LumiWasTaken
Copy link
Author

Hi. Sorry to hear that. Even I'm unsure of the exact requirements at this point. Can you try running this commit and see if it works. I think this was was slower but used less resources 93b0e95

In this case i run into the issue again:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)

SD_rebasin_merge.py --model_a "fileA.ckpt" --model_b "fileB.ckpt" --device cuda

@ogkalu2
Copy link
Owner

ogkalu2 commented Dec 12, 2022

Make your device Cpu. It'll still run on cuda on the parts that it can. Of that's what you've been doimg, You should also try that for latest commit

@LumiWasTaken
Copy link
Author

Make your device Cpu. It'll still run on cuda on the parts that it can. Of that's what you've been doimg, You should also try that for latest commit

well i have seen 0% gpu utilization and 100% cpu
grafik

@brucethemoose
Copy link

brucethemoose commented Dec 12, 2022

Merges are reasonably fast on CPU, thats not really an issue IMO since they are so infrequent.

But being locked to torch 11 because of the cpu requirement kinda is an issue 🤔.

@LumiWasTaken
Copy link
Author

Merges are reasonably fast on CPU, thats not really an issue IMO.

Being locked to torch 11 because of the cpu requirement kinda is though 🤔.

Its not really for me... and especially when i wanna do a larger batch of model merges via a seperate script its a bit meh

@brucethemoose
Copy link

brucethemoose commented Dec 12, 2022

Merges are reasonably fast on CPU, thats not really an issue IMO.
Being locked to torch 11 because of the cpu requirement kinda is though 🤔.

Its not really for me... and especially when i wanna do a larger batch of model merges via a seperate script its a bit meh

Yeah but even a mega merge script is still gonna take less than 5 minutes.

In ML world, thats basically free :P

@brucethemoose
Copy link

(For reference a merge finishes in like 30 seconds on my 8C 4900HS running linux)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants