-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request for Optimization, fixing issues with CUDA devices #23
Comments
Or when running in GPU / CUDA mode its common to have this issue: <class 'RuntimeError'> Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select) |
Yeah, IDK what the memory requirements are but it maxxes out my 16GB of RAM and eats tons of swap. And I also noticed some models don't work as an "A" input (with the error you described), but will work as a "B" input. |
talking about VRAM |
Yeah, but my theory is that if RAM usage is that high, setting the device to GPU will probably require a similar amount of memory. |
That sounds fair, so using 16GB of RAM as an equivalent is okay. But i ran it on a GPU with 24GB VRAM and for testing on a A100 40GB and it did max it out again and ran into an error... so there is that issue |
Hi. Sorry to hear that. Even I'm unsure of the exact requirements at this point. |
In this case i run into the issue again:
|
Make your device Cpu. It'll still run on cuda on the parts that it can. Of that's what you've been doimg, You should also try that for latest commit |
Merges are reasonably fast on CPU, thats not really an issue IMO since they are so infrequent. But being locked to torch 11 because of the cpu requirement kinda is an issue 🤔. |
Its not really for me... and especially when i wanna do a larger batch of model merges via a seperate script its a bit meh |
Yeah but even a mega merge script is still gonna take less than 5 minutes. In ML world, thats basically free :P |
(For reference a merge finishes in like 30 seconds on my 8C 4900HS running linux) |
Hey there!
I really love the project and the idea behind it.
Sadly i lack infos to run it properly.
On my Device (3060) it runs via GPU very quickly runs into an OOM issue maximizing my 12Gb VRAM when merging 2x 2GB Models in CUDA
Its unclear if the Script is able to handle float16 and float32 mixes or the error "dot function not implemented for 'half'" is a user / env issue.
Fix issues like
<class 'KeyError'> 'model_ema.decay'
For some models that are based on NovelAi or are unpruned?
I'd like to have more infos about your current enviroment.
I have desperately tried to get it working on a RTX 5000 but despite all efforts all attempts to run it on a GPU run into a OOM issue.
Also Feature request for:
Saving the Model after x Iterations so when i do i can compare results. i have found that after a certain iteration count the results get worse than expected.
Renaming default output name "merge.ckpt" to something like "model_a_name_without_ext--model_b_name_without_ext--alpha--xxxiter.ckpt"
The text was updated successfully, but these errors were encountered: