-
Notifications
You must be signed in to change notification settings - Fork 554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Won't get the GPU to get utilized on MacBook with M3 Max and 128 GB RAM. #946
Comments
I had the same issue on a Macbook Air M2 24GB Just remove these two lines in requirements.txt:
And add this one:
And performance should be a lot better |
@BrZHub > I upgraded the onnxruntime to 1.19.2 and now it does about 20 frames per second Can you explain how you upgraded the runtime? |
@C0untFloyd Any chance you could provide some guidance here? Am happy to do some testing and add to the wiki - have got lots of time on my hands |
My requirements.txt file looks like this:
It changed onnx and onnxruntime. On the settings page I set the provider to "coreml" If i run this test clip and swap all faces without adding any additional filters it runs an average of 11.5 FPS:
clip.trim.mp4After looking at this further and looking at CPU/GPU usage, I'm not actually sure it's using CoreML, but there is no chart to see if it is using the NPU... |
Many thanks. What do you have your no of execution threads set to in settings? I'm not sure if that is referring to the cpu or gpu. I also wondered if we could make use of https://pypi.org/project/onnxruntime-coreml/ somehow. See also https://onnxruntime.ai/docs/execution-providers/CoreML-ExecutionProvider.html My python is pretty rusty but happy to collaborate with someone on this. |
Have done a bit of digging and the following is placed in a number of files which load the models:
My guess is that no use is being made of the GPU or at least the Metal layer. I don't have a deep enough understanding of how CoreML works to know how that all fits together |
Sorry I'm currently very short on time and I don't own a Mac.
You could comment out every line where this is done, see if it makes a difference. I sadly don't remember why there is this fallback to cpu. If it's working this could be easily made into a config setting. |
Thanks. I’ll create a fork and let you know if I get it working. |
Thanks all for your comments and ideas.
I can do tests with your fork on my macbook if it helps. |
It appears to be because onnxruntime just cannot support devices other than CUDA, such as MPS. I tested removing the MPS to CPU replacement code on Mac M4, with results shown below.
It seems that unless the onnxruntime issue is resolved, Mac devices won't be able to use CoreML acceleration for roop. |
That's not actually the case. There is a coreml execution provider it's just that the code as it is doesn't really make use of it. Newer versions of onnxruntime also directly support apple silicon but the packages in this repo are pinned to earlier versions. I've been experimenting with all this and converting some of the models to coreml and also forcing coreml where I can in the existing code. I've seen slight improvements in frame rate but nothing spectacular yet. In addition there are allegedly speed gains to be made in the cv2 code by using UMat instead of Mat. I will be trying all this out on an adhoc basis so don't hold your breath but I'll report back if I make significant progress. In the meantime I'm using a GPU cloud instance with an NVIDIA card. |
Alright I confirm, changing onnx and onnxruntime versions does enable CoreML capability and FPS hits 10 to 15 on m2 pro. However, this seems to work only on first run. The second video reverts back to CPU and a crawling 0.7 fps to 2 fps max. Restarting the app enables CoreML again. Strange. |
Describe the bug
I won't get the GPU to get utilized on my MacBook.
Other apps like LLM can utilize up to 70 GB RAM for the graphic processor.
To Reproduce
Steps to reproduce the behavior:
I've enabled CoreML, Max. Number of Threads = 18, GFPGAN and the other processors.
Same problem with Max. Number of Threads = 3, GFPGAN and the other processors..
Same problem with Max. Number of Threads = 8, GFPGAN and the other processors..
My configuration is:
MacBook Pro 16" 2023
M3 Max
128 GB RAM
Python 3.11
The rate is quite low like 1 to 2s / frames, and it mostly hangs up, not going forward for 3-5s, then recalculates to 1-2s / frames.
Details
What OS are you using?
Are you using a GPU?
Which version of roop unleashed are you using?
4.3.1
Screenshots
If applicable, add screenshots to help explain your problem.
The text was updated successfully, but these errors were encountered: