Won't get the GPU to get utilized on MacBook with M3 Max and 128 GB RAM. #946

Gabbelgu · 2024-10-08T08:30:43Z

Describe the bug
I won't get the GPU to get utilized on my MacBook.
Other apps like LLM can utilize up to 70 GB RAM for the graphic processor.

To Reproduce
Steps to reproduce the behavior:
I've enabled CoreML, Max. Number of Threads = 18, GFPGAN and the other processors.
Same problem with Max. Number of Threads = 3, GFPGAN and the other processors..
Same problem with Max. Number of Threads = 8, GFPGAN and the other processors..

My configuration is:

MacBook Pro 16" 2023
M3 Max
128 GB RAM
Python 3.11
The rate is quite low like 1 to 2s / frames, and it mostly hangs up, not going forward for 3-5s, then recalculates to 1-2s / frames.

Details
What OS are you using?

Linux
Linux in WSL
Windows
[ x] Mac

Are you using a GPU?

No. CPU FTW
NVIDIA
AMD
Intel
[ x] Mac

Which version of roop unleashed are you using?
4.3.1

Screenshots
If applicable, add screenshots to help explain your problem.

BrZHub · 2024-10-10T18:31:50Z

I had the same issue on a Macbook Air M2 24GB
Framerate was about 2sec per frame.
I upgraded the onnxruntime to 1.19.2 and now it does about 20 frames per second.

Just remove these two lines in requirements.txt:

onnxruntime==1.17.1; sys_platform == 'darwin' and platform_machine != 'arm64'
onnxruntime-silicon==1.16.3; sys_platform == 'darwin' and platform_machine == 'arm64'

And add this one:

onnxruntime==1.19.2; sys_platform == 'darwin'

And performance should be a lot better

Gabbelgu · 2024-10-10T21:43:50Z

Thank you, I tried it with removing the two lines and adding the one line in the requirements.txt but it is not working for me.

codecowboy · 2024-10-29T17:46:27Z

@BrZHub > I upgraded the onnxruntime to 1.19.2 and now it does about 20 frames per second

Can you explain how you upgraded the runtime? python -m pip install onnxruntime ==1.19.2 ? I'm on an M1Pro with 16GB which is also doing about 2 frames / second. It also seems like platform_machine == arm64 would be fairly important?

codecowboy · 2024-10-30T11:21:11Z

@C0untFloyd Any chance you could provide some guidance here? Am happy to do some testing and add to the wiki - have got lots of time on my hands

BrZHub · 2024-10-30T11:56:25Z

@BrZHub > I upgraded the onnxruntime to 1.19.2 and now it does about 20 frames per second

Can you explain how you upgraded the runtime? python -m pip install onnxruntime ==1.19.2 ? I'm on an M1Pro with 16GB which is also doing about 2 frames / second. It also seems like platform_machine == arm64 would be fairly important?

My requirements.txt file looks like this:

--extra-index-url https://download.pytorch.org/whl/cu118

numpy==1.26.4
gradio==4.44.0
fastapi<0.113.0
opencv-python-headless==4.9.0.80
onnx==1.17.0
insightface==0.7.3
albucore==0.0.16
psutil==5.9.6
torch==2.1.2+cu118; sys_platform != 'darwin'
torch==2.1.2; sys_platform == 'darwin'
torchvision==0.16.2+cu118; sys_platform != 'darwin'
torchvision==0.16.2; sys_platform == 'darwin'
onnxruntime==1.19.2; sys_platform == 'darwin'
onnxruntime-gpu==1.17.1; sys_platform != 'darwin'
tqdm==4.66.4
ftfy
regex
pyvirtualcam

It changed onnx and onnxruntime.
It installs the dependencies listed in this file when you start runMacOS.sh
So it probably overrides anything you install manually using "pip install"

On the settings page I set the provider to "coreml"

If i run this test clip and swap all faces without adding any additional filters it runs an average of 11.5 FPS:

Processing clip.trim_12-39-03.mp4 took 55.71 secs, 11.52 frames/s

clip.trim.mp4

After looking at this further and looking at CPU/GPU usage, I'm not actually sure it's using CoreML, but there is no chart to see if it is using the NPU...
But upgrading the ONNX libraries did increase the performance by 5x on my machine.. (15" MacBook Air M2)
So there might be more gains to make.

codecowboy · 2024-10-30T12:13:18Z

Many thanks. What do you have your no of execution threads set to in settings? I'm not sure if that is referring to the cpu or gpu.
I've now tried editing requirements.txt as per yours but don't see a performance increase.

I also wondered if we could make use of https://pypi.org/project/onnxruntime-coreml/ somehow.

See also https://onnxruntime.ai/docs/execution-providers/CoreML-ExecutionProvider.html

My python is pretty rusty but happy to collaborate with someone on this.

codecowboy · 2024-10-30T16:29:18Z

Have done a bit of digging and the following is placed in a number of files which load the models:


# replace Mac mps with cpu for the moment
            self.devicename = self.plugin_options["devicename"].replace('mps', 'cpu')

My guess is that no use is being made of the GPU or at least the Metal layer. I don't have a deep enough understanding of how CoreML works to know how that all fits together

C0untFloyd · 2024-10-31T07:28:06Z

Any chance you could provide some guidance here? Am happy to do some testing and add to the wiki - have got lots of time on my hands

Sorry I'm currently very short on time and I don't own a Mac.

self.devicename = self.plugin_options["devicename"].replace('mps', 'cpu')

You could comment out every line where this is done, see if it makes a difference. I sadly don't remember why there is this fallback to cpu. If it's working this could be easily made into a config setting.

codecowboy · 2024-10-31T09:19:16Z

Thanks. I’ll create a fork and let you know if I get it working.

Gabbelgu · 2024-10-31T10:31:17Z

Thanks all for your comments and ideas.

Thanks. I’ll create a fork and let you know if I get it working.

I can do tests with your fork on my macbook if it helps.

tookdes · 2024-11-26T15:59:20Z

Any chance you could provide some guidance here? Am happy to do some testing and add to the wiki - have got lots of time on my hands

Sorry I'm currently very short on time and I don't own a Mac.

self.devicename = self.plugin_options["devicename"].replace('mps', 'cpu')

You could comment out every line where this is done, see if it makes a difference. I sadly don't remember why there is this fallback to cpu. If it's working this could be easily made into a config setting.

#269

It appears to be because onnxruntime just cannot support devices other than CUDA, such as MPS. I tested removing the MPS to CPU replacement code on Mac M4, with results shown below.

onnxruntime_inference_collection.py", line 32, in get_ort_device_type
    raise Exception("Unsupported device type: " + device_type)
Exception: Unsupported device type: mps

It seems that unless the onnxruntime issue is resolved, Mac devices won't be able to use CoreML acceleration for roop.

codecowboy · 2024-11-26T18:32:45Z

That's not actually the case. There is a coreml execution provider it's just that the code as it is doesn't really make use of it. Newer versions of onnxruntime also directly support apple silicon but the packages in this repo are pinned to earlier versions.

I've been experimenting with all this and converting some of the models to coreml and also forcing coreml where I can in the existing code. I've seen slight improvements in frame rate but nothing spectacular yet. In addition there are allegedly speed gains to be made in the cv2 code by using UMat instead of Mat. I will be trying all this out on an adhoc basis so don't hold your breath but I'll report back if I make significant progress. In the meantime I'm using a GPU cloud instance with an NVIDIA card.

rdastartupguy · 2024-12-14T08:35:28Z

@BrZHub > I upgraded the onnxruntime to 1.19.2 and now it does about 20 frames per second
Can you explain how you upgraded the runtime? python -m pip install onnxruntime ==1.19.2 ? I'm on an M1Pro with 16GB which is also doing about 2 frames / second. It also seems like platform_machine == arm64 would be fairly important?

My requirements.txt file looks like this:

--extra-index-url https://download.pytorch.org/whl/cu118
numpy==1.26.4
gradio==4.44.0
fastapi<0.113.0
opencv-python-headless==4.9.0.80
onnx==1.17.0
insightface==0.7.3
albucore==0.0.16
psutil==5.9.6
torch==2.1.2+cu118; sys_platform != 'darwin'
torch==2.1.2; sys_platform == 'darwin'
torchvision==0.16.2+cu118; sys_platform != 'darwin'
torchvision==0.16.2; sys_platform == 'darwin'
onnxruntime==1.19.2; sys_platform == 'darwin'
onnxruntime-gpu==1.17.1; sys_platform != 'darwin'
tqdm==4.66.4
ftfy
regex
pyvirtualcam

It changed onnx and onnxruntime. It installs the dependencies listed in this file when you start runMacOS.sh So it probably overrides anything you install manually using "pip install"

On the settings page I set the provider to "coreml"

If i run this test clip and swap all faces without adding any additional filters it runs an average of 11.5 FPS:

Processing clip.trim_12-39-03.mp4 took 55.71 secs, 11.52 frames/s

clip.trim.mp4

After looking at this further and looking at CPU/GPU usage, I'm not actually sure it's using CoreML, but there is no chart to see if it is using the NPU... But upgrading the ONNX libraries did increase the performance by 5x on my machine.. (15" MacBook Air M2) So there might be more gains to make.

Alright I confirm, changing onnx and onnxruntime versions does enable CoreML capability and FPS hits 10 to 15 on m2 pro. However, this seems to work only on first run. The second video reverts back to CPU and a crawling 0.7 fps to 2 fps max. Restarting the app enables CoreML again. Strange.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Won't get the GPU to get utilized on MacBook with M3 Max and 128 GB RAM. #946

Won't get the GPU to get utilized on MacBook with M3 Max and 128 GB RAM. #946

Gabbelgu commented Oct 8, 2024 •

edited

Loading

BrZHub commented Oct 10, 2024

Gabbelgu commented Oct 10, 2024 •

edited

Loading

codecowboy commented Oct 29, 2024 •

edited

Loading

codecowboy commented Oct 30, 2024

BrZHub commented Oct 30, 2024 •

edited

Loading

codecowboy commented Oct 30, 2024 •

edited

Loading

codecowboy commented Oct 30, 2024

C0untFloyd commented Oct 31, 2024

codecowboy commented Oct 31, 2024

Gabbelgu commented Oct 31, 2024

tookdes commented Nov 26, 2024

codecowboy commented Nov 26, 2024 •

edited

Loading

rdastartupguy commented Dec 14, 2024

Won't get the GPU to get utilized on MacBook with M3 Max and 128 GB RAM. #946

Won't get the GPU to get utilized on MacBook with M3 Max and 128 GB RAM. #946

Comments

Gabbelgu commented Oct 8, 2024 • edited Loading

BrZHub commented Oct 10, 2024

Gabbelgu commented Oct 10, 2024 • edited Loading

codecowboy commented Oct 29, 2024 • edited Loading

codecowboy commented Oct 30, 2024

BrZHub commented Oct 30, 2024 • edited Loading

codecowboy commented Oct 30, 2024 • edited Loading

codecowboy commented Oct 30, 2024

C0untFloyd commented Oct 31, 2024

codecowboy commented Oct 31, 2024

Gabbelgu commented Oct 31, 2024

tookdes commented Nov 26, 2024

codecowboy commented Nov 26, 2024 • edited Loading

rdastartupguy commented Dec 14, 2024

Gabbelgu commented Oct 8, 2024 •

edited

Loading

Gabbelgu commented Oct 10, 2024 •

edited

Loading

codecowboy commented Oct 29, 2024 •

edited

Loading

BrZHub commented Oct 30, 2024 •

edited

Loading

codecowboy commented Oct 30, 2024 •

edited

Loading

codecowboy commented Nov 26, 2024 •

edited

Loading