You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Same here, I'm running a 1B model on Xcode using an iPhone 16 Pro Max Simulator and the generation speed is 0.07 tokens/s. I am not sure if this is actually the right generation speed or if I am missing something (e.g. GPU acceleration). I have "Prefer Discrete GPU" selected on my simulator's GPU selection.
This is my spec:
Macbook Pro / M2 / 16GB / Sonoma 14.7.
I load llama2 model like example successfully but the speed to generate text is really slow.
[1] I'm not sure it use mps to accelerate generation.
How to confirm it?
[2] Is there a smaller LLM than 7B?
Here is my env
The text was updated successfully, but these errors were encountered: