Use of all the CPU cores on Raspberry Pi 5 with llamafile #337
Replies: 4 comments
-
Maybe @jart can help here? |
Beta Was this translation helpful? Give feedback.
-
Normally it will by default, no configuration is needed. (At least it do on x86) |
Beta Was this translation helpful? Give feedback.
-
You can speed up prompt processing by using F16 weights on RPI5 which is fast. for faster token generation speeds, your best bet at the moment would probably be |
Beta Was this translation helpful? Give feedback.
-
Do we have a survey mechanism perhaps? E.g. what if we have a binary that runs though different benchmarks, then write a single sentence telling the user which mode they should use? And perhaps better if it has an optional "do you want to upload your spec" to a leaderboard? In the readme you can tell users to download the benchmark first, run it though then pick a model from huggingface that matches the ability of your machine. |
Beta Was this translation helpful? Give feedback.
-
Hi,
To use all the CPU cores on Raspberry Pi 5 to speed up the inference, what configuration needs to be done with the llamafile?
Beta Was this translation helpful? Give feedback.
All reactions