Using whisperfile vs whisper.cpp (or server) for Linux speech input #551
Replies: 2 comments 1 reply
-
Beta Was this translation helpful? Give feedback.
-
Thanks Justine!
but after sending the speech recording, the whisperfile server bombs out with a segmentation fault. I looked briefly at the source code of server.cpp in this repo and compared against the upstream file, and do not see substantial differences. Yet, in server mode, only with --gpu auto, whisperfile with the newly compiled ggml-cuda.so seem to misbehave. |
Beta Was this translation helpful? Give feedback.
-
Hello All,
A big fan of llamafile (thanks Justine), I like the addition of whisperfile (thanks CJ Pais) to the latest release. As the author of BlahST (Blah Speech to Text) - a lean, low-resource tool to input text from speech into any Linux window using whisper.cpp, I just added support for whisperfile in BlahST. This should make it easier for some users who would have to otherwise compile main or whisper.cpp server on their Linux machines to be able to use BlahST.
A few benchmarks: On an AMD64 (znver4) with 12GB VRAM (CUDA 12.5), using 8 CPU threads, I am getting about 90x realtime (90xRT=7.6 sec 16-bit audio returned as text in 84 ms) transcription of microphone speech with the whisper.cpp server (the fastest , partially due to model (base.en) preloaded in VRAM and GPU ready).
It is about 23xRT with whisperfile.tiny.en and about 4xRT when using whisperfile.small. Compares favorably with whisper.cpp main (tiny.en) transcribing at 21xRT.
The whsiperfile runs are without the
--gpu auto
flag since for the small models and short speech clips in this use case, the GPU startup overhead dominates and there is no speedup. Haven't tried yet to setup the whisperfile as a server, should be closer to the whisper.cpp server in performance. Will post here when I have comparison of the servers with larger speech clips.Thanks again for this heavy-hitting, portable executable concept and all the clever optimizations of the linear algebra code in ggml!
Beta Was this translation helpful? Give feedback.
All reactions