-
Notifications
You must be signed in to change notification settings - Fork 135
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
8 changed files
with
270 additions
and
10 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
<!-- https://mkdocstrings.github.io/python/usage/configuration/general/ --> | ||
::: faster_whisper_server.config.Config | ||
options: | ||
show_bases: true | ||
show_if_no_docstring: true | ||
show_labels: false | ||
separate_signature: true | ||
show_signature_annotations: true | ||
signature_crossrefs: true | ||
summary: false | ||
source: true | ||
members_order: source | ||
filters: | ||
- "!model_config" | ||
- "!chat_completion_*" | ||
- "!speech_*" | ||
- "!transcription_*" | ||
|
||
::: faster_whisper_server.config.WhisperConfig | ||
|
||
<!-- TODO: nested model `whisper` --> | ||
<!-- TODO: Insert new lines for multi-line docstrings --> |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,101 @@ | ||
## Docker Compose (Recommended) | ||
|
||
TODO: just reference the existing compose file in the repo | ||
=== "CUDA" | ||
|
||
```yaml | ||
# https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html | ||
services: | ||
faster-whisper-server: | ||
image: fedirz/faster-whisper-server:latest-cuda | ||
name: faster-whisper-server | ||
restart: unless-stopped | ||
ports: | ||
- 8000:8000 | ||
volumes: | ||
- hugging_face_cache:/root/.cache/huggingface | ||
deploy: | ||
resources: | ||
reservations: | ||
devices: | ||
- capabilities: ["gpu"] | ||
volumes: | ||
hugging_face_cache: | ||
``` | ||
|
||
=== "CUDA (with CDI feature enabled)" | ||
|
||
```yaml | ||
# https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html | ||
services: | ||
faster-whisper-server: | ||
image: fedirz/faster-whisper-server:latest-cuda | ||
name: faster-whisper-server | ||
restart: unless-stopped | ||
ports: | ||
- 8000:8000 | ||
volumes: | ||
- hugging_face_cache:/root/.cache/huggingface | ||
deploy: | ||
resources: | ||
reservations: | ||
# https://docs.docker.com/reference/cli/dockerd/#enable-cdi-devices | ||
# https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html | ||
devices: | ||
- driver: cdi | ||
device_ids: | ||
- nvidia.com/gpu=all | ||
volumes: | ||
hugging_face_cache: | ||
``` | ||
|
||
=== "CPU" | ||
|
||
```yaml | ||
services: | ||
faster-whisper-server: | ||
image: fedirz/faster-whisper-server:latest-cpu | ||
name: faster-whisper-server | ||
restart: unless-stopped | ||
ports: | ||
- 8000:8000 | ||
volumes: | ||
- hugging_face_cache:/root/.cache/huggingface | ||
volumes: | ||
hugging_face_cache: | ||
``` | ||
|
||
## Docker | ||
|
||
=== "CUDA" | ||
|
||
```bash | ||
docker run --rm --detach --publish 8000:8000 --name faster-whisper-server --volume hugging_face_cache:/root/.cache/huggingface --gpus=all fedirz/faster-whisper-server:latest-cuda | ||
``` | ||
|
||
=== "CUDA (with CDI feature enabled)" | ||
|
||
```bash | ||
docker run --rm --detach --publish 8000:8000 --name faster-whisper-server --volume hugging_face_cache:/root/.cache/huggingface --device=nvidia.com/gpu=all fedirz/faster-whisper-server:latest-cuda | ||
``` | ||
|
||
=== "CPU" | ||
|
||
```bash | ||
docker run --rm --detach --publish 8000:8000 --name faster-whisper-server --volume hugging_face_cache:/root/.cache/huggingface fedirz/faster-whisper-server:latest-cpu | ||
``` | ||
|
||
## Kubernetes | ||
WARNING: it was written few months ago and may be outdated. | ||
Please refer to this [blog post](https://substratus.ai/blog/deploying-faster-whisper-on-k8s) | ||
|
||
## Python (requires Python 3.12+) | ||
|
||
```bash | ||
git clone https://github.com/fedirz/faster-whisper-server.git | ||
cd faster-whisper-server | ||
uv venv | ||
sourve .venv/bin/activate | ||
uv sync --all-extras | ||
uvicorn --factory --host 0.0.0.0 faster_whisper_server.main:create_app | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
!!! warning | ||
|
||
Under development. I don't yet recommend using these docs as reference for now. | ||
|
||
# Faster Whisper Server | ||
|
||
`faster-whisper-server` is an OpenAI API-compatible transcription server which uses [faster-whisper](https://github.com/SYSTRAN/faster-whisper) as its backend. | ||
Features: | ||
|
||
- GPU and CPU support. | ||
- Easily deployable using Docker. | ||
- **Configurable through environment variables (see [config.py](./src/faster_whisper_server/config.py))**. | ||
- OpenAI API compatible. | ||
- Streaming support (transcription is sent via [SSE](https://en.wikipedia.org/wiki/Server-sent_events) as the audio is transcribed. You don't need to wait for the audio to fully be transcribed before receiving it). | ||
- Live transcription support (audio is sent via websocket as it's generated). | ||
- Dynamic model loading / offloading. Just specify which model you want to use in the request and it will be loaded automatically. It will then be unloaded after a period of inactivity. | ||
|
||
Please create an issue if you find a bug, have a question, or a feature suggestion. | ||
|
||
## OpenAI API Compatibility ++ | ||
|
||
See [OpenAI API reference](https://platform.openai.com/docs/api-reference/audio) for more information. | ||
|
||
- Audio file transcription via `POST /v1/audio/transcriptions` endpoint. | ||
- Unlike OpenAI's API, `faster-whisper-server` also supports streaming transcriptions (and translations). This is useful for when you want to process large audio files and would rather receive the transcription in chunks as they are processed, rather than waiting for the whole file to be transcribed. It works similarly to chat messages when chatting with LLMs. | ||
- Audio file translation via `POST /v1/audio/translations` endpoint. | ||
- Live audio transcription via `WS /v1/audio/transcriptions` endpoint. | ||
- LocalAgreement2 ([paper](https://aclanthology.org/2023.ijcnlp-demo.3.pdf) | [original implementation](https://github.com/ufal/whisper_streaming)) algorithm is used for live transcription. | ||
- Only transcription of a single channel, 16000 sample rate, raw, 16-bit little-endian audio is supported. | ||
|
||
TODO: add a note about gradio ui | ||
TODO: add a note about hf space |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,86 @@ | ||
TODO: break this down into: transcription/translation, streaming transcription/translation, live transcription, audio generation, model listing | ||
TODO: add video demos for all | ||
TODO: add a note about OPENAI_API_KEY | ||
|
||
## Curl | ||
|
||
```bash | ||
curl http://localhost:8000/v1/audio/transcriptions -F "[email protected]" | ||
``` | ||
|
||
## Python | ||
|
||
=== "httpx" | ||
|
||
```python | ||
import httpx | ||
|
||
with open('audio.wav', 'rb') as f: | ||
files = {'file': ('audio.wav', f)} | ||
response = httpx.post('http://localhost:8000/v1/audio/transcriptions', files=files) | ||
|
||
print(response.text) | ||
``` | ||
|
||
## OpenAI SDKs | ||
|
||
=== "Python" | ||
|
||
```python | ||
import httpx | ||
|
||
with open('audio.wav', 'rb') as f: | ||
files = {'file': ('audio.wav', f)} | ||
response = httpx.post('http://localhost:8000/v1/audio/transcriptions', files=files) | ||
|
||
print(response.text) | ||
``` | ||
|
||
=== "CLI" | ||
|
||
```bash | ||
export OPENAI_BASE_URL=http://localhost:8000/v1/ | ||
export OPENAI_API_KEY="cant-be-empty" | ||
openai api audio.transcriptions.create -m Systran/faster-whisper-small -f audio.wav --response-format text | ||
``` | ||
|
||
=== "Other" | ||
|
||
See [OpenAI libraries](https://platform.openai.com/docs/libraries) and [OpenAI speech-to-text usage](https://platform.openai.com/docs/guides/speech-to-text). | ||
|
||
## Open WebUI | ||
|
||
### Using the UI | ||
|
||
1. Go to the [Admin Settings](http://localhost:8080/admin/settings) page | ||
2. Click on the "Audio" tab | ||
3. Update settings | ||
- Speech-to-Text Engine: OpenAI | ||
- API Base URL: http://faster-whisper-server:8000/v1 | ||
- API Key: does-not-matter-what-you-put-but-should-not-be-empty | ||
- Model: Systran/faster-distil-whisper-large-v3 | ||
4. Click "Save" | ||
|
||
### Using environment variables (Docker Compose) | ||
|
||
!!! warning | ||
|
||
This doesn't seem to work when you've previously used the UI to set the STT engine. | ||
|
||
```yaml | ||
# NOTE: Some parts of the file are omitted for brevity. | ||
services: | ||
open-webui: | ||
image: ghcr.io/open-webui/open-webui:main | ||
... | ||
environment: | ||
... | ||
# Environment variables are documented here https://docs.openwebui.com/getting-started/env-configuration#speech-to-text | ||
AUDIO_STT_ENGINE: "openai" | ||
AUDIO_STT_OPENAI_API_BASE_URL: "http://faster-whisper-server:8000/v1" | ||
AUDIO_STT_OPENAI_API_KEY: "does-not-matter-what-you-put-but-should-not-be-empty" | ||
AUDIO_STT_MODEL: "Systran/faster-distil-whisper-large-v3" | ||
faster-whisper-server: | ||
image: fedirz/faster-whisper-server:latest-cuda | ||
... | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters