Merge pull request #104 from makaveli10/tensorrt_backend

Tensorrt backend
collabora · Jan 24, 2024 · 3498787 · 3498787
2 parents 0942dc2 + 5cd59b1
commit 3498787
Show file tree

Hide file tree

Showing 13 changed files with 1,460 additions and 175 deletions.
diff --git a/README.md b/README.md
@@ -8,69 +8,89 @@ Unlike traditional speech recognition systems that rely on continuous audio stre
 ## Installation
 - Install PyAudio and ffmpeg
 ```bash
- bash setup.sh
+ bash scripts/setup.sh
 ```
 
 - Install whisper-live from pip
 ```bash
  pip install whisper-live
 ```
 
+### Setting up NVIDIA/TensorRT-LLM for TensorRT backend
+- Please follow [TensorRT_whisper readme](https://github.com/collabora/WhisperLive/blob/main/TensorRT_whisper.md) for setup of [NVIDIA/TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) and for building Whisper-TensorRT engine.
+
 ## Getting Started
-- Run the server
+The server supports two backends `faster_whisper` and `tensorrt`. If running `tensorrt` backend follow [TensorRT_whisper readme](https://github.com/collabora/WhisperLive/blob/main/TensorRT_whisper.md)
+
+### Running the Server
+- [Faster Whisper](https://github.com/SYSTRAN/faster-whisper) backend
+```bash
+python3 run_server.py --port 9090 \
+                      --backend faster_whisper
+
+# running with custom model
+python3 run_server.py --port 9090 \
+                      --backend faster_whisper
+                      -fw "/path/to/custom/faster/whisper/model"
+```
+
+- TensorRT backend. Currently, we recommend to only use the docker setup for TensorRT. Follow [TensorRT_whisper readme](https://github.com/collabora/WhisperLive/blob/main/TensorRT_whisper.md) which works as expected. Make sure to build your TensorRT Engines before running the server with TensorRT backend.
+```bash
+# Run English only model
+python3 run_server.py -p 9090 \
+                      -b tensorrt \
+                      -trt /home/TensorRT-LLM/examples/whisper/whisper_small_en
+
+# Run Multilingual model
+python3 run_server.py -p 9090 \
+                      -b tensorrt \
+                      -trt /home/TensorRT-LLM/examples/whisper/whisper_small \
+                      -m
+```
+
+
+### Running the Client
+- To transcribe an audio file:
 ```python
- from whisper_live.server import TranscriptionServer
- server = TranscriptionServer()
- server.run("0.0.0.0", 9090)
+from whisper_live.client import TranscriptionClient
+client = TranscriptionClient(
+  "localhost",
+  9090,
+  is_multilingual=False,
+  lang="en",
+  translate=False,
+  model_size="small"
+)
+
+client("tests/jfk.wav")
 ```
+This command transcribes the specified audio file (audio.wav) using the Whisper model. It connects to the server running on localhost at port 9090. It can also enable the multilingual feature, allowing transcription in multiple languages. The language option specifies the target language for transcription, in this case, English ("en"). The translate option should be set to `True` if we want to translate from the source language to English and `False` if we want to transcribe in the source language.
 
-- On the client side
-    - To transcribe an audio file:
-    ```python
-      from whisper_live.client import TranscriptionClient
-      client = TranscriptionClient(
-        "localhost",
-        9090,
-        is_multilingual=False,
-        lang="en",
-        translate=False,
-        model_size="small"
-      )
-
-      client("tests/jfk.wav")
-    ```
-    This command transcribes the specified audio file (audio.wav) using the Whisper model. It connects to the server running on localhost at port 9090. It can also enable the multilingual feature, allowing transcription in multiple languages. The language option specifies the target language for transcription, in this case, English ("en"). The translate option should be set to `True` if we want to translate from the source language to English and `False` if we want to transcribe in the source language.
-
-    - To transcribe from microphone:
-    ```python
-      from whisper_live.client import TranscriptionClient
-      client = TranscriptionClient(
-        "localhost",
-        9090,
-        is_multilingual=True,
-        lang="hi",
-        translate=True,
-        model_size="small"
-      )
-      client()
-    ```
-    This command captures audio from the microphone and sends it to the server for transcription. It uses the multilingual option with `hi` as the selected language, enabling the multilingual feature and specifying the target language and task. We use whisper `small` by default but can be changed to any other option based on the requirements and the hardware running the server.
-
-    - To transcribe from a HLS stream:
-    ```python
-      client = TranscriptionClient(host, port, is_multilingual=True, lang="en", translate=False) 
-      client(hls_url="http://as-hls-ww-live.akamaized.net/pool_904/live/ww/bbc_1xtra/bbc_1xtra.isml/bbc_1xtra-audio%3d96000.norewind.m3u8") 
-    ```
-    This command streams audio into the server from a HLS stream. It uses the same options as the previous command, enabling the multilingual feature and specifying the target language and task.
+- To transcribe from microphone:
+```python
+from whisper_live.client import TranscriptionClient
+client = TranscriptionClient(
+  "localhost",
+  9090,
+  is_multilingual=True,
+  lang="hi",
+  translate=True,
+  model_size="small"
+)
+client()
+```
+This command captures audio from the microphone and sends it to the server for transcription. It uses the multilingual option with `hi` as the selected language, enabling the multilingual feature and specifying the target language and task. We use whisper `small` by default but can be changed to any other option based on the requirements and the hardware running the server.
 
-## Transcribe audio from browser
-- Run the server
+- To transcribe from a HLS stream:
 ```python
- from whisper_live.server import TranscriptionServer
- server = TranscriptionServer()
- server.run("0.0.0.0", 9090)
+from whisper_live.client import TranscriptionClient
+client = TranscriptionClient(host, port, is_multilingual=True, lang="en", translate=False) 
+client(hls_url="http://as-hls-ww-live.akamaized.net/pool_904/live/ww/bbc_1xtra/bbc_1xtra.isml/bbc_1xtra-audio%3d96000.norewind.m3u8") 
 ```
-This would start the websocket server on port ```9090```.
+This command streams audio into the server from a HLS stream. It uses the same options as the previous command, enabling the multilingual feature and specifying the target language and task.
+
+## Transcribe audio from browser
+- Run the server with your desired backend as shown [here](https://github.com/collabora/WhisperLive?tab=readme-ov-file#running-the-server)
 
 ### Chrome Extension
 - Refer to [Audio-Transcription-Chrome](https://github.com/collabora/whisper-live/tree/main/Audio-Transcription-Chrome#readme) to use Chrome extension.
@@ -80,21 +100,24 @@ This would start the websocket server on port ```9090```.
 
 ## Whisper Live Server in Docker
 - GPU
-```bash
- docker build . -t whisper-live -f docker/Dockerfile.gpu
- docker run -it --gpus all -p 9090:9090 whisper-live:latest
-```
+  - Faster-Whisper
+  ```bash
+  docker build . -t whisper-live -f docker/Dockerfile.gpu
+  docker run -it --gpus all -p 9090:9090 whisper-live:latest
+  ```
+
+  - TensorRT. Follow [TensorRT_whisper readme](https://github.com/collabora/WhisperLive/blob/main/TensorRT_whisper.md) in order to setup docker and use TensorRT backend. We provide a pre-built docker image which has TensorRT-LLM built and ready to use.
 
 - CPU
 ```bash
- docker build . -t whisper-live -f docker/Dockerfile.cpu
- docker run -it -p 9090:9090 whisper-live:latest
+docker build . -t whisper-live -f docker/Dockerfile.cpu
+docker run -it -p 9090:9090 whisper-live:latest
 ```
 **Note**: By default we use "small" model size. To build docker image for a different model size, change the size in server.py and then build the docker image.
 
 ## Future Work
 - [ ] Add translation to other languages on top of transcription.
-- [ ] TensorRT backend for Whisper.
+- [x] TensorRT backend for Whisper.
 
 ## Contact
 

diff --git a/TensorRT_whisper.md b/TensorRT_whisper.md
@@ -0,0 +1,66 @@
+# Whisper-TensorRT
+We have only tested the TensorRT backend in docker so, we recommend docker for a smooth TensorRT backend setup.
+**Note**: We use [our fork to setup TensorRT](https://github.com/makaveli10/TensorRT-LLM)
+
+## Installation
+- Install [docker](https://docs.docker.com/engine/install/)
+- Install [nvidia-container-toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)
+
+- Clone this repo.
+```bash
+git clone https://github.com/collabora/WhisperLive.git
+cd WhisperLive
+```
+
+- Pull the TensorRT-LLM docker image which we prebuilt for WhisperLive TensorRT backend.
+```bash
+docker pull ghcr.io/collabora/whisperbot-base:latest
+```
+
+- Next, we run the docker image and mount WhisperLive repo to the containers `/home` directory.
+```bash
+docker run -it --gpus all --shm-size=8g \
+       --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \
+       -v /path/to/WhisperLive:/home/WhisperLive \
+       ghcr.io/collabora/whisperbot-base:latest
+```
+
+- Make sure to test the installation. 
+```bash
+# export ENV=${ENV:-/etc/shinit_v2} 
+# source $ENV
+python -c "import torch; import tensorrt; import tensorrt_llm"
+```
+**NOTE**: Uncomment and update library paths if imports fail.
+
+## Whisper TensorRT Engine
+- We build `small.en` and `small` multilingual TensorRT engine. The script logs the path of the directory with Whisper TensorRT engine. We need the model_path to run the server.
+```bash
+# convert small.en
+bash build_whisper_tensorrt /root/TensorRT-LLM-examples small.en
+
+# convert small multilingual model
+bash build_whisper_tensorrt /root/TensorRT-LLM-examples small
+```
+
+## Run WhisperLive Server with TensorRT Backend
+```bash
+cd /home/WhisperLive
+
+# Install requirements
+pip install -r requirements/server.txt
+
+# Required to create mel spectogram
+wget --directory-prefix=assets assets/mel_filters.npz https://raw.githubusercontent.com/openai/whisper/main/whisper/assets/mel_filters.npz
+
+# Run English only model
+python3 run_server.py --port 9090 \
+                      --backend tensorrt \
+                      --trt_model_path "path/to/whisper_trt/from/build/step"
+
+# Run Multilingual model
+python3 run_server.py --port 9090 \
+                      --backend tensorrt \
+                      --trt_model_path "path/to/whisper_trt/from/build/step" \
+                      --trt_multilingual
+```
diff --git a/docker/Dockerfile.cpu b/docker/Dockerfile.cpu
@@ -33,7 +33,7 @@ RUN apt install python3-pip -y
 RUN mkdir /app
 WORKDIR /app
 
-COPY setup.sh /app
+COPY scripts/setup.sh /app
 COPY requirements/ /app
 
 RUN bash setup.sh

diff --git a/docker/Dockerfile.gpu b/docker/Dockerfile.gpu
@@ -33,7 +33,7 @@ RUN apt install python3-pip -y
 RUN mkdir /app
 WORKDIR /app
 
-COPY setup.sh /app
+COPY scripts/setup.sh /app
 COPY requirements/ /app
 
 RUN apt update --fix-missing

diff --git a/requirements/server.txt b/requirements/server.txt
@@ -1,7 +1,5 @@
-PyAudio
 faster-whisper==0.10.0
---extra-index-url https://download.pytorch.org/whl/cu111
-torch==1.10.1
-torchaudio==0.10.1
+torch
 websockets
-onnxruntime==1.16.0
+onnxruntime==1.16.0
+numba
diff --git a/run_server.py b/run_server.py
@@ -2,12 +2,37 @@
 from whisper_live.server import TranscriptionServer
 
 if __name__ == "__main__":
-    server = TranscriptionServer()
     parser = argparse.ArgumentParser()
-    parser.add_argument('--model_path', type=str, default=None, help="Custom Faster Whisper Model")
+    parser.add_argument('--port', '-p',
+                        type=int, 
+                        default=9090,
+                        help="Websocket port to run the server on.")
+    parser.add_argument('--backend', '-b',
+                        type=str, 
+                        default='faster_whisper', 
+                        help='Backends from ["tensorrt", "faster_whisper"]')
+    parser.add_argument('--faster_whisper_custom_model_path', '-fw',
+                        type=str, default=None, 
+                        help="Custom Faster Whisper Model")
+    parser.add_argument('--trt_model_path', '-trt',
+                        type=str,
+                        default=None,
+                        help='Whisper TensorRT model path')
+    parser.add_argument('--trt_multilingual', '-m',
+                        action="store_true",
+                        help='Boolean only for TensorRT model. True if multilingual.')
     args = parser.parse_args()
+
+    if args.backend == "tensorrt":
+        if args.trt_model_path is None:
+            raise ValueError("Please Provide a valid tensorrt model path")
+
+    server = TranscriptionServer()
     server.run(
         "0.0.0.0",
-        9090,
-        custom_model_path=args.model_path
+        port=args.port, 
+        backend=args.backend,
+        faster_whisper_custom_model_path=args.faster_whisper_custom_model_path,
+        whisper_tensorrt_path=args.trt_model_path,
+        trt_multilingual=args.trt_multilingual
     )
diff --git a/scripts/build_whisper_tensorrt.sh b/scripts/build_whisper_tensorrt.sh
@@ -0,0 +1,77 @@
+#!/bin/bash
+
+download_and_build_model() {
+    local model_name="$1"
+    local model_url=""
+
+    case "$model_name" in
+        "tiny.en")
+            model_url="https://openaipublic.azureedge.net/main/whisper/models/d3dd57d32accea0b295c96e26691aa14d8822fac7d9d27d5dc00b4ca2826dd03/tiny.en.pt"
+            ;;
+        "tiny")
+            model_url="https://openaipublic.azureedge.net/main/whisper/models/65147644a518d12f04e32d6f3b26facc3f8dd46e5390956a9424a650c0ce22b9/tiny.pt"
+            ;;
+        "base.en")
+            model_url="https://openaipublic.azureedge.net/main/whisper/models/25a8566e1d0c1e2231d1c762132cd20e0f96a85d16145c3a00adf5d1ac670ead/base.en.pt"
+            ;;
+        "base")
+            model_url="https://openaipublic.azureedge.net/main/whisper/models/ed3a0b6b1c0edf879ad9b11b1af5a0e6ab5db9205f891f668f8b0e6c6326e34e/base.pt"
+            ;;
+        "small.en")
+            model_url="https://openaipublic.azureedge.net/main/whisper/models/f953ad0fd29cacd07d5a9eda5624af0f6bcf2258be67c92b79389873d91e0872/small.en.pt"
+            ;;
+        "small")
+            model_url="https://openaipublic.azureedge.net/main/whisper/models/9ecf779972d90ba49c06d968637d720dd632c55bbf19d441fb42bf17a411e794/small.pt"
+            ;;
+        "medium.en")
+            model_url="https://openaipublic.azureedge.net/main/whisper/models/d7440d1dc186f76616474e0ff0b3b6b879abc9d1a4926b7adfa41db2d497ab4f/medium.en.pt"
+            ;;
+        "medium")
+            model_url="https://openaipublic.azureedge.net/main/whisper/models/345ae4da62f9b3d59415adc60127b97c714f32e89e936602e85993674d08dcb1/medium.pt"
+            ;;
+        "large-v1")
+            model_url="https://openaipublic.azureedge.net/main/whisper/models/e4b87e7e0bf463eb8e6956e646f1e277e901512310def2c24bf0e11bd3c28e9a/large-v1.pt"
+            ;;
+        "large-v2")
+            model_url="https://openaipublic.azureedge.net/main/whisper/models/81f7c96c852ee8fc832187b0132e569d6c3065a3252ed18e56effd0b6a73e524/large-v2.pt"
+            ;;
+        "large-v3" | "large")
+            model_url="https://openaipublic.azureedge.net/main/whisper/models/e5b1a55b89c1367dacf97e3e19bfd829a01529dbfdeefa8caeb59b3f1b81dadb/large-v3.pt"
+            ;;
+        *)
+            echo "Invalid model name: $model_name"
+            exit 1
+            ;;
+    esac
+
+    echo "Downloading $model_name..."
+    # wget --directory-prefix=assets "$model_url"
+    # echo "Download completed: ${model_name}.pt"
+    if [ ! -f "assets/${model_name}.pt" ]; then
+        wget --directory-prefix=assets "$model_url"
+        echo "Download completed: ${model_name}.pt"
+    else
+        echo "${model_name}.pt already exists in assets directory."
+    fi
+
+    local output_dir="whisper_${model_name//./_}"
+    echo "$output_dir"
+    echo "Running build script for $model_name with output directory $output_dir"
+    python3 build.py --output_dir "$output_dir" --use_gpt_attention_plugin --use_gemm_plugin --use_bert_attention_plugin --model_name "$model_name"
+    echo "Whisper $model_name TensorRT engine built."
+    echo "========================================="
+    echo "Model is located at: $(pwd)/$output_dir"
+}
+
+if [ "$#" -lt 1 ]; then
+    echo "Usage: $0 <path-to-tensorrt-examples-dir> [model-name]"
+    exit 1
+fi
+
+tensorrt_examples_dir="$1"
+model_name="${2:-small.en}"
+
+cd $1/whisper
+pip install --no-deps -r requirements.txt
+
+download_and_build_model "$model_name"
diff --git a/setup.sh → scripts/setup.sh b/setup.sh → scripts/setup.sh
diff --git a/whisper_live/client.py b/whisper_live/client.py
@@ -72,7 +72,7 @@ def __init__(
             lang (str, optional): The selected language for transcription when multilingual is disabled. Default is None.
             translate (bool, optional): Specifies if the task is translation. Default is False.
         """
-        self.chunk = 1024
+        self.chunk = 4096
         self.format = pyaudio.paInt16
         self.channels = 1
         self.rate = 16000