Skip to content

Commit

Permalink
Merge pull request #104 from makaveli10/tensorrt_backend
Browse files Browse the repository at this point in the history
Tensorrt backend
  • Loading branch information
makaveli10 authored Jan 24, 2024
2 parents 0942dc2 + 5cd59b1 commit 3498787
Show file tree
Hide file tree
Showing 13 changed files with 1,460 additions and 175 deletions.
135 changes: 79 additions & 56 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,69 +8,89 @@ Unlike traditional speech recognition systems that rely on continuous audio stre
## Installation
- Install PyAudio and ffmpeg
```bash
bash setup.sh
bash scripts/setup.sh
```

- Install whisper-live from pip
```bash
pip install whisper-live
```

### Setting up NVIDIA/TensorRT-LLM for TensorRT backend
- Please follow [TensorRT_whisper readme](https://github.com/collabora/WhisperLive/blob/main/TensorRT_whisper.md) for setup of [NVIDIA/TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) and for building Whisper-TensorRT engine.

## Getting Started
- Run the server
The server supports two backends `faster_whisper` and `tensorrt`. If running `tensorrt` backend follow [TensorRT_whisper readme](https://github.com/collabora/WhisperLive/blob/main/TensorRT_whisper.md)

### Running the Server
- [Faster Whisper](https://github.com/SYSTRAN/faster-whisper) backend
```bash
python3 run_server.py --port 9090 \
--backend faster_whisper

# running with custom model
python3 run_server.py --port 9090 \
--backend faster_whisper
-fw "/path/to/custom/faster/whisper/model"
```

- TensorRT backend. Currently, we recommend to only use the docker setup for TensorRT. Follow [TensorRT_whisper readme](https://github.com/collabora/WhisperLive/blob/main/TensorRT_whisper.md) which works as expected. Make sure to build your TensorRT Engines before running the server with TensorRT backend.
```bash
# Run English only model
python3 run_server.py -p 9090 \
-b tensorrt \
-trt /home/TensorRT-LLM/examples/whisper/whisper_small_en

# Run Multilingual model
python3 run_server.py -p 9090 \
-b tensorrt \
-trt /home/TensorRT-LLM/examples/whisper/whisper_small \
-m
```


### Running the Client
- To transcribe an audio file:
```python
from whisper_live.server import TranscriptionServer
server = TranscriptionServer()
server.run("0.0.0.0", 9090)
from whisper_live.client import TranscriptionClient
client = TranscriptionClient(
"localhost",
9090,
is_multilingual=False,
lang="en",
translate=False,
model_size="small"
)

client("tests/jfk.wav")
```
This command transcribes the specified audio file (audio.wav) using the Whisper model. It connects to the server running on localhost at port 9090. It can also enable the multilingual feature, allowing transcription in multiple languages. The language option specifies the target language for transcription, in this case, English ("en"). The translate option should be set to `True` if we want to translate from the source language to English and `False` if we want to transcribe in the source language.

- On the client side
- To transcribe an audio file:
```python
from whisper_live.client import TranscriptionClient
client = TranscriptionClient(
"localhost",
9090,
is_multilingual=False,
lang="en",
translate=False,
model_size="small"
)

client("tests/jfk.wav")
```
This command transcribes the specified audio file (audio.wav) using the Whisper model. It connects to the server running on localhost at port 9090. It can also enable the multilingual feature, allowing transcription in multiple languages. The language option specifies the target language for transcription, in this case, English ("en"). The translate option should be set to `True` if we want to translate from the source language to English and `False` if we want to transcribe in the source language.

- To transcribe from microphone:
```python
from whisper_live.client import TranscriptionClient
client = TranscriptionClient(
"localhost",
9090,
is_multilingual=True,
lang="hi",
translate=True,
model_size="small"
)
client()
```
This command captures audio from the microphone and sends it to the server for transcription. It uses the multilingual option with `hi` as the selected language, enabling the multilingual feature and specifying the target language and task. We use whisper `small` by default but can be changed to any other option based on the requirements and the hardware running the server.

- To transcribe from a HLS stream:
```python
client = TranscriptionClient(host, port, is_multilingual=True, lang="en", translate=False)
client(hls_url="http://as-hls-ww-live.akamaized.net/pool_904/live/ww/bbc_1xtra/bbc_1xtra.isml/bbc_1xtra-audio%3d96000.norewind.m3u8")
```
This command streams audio into the server from a HLS stream. It uses the same options as the previous command, enabling the multilingual feature and specifying the target language and task.
- To transcribe from microphone:
```python
from whisper_live.client import TranscriptionClient
client = TranscriptionClient(
"localhost",
9090,
is_multilingual=True,
lang="hi",
translate=True,
model_size="small"
)
client()
```
This command captures audio from the microphone and sends it to the server for transcription. It uses the multilingual option with `hi` as the selected language, enabling the multilingual feature and specifying the target language and task. We use whisper `small` by default but can be changed to any other option based on the requirements and the hardware running the server.

## Transcribe audio from browser
- Run the server
- To transcribe from a HLS stream:
```python
from whisper_live.server import TranscriptionServer
server = TranscriptionServer()
server.run("0.0.0.0", 9090)
from whisper_live.client import TranscriptionClient
client = TranscriptionClient(host, port, is_multilingual=True, lang="en", translate=False)
client(hls_url="http://as-hls-ww-live.akamaized.net/pool_904/live/ww/bbc_1xtra/bbc_1xtra.isml/bbc_1xtra-audio%3d96000.norewind.m3u8")
```
This would start the websocket server on port ```9090```.
This command streams audio into the server from a HLS stream. It uses the same options as the previous command, enabling the multilingual feature and specifying the target language and task.

## Transcribe audio from browser
- Run the server with your desired backend as shown [here](https://github.com/collabora/WhisperLive?tab=readme-ov-file#running-the-server)

### Chrome Extension
- Refer to [Audio-Transcription-Chrome](https://github.com/collabora/whisper-live/tree/main/Audio-Transcription-Chrome#readme) to use Chrome extension.
Expand All @@ -80,21 +100,24 @@ This would start the websocket server on port ```9090```.

## Whisper Live Server in Docker
- GPU
```bash
docker build . -t whisper-live -f docker/Dockerfile.gpu
docker run -it --gpus all -p 9090:9090 whisper-live:latest
```
- Faster-Whisper
```bash
docker build . -t whisper-live -f docker/Dockerfile.gpu
docker run -it --gpus all -p 9090:9090 whisper-live:latest
```

- TensorRT. Follow [TensorRT_whisper readme](https://github.com/collabora/WhisperLive/blob/main/TensorRT_whisper.md) in order to setup docker and use TensorRT backend. We provide a pre-built docker image which has TensorRT-LLM built and ready to use.

- CPU
```bash
docker build . -t whisper-live -f docker/Dockerfile.cpu
docker run -it -p 9090:9090 whisper-live:latest
docker build . -t whisper-live -f docker/Dockerfile.cpu
docker run -it -p 9090:9090 whisper-live:latest
```
**Note**: By default we use "small" model size. To build docker image for a different model size, change the size in server.py and then build the docker image.

## Future Work
- [ ] Add translation to other languages on top of transcription.
- [ ] TensorRT backend for Whisper.
- [x] TensorRT backend for Whisper.

## Contact

Expand Down
66 changes: 66 additions & 0 deletions TensorRT_whisper.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Whisper-TensorRT
We have only tested the TensorRT backend in docker so, we recommend docker for a smooth TensorRT backend setup.
**Note**: We use [our fork to setup TensorRT](https://github.com/makaveli10/TensorRT-LLM)

## Installation
- Install [docker](https://docs.docker.com/engine/install/)
- Install [nvidia-container-toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)

- Clone this repo.
```bash
git clone https://github.com/collabora/WhisperLive.git
cd WhisperLive
```

- Pull the TensorRT-LLM docker image which we prebuilt for WhisperLive TensorRT backend.
```bash
docker pull ghcr.io/collabora/whisperbot-base:latest
```

- Next, we run the docker image and mount WhisperLive repo to the containers `/home` directory.
```bash
docker run -it --gpus all --shm-size=8g \
--ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \
-v /path/to/WhisperLive:/home/WhisperLive \
ghcr.io/collabora/whisperbot-base:latest
```

- Make sure to test the installation.
```bash
# export ENV=${ENV:-/etc/shinit_v2}
# source $ENV
python -c "import torch; import tensorrt; import tensorrt_llm"
```
**NOTE**: Uncomment and update library paths if imports fail.

## Whisper TensorRT Engine
- We build `small.en` and `small` multilingual TensorRT engine. The script logs the path of the directory with Whisper TensorRT engine. We need the model_path to run the server.
```bash
# convert small.en
bash build_whisper_tensorrt /root/TensorRT-LLM-examples small.en

# convert small multilingual model
bash build_whisper_tensorrt /root/TensorRT-LLM-examples small
```

## Run WhisperLive Server with TensorRT Backend
```bash
cd /home/WhisperLive

# Install requirements
pip install -r requirements/server.txt

# Required to create mel spectogram
wget --directory-prefix=assets assets/mel_filters.npz https://raw.githubusercontent.com/openai/whisper/main/whisper/assets/mel_filters.npz

# Run English only model
python3 run_server.py --port 9090 \
--backend tensorrt \
--trt_model_path "path/to/whisper_trt/from/build/step"

# Run Multilingual model
python3 run_server.py --port 9090 \
--backend tensorrt \
--trt_model_path "path/to/whisper_trt/from/build/step" \
--trt_multilingual
```
2 changes: 1 addition & 1 deletion docker/Dockerfile.cpu
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ RUN apt install python3-pip -y
RUN mkdir /app
WORKDIR /app

COPY setup.sh /app
COPY scripts/setup.sh /app
COPY requirements/ /app

RUN bash setup.sh
Expand Down
2 changes: 1 addition & 1 deletion docker/Dockerfile.gpu
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ RUN apt install python3-pip -y
RUN mkdir /app
WORKDIR /app

COPY setup.sh /app
COPY scripts/setup.sh /app
COPY requirements/ /app

RUN apt update --fix-missing
Expand Down
8 changes: 3 additions & 5 deletions requirements/server.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
PyAudio
faster-whisper==0.10.0
--extra-index-url https://download.pytorch.org/whl/cu111
torch==1.10.1
torchaudio==0.10.1
torch
websockets
onnxruntime==1.16.0
onnxruntime==1.16.0
numba
33 changes: 29 additions & 4 deletions run_server.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,37 @@
from whisper_live.server import TranscriptionServer

if __name__ == "__main__":
server = TranscriptionServer()
parser = argparse.ArgumentParser()
parser.add_argument('--model_path', type=str, default=None, help="Custom Faster Whisper Model")
parser.add_argument('--port', '-p',
type=int,
default=9090,
help="Websocket port to run the server on.")
parser.add_argument('--backend', '-b',
type=str,
default='faster_whisper',
help='Backends from ["tensorrt", "faster_whisper"]')
parser.add_argument('--faster_whisper_custom_model_path', '-fw',
type=str, default=None,
help="Custom Faster Whisper Model")
parser.add_argument('--trt_model_path', '-trt',
type=str,
default=None,
help='Whisper TensorRT model path')
parser.add_argument('--trt_multilingual', '-m',
action="store_true",
help='Boolean only for TensorRT model. True if multilingual.')
args = parser.parse_args()

if args.backend == "tensorrt":
if args.trt_model_path is None:
raise ValueError("Please Provide a valid tensorrt model path")

server = TranscriptionServer()
server.run(
"0.0.0.0",
9090,
custom_model_path=args.model_path
port=args.port,
backend=args.backend,
faster_whisper_custom_model_path=args.faster_whisper_custom_model_path,
whisper_tensorrt_path=args.trt_model_path,
trt_multilingual=args.trt_multilingual
)
77 changes: 77 additions & 0 deletions scripts/build_whisper_tensorrt.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
#!/bin/bash

download_and_build_model() {
local model_name="$1"
local model_url=""

case "$model_name" in
"tiny.en")
model_url="https://openaipublic.azureedge.net/main/whisper/models/d3dd57d32accea0b295c96e26691aa14d8822fac7d9d27d5dc00b4ca2826dd03/tiny.en.pt"
;;
"tiny")
model_url="https://openaipublic.azureedge.net/main/whisper/models/65147644a518d12f04e32d6f3b26facc3f8dd46e5390956a9424a650c0ce22b9/tiny.pt"
;;
"base.en")
model_url="https://openaipublic.azureedge.net/main/whisper/models/25a8566e1d0c1e2231d1c762132cd20e0f96a85d16145c3a00adf5d1ac670ead/base.en.pt"
;;
"base")
model_url="https://openaipublic.azureedge.net/main/whisper/models/ed3a0b6b1c0edf879ad9b11b1af5a0e6ab5db9205f891f668f8b0e6c6326e34e/base.pt"
;;
"small.en")
model_url="https://openaipublic.azureedge.net/main/whisper/models/f953ad0fd29cacd07d5a9eda5624af0f6bcf2258be67c92b79389873d91e0872/small.en.pt"
;;
"small")
model_url="https://openaipublic.azureedge.net/main/whisper/models/9ecf779972d90ba49c06d968637d720dd632c55bbf19d441fb42bf17a411e794/small.pt"
;;
"medium.en")
model_url="https://openaipublic.azureedge.net/main/whisper/models/d7440d1dc186f76616474e0ff0b3b6b879abc9d1a4926b7adfa41db2d497ab4f/medium.en.pt"
;;
"medium")
model_url="https://openaipublic.azureedge.net/main/whisper/models/345ae4da62f9b3d59415adc60127b97c714f32e89e936602e85993674d08dcb1/medium.pt"
;;
"large-v1")
model_url="https://openaipublic.azureedge.net/main/whisper/models/e4b87e7e0bf463eb8e6956e646f1e277e901512310def2c24bf0e11bd3c28e9a/large-v1.pt"
;;
"large-v2")
model_url="https://openaipublic.azureedge.net/main/whisper/models/81f7c96c852ee8fc832187b0132e569d6c3065a3252ed18e56effd0b6a73e524/large-v2.pt"
;;
"large-v3" | "large")
model_url="https://openaipublic.azureedge.net/main/whisper/models/e5b1a55b89c1367dacf97e3e19bfd829a01529dbfdeefa8caeb59b3f1b81dadb/large-v3.pt"
;;
*)
echo "Invalid model name: $model_name"
exit 1
;;
esac

echo "Downloading $model_name..."
# wget --directory-prefix=assets "$model_url"
# echo "Download completed: ${model_name}.pt"
if [ ! -f "assets/${model_name}.pt" ]; then
wget --directory-prefix=assets "$model_url"
echo "Download completed: ${model_name}.pt"
else
echo "${model_name}.pt already exists in assets directory."
fi

local output_dir="whisper_${model_name//./_}"
echo "$output_dir"
echo "Running build script for $model_name with output directory $output_dir"
python3 build.py --output_dir "$output_dir" --use_gpt_attention_plugin --use_gemm_plugin --use_bert_attention_plugin --model_name "$model_name"
echo "Whisper $model_name TensorRT engine built."
echo "========================================="
echo "Model is located at: $(pwd)/$output_dir"
}

if [ "$#" -lt 1 ]; then
echo "Usage: $0 <path-to-tensorrt-examples-dir> [model-name]"
exit 1
fi

tensorrt_examples_dir="$1"
model_name="${2:-small.en}"

cd $1/whisper
pip install --no-deps -r requirements.txt

download_and_build_model "$model_name"
File renamed without changes.
2 changes: 1 addition & 1 deletion whisper_live/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ def __init__(
lang (str, optional): The selected language for transcription when multilingual is disabled. Default is None.
translate (bool, optional): Specifies if the task is translation. Default is False.
"""
self.chunk = 1024
self.chunk = 4096
self.format = pyaudio.paInt16
self.channels = 1
self.rate = 16000
Expand Down
Loading

0 comments on commit 3498787

Please sign in to comment.