Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensorrt backend #104

Merged
merged 30 commits into from
Jan 24, 2024
Merged
Show file tree
Hide file tree
Changes from 29 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
fd86340
add: tensorrt backend
makaveli10 Jan 9, 2024
0f9e93d
add: tensorrt backend to server
makaveli10 Jan 9, 2024
244ca9e
remove duplicate code
makaveli10 Jan 10, 2024
6dff4fb
add tensorrt_llm installation script
makaveli10 Jan 10, 2024
ddb1e09
move setup.sh to scripts
makaveli10 Jan 10, 2024
a26f990
update readme to new setup.sh path
makaveli10 Jan 10, 2024
71a062b
update dockerfiles
makaveli10 Jan 10, 2024
647c576
update with multilingual option
makaveli10 Jan 11, 2024
3c202bf
update README; add TensorRT doc
makaveli10 Jan 11, 2024
f06b9bc
remove torch req
makaveli10 Jan 11, 2024
389bb5a
add docker setup for tensorrt-llm; update readme
makaveli10 Jan 12, 2024
4cf9d95
merge main
makaveli10 Jan 12, 2024
67232ff
install whl
makaveli10 Jan 12, 2024
735d6c7
merge with main
makaveli10 Jan 19, 2024
7a9dc6d
add tensorrt readme
makaveli10 Jan 19, 2024
6f1d13f
update requirements
makaveli10 Jan 19, 2024
75001ae
updatetensorrt-llm dockerfile
makaveli10 Jan 19, 2024
867ff52
add tensorrt installation & whisper conversion script
makaveli10 Jan 19, 2024
f25ff17
increase chunk size from 64ms to 256ms
makaveli10 Jan 19, 2024
b955e63
update READM
makaveli10 Jan 19, 2024
e3084b3
update tensorrt docker & readme
makaveli10 Jan 19, 2024
1e2faa3
fix: tensorrt llm idocker setup & docs
makaveli10 Jan 19, 2024
986823d
Merge remote-tracking branch 'upstream/main' into tensorrt_backend
makaveli10 Jan 19, 2024
44a2e20
remove trt-llm dockerfile
makaveli10 Jan 22, 2024
969a5aa
remove trt_llm install script
makaveli10 Jan 22, 2024
634dae8
add numba to req(trt-llm)
makaveli10 Jan 22, 2024
3bf5b47
fix: server; remove debug stats
makaveli10 Jan 22, 2024
8e26422
update tensorrt readme
makaveli10 Jan 22, 2024
bd54329
update readme
makaveli10 Jan 22, 2024
5cd59b1
Update README.md
makaveli10 Jan 24, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
135 changes: 79 additions & 56 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,69 +8,89 @@ Unlike traditional speech recognition systems that rely on continuous audio stre
## Installation
- Install PyAudio and ffmpeg
```bash
bash setup.sh
bash scripts/setup.sh
```

- Install whisper-live from pip
```bash
pip install whisper-live
```

### Setting up NVIDIA/TensorRT-LLM for TensorRT backend
- Please follow [TensorRT_whisper readme](https://github.com/collabora/WhisperLive/blob/main/TensorRT_whisper.md) for setup of [NVIDIA/TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) and for building Whisper-TensorRT engine.

## Getting Started
- Run the server
The server supports two backends `faster_whisper` and `tensorrt`. If running `tensorrt` backend follow [TensorRT_whisper readme](https://github.com/collabora/WhisperLive/blob/main/TensorRT_whisper.md)

### Running the Server
- [Faster Whisper](https://github.com/SYSTRAN/faster-whisper) backend
```bash
python3 run_server.py --port 9090 \
--backend faster_whisper

# running with custom model
python3 run_server.py --port 9090 \
--backend faster_whisper
-fw "/path/to/custom/faster/whisper/model"
```

- TensorRT backend. Currently, we only recommend docker setup for TensorRT. Follow [TensorRT_whisper readme](https://github.com/collabora/WhisperLive/blob/main/TensorRT_whisper.md) which works as expected. Make sure to build your TensorRT Engines before running the server with TensorRT backend.
makaveli10 marked this conversation as resolved.
Show resolved Hide resolved
```bash
# Run English only model
python3 run_server.py -p 9090 \
-b tensorrt \
-trt /home/TensorRT-LLM/examples/whisper/whisper_small_en

# Run Multilingual model
python3 run_server.py -p 9090 \
-b tensorrt \
-trt /home/TensorRT-LLM/examples/whisper/whisper_small \
-m
```


### Running the Client
- To transcribe an audio file:
```python
from whisper_live.server import TranscriptionServer
server = TranscriptionServer()
server.run("0.0.0.0", 9090)
from whisper_live.client import TranscriptionClient
client = TranscriptionClient(
"localhost",
9090,
is_multilingual=False,
lang="en",
translate=False,
model_size="small"
)

client("tests/jfk.wav")
```
This command transcribes the specified audio file (audio.wav) using the Whisper model. It connects to the server running on localhost at port 9090. It can also enable the multilingual feature, allowing transcription in multiple languages. The language option specifies the target language for transcription, in this case, English ("en"). The translate option should be set to `True` if we want to translate from the source language to English and `False` if we want to transcribe in the source language.

- On the client side
- To transcribe an audio file:
```python
from whisper_live.client import TranscriptionClient
client = TranscriptionClient(
"localhost",
9090,
is_multilingual=False,
lang="en",
translate=False,
model_size="small"
)

client("tests/jfk.wav")
```
This command transcribes the specified audio file (audio.wav) using the Whisper model. It connects to the server running on localhost at port 9090. It can also enable the multilingual feature, allowing transcription in multiple languages. The language option specifies the target language for transcription, in this case, English ("en"). The translate option should be set to `True` if we want to translate from the source language to English and `False` if we want to transcribe in the source language.

- To transcribe from microphone:
```python
from whisper_live.client import TranscriptionClient
client = TranscriptionClient(
"localhost",
9090,
is_multilingual=True,
lang="hi",
translate=True,
model_size="small"
)
client()
```
This command captures audio from the microphone and sends it to the server for transcription. It uses the multilingual option with `hi` as the selected language, enabling the multilingual feature and specifying the target language and task. We use whisper `small` by default but can be changed to any other option based on the requirements and the hardware running the server.

- To transcribe from a HLS stream:
```python
client = TranscriptionClient(host, port, is_multilingual=True, lang="en", translate=False)
client(hls_url="http://as-hls-ww-live.akamaized.net/pool_904/live/ww/bbc_1xtra/bbc_1xtra.isml/bbc_1xtra-audio%3d96000.norewind.m3u8")
```
This command streams audio into the server from a HLS stream. It uses the same options as the previous command, enabling the multilingual feature and specifying the target language and task.
- To transcribe from microphone:
```python
from whisper_live.client import TranscriptionClient
client = TranscriptionClient(
"localhost",
9090,
is_multilingual=True,
lang="hi",
translate=True,
model_size="small"
)
client()
```
This command captures audio from the microphone and sends it to the server for transcription. It uses the multilingual option with `hi` as the selected language, enabling the multilingual feature and specifying the target language and task. We use whisper `small` by default but can be changed to any other option based on the requirements and the hardware running the server.

## Transcribe audio from browser
- Run the server
- To transcribe from a HLS stream:
```python
from whisper_live.server import TranscriptionServer
server = TranscriptionServer()
server.run("0.0.0.0", 9090)
from whisper_live.client import TranscriptionClient
client = TranscriptionClient(host, port, is_multilingual=True, lang="en", translate=False)
client(hls_url="http://as-hls-ww-live.akamaized.net/pool_904/live/ww/bbc_1xtra/bbc_1xtra.isml/bbc_1xtra-audio%3d96000.norewind.m3u8")
```
This would start the websocket server on port ```9090```.
This command streams audio into the server from a HLS stream. It uses the same options as the previous command, enabling the multilingual feature and specifying the target language and task.

## Transcribe audio from browser
- Run the server with your desired backend as shown [here](https://github.com/collabora/WhisperLive?tab=readme-ov-file#running-the-server)

### Chrome Extension
- Refer to [Audio-Transcription-Chrome](https://github.com/collabora/whisper-live/tree/main/Audio-Transcription-Chrome#readme) to use Chrome extension.
Expand All @@ -80,21 +100,24 @@ This would start the websocket server on port ```9090```.

## Whisper Live Server in Docker
- GPU
```bash
docker build . -t whisper-live -f docker/Dockerfile.gpu
docker run -it --gpus all -p 9090:9090 whisper-live:latest
```
- Faster-Whisper
```bash
docker build . -t whisper-live -f docker/Dockerfile.gpu
docker run -it --gpus all -p 9090:9090 whisper-live:latest
```

- TensorRT. Follow [TensorRT_whisper readme](https://github.com/collabora/WhisperLive/blob/main/TensorRT_whisper.md) in order to setup docker and use TensorRT backend. We provide a pre-built docker image which has TensorRT-LLM built and ready to use.

- CPU
```bash
docker build . -t whisper-live -f docker/Dockerfile.cpu
docker run -it -p 9090:9090 whisper-live:latest
docker build . -t whisper-live -f docker/Dockerfile.cpu
docker run -it -p 9090:9090 whisper-live:latest
```
**Note**: By default we use "small" model size. To build docker image for a different model size, change the size in server.py and then build the docker image.

## Future Work
- [ ] Add translation to other languages on top of transcription.
- [ ] TensorRT backend for Whisper.
- [x] TensorRT backend for Whisper.

## Contact

Expand Down
66 changes: 66 additions & 0 deletions TensorRT_whisper.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Whisper-TensorRT
We have only tested the TensorRT backend in docker so, we recommend docker for a smooth TensorRT backend setup.
**Note**: We use [our fork to setup TensorRT](https://github.com/makaveli10/TensorRT-LLM)

## Installation
- Install [docker](https://docs.docker.com/engine/install/)
- Install [nvidia-container-toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)

- Clone this repo.
```bash
git clone https://github.com/collabora/WhisperLive.git
cd WhisperLive
```

- Pull the TensorRT-LLM docker image which we prebuilt for WhisperLive TensorRT backend.
```bash
docker pull ghcr.io/collabora/whisperbot-base:latest
```

- Next, we run the docker image and mount WhisperLive repo to the containers `/home` directory.
```bash
docker run -it --gpus all --shm-size=8g \
--ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \
-v /path/to/WhisperLive:/home/WhisperLive \
ghcr.io/collabora/whisperbot-base:latest
```

- Make sure to test the installation.
```bash
# export ENV=${ENV:-/etc/shinit_v2}
# source $ENV
python -c "import torch; import tensorrt; import tensorrt_llm"
```
**NOTE**: Uncomment and update library paths if imports fail.

## Whisper TensorRT Engine
- We build `small.en` and `small` multilingual TensorRT engine. The script logs the path of the directory with Whisper TensorRT engine. We need the model_path to run the server.
```bash
# convert small.en
bash build_whisper_tensorrt /root/TensorRT-LLM-examples small.en

# convert small multilingual model
bash build_whisper_tensorrt /root/TensorRT-LLM-examples small
```

## Run WhisperLive Server with TensorRT Backend
```bash
cd /home/WhisperLive

# Install requirements
pip install -r requirements/server.txt

# Required to create mel spectogram
wget --directory-prefix=assets assets/mel_filters.npz https://raw.githubusercontent.com/openai/whisper/main/whisper/assets/mel_filters.npz

# Run English only model
python3 run_server.py --port 9090 \
--backend tensorrt \
--trt_model_path "path/to/whisper_trt/from/build/step"

# Run Multilingual model
python3 run_server.py --port 9090 \
--backend tensorrt \
--trt_model_path "path/to/whisper_trt/from/build/step" \
--trt_multilingual
```
2 changes: 1 addition & 1 deletion docker/Dockerfile.cpu
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ RUN apt install python3-pip -y
RUN mkdir /app
WORKDIR /app

COPY setup.sh /app
COPY scripts/setup.sh /app
COPY requirements/ /app

RUN bash setup.sh
Expand Down
2 changes: 1 addition & 1 deletion docker/Dockerfile.gpu
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ RUN apt install python3-pip -y
RUN mkdir /app
WORKDIR /app

COPY setup.sh /app
COPY scripts/setup.sh /app
COPY requirements/ /app

RUN apt update --fix-missing
Expand Down
8 changes: 3 additions & 5 deletions requirements/server.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
PyAudio
faster-whisper==0.10.0
--extra-index-url https://download.pytorch.org/whl/cu111
torch==1.10.1
torchaudio==0.10.1
torch
websockets
onnxruntime==1.16.0
onnxruntime==1.16.0
numba
33 changes: 29 additions & 4 deletions run_server.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,37 @@
from whisper_live.server import TranscriptionServer

if __name__ == "__main__":
server = TranscriptionServer()
parser = argparse.ArgumentParser()
parser.add_argument('--model_path', type=str, default=None, help="Custom Faster Whisper Model")
parser.add_argument('--port', '-p',
type=int,
default=9090,
help="Websocket port to run the server on.")
parser.add_argument('--backend', '-b',
type=str,
default='faster_whisper',
help='Backends from ["tensorrt", "faster_whisper"]')
parser.add_argument('--faster_whisper_custom_model_path', '-fw',
type=str, default=None,
help="Custom Faster Whisper Model")
parser.add_argument('--trt_model_path', '-trt',
type=str,
default=None,
help='Whisper TensorRT model path')
parser.add_argument('--trt_multilingual', '-m',
action="store_true",
help='Boolean only for TensorRT model. True if multilingual.')
args = parser.parse_args()

if args.backend == "tensorrt":
if args.trt_model_path is None:
raise ValueError("Please Provide a valid tensorrt model path")

server = TranscriptionServer()
server.run(
"0.0.0.0",
9090,
custom_model_path=args.model_path
port=args.port,
backend=args.backend,
faster_whisper_custom_model_path=args.faster_whisper_custom_model_path,
whisper_tensorrt_path=args.trt_model_path,
trt_multilingual=args.trt_multilingual
)
77 changes: 77 additions & 0 deletions scripts/build_whisper_tensorrt.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
#!/bin/bash

download_and_build_model() {
local model_name="$1"
local model_url=""

case "$model_name" in
"tiny.en")
model_url="https://openaipublic.azureedge.net/main/whisper/models/d3dd57d32accea0b295c96e26691aa14d8822fac7d9d27d5dc00b4ca2826dd03/tiny.en.pt"
;;
"tiny")
model_url="https://openaipublic.azureedge.net/main/whisper/models/65147644a518d12f04e32d6f3b26facc3f8dd46e5390956a9424a650c0ce22b9/tiny.pt"
;;
"base.en")
model_url="https://openaipublic.azureedge.net/main/whisper/models/25a8566e1d0c1e2231d1c762132cd20e0f96a85d16145c3a00adf5d1ac670ead/base.en.pt"
;;
"base")
model_url="https://openaipublic.azureedge.net/main/whisper/models/ed3a0b6b1c0edf879ad9b11b1af5a0e6ab5db9205f891f668f8b0e6c6326e34e/base.pt"
;;
"small.en")
model_url="https://openaipublic.azureedge.net/main/whisper/models/f953ad0fd29cacd07d5a9eda5624af0f6bcf2258be67c92b79389873d91e0872/small.en.pt"
;;
"small")
model_url="https://openaipublic.azureedge.net/main/whisper/models/9ecf779972d90ba49c06d968637d720dd632c55bbf19d441fb42bf17a411e794/small.pt"
;;
"medium.en")
model_url="https://openaipublic.azureedge.net/main/whisper/models/d7440d1dc186f76616474e0ff0b3b6b879abc9d1a4926b7adfa41db2d497ab4f/medium.en.pt"
;;
"medium")
model_url="https://openaipublic.azureedge.net/main/whisper/models/345ae4da62f9b3d59415adc60127b97c714f32e89e936602e85993674d08dcb1/medium.pt"
;;
"large-v1")
model_url="https://openaipublic.azureedge.net/main/whisper/models/e4b87e7e0bf463eb8e6956e646f1e277e901512310def2c24bf0e11bd3c28e9a/large-v1.pt"
;;
"large-v2")
model_url="https://openaipublic.azureedge.net/main/whisper/models/81f7c96c852ee8fc832187b0132e569d6c3065a3252ed18e56effd0b6a73e524/large-v2.pt"
;;
"large-v3" | "large")
model_url="https://openaipublic.azureedge.net/main/whisper/models/e5b1a55b89c1367dacf97e3e19bfd829a01529dbfdeefa8caeb59b3f1b81dadb/large-v3.pt"
;;
*)
echo "Invalid model name: $model_name"
exit 1
;;
esac

echo "Downloading $model_name..."
# wget --directory-prefix=assets "$model_url"
# echo "Download completed: ${model_name}.pt"
if [ ! -f "assets/${model_name}.pt" ]; then
wget --directory-prefix=assets "$model_url"
echo "Download completed: ${model_name}.pt"
else
echo "${model_name}.pt already exists in assets directory."
fi

local output_dir="whisper_${model_name//./_}"
echo "$output_dir"
echo "Running build script for $model_name with output directory $output_dir"
python3 build.py --output_dir "$output_dir" --use_gpt_attention_plugin --use_gemm_plugin --use_bert_attention_plugin --model_name "$model_name"
echo "Whisper $model_name TensorRT engine built."
echo "========================================="
echo "Model is located at: $(pwd)/$output_dir"
}

if [ "$#" -lt 1 ]; then
echo "Usage: $0 <path-to-tensorrt-examples-dir> [model-name]"
exit 1
fi

tensorrt_examples_dir="$1"
model_name="${2:-small.en}"

cd $1/whisper
pip install --no-deps -r requirements.txt

download_and_build_model "$model_name"
File renamed without changes.
2 changes: 1 addition & 1 deletion whisper_live/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ def __init__(
lang (str, optional): The selected language for transcription when multilingual is disabled. Default is None.
translate (bool, optional): Specifies if the task is translation. Default is False.
"""
self.chunk = 1024
self.chunk = 4096
self.format = pyaudio.paInt16
self.channels = 1
self.rate = 16000
Expand Down
Loading
Loading