Triton Inference Server ASR

P.S.

This project is a test task. The goal of the project is to show how the problem can be approached. Quality and results have a lower priority

Task

Triton Inference Server with ASR model (container)
FastAPI service, which processes the incoming audio, converts it to tensors, sends it to inference, and generates the response text. (container)
Telegram Bot. To record a voice message, we receive its transcript

ML Design

ML System design

Start

Triton Inference Server

You can create Triton Inference Server with command:

docker run -it --rm --detach -p 8000:8000 -p 8001:8001 -p 8002:8002 -v "$PWD"/model_repository:/models nvcr.io/nvidia/tritonserver:23.07-py3 tritonserver --model-repository=/models

Container for FastAPI

Using DockerFile inside the project run the command

docker build -t fastapi_container .

Start Container for FastAPI and Triton Inference Server

To run containers, follow these steps:

Launching the container for FastAPI

docker run --name fastapi_tg --rm --detach --network host fastapi_container:latest

Launching the container for Triton

docker run --name triton --rm --detach --network host -v "$PWD"/model_repository:/models nvcr.io/nvidia/tritonserver:23.07-py3 tritonserver --model-repository=/models

Start Telegram Bot

You need to create a config.py file in the telegram_bot folder. An example of a config is in the same place: telegram_bot/config_example.py

Need to install packages

pip install -r req_for_tg.txt

Next from the root run

python main.py

Quick Start

To quickly start the service, run the bash script

sh start.sh

What can be improved or fixed

Add logging in each module.
Do not write an audio file to disk at the Telegram stage and at the FastAPI stage. I wrote them down for debugging.
Write tests/unit-tests
Reduce the size of the docker image for services (delete the cache for example)
Perhaps there are more competent ways to extract the necessary modules from the ASR model (file: fast_api_module/utils/ASR_modules.py, function: get_modules)
Reduce Requirers for FastAPI
You can use guvicorn instead of uvicorn
Make another newt server for CTCDecoder to extract text

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
fast_api_module		fast_api_module
model_repository/quartznet		model_repository/quartznet
telegram_bot		telegram_bot
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
main.py		main.py
req_for_tg.txt		req_for_tg.txt
requirements.txt		requirements.txt
start.sh		start.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Triton Inference Server ASR

Task

ML Design

ML System design

Start

Triton Inference Server

Container for FastAPI

Start Container for FastAPI and Triton Inference Server

Start Telegram Bot

Quick Start

What can be improved or fixed

About

Releases

Packages

Languages

2Bye/mlops_asr

Folders and files

Latest commit

History

Repository files navigation

Triton Inference Server ASR

Task

ML Design

ML System design

Start

Triton Inference Server

Container for FastAPI

Start Container for FastAPI and Triton Inference Server

Start Telegram Bot

Quick Start

What can be improved or fixed

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages