P.S.
This project is a test task. The goal of the project is to show how the problem can be approached. Quality and results have a lower priority
- Triton Inference Server with ASR model (container)
- FastAPI service, which processes the incoming audio, converts it to tensors, sends it to inference, and generates the response text. (container)
- Telegram Bot. To record a voice message, we receive its transcript
You can create Triton Inference Server with command:
docker run -it --rm --detach -p 8000:8000 -p 8001:8001 -p 8002:8002 -v "$PWD"/model_repository:/models nvcr.io/nvidia/tritonserver:23.07-py3 tritonserver --model-repository=/models
Using DockerFile inside the project run the command
docker build -t fastapi_container .
To run containers, follow these steps:
- Launching the container for FastAPI
docker run --name fastapi_tg --rm --detach --network host fastapi_container:latest
- Launching the container for Triton
docker run --name triton --rm --detach --network host -v "$PWD"/model_repository:/models nvcr.io/nvidia/tritonserver:23.07-py3 tritonserver --model-repository=/models
You need to create a config.py file in the telegram_bot folder. An example of a config is in the same place: telegram_bot/config_example.py
- Need to install packages
pip install -r req_for_tg.txt
- Next from the root run
python main.py
To quickly start the service, run the bash script
sh start.sh
- Add logging in each module.
- Do not write an audio file to disk at the Telegram stage and at the FastAPI stage. I wrote them down for debugging.
- Write tests/unit-tests
- Reduce the size of the docker image for services (delete the cache for example)
- Perhaps there are more competent ways to extract the necessary modules from the ASR model (file: fast_api_module/utils/ASR_modules.py, function: get_modules)
- Reduce Requirers for FastAPI
- You can use guvicorn instead of uvicorn
- Make another newt server for CTCDecoder to extract text