This repository contains code for a flask server that's containerized and deployed to Vertex AI on GCP.
The flask server provides access to the following Sunbird AI models:
- ASR (speech to text) for Luganda.
- Translation (local languages to English and English to local languages.
- TTS (coming soon to the API)
The process of deployment is as follows:
- The models are pulled from HuggingFace. See asr_inference and translate_inference.
- The flask app exposes 2 endpoints:
isalive
andpredict
as required by Vertex AI. Thepredict
endpoint receives a list of inference requests, passes them to the model and returns the results. - A docker container is built from this flask app and is pushed to the Google container repository (GCR).
- On Vertex AI, a "model" is created from this container and then deployed to a Vertex endpoint.
NOTE: Check out this article for a detailed tutorial on this process.
The resulting endpoint is then used in the main Sunbird AI API.
- Add TTS
- Handle long audio files.
- Use a smaller base container, current container (
huggingface/transformers-pytorch-gpu
) is pretty heavy and maybe unncessary. This would enable us to end up with a smaller artificat which takes up less memory. - Automate the deployment process for both the API and this inference service (using Github Actions or Terraform...or both?)
- Come up with an end-to-end workflow from data ingestion to deployment (what tools are required for this?).