Skip to content

A Flask Server that's deployed to Vertex AI as a container and provides inference for the Sunbird AI API.

Notifications You must be signed in to change notification settings

SunbirdAI/api-inference-server

Repository files navigation

Sunbird API Inference Service

This repository contains code for a flask server that's containerized and deployed to Vertex AI on GCP.

The flask server provides access to the following Sunbird AI models:

The process of deployment is as follows:

  • The models are pulled from HuggingFace. See asr_inference and translate_inference.
  • The flask app exposes 2 endpoints: isalive and predict as required by Vertex AI. The predict endpoint receives a list of inference requests, passes them to the model and returns the results.
  • A docker container is built from this flask app and is pushed to the Google container repository (GCR).
  • On Vertex AI, a "model" is created from this container and then deployed to a Vertex endpoint.

NOTE: Check out this article for a detailed tutorial on this process.

The resulting endpoint is then used in the main Sunbird AI API.

TODOs

  • Add TTS
  • Handle long audio files.
  • Use a smaller base container, current container (huggingface/transformers-pytorch-gpu) is pretty heavy and maybe unncessary. This would enable us to end up with a smaller artificat which takes up less memory.
  • Automate the deployment process for both the API and this inference service (using Github Actions or Terraform...or both?)
  • Come up with an end-to-end workflow from data ingestion to deployment (what tools are required for this?).

About

A Flask Server that's deployed to Vertex AI as a container and provides inference for the Sunbird AI API.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published