- Model Serving Platform for seamless demonstration of models from multiple ML projects.
- Nvidia Triton Inference Server - GPU Inference
- For more details, visit the Nvidia Triton Documentation.
- FastAPI - API serving
- MongoDB - Advanced Data/Log Management
- A model serving platform that can be utilized in any project.
- Enables ML engineers without MLOps knowledge to deploy ML models immediately.
- Based on Docker, enabling easy deployment of models from TensorFlow, PyTorch, ONNX, etc.
name: "wine_prediction_model"
platform: "pytorch_libtorch"
input [
{
name: "wine_data__0"
data_type: TYPE_FP32
dims: [ -1, 77 ]
},
{
name: "climate_data__1"
data_type: TYPE_FP32
dims: [ -1, 7, 22 ]
}
]
output [
{
name: "output__0"
data_type: TYPE_FP32
dims: [ 1 ]
}
]
docker pull nvcr.io/nvidia/tritonserver:21.10-py3
docker run --gpus all --rm \
-p8000:8000 -p8001:8001 -p8002:8002 \
-v /home/ubuntu/playground/neuroflow_models:/models\
nvcr.io/nvidia/tritonserver:20.08-py3 \
tritonserver --model-repository=/models
- Export Model using
torch.jit.trace
- Adhere to the naming convention -
NAME__INDEX
format - Beware of Input/Ouput Dimentions writing
.pbtxt
configs.