Skip to content

Latest commit

 

History

History
181 lines (133 loc) · 6.37 KB

README.md

File metadata and controls

181 lines (133 loc) · 6.37 KB

Quick Start Guide

To quickly get started using ModelMesh Serving, here is a brief guide.

Prerequisites

  • A Kubernetes cluster v 1.16+ with cluster administrative privileges
  • kubectl and kustomize (v4.0.0+)
  • At least 4 vCPU and 8 GB memory. For more details, please see here.

1. Install ModelMesh Serving

Clone the repository

git clone [email protected]:kserve/modelmesh-serving.git
cd modelmesh-serving

Run install script

kubectl create namespace modelmesh-serving
./scripts/install.sh --namespace modelmesh-serving --quickstart

This will install ModelMesh serving in the modelmesh-serving namespace, along with an etcd and MinIO instances. Eventually after running this script, you should see a Successfully installed ModelMesh Serving! message.

Verify installation

Check that the pods are running:

kubectl get pods

NAME                                        READY   STATUS    RESTARTS   AGE
pod/etcd                                    1/1     Running   0          5m
pod/minio                                   1/1     Running   0          5m
pod/modelmesh-controller-547bfb64dc-mrgrq   1/1     Running   0          5m

Check that the ServingRuntimes are available:

kubectl get servingruntimes

NAME           DISABLED   MODELTYPE    CONTAINERS   AGE
mlserver-0.x              sklearn      mlserver     5m
triton-2.x                tensorflow   triton       5m

ServingRuntimes are automatically provisioned based on the framework of the model deployed. Two ServingRuntimes are included with ModelMesh Serving by default. The current mappings for these are:

ServingRuntime Supported Frameworks
triton-2.x tensorflow, pytorch, onnx, tensorrt
mlserver-0.x sklearn, xgboost, lightgbm

To see more detailed instructions and information, click here.

2. Deploy a model

With ModelMesh Serving now installed, try deploying a model using the Predictor CRD.

Note: ModelMesh Serving also supports deployment using KFServing's TrainedModel interface. Please refer to these instructions for information on alternatively using TrainedModels.

Here, we deploy an SKLearn MNIST model which is served from the local MinIO container:

kubectl apply -f - <<EOF
apiVersion: serving.kserve.io/v1alpha1
kind: Predictor
metadata:
  name: example-mnist-predictor
spec:
  modelType:
    name: sklearn
  path: sklearn/mnist-svm.joblib
  storage:
    s3:
      secretKey: localMinIO
EOF

After applying this predictor, you should see it in the Loading state:

kubectl get predictors

NAME                      TYPE      AVAILABLE   ACTIVEMODEL   TARGETMODEL   TRANSITION   AGE
example-mnist-predictor   sklearn   false       Loading                     UpToDate     7s

Eventually, you should see the ServingRuntime pods that will hold the SKLearn model become Running.

kubectl get pods

...
modelmesh-serving-mlserver-0.x-7db675f677-twrwd   3/3     Running   0          2m
modelmesh-serving-mlserver-0.x-7db675f677-xvd8q   3/3     Running   0          2m

Then, checking on the predictors again, you should see that it is now available:

kubectl get predictors

NAME                      TYPE      AVAILABLE   ACTIVEMODEL   TARGETMODEL   TRANSITION   AGE
example-mnist-predictor   sklearn   true        Loaded                      UpToDate     2m

To see more detailed instructions and information, click here.

3. Perform a gRPC inference request

Now that a model is loaded and available, you can then perform inference. Currently, only gRPC inference requests are supported. By default, ModelMesh Serving uses a headless Service since a normal Service has issues load balancing gRPC requests. See more info here.

To test out inference requests, you can port-forward the headless service in a separate terminal window:

kubectl port-forward --address 0.0.0.0 service/modelmesh-serving  8033 -n modelmesh-serving

Then a gRPC client generated from the KFServing grpc_predict_v2.proto file can be used with localhost:8033. A ready-to-use Python example of this can be found here.

Alternatively, you can test inference with grpcurl. This can easily be installed with brew install grpcurl if on macOS.

With grpcurl, a request can be sent to the SKLearn MNIST model like the following. Make sure that the MODEL_NAME variable below is set to the name of your Predictor/TrainedModel.

MODEL_NAME=example-mnist-predictor
grpcurl \
  -plaintext \
  -proto fvt/proto/kfs_inference_v2.proto \
  -d '{ "model_name": "'"${MODEL_NAME}"'", "inputs": [{ "name": "predict", "shape": [1, 64], "datatype": "FP32", "contents": { "fp32_contents": [0.0, 0.0, 1.0, 11.0, 14.0, 15.0, 3.0, 0.0, 0.0, 1.0, 13.0, 16.0, 12.0, 16.0, 8.0, 0.0, 0.0, 8.0, 16.0, 4.0, 6.0, 16.0, 5.0, 0.0, 0.0, 5.0, 15.0, 11.0, 13.0, 14.0, 0.0, 0.0, 0.0, 0.0, 2.0, 12.0, 16.0, 13.0, 0.0, 0.0, 0.0, 0.0, 0.0, 13.0, 16.0, 16.0, 6.0, 0.0, 0.0, 0.0, 0.0, 16.0, 16.0, 16.0, 7.0, 0.0, 0.0, 0.0, 0.0, 11.0, 13.0, 12.0, 1.0, 0.0] }}]}' \
  localhost:8033 \
  inference.GRPCInferenceService.ModelInfer

This should give you output like the following:

{
  "modelName": "example-mnist-predictor__ksp-7702c1b55a",
  "outputs": [
    {
      "name": "predict",
      "datatype": "FP32",
      "shape": [
        "1"
      ],
      "contents": {
        "fp32Contents": [
          8
        ]
      }
    }
  ]
}

To see more detailed instructions and information, click here.

4. (Optional) Deleting your ModelMesh Serving installation

To delete all ModelMesh Serving resources that were installed, run the following from the root of the project:

./scripts/delete.sh --namespace modelmesh-serving