This folder contains a demonstration of Kubeflow capabilities, suitable for presentation to public audiences.
This demo highlights the use of pipelines and hyperparameter tuning on a GKE cluster with node autoprovisioning (NAP). A simple pipeline requests GPU resources, which triggers node pool creation. This demo includes the following steps:
Follow the instructions in demo_setup/README.md to setup your environment and install Kubeflow with pipelines on an autoprovisioning GKE cluster.
View the installed components in the GCP Console.
- In the
Kubernetes Engine
section, you will see a new cluster ${CLUSTER} with 3
n1-standard-1
nodes - Under Workloads, you will see all the default Kubeflow and pipeline components.
Source the environment file and activate the conda environment for pipelines:
source kubeflow-demo-simple-pipeline.env
source activate kfp
Show the file gpu-example-pipeline.py
as an example of a simple pipeline.
Compile it to create a .tar.gz file:
./gpu-example-pipeline.py
View the pipelines UI locally by forwarding a port to the ml-pipeline-ui pod:
PIPELINES_POD=$(kubectl get po -l app=ml-pipeline-ui | \
grep ml-pipeline-ui | \
head -n 1 | \
cut -d " " -f 1 )
kubectl port-forward ${PIPELINES_POD} 8080:3000
In the browser, navigate to localhost:8080
and create a new pipeline by
uploading gpu-example-pipeline.py.tar.gz
. Select the pipeline and click
Create experiment. Use all suggested defaults.
View the effects of autoprovisioning by observing the number of nodes increase.
Select Experiments from the left-hand side, then Runs. Click on the experiment run to view the graph and watch it execute.
View the container logs for the training step and take note of the low accuracy (~0.113).
In order to determine parameters that result in higher accuracy, use Katib to execute a Study, which defines a search space for performing training with a range of different parameters.
Create a Study by applying an example file to the cluster:
kubectl apply -f gpu-example-katib.yaml
This creates a Studyjob object. To view it:
kubectl get studyjob
kubectl describe studyjobs gpu-example
To view the Katib UI, connect to the modeldb-frontend pod:
KATIB_POD=$(kubectl get po -l app=modeldb,component=frontend | \
grep modeldb-frontend | \
head -n 1 | \
cut -d " " -f 1 )
kubectl port-forward ${KATIB_POD} 8081:3000
In the browser, navigate to localhost:8081/katib
and click on the
gpu-example project. In the Explore Visualizations section, select
Optimizer in the Group By dropdown, then click Compare.
While you're waiting, watch for autoprovisioning to occur. View the pods in Pending status.
View the creation of a new GPU node pool:
gcloud container node-pools list --cluster ${CLUSTER}
View the creation of new nodes:
kubectl get nodes
In the Katib UI, interact with the various graphs to determine which combination of parameters results in the highest accuracy. Grouping by optimizer type is one way to find consistently higher accuracies. Gather a set of parameters to use in a new run of the pipeline.
In the pipelines UI, clone the previous experiment run and update the arguments to match the parameters for one of the runs with higher accuracies from the Katib UI. Execute the pipeline and watch for the resulting accuracy, which should be closer to 0.98.
Approximately 5 minutes after the last run completes, check the cluster nodes to verify that GPU nodes have disappeared.