diff --git a/mnist/mnist_gcp.ipynb b/mnist/mnist_gcp.ipynb index 37316f988..178bf7e87 100644 --- a/mnist/mnist_gcp.ipynb +++ b/mnist/mnist_gcp.ipynb @@ -4,17 +4,18 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# MNIST E2E on Kubeflow on GKE\n", + "# MNIST end to end on Kubeflow on GKE\n", "\n", "This example guides you through:\n", " \n", - " 1. Taking an example TensorFlow model and modifying it to support distributed training\n", - " 1. Serving the resulting model using TFServing\n", - " 1. Deploying and using a web-app that uses the model\n", + " 1. Taking an example TensorFlow model and modifying it to support distributed training.\n", + " 1. Serving the resulting model using TFServing.\n", + " 1. Deploying and using a web app that sends prediction requests to the model.\n", " \n", "## Requirements\n", "\n", - " * You must be running Kubeflow 1.0 on GKE with IAP\n", + " * You must be running Kubeflow 1.0 on Kubernetes Engine (GKE) with Cloud Identity-Aware Proxy (Cloud IAP).\n", + " * Run this notebook within your Kubeflow cluster. See the guide to [setting up your Kubeflow notebooks](https://www.kubeflow.org/docs/notebooks/setup/).\n", " " ] }, @@ -22,31 +23,31 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Prepare model\n", + "### Prepare model\n", "\n", - "There is a delta between existing distributed mnist examples and what's needed to run well as a TFJob.\n", + "There is a delta between existing distributed MNIST examples and what's needed to run well as a TFJob.\n", "\n", - "Basically, we must:\n", + "Basically, you must:\n", "\n", - "1. Add options in order to make the model configurable.\n", - "1. Use `tf.estimator.train_and_evaluate` to enable model exporting and serving.\n", - "1. Define serving signatures for model serving.\n", + "* Add options in order to make the model configurable.\n", + "* Use `tf.estimator.train_and_evaluate` to enable model exporting and serving.\n", + "* Define serving signatures for model serving.\n", "\n", - "The resulting model is [model.py](model.py)." + "This tutorial provides a Python program that's already prepared for you: [model.py](model.py)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Verify we have a GCP account\n", + "### Verify that you have a Google Cloud Platform (GCP) account\n", "\n", - "* The cell below checks that this notebook was spawned with credentials to access GCP\n" + "The cell below checks that this notebook was spawned with credentials to access GCP.\n" ] }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 1, "metadata": {}, "outputs": [], "source": [ @@ -62,14 +63,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Install Required Libraries\n", + "## Install the required libraries\n", "\n", - "Import the libraries required to train this model." + "Run the next cell to import the libraries required to train this model." ] }, { "cell_type": "code", - "execution_count": 10, + "execution_count": 2, "metadata": {}, "outputs": [ { @@ -77,7 +78,10 @@ "output_type": "stream", "text": [ "pip installing requirements.txt\n", + "Cloning the tf-operator repo\n", "Checkout kubeflow/tf-operator @9238906\n", + "Adding /home/jovyan/.local/lib/python3.6/site-packages to python path\n", + "Adding /home/jovyan/git_tf-operator/sdk/python to python path\n", "Configure docker credentials\n" ] } @@ -88,15 +92,22 @@ "notebook_setup.notebook_setup()" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Wait for the message `Configure docker credentials` before moving on to the next cell." + ] + }, { "cell_type": "code", - "execution_count": 11, + "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "import k8s_util\n", - "# Force a reload of kubeflow; since kubeflow is a multi namespace module\n", - "# it looks like doing this in notebook_setup may not be sufficient\n", + "# Force a reload of Kubeflow. Since Kubeflow is a multi namespace module,\n", + "# doing the reload in notebook_setup may not be sufficient.\n", "import kubeflow\n", "reload(kubeflow)\n", "from kubernetes import client as k8s_client\n", @@ -110,25 +121,25 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Configure The Docker Registry For Kubeflow Fairing\n", + "## Configure a Docker registry for Kubeflow Fairing\n", "\n", - "* In order to build docker images from your notebook we need a docker registry where the images will be stored\n", - "* Below you set some variables specifying a [GCR container registry](https://cloud.google.com/container-registry/docs/)\n", - "* Kubeflow Fairing provides a utility function to guess the name of your GCP project" + "* In order to build Docker images from your notebook, you need a Docker registry to store the images.\n", + "* Below you set some variables specifying a [Container Registry](https://cloud.google.com/container-registry/docs/).\n", + "* Kubeflow Fairing provides a utility function to guess the name of your GCP project." ] }, { "cell_type": "code", - "execution_count": 12, + "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ - "Running in project jlewi-dev\n", - "Running in namespace kubeflow-jlewi\n", - "Using docker registry gcr.io/jlewi-dev/fairing-job\n" + "Running in project kubeflow-writers\n", + "Running in namespace kubeflow-sarahmaddox\n", + "Using Docker registry gcr.io/kubeflow-writers/fairing-job\n" ] } ], @@ -141,31 +152,31 @@ "from kubeflow.fairing.deployers import job\n", "from kubeflow.fairing.preprocessors import base as base_preprocessor\n", "\n", - "# Setting up google container repositories (GCR) for storing output containers\n", - "# You can use any docker container registry istead of GCR\n", + "# Setting up Google Container Registry (GCR) for storing output containers.\n", + "# You can use any Docker container registry instead of GCR.\n", "GCP_PROJECT = fairing.cloud.gcp.guess_project_name()\n", "DOCKER_REGISTRY = 'gcr.io/{}/fairing-job'.format(GCP_PROJECT)\n", "namespace = fairing_utils.get_current_k8s_namespace()\n", "\n", "logging.info(f\"Running in project {GCP_PROJECT}\")\n", "logging.info(f\"Running in namespace {namespace}\")\n", - "logging.info(f\"Using docker registry {DOCKER_REGISTRY}\")\n" + "logging.info(f\"Using Docker registry {DOCKER_REGISTRY}\")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Use Kubeflow fairing to build the docker image\n", + "## Use Kubeflow Fairing to build the Docker image\n", "\n", - "* You will use kubeflow fairing's kaniko builder to build a docker image that includes all your dependencies\n", - " * You use kaniko because you want to be able to run `pip` to install dependencies\n", - " * Kaniko gives you the flexibility to build images from Dockerfiles" + "This notebook uses Kubeflow Fairing's kaniko builder to build a Docker image that includes all your dependencies.\n", + " * You use kaniko because you want to be able to run `pip` to install dependencies.\n", + " * Kaniko gives you the flexibility to build images from Dockerfiles." ] }, { "cell_type": "code", - "execution_count": 13, + "execution_count": 5, "metadata": {}, "outputs": [], "source": [ @@ -177,7 +188,7 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": 6, "metadata": {}, "outputs": [ { @@ -186,7 +197,7 @@ "set()" ] }, - "execution_count": 14, + "execution_count": 6, "metadata": {}, "output_type": "execute_result" } @@ -211,9 +222,16 @@ "preprocessor.preprocess()" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Run the next cell and wait until you see a message like `Built image gcr.io//fairing-job/mnist:<1234567>`." + ] + }, { "cell_type": "code", - "execution_count": 15, + "execution_count": 7, "metadata": {}, "outputs": [ { @@ -221,11 +239,11 @@ "output_type": "stream", "text": [ "Building image using cluster builder.\n", - "Creating docker context: /tmp/fairing_context_n8ikop1c\n", + "Creating docker context: /tmp/fairing_context_ohm2nlbv\n", "Dockerfile already exists in Fairing context, skipping...\n", - "Waiting for fairing-builder-nv9dh-2kwz9 to start...\n", - "Waiting for fairing-builder-nv9dh-2kwz9 to start...\n", - "Waiting for fairing-builder-nv9dh-2kwz9 to start...\n", + "Waiting for fairing-builder-9vw9w-ndbhd to start...\n", + "Waiting for fairing-builder-9vw9w-ndbhd to start...\n", + "Waiting for fairing-builder-9vw9w-ndbhd to start...\n", "Pod started running True\n" ] }, @@ -233,40 +251,42 @@ "name": "stdout", "output_type": "stream", "text": [ - "ERROR: logging before flag.Parse: E0212 21:28:24.488770 1 metadata.go:241] Failed to unmarshal scopes: invalid character 'h' looking for beginning of value\n", - "\u001b[36mINFO\u001b[0m[0002] Resolved base name tensorflow/tensorflow:1.15.2-py3 to tensorflow/tensorflow:1.15.2-py3\n", - "\u001b[36mINFO\u001b[0m[0002] Resolved base name tensorflow/tensorflow:1.15.2-py3 to tensorflow/tensorflow:1.15.2-py3\n", - "\u001b[36mINFO\u001b[0m[0002] Downloading base image tensorflow/tensorflow:1.15.2-py3\n", - "ERROR: logging before flag.Parse: E0212 21:28:24.983416 1 metadata.go:142] while reading 'google-dockercfg' metadata: http status code: 404 while fetching url http://metadata.google.internal./computeMetadata/v1/instance/attributes/google-dockercfg\n", - "ERROR: logging before flag.Parse: E0212 21:28:24.989996 1 metadata.go:159] while reading 'google-dockercfg-url' metadata: http status code: 404 while fetching url http://metadata.google.internal./computeMetadata/v1/instance/attributes/google-dockercfg-url\n", - "\u001b[36mINFO\u001b[0m[0002] Error while retrieving image from cache: getting file info: stat /cache/sha256:28b5f547969d70f825909c8fe06675ffc2959afe6079aeae754afa312f6417b9: no such file or directory\n", - "\u001b[36mINFO\u001b[0m[0002] Downloading base image tensorflow/tensorflow:1.15.2-py3\n", - "\u001b[36mINFO\u001b[0m[0003] Built cross stage deps: map[]\n", - "\u001b[36mINFO\u001b[0m[0003] Downloading base image tensorflow/tensorflow:1.15.2-py3\n", - "\u001b[36mINFO\u001b[0m[0003] Error while retrieving image from cache: getting file info: stat /cache/sha256:28b5f547969d70f825909c8fe06675ffc2959afe6079aeae754afa312f6417b9: no such file or directory\n", - "\u001b[36mINFO\u001b[0m[0003] Downloading base image tensorflow/tensorflow:1.15.2-py3\n", - "\u001b[36mINFO\u001b[0m[0003] Using files from context: [/kaniko/buildcontext/model.py]\n", - "\u001b[36mINFO\u001b[0m[0003] Checking for cached layer gcr.io/jlewi-dev/fairing-job/mnist/cache:6802122184979734f01a549e1224c5f46a277db894d4b3e749e41ad1ca522bdf...\n", - "\u001b[36mINFO\u001b[0m[0004] Using caching version of cmd: RUN chmod +x /opt/model.py\n", - "\u001b[36mINFO\u001b[0m[0004] Skipping unpacking as no commands require it.\n", - "\u001b[36mINFO\u001b[0m[0004] Taking snapshot of full filesystem...\n", - "\u001b[36mINFO\u001b[0m[0004] Using files from context: [/kaniko/buildcontext/model.py]\n", - "\u001b[36mINFO\u001b[0m[0004] ADD model.py /opt/model.py\n", - "\u001b[36mINFO\u001b[0m[0004] Taking snapshot of files...\n", - "\u001b[36mINFO\u001b[0m[0004] RUN chmod +x /opt/model.py\n", - "\u001b[36mINFO\u001b[0m[0004] Found cached layer, extracting to filesystem\n", - "\u001b[36mINFO\u001b[0m[0004] Taking snapshot of files...\n", - "\u001b[36mINFO\u001b[0m[0004] ENTRYPOINT [\"/usr/bin/python\"]\n", - "\u001b[36mINFO\u001b[0m[0004] No files changed in this command, skipping snapshotting.\n", - "\u001b[36mINFO\u001b[0m[0004] CMD [\"/opt/model.py\"]\n", - "\u001b[36mINFO\u001b[0m[0004] No files changed in this command, skipping snapshotting.\n" + "ERROR: logging before flag.Parse: E0226 02:34:42.750776 1 metadata.go:241] Failed to unmarshal scopes: invalid character 'h' looking for beginning of value\n", + "\u001b[36mINFO\u001b[0m[0004] Resolved base name tensorflow/tensorflow:1.15.2-py3 to tensorflow/tensorflow:1.15.2-py3\n", + "\u001b[36mINFO\u001b[0m[0004] Resolved base name tensorflow/tensorflow:1.15.2-py3 to tensorflow/tensorflow:1.15.2-py3\n", + "\u001b[36mINFO\u001b[0m[0004] Downloading base image tensorflow/tensorflow:1.15.2-py3\n", + "ERROR: logging before flag.Parse: E0226 02:34:44.230593 1 metadata.go:142] while reading 'google-dockercfg' metadata: http status code: 404 while fetching url http://metadata.google.internal./computeMetadata/v1/instance/attributes/google-dockercfg\n", + "ERROR: logging before flag.Parse: E0226 02:34:44.233477 1 metadata.go:159] while reading 'google-dockercfg-url' metadata: http status code: 404 while fetching url http://metadata.google.internal./computeMetadata/v1/instance/attributes/google-dockercfg-url\n", + "\u001b[36mINFO\u001b[0m[0004] Error while retrieving image from cache: getting file info: stat /cache/sha256:28b5f547969d70f825909c8fe06675ffc2959afe6079aeae754afa312f6417b9: no such file or directory\n", + "\u001b[36mINFO\u001b[0m[0004] Downloading base image tensorflow/tensorflow:1.15.2-py3\n", + "\u001b[36mINFO\u001b[0m[0005] Built cross stage deps: map[]\n", + "\u001b[36mINFO\u001b[0m[0005] Downloading base image tensorflow/tensorflow:1.15.2-py3\n", + "\u001b[36mINFO\u001b[0m[0005] Error while retrieving image from cache: getting file info: stat /cache/sha256:28b5f547969d70f825909c8fe06675ffc2959afe6079aeae754afa312f6417b9: no such file or directory\n", + "\u001b[36mINFO\u001b[0m[0005] Downloading base image tensorflow/tensorflow:1.15.2-py3\n", + "\u001b[36mINFO\u001b[0m[0005] Using files from context: [/kaniko/buildcontext/model.py]\n", + "\u001b[36mINFO\u001b[0m[0005] Checking for cached layer gcr.io/kubeflow-writers/fairing-job/mnist/cache:6802122184979734f01a549e1224c5f46a277db894d4b3e749e41ad1ca522bdf...\n", + "\u001b[36mINFO\u001b[0m[0006] No cached layer found for cmd RUN chmod +x /opt/model.py\n", + "\u001b[36mINFO\u001b[0m[0006] Unpacking rootfs as cmd RUN chmod +x /opt/model.py requires it.\n", + "\u001b[36mINFO\u001b[0m[0029] Taking snapshot of full filesystem...\n", + "\u001b[36mINFO\u001b[0m[0042] Using files from context: [/kaniko/buildcontext/model.py]\n", + "\u001b[36mINFO\u001b[0m[0042] ADD model.py /opt/model.py\n", + "\u001b[36mINFO\u001b[0m[0042] Taking snapshot of files...\n", + "\u001b[36mINFO\u001b[0m[0042] RUN chmod +x /opt/model.py\n", + "\u001b[36mINFO\u001b[0m[0042] cmd: /bin/sh\n", + "\u001b[36mINFO\u001b[0m[0042] args: [-c chmod +x /opt/model.py]\n", + "\u001b[36mINFO\u001b[0m[0042] Taking snapshot of full filesystem...\n", + "\u001b[36mINFO\u001b[0m[0045] ENTRYPOINT [\"/usr/bin/python\"]\n", + "\u001b[36mINFO\u001b[0m[0045] Pushing layer gcr.io/kubeflow-writers/fairing-job/mnist/cache:6802122184979734f01a549e1224c5f46a277db894d4b3e749e41ad1ca522bdf to cache now\n", + "\u001b[36mINFO\u001b[0m[0045] No files changed in this command, skipping snapshotting.\n", + "\u001b[36mINFO\u001b[0m[0045] CMD [\"/opt/model.py\"]\n", + "\u001b[36mINFO\u001b[0m[0045] No files changed in this command, skipping snapshotting.\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ - "Built image gcr.io/jlewi-dev/fairing-job/mnist:24327351\n" + "Built image gcr.io/kubeflow-writers/fairing-job/mnist:8310D75B\n" ] } ], @@ -288,22 +308,23 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Create a GCS Bucket\n", + "## Create a Cloud Storage bucket\n", + "\n", + "Run the next cell to create a Google Cloud Storage (GCS) bucket to store your models and other results.\n", "\n", - "* Create a GCS bucket to store our models and other results.\n", - "* Since we are running in python we use the python client libraries but you could also use the `gsutil` command line" + "Since this notebook is running in Python, the cell uses the GCS Python client libraries, but you can use the `gsutil` command line instead." ] }, { "cell_type": "code", - "execution_count": 16, + "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ - "Bucket jlewi-dev-mnist already exists\n" + "Creating bucket kubeflow-writers-mnist\n" ] } ], @@ -327,12 +348,12 @@ "source": [ "## Distributed training\n", "\n", - "* We will train the model by using TFJob to run a distributed training job" + "To train the model, this example uses [TFJob](https://www.kubeflow.org/docs/components/training/tftraining/) to run a distributed training job. Run the next cell to set up the YAML specification for the job:" ] }, { "cell_type": "code", - "execution_count": 17, + "execution_count": 9, "metadata": {}, "outputs": [], "source": [ @@ -424,16 +445,17 @@ "source": [ "### Create the training job\n", "\n", - "* You could write the spec to a YAML file and then do `kubectl apply -f {FILE}`\n", - "* Since you are running in jupyter you will use the TFJob client\n", - "* You will run the TFJob in a namespace created by a Kubeflow profile\n", - " * The namespace will be the same namespace you are running the notebook in\n", - " * Creating a profile ensures the namespace is provisioned with service accounts and other resources needed for Kubeflow" + "To submit the training job, you could write the spec to a YAML file and then do `kubectl apply -f {FILE}`.\n", + "\n", + "However, because you are running in a Jupyter notebook, you use the TFJob client. \n", + "* You run the TFJob in a namespace created by a Kubeflow profile.\n", + "* The namespace is the same as the namespace where you are running the notebook.\n", + "* Creating a profile ensures that the namespace is provisioned with service accounts and other resources needed for Kubeflow." ] }, { "cell_type": "code", - "execution_count": 18, + "execution_count": 10, "metadata": {}, "outputs": [], "source": [ @@ -442,14 +464,14 @@ }, { "cell_type": "code", - "execution_count": 21, + "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ - "TFJob kubeflow-jlewi.mnist-train-2c73 succeeded\n" + "Created job kubeflow-sarahmaddox.mnist-train-289e\n" ] } ], @@ -464,126 +486,125 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Check the job\n", + "### Check the job using kubectl\n", "\n", - "* Above you used the python SDK for TFJob to check the status\n", - "* You can also use kubectl get the status of your job\n", - "* The job conditions will tell you whether the job is running, succeeded or failed" + "Above you used the Python SDK for TFJob to check the status. You can also use kubectl get the status of your job. \n", + "The job conditions will tell you whether the job is running, succeeded or failed." ] }, { "cell_type": "code", - "execution_count": 22, + "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "apiVersion: kubeflow.org/v1\n", - "kind: TFJob\n", - "metadata:\n", - " creationTimestamp: \"2020-02-12T21:28:31Z\"\n", - " generation: 1\n", - " name: mnist-train-2c73\n", - " namespace: kubeflow-jlewi\n", - " resourceVersion: \"1730369\"\n", - " selfLink: /apis/kubeflow.org/v1/namespaces/kubeflow-jlewi/tfjobs/mnist-train-2c73\n", - " uid: 9e27854c-4dde-11ea-9830-42010a8e016f\n", - "spec:\n", - " tfReplicaSpecs:\n", - " Chief:\n", - " replicas: 1\n", - " template:\n", - " metadata:\n", - " annotations:\n", - " sidecar.istio.io/inject: \"false\"\n", - " spec:\n", - " containers:\n", - " - command:\n", - " - python\n", - " - /opt/model.py\n", - " - --tf-model-dir=gs://jlewi-dev-mnist/mnist\n", - " - --tf-export-dir=gs://jlewi-dev-mnist/mnist/export\n", - " - --tf-train-steps=200\n", - " - --tf-batch-size=100\n", - " - --tf-learning-rate=0.01\n", - " image: gcr.io/jlewi-dev/fairing-job/mnist:24327351\n", - " name: tensorflow\n", - " workingDir: /opt\n", - " restartPolicy: OnFailure\n", - " serviceAccount: default-editor\n", - " Ps:\n", - " replicas: 1\n", - " template:\n", - " metadata:\n", - " annotations:\n", - " sidecar.istio.io/inject: \"false\"\n", - " spec:\n", - " containers:\n", - " - command:\n", - " - python\n", - " - /opt/model.py\n", - " - --tf-model-dir=gs://jlewi-dev-mnist/mnist\n", - " - --tf-export-dir=gs://jlewi-dev-mnist/mnist/export\n", - " - --tf-train-steps=200\n", - " - --tf-batch-size=100\n", - " - --tf-learning-rate=0.01\n", - " image: gcr.io/jlewi-dev/fairing-job/mnist:24327351\n", - " name: tensorflow\n", - " workingDir: /opt\n", - " restartPolicy: OnFailure\n", - " serviceAccount: default-editor\n", - " Worker:\n", - " replicas: 1\n", - " template:\n", - " metadata:\n", - " annotations:\n", - " sidecar.istio.io/inject: \"false\"\n", - " spec:\n", - " containers:\n", - " - command:\n", - " - python\n", - " - /opt/model.py\n", - " - --tf-model-dir=gs://jlewi-dev-mnist/mnist\n", - " - --tf-export-dir=gs://jlewi-dev-mnist/mnist/export\n", - " - --tf-train-steps=200\n", - " - --tf-batch-size=100\n", - " - --tf-learning-rate=0.01\n", - " image: gcr.io/jlewi-dev/fairing-job/mnist:24327351\n", - " name: tensorflow\n", - " workingDir: /opt\n", - " restartPolicy: OnFailure\n", - " serviceAccount: default-editor\n", - "status:\n", - " completionTime: \"2020-02-12T21:28:53Z\"\n", - " conditions:\n", - " - lastTransitionTime: \"2020-02-12T21:28:31Z\"\n", - " lastUpdateTime: \"2020-02-12T21:28:31Z\"\n", - " message: TFJob mnist-train-2c73 is created.\n", - " reason: TFJobCreated\n", - " status: \"True\"\n", - " type: Created\n", - " - lastTransitionTime: \"2020-02-12T21:28:34Z\"\n", - " lastUpdateTime: \"2020-02-12T21:28:34Z\"\n", - " message: TFJob mnist-train-2c73 is running.\n", - " reason: TFJobRunning\n", - " status: \"False\"\n", - " type: Running\n", - " - lastTransitionTime: \"2020-02-12T21:28:53Z\"\n", - " lastUpdateTime: \"2020-02-12T21:28:53Z\"\n", - " message: TFJob mnist-train-2c73 successfully completed.\n", - " reason: TFJobSucceeded\n", - " status: \"True\"\n", - " type: Succeeded\n", - " replicaStatuses:\n", - " Chief:\n", - " succeeded: 1\n", - " PS:\n", - " succeeded: 1\n", - " Worker:\n", - " succeeded: 1\n", - " startTime: \"2020-02-12T21:28:32Z\"\n" + "apiVersion: kubeflow.org/v1\r\n", + "kind: TFJob\r\n", + "metadata:\r\n", + " creationTimestamp: \"2020-02-26T02:58:32Z\"\r\n", + " generation: 1\r\n", + " name: mnist-train-289e\r\n", + " namespace: kubeflow-sarahmaddox\r\n", + " resourceVersion: \"770252\"\r\n", + " selfLink: /apis/kubeflow.org/v1/namespaces/kubeflow-sarahmaddox/tfjobs/mnist-train-289e\r\n", + " uid: dfa23ecf-5843-11ea-9ddf-42010a80013f\r\n", + "spec:\r\n", + " tfReplicaSpecs:\r\n", + " Chief:\r\n", + " replicas: 1\r\n", + " template:\r\n", + " metadata:\r\n", + " annotations:\r\n", + " sidecar.istio.io/inject: \"false\"\r\n", + " spec:\r\n", + " containers:\r\n", + " - command:\r\n", + " - python\r\n", + " - /opt/model.py\r\n", + " - --tf-model-dir=gs://kubeflow-writers-mnist/mnist\r\n", + " - --tf-export-dir=gs://kubeflow-writers-mnist/mnist/export\r\n", + " - --tf-train-steps=200\r\n", + " - --tf-batch-size=100\r\n", + " - --tf-learning-rate=0.01\r\n", + " image: gcr.io/kubeflow-writers/fairing-job/mnist:8310D75B\r\n", + " name: tensorflow\r\n", + " workingDir: /opt\r\n", + " restartPolicy: OnFailure\r\n", + " serviceAccount: default-editor\r\n", + " Ps:\r\n", + " replicas: 1\r\n", + " template:\r\n", + " metadata:\r\n", + " annotations:\r\n", + " sidecar.istio.io/inject: \"false\"\r\n", + " spec:\r\n", + " containers:\r\n", + " - command:\r\n", + " - python\r\n", + " - /opt/model.py\r\n", + " - --tf-model-dir=gs://kubeflow-writers-mnist/mnist\r\n", + " - --tf-export-dir=gs://kubeflow-writers-mnist/mnist/export\r\n", + " - --tf-train-steps=200\r\n", + " - --tf-batch-size=100\r\n", + " - --tf-learning-rate=0.01\r\n", + " image: gcr.io/kubeflow-writers/fairing-job/mnist:8310D75B\r\n", + " name: tensorflow\r\n", + " workingDir: /opt\r\n", + " restartPolicy: OnFailure\r\n", + " serviceAccount: default-editor\r\n", + " Worker:\r\n", + " replicas: 1\r\n", + " template:\r\n", + " metadata:\r\n", + " annotations:\r\n", + " sidecar.istio.io/inject: \"false\"\r\n", + " spec:\r\n", + " containers:\r\n", + " - command:\r\n", + " - python\r\n", + " - /opt/model.py\r\n", + " - --tf-model-dir=gs://kubeflow-writers-mnist/mnist\r\n", + " - --tf-export-dir=gs://kubeflow-writers-mnist/mnist/export\r\n", + " - --tf-train-steps=200\r\n", + " - --tf-batch-size=100\r\n", + " - --tf-learning-rate=0.01\r\n", + " image: gcr.io/kubeflow-writers/fairing-job/mnist:8310D75B\r\n", + " name: tensorflow\r\n", + " workingDir: /opt\r\n", + " restartPolicy: OnFailure\r\n", + " serviceAccount: default-editor\r\n", + "status:\r\n", + " completionTime: \"2020-02-26T02:59:58Z\"\r\n", + " conditions:\r\n", + " - lastTransitionTime: \"2020-02-26T02:58:32Z\"\r\n", + " lastUpdateTime: \"2020-02-26T02:58:32Z\"\r\n", + " message: TFJob mnist-train-289e is created.\r\n", + " reason: TFJobCreated\r\n", + " status: \"True\"\r\n", + " type: Created\r\n", + " - lastTransitionTime: \"2020-02-26T02:58:35Z\"\r\n", + " lastUpdateTime: \"2020-02-26T02:58:35Z\"\r\n", + " message: TFJob mnist-train-289e is running.\r\n", + " reason: TFJobRunning\r\n", + " status: \"False\"\r\n", + " type: Running\r\n", + " - lastTransitionTime: \"2020-02-26T02:59:58Z\"\r\n", + " lastUpdateTime: \"2020-02-26T02:59:58Z\"\r\n", + " message: TFJob mnist-train-289e successfully completed.\r\n", + " reason: TFJobSucceeded\r\n", + " status: \"True\"\r\n", + " type: Succeeded\r\n", + " replicaStatuses:\r\n", + " Chief:\r\n", + " succeeded: 1\r\n", + " PS:\r\n", + " succeeded: 1\r\n", + " Worker:\r\n", + " succeeded: 1\r\n", + " startTime: \"2020-02-26T02:58:32Z\"\r\n" ] } ], @@ -595,30 +616,29 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Get The Logs\n", + "### Get the training logs\n", "\n", - "* There are two ways to get the logs for the training job\n", + "* There are two ways to get the logs for the training job:\n", "\n", - " 1. Using kubectl to fetch the pod logs\n", - " * These logs are ephemeral; they will be unavailable when the pod is garbage collected to free up resources\n", - " 1. Using stackdriver\n", + " * Using kubectl to fetch the pod logs. These logs are ephemeral; they will be unavailable when the pod is garbage collected to free up resources.\n", + " * Using Stackdriver.\n", " \n", - " * Kubernetes logs are automatically available in stackdriver\n", - " * You can use labels to locate logs for a specific pod\n", - " * In the cell below you use labels for the training job name and process type to locate the logs for a specific pod\n", + " * Kubernetes logs are automatically available in Stackdriver.\n", + " * You can use labels to locate the logs for a specific pod.\n", + " * In the cell below, you use labels for the training job name and process type to locate the logs for a specific pod.\n", " \n", - "* Run the cell below to get a link to stackdriver for your logs" + "* Run the cell below to get a link to Stackdriver for your logs:" ] }, { "cell_type": "code", - "execution_count": 23, + "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ - "Link to: chief logs" + "Link to: chief logs" ], "text/plain": [ "" @@ -630,7 +650,7 @@ { "data": { "text/html": [ - "Link to: worker logs" + "Link to: worker logs" ], "text/plain": [ "" @@ -642,7 +662,7 @@ { "data": { "text/html": [ - "Link to: ps logs" + "Link to: ps logs" ], "text/plain": [ "" @@ -679,13 +699,14 @@ "source": [ "## Deploy TensorBoard\n", "\n", - "* You will create a Kubernetes Deployment to run TensorBoard\n", - "* TensorBoard will be accessible behind the Kubeflow IAP endpoint" + "The next step is to create a Kubernetes deployment to run TensorBoard.\n", + "\n", + "TensorBoard will be accessible behind the Kubeflow IAP endpoint." ] }, { "cell_type": "code", - "execution_count": 24, + "execution_count": 14, "metadata": {}, "outputs": [], "source": [ @@ -764,20 +785,17 @@ }, { "cell_type": "code", - "execution_count": 25, + "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ - "/home/jovyan/git_kubeflow-examples/mnist/k8s_util.py:55: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.\n", + "/home/jovyan/examples/mnist/k8s_util.py:55: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.\n", " spec = yaml.load(spec)\n", - "Deleted Deployment kubeflow-jlewi.mnist-tensorboard\n", - "Created Deployment kubeflow-jlewi.mnist-tensorboard\n", - "Deleted Service kubeflow-jlewi.mnist-tensorboard\n", - "Created Service kubeflow-jlewi.mnist-tensorboard\n", - "Deleted VirtualService kubeflow-jlewi.mnist-tensorboard\n", + "Created Deployment kubeflow-sarahmaddox.mnist-tensorboard\n", + "Created Service kubeflow-sarahmaddox.mnist-tensorboard\n", "Created VirtualService mnist-tensorboard.mnist-tensorboard\n" ] }, @@ -788,7 +806,7 @@ " 'kind': 'Deployment',\n", " 'metadata': {'annotations': None,\n", " 'cluster_name': None,\n", - " 'creation_timestamp': datetime.datetime(2020, 2, 12, 21, 30, 38, tzinfo=tzlocal()),\n", + " 'creation_timestamp': datetime.datetime(2020, 2, 26, 3, 20, 4, tzinfo=tzlocal()),\n", " 'deletion_grace_period_seconds': None,\n", " 'deletion_timestamp': None,\n", " 'finalizers': None,\n", @@ -798,11 +816,11 @@ " 'labels': {'app': 'mnist-tensorboard'},\n", " 'managed_fields': None,\n", " 'name': 'mnist-tensorboard',\n", - " 'namespace': 'kubeflow-jlewi',\n", + " 'namespace': 'kubeflow-sarahmaddox',\n", " 'owner_references': None,\n", - " 'resource_version': '1731593',\n", - " 'self_link': '/apis/apps/v1/namespaces/kubeflow-jlewi/deployments/mnist-tensorboard',\n", - " 'uid': 'e9750d8b-4dde-11ea-9830-42010a8e016f'},\n", + " 'resource_version': '782392',\n", + " 'self_link': '/apis/apps/v1/namespaces/kubeflow-sarahmaddox/deployments/mnist-tensorboard',\n", + " 'uid': 'e1d50153-5846-11ea-9ddf-42010a80013f'},\n", " 'spec': {'min_ready_seconds': None,\n", " 'paused': None,\n", " 'progress_deadline_seconds': 600,\n", @@ -836,7 +854,7 @@ " 'automount_service_account_token': None,\n", " 'containers': [{'args': None,\n", " 'command': ['/usr/local/bin/tensorboard',\n", - " '--logdir=gs://jlewi-dev-mnist/mnist',\n", + " '--logdir=gs://kubeflow-writers-mnist/mnist',\n", " '--port=80'],\n", " 'env': None,\n", " 'env_from': None,\n", @@ -905,7 +923,7 @@ " 'kind': 'Service',\n", " 'metadata': {'annotations': None,\n", " 'cluster_name': None,\n", - " 'creation_timestamp': datetime.datetime(2020, 2, 12, 21, 30, 38, tzinfo=tzlocal()),\n", + " 'creation_timestamp': datetime.datetime(2020, 2, 26, 3, 20, 4, tzinfo=tzlocal()),\n", " 'deletion_grace_period_seconds': None,\n", " 'deletion_timestamp': None,\n", " 'finalizers': None,\n", @@ -915,12 +933,12 @@ " 'labels': {'app': 'mnist-tensorboard'},\n", " 'managed_fields': None,\n", " 'name': 'mnist-tensorboard',\n", - " 'namespace': 'kubeflow-jlewi',\n", + " 'namespace': 'kubeflow-sarahmaddox',\n", " 'owner_references': None,\n", - " 'resource_version': '1731608',\n", - " 'self_link': '/api/v1/namespaces/kubeflow-jlewi/services/mnist-tensorboard',\n", - " 'uid': 'e98fa09f-4dde-11ea-9830-42010a8e016f'},\n", - " 'spec': {'cluster_ip': '10.55.245.113',\n", + " 'resource_version': '782395',\n", + " 'self_link': '/api/v1/namespaces/kubeflow-sarahmaddox/services/mnist-tensorboard',\n", + " 'uid': 'e1d7b041-5846-11ea-9ddf-42010a80013f'},\n", + " 'spec': {'cluster_ip': '10.35.253.170',\n", " 'external_i_ps': None,\n", " 'external_name': None,\n", " 'external_traffic_policy': None,\n", @@ -939,23 +957,23 @@ " 'type': 'ClusterIP'},\n", " 'status': {'load_balancer': {'ingress': None}}}, {'apiVersion': 'networking.istio.io/v1alpha3',\n", " 'kind': 'VirtualService',\n", - " 'metadata': {'creationTimestamp': '2020-02-12T21:30:38Z',\n", + " 'metadata': {'creationTimestamp': '2020-02-26T03:20:04Z',\n", " 'generation': 1,\n", " 'name': 'mnist-tensorboard',\n", - " 'namespace': 'kubeflow-jlewi',\n", - " 'resourceVersion': '1731612',\n", - " 'selfLink': '/apis/networking.istio.io/v1alpha3/namespaces/kubeflow-jlewi/virtualservices/mnist-tensorboard',\n", - " 'uid': 'e99c4909-4dde-11ea-9830-42010a8e016f'},\n", + " 'namespace': 'kubeflow-sarahmaddox',\n", + " 'resourceVersion': '782396',\n", + " 'selfLink': '/apis/networking.istio.io/v1alpha3/namespaces/kubeflow-sarahmaddox/virtualservices/mnist-tensorboard',\n", + " 'uid': 'e1daadfe-5846-11ea-9ddf-42010a80013f'},\n", " 'spec': {'gateways': ['kubeflow/kubeflow-gateway'],\n", " 'hosts': ['*'],\n", - " 'http': [{'match': [{'uri': {'prefix': '/mnist/kubeflow-jlewi/tensorboard/'}}],\n", + " 'http': [{'match': [{'uri': {'prefix': '/mnist/kubeflow-sarahmaddox/tensorboard/'}}],\n", " 'rewrite': {'uri': '/'},\n", - " 'route': [{'destination': {'host': 'mnist-tensorboard.kubeflow-jlewi.svc.cluster.local',\n", + " 'route': [{'destination': {'host': 'mnist-tensorboard.kubeflow-sarahmaddox.svc.cluster.local',\n", " 'port': {'number': 80}}}],\n", " 'timeout': '300s'}]}}]" ] }, - "execution_count": 25, + "execution_count": 15, "metadata": {}, "output_type": "execute_result" } @@ -968,18 +986,51 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Access The TensorBoard UI" + "## Set a variable defining your endpoint\n", + "\n", + "Set `endpoint` to `https://your-domain` (with no slash at the end). Your domain typically has the following pattern: `.endpoints..cloud.goog`. You can see your domain in the URL that you're using to access this notebook." ] }, { "cell_type": "code", - "execution_count": 26, + "execution_count": 36, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "endpoint set to https://sarahmaddox-kfw-v100rc4.endpoints.kubeflow-writers.cloud.goog\n" + ] + } + ], + "source": [ + "endpoint = None\n", + "\n", + "if endpoint:\n", + " logging.info(f\"endpoint set to {endpoint}\")\n", + "else:\n", + " logging.info(\"Warning: You must set {endpoint} in order to print out the URLs where you can access your web apps.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Access the TensorBoard UI\n", + "\n", + "Run the cell below to find the endpoint for the TensorBoard UI." + ] + }, + { + "cell_type": "code", + "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/html": [ - "TensorBoard UI is at https://kf-v1-0210.endpoints.jlewi-dev.cloud.goog/mnist/kubeflow-jlewi/tensorboard/" + "TensorBoard UI is at https://sarahmaddox-kfw-v100rc4.endpoints.kubeflow-writers.cloud.goog/mnist/kubeflow-sarahmaddox/tensorboard/" ], "text/plain": [ "" @@ -990,7 +1041,6 @@ } ], "source": [ - "endpoint = k8s_util.get_iap_endpoint() \n", "if endpoint: \n", " vs = yaml.safe_load(tb_virtual_service)\n", " path= vs[\"spec\"][\"http\"][0][\"match\"][0][\"uri\"][\"prefix\"]\n", @@ -1002,21 +1052,29 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Wait For the Training Job to finish" + "## Wait for the training job to finish" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "* You can use the TFJob client to wait for it to finish." + "You can use the TFJob client to wait for the job to finish:" ] }, { "cell_type": "code", - "execution_count": 27, + "execution_count": 18, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "TFJob kubeflow-sarahmaddox.mnist-train-289e succeeded\n" + ] + } + ], "source": [ "tf_job = tf_job_client.wait_for_condition(train_name, expected_condition=[\"Succeeded\", \"Failed\"], namespace=namespace)\n", "\n", @@ -1037,24 +1095,25 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "* Deploy the model using tensorflow serving\n", - "* We need to create\n", - " 1. A Kubernetes Deployment\n", - " 1. A Kubernetes service\n", - " 1. (Optional) Create a configmap containing the prometheus monitoring config" + "Now you can deploy the model using [TensorFlow Serving](https://www.kubeflow.org/docs/components/serving/tfserving_new/).\n", + "\n", + "You need to create the following:\n", + "* A Kubernetes deployment.\n", + "* A Kubernetes service.\n", + "* (Optional) A configmap containing the Prometheus monitoring configuration." ] }, { "cell_type": "code", - "execution_count": 28, + "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "deploy_name = \"mnist-model\"\n", "model_base_path = export_path\n", "\n", - "# The web ui defaults to mnist-service so if you change it you will\n", - "# need to change it in the UI as well to send predictions to the mode\n", + "# The web UI defaults to mnist-service so if you change the name, you must\n", + "# change it in the UI as well.\n", "model_service = \"mnist-service\"\n", "\n", "deploy_spec = f\"\"\"apiVersion: apps/v1\n", @@ -1162,19 +1221,16 @@ }, { "cell_type": "code", - "execution_count": 29, + "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ - "Deleted Deployment kubeflow-jlewi.mnist-model\n", - "Created Deployment kubeflow-jlewi.mnist-model\n", - "Deleted Service kubeflow-jlewi.mnist-service\n", - "Created Service kubeflow-jlewi.mnist-service\n", - "Deleted ConfigMap kubeflow-jlewi.mnist-model\n", - "Created ConfigMap kubeflow-jlewi.mnist-model\n" + "Created Deployment kubeflow-sarahmaddox.mnist-model\n", + "Created Service kubeflow-sarahmaddox.mnist-service\n", + "Created ConfigMap kubeflow-sarahmaddox.mnist-model\n" ] }, { @@ -1184,7 +1240,7 @@ " 'kind': 'Deployment',\n", " 'metadata': {'annotations': None,\n", " 'cluster_name': None,\n", - " 'creation_timestamp': datetime.datetime(2020, 2, 12, 21, 30, 38, tzinfo=tzlocal()),\n", + " 'creation_timestamp': datetime.datetime(2020, 2, 26, 3, 30, 28, tzinfo=tzlocal()),\n", " 'deletion_grace_period_seconds': None,\n", " 'deletion_timestamp': None,\n", " 'finalizers': None,\n", @@ -1194,11 +1250,11 @@ " 'labels': {'app': 'mnist'},\n", " 'managed_fields': None,\n", " 'name': 'mnist-model',\n", - " 'namespace': 'kubeflow-jlewi',\n", + " 'namespace': 'kubeflow-sarahmaddox',\n", " 'owner_references': None,\n", - " 'resource_version': '1731617',\n", - " 'self_link': '/apis/apps/v1/namespaces/kubeflow-jlewi/deployments/mnist-model',\n", - " 'uid': 'e9add65c-4dde-11ea-9830-42010a8e016f'},\n", + " 'resource_version': '788910',\n", + " 'self_link': '/apis/apps/v1/namespaces/kubeflow-sarahmaddox/deployments/mnist-model',\n", + " 'uid': '5555d458-5848-11ea-9ddf-42010a80013f'},\n", " 'spec': {'min_ready_seconds': None,\n", " 'paused': None,\n", " 'progress_deadline_seconds': 600,\n", @@ -1233,11 +1289,11 @@ " 'containers': [{'args': ['--port=9000',\n", " '--rest_api_port=8500',\n", " '--model_name=mnist',\n", - " '--model_base_path=gs://jlewi-dev-mnist/mnist/export',\n", + " '--model_base_path=gs://kubeflow-writers-mnist/mnist/export',\n", " '--monitoring_config_file=/var/config/monitoring_config.txt'],\n", " 'command': ['/usr/bin/tensorflow_model_server'],\n", " 'env': [{'name': 'modelBasePath',\n", - " 'value': 'gs://jlewi-dev-mnist/mnist/export',\n", + " 'value': 'gs://kubeflow-writers-mnist/mnist/export',\n", " 'value_from': None}],\n", " 'env_from': None,\n", " 'image': 'tensorflow/serving:1.15.0',\n", @@ -1358,7 +1414,7 @@ " 'prometheus.io/port': '8500',\n", " 'prometheus.io/scrape': 'true'},\n", " 'cluster_name': None,\n", - " 'creation_timestamp': datetime.datetime(2020, 2, 12, 21, 30, 38, tzinfo=tzlocal()),\n", + " 'creation_timestamp': datetime.datetime(2020, 2, 26, 3, 30, 28, tzinfo=tzlocal()),\n", " 'deletion_grace_period_seconds': None,\n", " 'deletion_timestamp': None,\n", " 'finalizers': None,\n", @@ -1368,12 +1424,12 @@ " 'labels': {'app': 'mnist-model'},\n", " 'managed_fields': None,\n", " 'name': 'mnist-service',\n", - " 'namespace': 'kubeflow-jlewi',\n", + " 'namespace': 'kubeflow-sarahmaddox',\n", " 'owner_references': None,\n", - " 'resource_version': '1731639',\n", - " 'self_link': '/api/v1/namespaces/kubeflow-jlewi/services/mnist-service',\n", - " 'uid': 'e9dcfd8c-4dde-11ea-9830-42010a8e016f'},\n", - " 'spec': {'cluster_ip': '10.55.250.62',\n", + " 'resource_version': '788913',\n", + " 'self_link': '/api/v1/namespaces/kubeflow-sarahmaddox/services/mnist-service',\n", + " 'uid': '555d8fc0-5848-11ea-9ddf-42010a80013f'},\n", + " 'spec': {'cluster_ip': '10.35.254.103',\n", " 'external_i_ps': None,\n", " 'external_name': None,\n", " 'external_traffic_policy': None,\n", @@ -1404,7 +1460,7 @@ " 'kind': 'ConfigMap',\n", " 'metadata': {'annotations': None,\n", " 'cluster_name': None,\n", - " 'creation_timestamp': datetime.datetime(2020, 2, 12, 21, 30, 39, tzinfo=tzlocal()),\n", + " 'creation_timestamp': datetime.datetime(2020, 2, 26, 3, 30, 28, tzinfo=tzlocal()),\n", " 'deletion_grace_period_seconds': None,\n", " 'deletion_timestamp': None,\n", " 'finalizers': None,\n", @@ -1414,14 +1470,14 @@ " 'labels': None,\n", " 'managed_fields': None,\n", " 'name': 'mnist-model',\n", - " 'namespace': 'kubeflow-jlewi',\n", + " 'namespace': 'kubeflow-sarahmaddox',\n", " 'owner_references': None,\n", - " 'resource_version': '1731646',\n", - " 'self_link': '/api/v1/namespaces/kubeflow-jlewi/configmaps/mnist-model',\n", - " 'uid': 'e9eeb2f4-4dde-11ea-9830-42010a8e016f'}}]" + " 'resource_version': '788914',\n", + " 'self_link': '/api/v1/namespaces/kubeflow-sarahmaddox/configmaps/mnist-model',\n", + " 'uid': '5560bb37-5848-11ea-9ddf-42010a80013f'}}]" ] }, - "execution_count": 29, + "execution_count": 20, "metadata": {}, "output_type": "execute_result" } @@ -1434,15 +1490,16 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Deploy the mnist UI\n", + "## Deploy the UI for the MNIST web app\n", + "\n", + "Deploy the UI to visualize the MNIST prediction results.\n", "\n", - "* We will now deploy the UI to visual the mnist results\n", - "* Note: This is using a prebuilt and public docker image for the UI" + "This example uses a prebuilt and public Docker image for the UI." ] }, { "cell_type": "code", - "execution_count": 30, + "execution_count": 21, "metadata": {}, "outputs": [], "source": [ @@ -1515,18 +1572,15 @@ }, { "cell_type": "code", - "execution_count": 31, + "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ - "Deleted Deployment kubeflow-jlewi.mnist-ui\n", - "Created Deployment kubeflow-jlewi.mnist-ui\n", - "Deleted Service kubeflow-jlewi.mnist-ui\n", - "Created Service kubeflow-jlewi.mnist-ui\n", - "Deleted VirtualService kubeflow-jlewi.mnist-ui\n", + "Created Deployment kubeflow-sarahmaddox.mnist-ui\n", + "Created Service kubeflow-sarahmaddox.mnist-ui\n", "Created VirtualService mnist-ui.mnist-ui\n" ] }, @@ -1537,7 +1591,7 @@ " 'kind': 'Deployment',\n", " 'metadata': {'annotations': None,\n", " 'cluster_name': None,\n", - " 'creation_timestamp': datetime.datetime(2020, 2, 12, 21, 30, 39, tzinfo=tzlocal()),\n", + " 'creation_timestamp': datetime.datetime(2020, 2, 26, 3, 32, 29, tzinfo=tzlocal()),\n", " 'deletion_grace_period_seconds': None,\n", " 'deletion_timestamp': None,\n", " 'finalizers': None,\n", @@ -1547,11 +1601,11 @@ " 'labels': None,\n", " 'managed_fields': None,\n", " 'name': 'mnist-ui',\n", - " 'namespace': 'kubeflow-jlewi',\n", + " 'namespace': 'kubeflow-sarahmaddox',\n", " 'owner_references': None,\n", - " 'resource_version': '1731648',\n", - " 'self_link': '/apis/apps/v1/namespaces/kubeflow-jlewi/deployments/mnist-ui',\n", - " 'uid': 'e9f77ba8-4dde-11ea-9830-42010a8e016f'},\n", + " 'resource_version': '790203',\n", + " 'self_link': '/apis/apps/v1/namespaces/kubeflow-sarahmaddox/deployments/mnist-ui',\n", + " 'uid': '9d846bf6-5848-11ea-9ddf-42010a80013f'},\n", " 'spec': {'min_ready_seconds': None,\n", " 'paused': None,\n", " 'progress_deadline_seconds': 600,\n", @@ -1651,7 +1705,7 @@ " 'kind': 'Service',\n", " 'metadata': {'annotations': None,\n", " 'cluster_name': None,\n", - " 'creation_timestamp': datetime.datetime(2020, 2, 12, 21, 30, 39, tzinfo=tzlocal()),\n", + " 'creation_timestamp': datetime.datetime(2020, 2, 26, 3, 32, 29, tzinfo=tzlocal()),\n", " 'deletion_grace_period_seconds': None,\n", " 'deletion_timestamp': None,\n", " 'finalizers': None,\n", @@ -1661,12 +1715,12 @@ " 'labels': None,\n", " 'managed_fields': None,\n", " 'name': 'mnist-ui',\n", - " 'namespace': 'kubeflow-jlewi',\n", + " 'namespace': 'kubeflow-sarahmaddox',\n", " 'owner_references': None,\n", - " 'resource_version': '1731664',\n", - " 'self_link': '/api/v1/namespaces/kubeflow-jlewi/services/mnist-ui',\n", - " 'uid': 'ea12ef25-4dde-11ea-9830-42010a8e016f'},\n", - " 'spec': {'cluster_ip': '10.55.250.134',\n", + " 'resource_version': '790209',\n", + " 'self_link': '/api/v1/namespaces/kubeflow-sarahmaddox/services/mnist-ui',\n", + " 'uid': '9d8a67e4-5848-11ea-9ddf-42010a80013f'},\n", + " 'spec': {'cluster_ip': '10.35.244.4',\n", " 'external_i_ps': None,\n", " 'external_name': None,\n", " 'external_traffic_policy': None,\n", @@ -1685,23 +1739,23 @@ " 'type': 'ClusterIP'},\n", " 'status': {'load_balancer': {'ingress': None}}}, {'apiVersion': 'networking.istio.io/v1alpha3',\n", " 'kind': 'VirtualService',\n", - " 'metadata': {'creationTimestamp': '2020-02-12T21:30:39Z',\n", + " 'metadata': {'creationTimestamp': '2020-02-26T03:32:29Z',\n", " 'generation': 1,\n", " 'name': 'mnist-ui',\n", - " 'namespace': 'kubeflow-jlewi',\n", - " 'resourceVersion': '1731676',\n", - " 'selfLink': '/apis/networking.istio.io/v1alpha3/namespaces/kubeflow-jlewi/virtualservices/mnist-ui',\n", - " 'uid': 'ea2ac046-4dde-11ea-9830-42010a8e016f'},\n", + " 'namespace': 'kubeflow-sarahmaddox',\n", + " 'resourceVersion': '790211',\n", + " 'selfLink': '/apis/networking.istio.io/v1alpha3/namespaces/kubeflow-sarahmaddox/virtualservices/mnist-ui',\n", + " 'uid': '9d921512-5848-11ea-9ddf-42010a80013f'},\n", " 'spec': {'gateways': ['kubeflow/kubeflow-gateway'],\n", " 'hosts': ['*'],\n", - " 'http': [{'match': [{'uri': {'prefix': '/mnist/kubeflow-jlewi/ui/'}}],\n", + " 'http': [{'match': [{'uri': {'prefix': '/mnist/kubeflow-sarahmaddox/ui/'}}],\n", " 'rewrite': {'uri': '/'},\n", - " 'route': [{'destination': {'host': 'mnist-ui.kubeflow-jlewi.svc.cluster.local',\n", + " 'route': [{'destination': {'host': 'mnist-ui.kubeflow-sarahmaddox.svc.cluster.local',\n", " 'port': {'number': 80}}}],\n", " 'timeout': '300s'}]}}]" ] }, - "execution_count": 31, + "execution_count": 22, "metadata": {}, "output_type": "execute_result" } @@ -1715,39 +1769,26 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Access the web UI\n", + "## Access the MNIST web UI\n", "\n", - "* A reverse proxy route is automatically added to the Kubeflow IAP endpoint\n", - "* The endpoint will be\n", + "A reverse proxy route is automatically added to the Kubeflow IAP endpoint. The MNIST endpoint is:\n", "\n", " ```\n", - " http:/${KUBEflOW_ENDPOINT}/mnist/${NAMESPACE}/ui/ \n", - " ```kubeflow-jlewi\n", - "* You can get the KUBEFLOW_ENDPOINT\n", - "\n", - " ```\n", - " KUBEfLOW_ENDPOINT=`kubectl -n istio-system get ingress envoy-ingress -o jsonpath=\"{.spec.rules[0].host}\"`\n", + " https:/${KUBEFlOW_ENDPOINT}/mnist/${NAMESPACE}/ui/ \n", " ```\n", " \n", - " * You must run this command with sufficient RBAC permissions to get the ingress.\n", - " \n", - "* If you have sufficient privileges you can run the cell below to get the endpoint if you don't have sufficient priveleges you can \n", - " grant appropriate permissions by running the command\n", - " \n", - " ```\n", - " kubectl create --namespace=istio-system rolebinding --clusterrole=kubeflow-view --serviceaccount=${NAMESPACE}:default-editor ${NAMESPACE}-istio-view\n", - " ```" + "where `NAMESPACE` is the namespace where you're running the Jupyter notebook." ] }, { "cell_type": "code", - "execution_count": 32, + "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/html": [ - "mnist UI is at https://kf-v1-0210.endpoints.jlewi-dev.cloud.goog/mnist/kubeflow-jlewi/ui/" + "mnist UI is at https://sarahmaddox-kfw-v100rc4.endpoints.kubeflow-writers.cloud.goog/mnist/kubeflow-sarahmaddox/ui/" ], "text/plain": [ "" @@ -1758,13 +1799,37 @@ } ], "source": [ - "endpoint = k8s_util.get_iap_endpoint() \n", "if endpoint: \n", " vs = yaml.safe_load(ui_virtual_service)\n", " path= vs[\"spec\"][\"http\"][0][\"match\"][0][\"uri\"][\"prefix\"]\n", " ui_endpoint = endpoint + path\n", " display(HTML(f\"mnist UI is at {ui_endpoint}\"))\n" ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Open the MNIST UI in your browser. You should see an image of a hand-written digit from 0 to 9. This is a random image sent to the model for classification. Below the image is a set of bar graphs, one for each classification label from 0 to 9, as output by the model. Each bar represents the probability that the image matches the respective label. \n", + "\n", + "Click the **test random image** button to send the model a new image." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Next steps\n", + "\n", + "Visit the [Kubeflow docs](https://www.kubeflow.org/docs/gke/) for more information about running Kubeflow on GCP." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] } ], "metadata": {