GSoC '20: Samples for Notebook to Kubeflow Deployment using TensorFlo…

…w 2.0 Keras (kubeflow#816) * Add text classification example in TF 2.0 * Add neural machine translation example in TF 2.0 * Add README for tensorflow_2 directory * Update README.md Update README to make it more verbose and formal. * Update directory name
AmandeepSinghCS · Aug 27, 2020 · 31a4d5e · 31a4d5e
1 parent d918d6c
commit 31a4d5e
Show file tree

Hide file tree

Showing 30 changed files with 133,191 additions and 2 deletions.
diff --git a/tensorflow_cuj/README.md b/tensorflow_cuj/README.md
@@ -1,3 +1,36 @@
-# Kubeflow Customer User Journey with TensorFlow
+# Samples for Notebook to Kubeflow Deployment using TensorFlow 2.0 Keras
+This directory aims at building Kubeflow use case samples with Tensorflow 2.0 Keras training code demonstrating 'Customer User Journey' (CUJ) in the process. The samples which this directory hosts are given below. Both of these samples demonstrate Kubeflow functionalities over NLP tasks. These samples assume you have a Kubeflow instance running. For more information on how to set up Kubeflow, please follow this getting-started [tutorial](https://www.kubeflow.org/docs/started/getting-started/). These samples also assume you have Jupyter notebooks integrated with your Kubeflow instance. For an overview on Jupyter notebooks in Kubeflow and instructions on how to set up Jupyter in Kubeflow, please go through this [documentation](https://www.kubeflow.org/docs/notebooks/)
 
-TBA
+### Text Classification 
+This [Tensorflow tutorial](https://www.tensorflow.org/tutorials/keras/text_classification_with_hub) explains how to classify IMDB movie reviews with Tensorflow. We use code from this tutorial, modify it to suit Kubeflow needs and add Kubeflow code to demonstrate how Kubeflow can leverage containerization and cloud technologies to efficiently manage machine learning workflows that can take advantage of multiple compute nodes. This sample holds the following notebooks and details about each of these notebooks are given adjacent to their names.
+
+1. `text_classification_with_rnn.py` - This is the core training code upon which all subsequent examples showing Kubeflow functionalities are based. Please go through this first to know more about the machine learning task subsequent notebooks will manage.
+
+2. `distributed_text_classification_with_rnn.py` - To truly take advantage of multiple compute nodes, the training code has to be modified to support distributed training. The code in the above file is modified with Tensorflow's [distributed training](https://www.tensorflow.org/guide/distributed_training) strategy and hosted here.
+
+3. `Dockerfile` - This is the dockerfile which is used to build Docker image of the training code. Some Kubeflow functionalities require that a docker image of the training code is built and hosted on a docker container registry. This Docker 101 [tutorial](https://www.docker.com/101-tutorial) is a good starting point to get hands-on training on Docker. For complete starters in the field of containerization, this [introduction](https://opensource.com/resources/what-docker) can serve as a good starting point.
+
+4. `fairing-with-python-sdk.ipynb` - Fairing is a Kubeflow functionality that lets you run model training tasks remotely. This is the Jupyter notebook which deploys a model training task on cloud using Kubeflow Fairing. Fairing does not require you to build a Docker image of the training code first. Hence, its training code resides in the same notebook. To know more about Kubeflow Fairing, please visit Fairing's [official documentation](https://www.kubeflow.org/docs/fairing/fairing-overview/)
+
+5. `katib-with-python-sdk.ipynb` - [Katib](https://www.kubeflow.org/docs/components/hyperparameter-tuning/hyperparameter/) is a Kubeflow functionality that lets you perform hyperparameter tuning experiments and reports best set of hyperparameters based ona provided metric. This is the Jupyter notebook which launches Katib hyperparameter tuning experiments using its [Python SDK](https://github.com/kubeflow/katib/tree/master/sdk/python). Katib requires you to build and host a Docker image of your training code in a container registry. For this sample, we have used [gcloud builds](https://cloud.google.com/cloud-build/docs) to build the required Docker image of the training code along with the training data and hosts it on [gcr.io](gcr.io).
+
+6. `tfjob-with-python-sdk.ipynb` - [TFJobs](https://www.kubeflow.org/docs/components/training/tftraining/) are used to run distributed training jobs over Kubernetes. With multiple workers, TFJob truly leverage the ability of your code to support distributed training. This Jupyter notebook demonstrates how to use TFJob. The Docker image built from the distributed version of our core training code is used in this notebook.
+
+7. `tekton-pipeline-with-python-sdk.ipynb` - [Kubeflow Pipeline](https://www.kubeflow.org/docs/pipelines/overview/pipelines-overview/) is a platform that lets you build, manage and deploy end-to-end machine learning workflows. This is a Jupyter notebook which bundles Katib hyperparameter tuning and TFJob distributed training into one Kubeflow pipeline. The pipeline used here uses [Tekton](https://cloud.google.com/tekton) in its backend. Tekton is a Kubernetes resource to create efficient [continuous integration and delivery](https://opensource.com/article/18/8/what-cicd) (CI/CD) systems.
+
+### Neural Machine Translation 
+This other [Tensorflow tutorial](https://www.tensorflow.org/tutorials/text/nmt_with_attention) explains how to translate Spanish text to English using Tensorflow. As stated in the previous sample, we use code from this tutorial, modify it to suit Kubeflow needs and add Kubeflow code to demonstrate how Kubeflow can leverage containerization and cloud technologies to efficiently manage machine learning workflows that can take advantage of multiple compute nodes. This sample holds the following notebooks and details about each of these notebooks are given adjacent to their names.
+
+1. `nmt_with_attention.py` - This is the core training code upon which all subsequent examples showing Kubeflow functionalities are based. Please go through this first to know more about the machine learning task subsequent notebooks will manage.
+
+2. `distributed_nmt_with_attention.py` - To truly take advantage of multiple compute nodes, the training code has to be modified to support distributed training. The code in the above file is modified with Tensorflow's [distributed training](https://www.tensorflow.org/guide/distributed_training) strategy and hosted here.
+
+3. `Dockerfile` - This is the dockerfile which is used to build Docker image of the training code. Some Kubeflow functionalities require that a docker image of the training code is built and hosted on a docker container registry. This Docker 101 [tutorial](https://www.docker.com/101-tutorial) is a good starting point to get hands-on training on Docker. For complete starters in the field of containerization, this [introduction](https://opensource.com/resources/what-docker) can serve as a good starting point.
+
+4. `fairing-with-python-sdk.ipynb` - JFairing is a Kubeflow functionality that lets you run model training tasks remotely. This is the Jupyter notebook which deploys a model training task on cloud using Kubeflow Fairing. As said above, Fairing does not require you to build an image by yourself. You have to expose a class for your ML model. In this notebook, we have imported the `NeuralMachineTranslation` class defined in `nmt_with_attention.py` and passed this to Fairing for it to build an image on its own. To know more about Kubeflow Fairing, please visit Fairing's [official documentation](https://www.kubeflow.org/docs/fairing/fairing-overview/).
+
+5. `katib-with-python-sdk.ipynb` - [Katib](https://www.kubeflow.org/docs/components/hyperparameter-tuning/hyperparameter/) is a Kubeflow functionality that lets you perform hyperparameter tuning experiments and reports best set of hyperparameters based ona provided metric. This is the Jupyter notebook which launches Katib hyperparameter tuning experiments using its [Python SDK](https://github.com/kubeflow/katib/tree/master/sdk/python). Katib requires you to build and host a Docker image of your training code in a container registry. For this sample, we have used [gcloud builds](https://cloud.google.com/cloud-build/docs) to build the required Docker image of the training code along with the training data and hosts it on [gcr.io](gcr.io). We have used the `Tree-structured Parzen Estimator(TPE)` hyperparameter optimization algorithm in this example.
+
+6. `tfjob-with-python-sdk.ipynb` - [TFJobs](https://www.kubeflow.org/docs/components/training/tftraining/) are used to run distributed training jobs over Kubernetes. With multiple workers, TFJob truly leverage the ability of your code to support distributed training. This Jupyter notebook demonstrates how to use TFJob. The Docker image built from the distributed version of our core training code is used in this notebook.
+
+7. `tekton-pipeline-with-python-sdk.ipynb` - [Kubeflow Pipeline](https://www.kubeflow.org/docs/pipelines/overview/pipelines-overview/) is a platform that lets you build, manage and deploy end-to-end machine learning workflows. This is a Jupyter notebook which bundles Katib hyperparameter tuning and TFJob distributed training into one Kubeflow pipeline. The pipeline used here uses [Tekton](https://cloud.google.com/tekton) in its backend. Tekton is a Kubernetes resource to create efficient [continuous integration and delivery](https://opensource.com/article/18/8/what-cicd) (CI/CD) systems.
diff --git a/tensorflow_cuj/neural_machine_translation/Dockerfile b/tensorflow_cuj/neural_machine_translation/Dockerfile
@@ -0,0 +1,9 @@
+FROM python:3.7
+
+COPY . /app/
+
+WORKDIR '/app/'
+
+RUN pip3 install --no-cache-dir -r /app/requirements.txt
+
+ENTRYPOINT ["python3", "/app/distributed_nmt_with_attention.py"]
diff --git a/tensorflow_cuj/neural_machine_translation/dataset/spa-eng.zip b/tensorflow_cuj/neural_machine_translation/dataset/spa-eng.zip
diff --git a/tensorflow_cuj/neural_machine_translation/dataset/spa-eng/_about.txt b/tensorflow_cuj/neural_machine_translation/dataset/spa-eng/_about.txt
@@ -0,0 +1,60 @@
+** Info **
+
+Check for newest version here:
+  http://www.manythings.org/anki/
+Date of this file:
+  2018-05-13
+
+This data is from the sentences_detailed.csv file from tatoeba.org.
+http://tatoeba.org/files/downloads/sentences_detailed.csv
+
+
+
+** Terms of Use **
+
+See the terms of use.
+These files have been released under the same license as the
+source.
+
+http://tatoeba.org/eng/terms_of_use
+http://creativecommons.org/licenses/by/2.0
+
+Attribution: www.manythings.org/anki and tatoeba.org
+
+
+
+** Warnings ** 
+
+The data from the Tatoeba Project contains errors.
+
+To lower the number of errors you are likely to see, only
+sentences by native speakers and proofread sentences have
+been included.
+
+For the non-English language, I made these (possibly wrong)
+assumptions.
+Assumption 1: Sentences written by native speakers can be
+trusted.
+Assumption 2: Contributors to the Tatoeba Project are honest
+about what their native language is.
+
+For English, I used the sentences that I have proofread
+and thought were OK.
+Of course, I may have missed a few errors.
+
+
+
+** Downloading Anki ** 
+
+See http://ankisrs.net/
+
+
+
+** Importing into Anki ** 
+
+Information is at http://ankisrs.net/docs/manual.html#importing
+
+Of particular interest may be about "duplicates" at http://ankisrs.net/docs/manual.html#duplicates-and-updating.
+You can choose:
+1. not to allow duplicates (alternate translations) as cards.
+2. allow duplicates (alternate translations) as cards.