added named entity recognition example (kubeflow#590)

* added named entity recognition example kubeflow/website#853 * added previous and next steps * changed all absolute links to relative links * changed headline for better understanding * moved dataset description section to top * fixed style * added missing Jupyter notebook * changed headline * added link to documentation * fixed meaning of images and components * adapted documentation to https://www.kubeflow.org/docs/about/style-guide/#address-the-audience-directly * added link to ai platform models * make it clear these are optional extensions * changed summary and goals * added kubeflow version * fixed s/an/a/ also checked the rest of the documentation * added #!/bin/sh * added environment variables for build scripts and adapted documentation * changed PROJECT TO PROJECT_ID * added link to kaggle dataset and removed not required copy script (due to direct public location in gs://). Adapted Jupyter notebook input data path * added hint to make clear no further steps are required * fixed s/Run/RUN/ * grammar fix * optimized text * added prev link to index * removed model description due to lack of information * added significance and congrats =) * added example * guided the user's attention to specific screens/metrics/graphs * explenation of pieces * updated main readme * updated parts * fixed typo * adapted dataset path * made scripts executable chmod +x * Update step-1-setup.md swaped sections and added env variables to gsutil comand * added information regarding public access * added named entity recognition example kubeflow/website#853 * added previous and next steps * changed all absolute links to relative links * changed headline for better understanding * moved dataset description section to top * fixed style * added missing Jupyter notebook * changed headline * added link to documentation * fixed meaning of images and components * adapted documentation to https://www.kubeflow.org/docs/about/style-guide/#address-the-audience-directly * added link to ai platform models * make it clear these are optional extensions * changed summary and goals * added kubeflow version * fixed s/an/a/ also checked the rest of the documentation * added #!/bin/sh * added environment variables for build scripts and adapted documentation * changed PROJECT TO PROJECT_ID * added link to kaggle dataset and removed not required copy script (due to direct public location in gs://). Adapted Jupyter notebook input data path * added hint to make clear no further steps are required * fixed s/Run/RUN/ * grammar fix * optimized text * added prev link to index * removed model description due to lack of information * added significance and congrats =) * added example * guided the user's attention to specific screens/metrics/graphs * explenation of pieces * updated main readme * updated parts * fixed typo * adapted dataset path * made scripts executable chmod +x * Update step-1-setup.md swaped sections and added env variables to gsutil comand * added information regarding public access * fixed lint error * fixed lint issues * fixed lint issues * figured kubeflow examples are using 2 rather then 4 spaces (due to tensorflow standards) * lint fixes * reverted changes * removed unused import * removed object inherit * fixed lint issues * added kwargs to ignored-argument-name (due to best practice in Google custom prediction routine) * fix lint issues * set pylintrc back to default and removed unused argument
AmandeepSinghCS · Sep 18, 2019 · 1ff3cf5 · 1ff3cf5
1 parent 78a79e7
commit 1ff3cf5
Show file tree

Hide file tree

Showing 41 changed files with 1,458 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -11,6 +11,17 @@ This repository is home to the following types of examples and demos:
 
 ## End-to-end
 
+### [Named Entity Recognition](./named_entity_recognition)
+Author: [Sascha Heyer](https://github.com/saschaheyer)
+
+This example covers the following concepts:
+1. Build reusable pipeline components
+2. Run Kubeflow Pipelines with Jupyter notebooks
+1. Train a Named Entity Recognition model on a Kubernetes cluster
+1. Deploy a Keras model to AI Platform
+1. Use Kubeflow metrics
+1. Use Kubeflow visualizations 
+
 ### [GitHub issue summarization](./github_issue_summarization)
 Author: [Hamel Husain](https://github.com/hamelsmu)
 

diff --git a/named_entity_recognition/.gitignore b/named_entity_recognition/.gitignore
@@ -0,0 +1,108 @@
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+.hypothesis/
+.pytest_cache/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# pyenv
+.python-version
+
+# celery beat schedule file
+celerybeat-schedule
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+
+# custom
+custom_prediction_routine.egg-info
+custom_prediction_routine*
diff --git a/named_entity_recognition/README.md b/named_entity_recognition/README.md
@@ -0,0 +1,33 @@
+# Named Entity Recognition with Kubeflow and Keras 
+
+In this walkthrough, you will learn how to use Kubeflow to build reusable components to train your model on an kubernetes cluster and deploy it to AI platform.
+
+## Goals
+
+* Demonstrate how to build reusable pipeline components
+* Demonstrate how to use Keras only models
+* Demonstrate how to train a Named Entity Recognition model on a Kubernetes cluster
+* Demonstrate how to deploy a Keras model to AI Platform
+* Demonstrate how to use a custom prediction routine
+* Demonstrate how to use Kubeflow metrics
+* Demonstrate how to use Kubeflow visualizations 
+
+## What is Named Entity Recognition
+Named Entity Recognition is a word classification problem, which extract data called entities from text.
+
+![solution](documentation/files/solution.png)
+
+### Steps
+
+1. [Setup Kubeflow and clone repository](documentation/step-1-setup.md)
+1. [Build the pipeline components](documentation/step-2-build-components.md)
+1. [Upload the dataset](documentation/step-3-upload-dataset.md)
+1. [Custom prediction routine](documentation/step-4-custom-prediction-routine.md)
+1. [Run the pipeline](documentation/step-5-run-pipeline.md)
+1. [Monitor the training](documentation/step-6-monitor-training.md)
+1. [Predict](documentation/step-7-predictions.md)
+
+
+
+
+
diff --git a/named_entity_recognition/components/build_components.sh b/named_entity_recognition/components/build_components.sh
@@ -0,0 +1,10 @@
+#!/bin/sh
+
+echo "\nBuild and push preprocess component"
+./preprocess/build_image.sh
+
+echo "\nBuild and push train component"
+./train/build_image.sh
+
+echo "\nBuild and push deploy component"
+./deploy/build_image.sh
diff --git a/named_entity_recognition/components/copy_specification.sh b/named_entity_recognition/components/copy_specification.sh
@@ -0,0 +1,13 @@
+#!/bin/sh
+
+BUCKET="your-bucket-name"
+
+echo "\nCopy component specifications to Google Cloud Storage"
+gsutil cp preprocess/component.yaml gs://${BUCKET}/components/preprocess/component.yaml
+gsutil acl ch -u AllUsers:R gs://${BUCKET}/components/preprocess/component.yaml
+
+gsutil cp train/component.yaml gs://${BUCKET}/components/train/component.yaml
+gsutil acl ch -u AllUsers:R gs://${BUCKET}/components/train/component.yaml
+
+gsutil cp deploy/component.yaml gs://${BUCKET}/components/deploy/component.yaml
+gsutil acl ch -u AllUsers:R gs://${BUCKET}/components/deploy/component.yaml
diff --git a/named_entity_recognition/components/deploy/Dockerfile b/named_entity_recognition/components/deploy/Dockerfile
@@ -0,0 +1,4 @@
+FROM google/cloud-sdk:latest
+ADD ./src /pipelines/component/src
+RUN chmod 755 /pipelines/component/src/deploy.sh
+ENTRYPOINT ["/pipelines/component/src/deploy.sh"]
diff --git a/named_entity_recognition/components/deploy/build_image.sh b/named_entity_recognition/components/deploy/build_image.sh
@@ -0,0 +1,11 @@
+#!/bin/sh
+
+image_name=gcr.io/$PROJECT_ID/kubeflow/ner/deploy
+image_tag=latest
+
+full_image_name=${image_name}:${image_tag}
+
+cd "$(dirname "$0")" 
+
+docker build -t "${full_image_name}" .
+docker push "$full_image_name"
diff --git a/named_entity_recognition/components/deploy/component.yaml b/named_entity_recognition/components/deploy/component.yaml
@@ -0,0 +1,44 @@
+name: deploy
+description: Deploy the model with custom prediction route
+inputs:
+  - name: Model path
+    type: GCSPath
+    description: 'Path of GCS directory containing exported Tensorflow model.'
+  - name: Model name
+    type: String
+    description: 'The name specified for the model when it was or get created'
+  - name: Model region
+    type: String
+    description: 'The region where the model is going to be deployed'
+  - name: Model version
+    type: String
+    description: 'The version of the model'
+  - name: Model runtime version
+    type: String
+    description: 'The runtime version of the model'
+  - name: Model prediction class
+    type: String
+    description: 'The runtime version of the model'
+  - name: Model python version
+    type: String
+    description: 'The python version of the model'
+  - name: Model package uris
+    type: String
+    description: 'The packge uri of the model'
+outputs:
+implementation:
+  container:
+    image: gcr.io/<PROJECT-ID>/kubeflow/ner/deploy:latest
+    command: [
+      sh, /pipelines/component/src/deploy.sh
+    ]
+    args: [
+      --model-path,             {inputValue: Model path},
+      --model-name,             {inputValue: Model name},
+      --model-region,           {inputValue: Model region},
+      --model-version,          {inputValue: Model version},
+      --model-runtime-version,  {inputValue: Model runtime version},
+      --model-prediction-class, {inputValue: Model prediction class},
+      --model-python-version,   {inputValue: Model python version},
+      --model-package-uris,     {inputValue: Model package uris},
+    ]
diff --git a/named_entity_recognition/components/deploy/src/deploy.sh b/named_entity_recognition/components/deploy/src/deploy.sh
@@ -0,0 +1,88 @@
+# loop through all parameters
+while [ "$1" != "" ]; do
+    case $1 in
+      "--model-path")
+        shift
+        MODEL_PATH="$1"
+        echo
+        shift
+        ;;
+        "--model-name")
+        shift
+        MODEL_NAME="$1"
+        echo
+        shift
+        ;;
+        "--model-region")
+        shift
+        MODEL_REGION="$1"
+        echo
+        shift
+        ;;
+        "--model-version")
+        shift
+        MODEL_VERSION="$1"
+        echo
+        shift
+        ;;
+        "--model-runtime-version")
+        shift
+        RUNTIME_VERSION="$1"
+        echo
+        shift
+        ;;
+        "--model-prediction-class")
+        shift
+        MODEL_PREDICTION_CLASS="$1"
+        echo
+        shift
+        ;;
+        "--model-python-version")
+        shift
+        MODEL_PYTHON_VERSION="$1"
+        echo
+        shift
+        ;;
+        "--model-package-uris")
+        shift
+        MODEL_PACKAGE_URIS="$1"
+        echo
+        shift
+        ;;
+        *)
+   esac
+done
+
+# echo inputs
+echo MODEL_PATH               = "${MODEL_PATH}"
+echo MODEL                    = "${MODEL_EXPORT_PATH}"
+echo MODEL_NAME               = "${MODEL_NAME}"
+echo MODEL_REGION             = "${MODEL_REGION}"
+echo MODEL_VERSION            = "${MODEL_VERSION}"
+echo RUNTIME_VERSION          = "${RUNTIME_VERSION}"
+echo MODEL_PREDICTION_CLASS   = "${MODEL_PREDICTION_CLASS}"
+echo MODEL_PYTHON_VERSION     = "${MODEL_PYTHON_VERSION}"
+echo MODEL_PACKAGE_URIS       = "${MODEL_PACKAGE_URIS}"
+
+
+# create model
+modelname=$(gcloud ai-platform models list | grep -w "$MODEL_NAME")
+echo "$modelname"
+if [ -z "$modelname" ]; then
+   echo "Creating model $MODEL_NAME in region $REGION"
+
+   gcloud ai-platform models create ${MODEL_NAME} \
+    --regions ${MODEL_REGION}
+else
+   echo "Model $MODEL_NAME already exists"
+fi
+
+# create version with custom prediction routine (beta)
+echo "Creating version $MODEL_VERSION from $MODEL_PATH"
+gcloud beta ai-platform versions create ${MODEL_VERSION} \
+       --model ${MODEL_NAME} \
+       --origin ${MODEL_PATH} \
+       --python-version ${MODEL_PYTHON_VERSION} \
+       --runtime-version ${RUNTIME_VERSION} \
+       --package-uris ${MODEL_PACKAGE_URIS} \
+       --prediction-class ${MODEL_PREDICTION_CLASS}
diff --git a/named_entity_recognition/components/preprocess/Dockerfile b/named_entity_recognition/components/preprocess/Dockerfile
@@ -0,0 +1,4 @@
+ARG BASE_IMAGE_TAG=1.12.0-py3
+FROM tensorflow/tensorflow:$BASE_IMAGE_TAG
+RUN python3 -m pip install keras
+COPY ./src /pipelines/component/src
diff --git a/named_entity_recognition/components/preprocess/build_image.sh b/named_entity_recognition/components/preprocess/build_image.sh
@@ -0,0 +1,12 @@
+#!/bin/sh
+
+image_name=gcr.io/$PROJECT_ID/kubeflow/ner/preprocess
+image_tag=latest
+
+full_image_name=${image_name}:${image_tag}
+base_image_tag=1.12.0-py3
+
+cd "$(dirname "$0")" 
+
+docker build --build-arg BASE_IMAGE_TAG=${base_image_tag} -t "${full_image_name}" .
+docker push "$full_image_name"
diff --git a/named_entity_recognition/components/preprocess/component.yaml b/named_entity_recognition/components/preprocess/component.yaml
@@ -0,0 +1,34 @@
+name: preprocess
+description: Performs the IOB preprocessing.
+inputs:
+- {name: Input 1 URI, type: GCSPath}
+- {name: Output x URI template, type: GCSPath}
+- {name: Output y URI template, type: GCSPath}
+- {name: Output preprocessing state URI template, type: GCSPath}
+outputs:
+  - name: Output x URI
+    type: GCSPath
+  - name: Output y URI
+    type: String
+  - name: Output tags
+    type: String
+  - name: Output words
+    type: String
+  - name: Output preprocessing state URI
+    type: String
+implementation:
+  container:
+    image: gcr.io/<PROJECT-ID>/kubeflow/ner/preprocess:latest
+    command: [
+      python3, /pipelines/component/src/component.py,
+      --input1-path,                        {inputValue: Input 1 URI},
+      --output-y-path,                      {inputValue: Output y URI template},
+      --output-x-path,                      {inputValue: Output x URI template},
+      --output-preprocessing-state-path,    {inputValue: Output preprocessing state URI template},
+
+      --output-y-path-file,                     {outputPath: Output y URI},
+      --output-x-path-file,                     {outputPath: Output x URI},
+      --output-preprocessing-state-path-file,   {outputPath: Output preprocessing state URI},
+      --output-tags,                            {outputPath: Output tags},
+      --output-words,                           {outputPath: Output words},
+    ]