Skip to content

Commit

Permalink
added named entity recognition example (kubeflow#590)
Browse files Browse the repository at this point in the history
* added named entity recognition example

kubeflow/website#853

* added previous and next steps

* changed all absolute links to relative links

* changed headline for better understanding

* moved dataset description section to top

* fixed style

* added missing Jupyter notebook

* changed headline

* added link to documentation

* fixed meaning of images and components

* adapted documentation to https://www.kubeflow.org/docs/about/style-guide/#address-the-audience-directly

* added link to ai platform models

* make it clear these are optional extensions

* changed summary and goals

* added kubeflow version

* fixed s/an/a/ also checked the rest of the documentation

* added #!/bin/sh

* added environment variables for build scripts and adapted documentation

* changed PROJECT TO PROJECT_ID

* added link to kaggle dataset and removed not required copy script (due to direct public location in gs://). Adapted Jupyter notebook input data path

* added hint to make clear no further steps are required

* fixed s/Run/RUN/

* grammar fix

* optimized text

* added prev link to index

* removed model description due to lack of information

* added significance and congrats =)

* added example

* guided the user's attention to specific screens/metrics/graphs

* explenation of pieces

* updated main readme

* updated parts

* fixed typo

* adapted dataset path

* made scripts executable

chmod +x

* Update step-1-setup.md

swaped sections and added env variables to gsutil comand

* added information regarding public access

* added named entity recognition example

kubeflow/website#853

* added previous and next steps

* changed all absolute links to relative links

* changed headline for better understanding

* moved dataset description section to top

* fixed style

* added missing Jupyter notebook

* changed headline

* added link to documentation

* fixed meaning of images and components

* adapted documentation to https://www.kubeflow.org/docs/about/style-guide/#address-the-audience-directly

* added link to ai platform models

* make it clear these are optional extensions

* changed summary and goals

* added kubeflow version

* fixed s/an/a/ also checked the rest of the documentation

* added #!/bin/sh

* added environment variables for build scripts and adapted documentation

* changed PROJECT TO PROJECT_ID

* added link to kaggle dataset and removed not required copy script (due to direct public location in gs://). Adapted Jupyter notebook input data path

* added hint to make clear no further steps are required

* fixed s/Run/RUN/

* grammar fix

* optimized text

* added prev link to index

* removed model description due to lack of information

* added significance and congrats =)

* added example

* guided the user's attention to specific screens/metrics/graphs

* explenation of pieces

* updated main readme

* updated parts

* fixed typo

* adapted dataset path

* made scripts executable

chmod +x

* Update step-1-setup.md

swaped sections and added env variables to gsutil comand

* added information regarding public access

* fixed lint error

* fixed lint issues

* fixed lint issues

* figured kubeflow examples are using 2 rather then 4 spaces (due to tensorflow standards)

* lint fixes

* reverted changes

* removed unused import

* removed object inherit

* fixed lint issues

* added kwargs to ignored-argument-name (due to best practice in Google custom prediction routine)

* fix lint issues

* set pylintrc back to default and removed unused argument
  • Loading branch information
SaschaHeyer authored and k8s-ci-robot committed Sep 18, 2019
1 parent 78a79e7 commit 1ff3cf5
Show file tree
Hide file tree
Showing 41 changed files with 1,458 additions and 0 deletions.
11 changes: 11 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,17 @@ This repository is home to the following types of examples and demos:

## End-to-end

### [Named Entity Recognition](./named_entity_recognition)
Author: [Sascha Heyer](https://github.com/saschaheyer)

This example covers the following concepts:
1. Build reusable pipeline components
2. Run Kubeflow Pipelines with Jupyter notebooks
1. Train a Named Entity Recognition model on a Kubernetes cluster
1. Deploy a Keras model to AI Platform
1. Use Kubeflow metrics
1. Use Kubeflow visualizations

### [GitHub issue summarization](./github_issue_summarization)
Author: [Hamel Husain](https://github.com/hamelsmu)

Expand Down
108 changes: 108 additions & 0 deletions named_entity_recognition/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# pyenv
.python-version

# celery beat schedule file
celerybeat-schedule

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/

# custom
custom_prediction_routine.egg-info
custom_prediction_routine*
33 changes: 33 additions & 0 deletions named_entity_recognition/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Named Entity Recognition with Kubeflow and Keras

In this walkthrough, you will learn how to use Kubeflow to build reusable components to train your model on an kubernetes cluster and deploy it to AI platform.

## Goals

* Demonstrate how to build reusable pipeline components
* Demonstrate how to use Keras only models
* Demonstrate how to train a Named Entity Recognition model on a Kubernetes cluster
* Demonstrate how to deploy a Keras model to AI Platform
* Demonstrate how to use a custom prediction routine
* Demonstrate how to use Kubeflow metrics
* Demonstrate how to use Kubeflow visualizations

## What is Named Entity Recognition
Named Entity Recognition is a word classification problem, which extract data called entities from text.

![solution](documentation/files/solution.png)

### Steps

1. [Setup Kubeflow and clone repository](documentation/step-1-setup.md)
1. [Build the pipeline components](documentation/step-2-build-components.md)
1. [Upload the dataset](documentation/step-3-upload-dataset.md)
1. [Custom prediction routine](documentation/step-4-custom-prediction-routine.md)
1. [Run the pipeline](documentation/step-5-run-pipeline.md)
1. [Monitor the training](documentation/step-6-monitor-training.md)
1. [Predict](documentation/step-7-predictions.md)





10 changes: 10 additions & 0 deletions named_entity_recognition/components/build_components.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/bin/sh

echo "\nBuild and push preprocess component"
./preprocess/build_image.sh

echo "\nBuild and push train component"
./train/build_image.sh

echo "\nBuild and push deploy component"
./deploy/build_image.sh
13 changes: 13 additions & 0 deletions named_entity_recognition/components/copy_specification.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#!/bin/sh

BUCKET="your-bucket-name"

echo "\nCopy component specifications to Google Cloud Storage"
gsutil cp preprocess/component.yaml gs://${BUCKET}/components/preprocess/component.yaml
gsutil acl ch -u AllUsers:R gs://${BUCKET}/components/preprocess/component.yaml

gsutil cp train/component.yaml gs://${BUCKET}/components/train/component.yaml
gsutil acl ch -u AllUsers:R gs://${BUCKET}/components/train/component.yaml

gsutil cp deploy/component.yaml gs://${BUCKET}/components/deploy/component.yaml
gsutil acl ch -u AllUsers:R gs://${BUCKET}/components/deploy/component.yaml
4 changes: 4 additions & 0 deletions named_entity_recognition/components/deploy/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
FROM google/cloud-sdk:latest
ADD ./src /pipelines/component/src
RUN chmod 755 /pipelines/component/src/deploy.sh
ENTRYPOINT ["/pipelines/component/src/deploy.sh"]
11 changes: 11 additions & 0 deletions named_entity_recognition/components/deploy/build_image.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#!/bin/sh

image_name=gcr.io/$PROJECT_ID/kubeflow/ner/deploy
image_tag=latest

full_image_name=${image_name}:${image_tag}

cd "$(dirname "$0")"

docker build -t "${full_image_name}" .
docker push "$full_image_name"
44 changes: 44 additions & 0 deletions named_entity_recognition/components/deploy/component.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
name: deploy
description: Deploy the model with custom prediction route
inputs:
- name: Model path
type: GCSPath
description: 'Path of GCS directory containing exported Tensorflow model.'
- name: Model name
type: String
description: 'The name specified for the model when it was or get created'
- name: Model region
type: String
description: 'The region where the model is going to be deployed'
- name: Model version
type: String
description: 'The version of the model'
- name: Model runtime version
type: String
description: 'The runtime version of the model'
- name: Model prediction class
type: String
description: 'The runtime version of the model'
- name: Model python version
type: String
description: 'The python version of the model'
- name: Model package uris
type: String
description: 'The packge uri of the model'
outputs:
implementation:
container:
image: gcr.io/<PROJECT-ID>/kubeflow/ner/deploy:latest
command: [
sh, /pipelines/component/src/deploy.sh
]
args: [
--model-path, {inputValue: Model path},
--model-name, {inputValue: Model name},
--model-region, {inputValue: Model region},
--model-version, {inputValue: Model version},
--model-runtime-version, {inputValue: Model runtime version},
--model-prediction-class, {inputValue: Model prediction class},
--model-python-version, {inputValue: Model python version},
--model-package-uris, {inputValue: Model package uris},
]
88 changes: 88 additions & 0 deletions named_entity_recognition/components/deploy/src/deploy.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# loop through all parameters
while [ "$1" != "" ]; do
case $1 in
"--model-path")
shift
MODEL_PATH="$1"
echo
shift
;;
"--model-name")
shift
MODEL_NAME="$1"
echo
shift
;;
"--model-region")
shift
MODEL_REGION="$1"
echo
shift
;;
"--model-version")
shift
MODEL_VERSION="$1"
echo
shift
;;
"--model-runtime-version")
shift
RUNTIME_VERSION="$1"
echo
shift
;;
"--model-prediction-class")
shift
MODEL_PREDICTION_CLASS="$1"
echo
shift
;;
"--model-python-version")
shift
MODEL_PYTHON_VERSION="$1"
echo
shift
;;
"--model-package-uris")
shift
MODEL_PACKAGE_URIS="$1"
echo
shift
;;
*)
esac
done

# echo inputs
echo MODEL_PATH = "${MODEL_PATH}"
echo MODEL = "${MODEL_EXPORT_PATH}"
echo MODEL_NAME = "${MODEL_NAME}"
echo MODEL_REGION = "${MODEL_REGION}"
echo MODEL_VERSION = "${MODEL_VERSION}"
echo RUNTIME_VERSION = "${RUNTIME_VERSION}"
echo MODEL_PREDICTION_CLASS = "${MODEL_PREDICTION_CLASS}"
echo MODEL_PYTHON_VERSION = "${MODEL_PYTHON_VERSION}"
echo MODEL_PACKAGE_URIS = "${MODEL_PACKAGE_URIS}"


# create model
modelname=$(gcloud ai-platform models list | grep -w "$MODEL_NAME")
echo "$modelname"
if [ -z "$modelname" ]; then
echo "Creating model $MODEL_NAME in region $REGION"

gcloud ai-platform models create ${MODEL_NAME} \
--regions ${MODEL_REGION}
else
echo "Model $MODEL_NAME already exists"
fi

# create version with custom prediction routine (beta)
echo "Creating version $MODEL_VERSION from $MODEL_PATH"
gcloud beta ai-platform versions create ${MODEL_VERSION} \
--model ${MODEL_NAME} \
--origin ${MODEL_PATH} \
--python-version ${MODEL_PYTHON_VERSION} \
--runtime-version ${RUNTIME_VERSION} \
--package-uris ${MODEL_PACKAGE_URIS} \
--prediction-class ${MODEL_PREDICTION_CLASS}
4 changes: 4 additions & 0 deletions named_entity_recognition/components/preprocess/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
ARG BASE_IMAGE_TAG=1.12.0-py3
FROM tensorflow/tensorflow:$BASE_IMAGE_TAG
RUN python3 -m pip install keras
COPY ./src /pipelines/component/src
12 changes: 12 additions & 0 deletions named_entity_recognition/components/preprocess/build_image.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
#!/bin/sh

image_name=gcr.io/$PROJECT_ID/kubeflow/ner/preprocess
image_tag=latest

full_image_name=${image_name}:${image_tag}
base_image_tag=1.12.0-py3

cd "$(dirname "$0")"

docker build --build-arg BASE_IMAGE_TAG=${base_image_tag} -t "${full_image_name}" .
docker push "$full_image_name"
34 changes: 34 additions & 0 deletions named_entity_recognition/components/preprocess/component.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
name: preprocess
description: Performs the IOB preprocessing.
inputs:
- {name: Input 1 URI, type: GCSPath}
- {name: Output x URI template, type: GCSPath}
- {name: Output y URI template, type: GCSPath}
- {name: Output preprocessing state URI template, type: GCSPath}
outputs:
- name: Output x URI
type: GCSPath
- name: Output y URI
type: String
- name: Output tags
type: String
- name: Output words
type: String
- name: Output preprocessing state URI
type: String
implementation:
container:
image: gcr.io/<PROJECT-ID>/kubeflow/ner/preprocess:latest
command: [
python3, /pipelines/component/src/component.py,
--input1-path, {inputValue: Input 1 URI},
--output-y-path, {inputValue: Output y URI template},
--output-x-path, {inputValue: Output x URI template},
--output-preprocessing-state-path, {inputValue: Output preprocessing state URI template},

--output-y-path-file, {outputPath: Output y URI},
--output-x-path-file, {outputPath: Output x URI},
--output-preprocessing-state-path-file, {outputPath: Output preprocessing state URI},
--output-tags, {outputPath: Output tags},
--output-words, {outputPath: Output words},
]
Loading

0 comments on commit 1ff3cf5

Please sign in to comment.