forked from kubeflow/examples
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
added named entity recognition example (kubeflow#590)
* added named entity recognition example kubeflow/website#853 * added previous and next steps * changed all absolute links to relative links * changed headline for better understanding * moved dataset description section to top * fixed style * added missing Jupyter notebook * changed headline * added link to documentation * fixed meaning of images and components * adapted documentation to https://www.kubeflow.org/docs/about/style-guide/#address-the-audience-directly * added link to ai platform models * make it clear these are optional extensions * changed summary and goals * added kubeflow version * fixed s/an/a/ also checked the rest of the documentation * added #!/bin/sh * added environment variables for build scripts and adapted documentation * changed PROJECT TO PROJECT_ID * added link to kaggle dataset and removed not required copy script (due to direct public location in gs://). Adapted Jupyter notebook input data path * added hint to make clear no further steps are required * fixed s/Run/RUN/ * grammar fix * optimized text * added prev link to index * removed model description due to lack of information * added significance and congrats =) * added example * guided the user's attention to specific screens/metrics/graphs * explenation of pieces * updated main readme * updated parts * fixed typo * adapted dataset path * made scripts executable chmod +x * Update step-1-setup.md swaped sections and added env variables to gsutil comand * added information regarding public access * added named entity recognition example kubeflow/website#853 * added previous and next steps * changed all absolute links to relative links * changed headline for better understanding * moved dataset description section to top * fixed style * added missing Jupyter notebook * changed headline * added link to documentation * fixed meaning of images and components * adapted documentation to https://www.kubeflow.org/docs/about/style-guide/#address-the-audience-directly * added link to ai platform models * make it clear these are optional extensions * changed summary and goals * added kubeflow version * fixed s/an/a/ also checked the rest of the documentation * added #!/bin/sh * added environment variables for build scripts and adapted documentation * changed PROJECT TO PROJECT_ID * added link to kaggle dataset and removed not required copy script (due to direct public location in gs://). Adapted Jupyter notebook input data path * added hint to make clear no further steps are required * fixed s/Run/RUN/ * grammar fix * optimized text * added prev link to index * removed model description due to lack of information * added significance and congrats =) * added example * guided the user's attention to specific screens/metrics/graphs * explenation of pieces * updated main readme * updated parts * fixed typo * adapted dataset path * made scripts executable chmod +x * Update step-1-setup.md swaped sections and added env variables to gsutil comand * added information regarding public access * fixed lint error * fixed lint issues * fixed lint issues * figured kubeflow examples are using 2 rather then 4 spaces (due to tensorflow standards) * lint fixes * reverted changes * removed unused import * removed object inherit * fixed lint issues * added kwargs to ignored-argument-name (due to best practice in Google custom prediction routine) * fix lint issues * set pylintrc back to default and removed unused argument
- Loading branch information
1 parent
78a79e7
commit 1ff3cf5
Showing
41 changed files
with
1,458 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,108 @@ | ||
# Byte-compiled / optimized / DLL files | ||
__pycache__/ | ||
*.py[cod] | ||
*$py.class | ||
|
||
# C extensions | ||
*.so | ||
|
||
# Distribution / packaging | ||
.Python | ||
build/ | ||
develop-eggs/ | ||
dist/ | ||
downloads/ | ||
eggs/ | ||
.eggs/ | ||
lib/ | ||
lib64/ | ||
parts/ | ||
sdist/ | ||
var/ | ||
wheels/ | ||
*.egg-info/ | ||
.installed.cfg | ||
*.egg | ||
MANIFEST | ||
|
||
# PyInstaller | ||
# Usually these files are written by a python script from a template | ||
# before PyInstaller builds the exe, so as to inject date/other infos into it. | ||
*.manifest | ||
*.spec | ||
|
||
# Installer logs | ||
pip-log.txt | ||
pip-delete-this-directory.txt | ||
|
||
# Unit test / coverage reports | ||
htmlcov/ | ||
.tox/ | ||
.coverage | ||
.coverage.* | ||
.cache | ||
nosetests.xml | ||
coverage.xml | ||
*.cover | ||
.hypothesis/ | ||
.pytest_cache/ | ||
|
||
# Translations | ||
*.mo | ||
*.pot | ||
|
||
# Django stuff: | ||
*.log | ||
local_settings.py | ||
db.sqlite3 | ||
|
||
# Flask stuff: | ||
instance/ | ||
.webassets-cache | ||
|
||
# Scrapy stuff: | ||
.scrapy | ||
|
||
# Sphinx documentation | ||
docs/_build/ | ||
|
||
# PyBuilder | ||
target/ | ||
|
||
# Jupyter Notebook | ||
.ipynb_checkpoints | ||
|
||
# pyenv | ||
.python-version | ||
|
||
# celery beat schedule file | ||
celerybeat-schedule | ||
|
||
# SageMath parsed files | ||
*.sage.py | ||
|
||
# Environments | ||
.env | ||
.venv | ||
env/ | ||
venv/ | ||
ENV/ | ||
env.bak/ | ||
venv.bak/ | ||
|
||
# Spyder project settings | ||
.spyderproject | ||
.spyproject | ||
|
||
# Rope project settings | ||
.ropeproject | ||
|
||
# mkdocs documentation | ||
/site | ||
|
||
# mypy | ||
.mypy_cache/ | ||
|
||
# custom | ||
custom_prediction_routine.egg-info | ||
custom_prediction_routine* |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
# Named Entity Recognition with Kubeflow and Keras | ||
|
||
In this walkthrough, you will learn how to use Kubeflow to build reusable components to train your model on an kubernetes cluster and deploy it to AI platform. | ||
|
||
## Goals | ||
|
||
* Demonstrate how to build reusable pipeline components | ||
* Demonstrate how to use Keras only models | ||
* Demonstrate how to train a Named Entity Recognition model on a Kubernetes cluster | ||
* Demonstrate how to deploy a Keras model to AI Platform | ||
* Demonstrate how to use a custom prediction routine | ||
* Demonstrate how to use Kubeflow metrics | ||
* Demonstrate how to use Kubeflow visualizations | ||
|
||
## What is Named Entity Recognition | ||
Named Entity Recognition is a word classification problem, which extract data called entities from text. | ||
|
||
![solution](documentation/files/solution.png) | ||
|
||
### Steps | ||
|
||
1. [Setup Kubeflow and clone repository](documentation/step-1-setup.md) | ||
1. [Build the pipeline components](documentation/step-2-build-components.md) | ||
1. [Upload the dataset](documentation/step-3-upload-dataset.md) | ||
1. [Custom prediction routine](documentation/step-4-custom-prediction-routine.md) | ||
1. [Run the pipeline](documentation/step-5-run-pipeline.md) | ||
1. [Monitor the training](documentation/step-6-monitor-training.md) | ||
1. [Predict](documentation/step-7-predictions.md) | ||
|
||
|
||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
#!/bin/sh | ||
|
||
echo "\nBuild and push preprocess component" | ||
./preprocess/build_image.sh | ||
|
||
echo "\nBuild and push train component" | ||
./train/build_image.sh | ||
|
||
echo "\nBuild and push deploy component" | ||
./deploy/build_image.sh |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
#!/bin/sh | ||
|
||
BUCKET="your-bucket-name" | ||
|
||
echo "\nCopy component specifications to Google Cloud Storage" | ||
gsutil cp preprocess/component.yaml gs://${BUCKET}/components/preprocess/component.yaml | ||
gsutil acl ch -u AllUsers:R gs://${BUCKET}/components/preprocess/component.yaml | ||
|
||
gsutil cp train/component.yaml gs://${BUCKET}/components/train/component.yaml | ||
gsutil acl ch -u AllUsers:R gs://${BUCKET}/components/train/component.yaml | ||
|
||
gsutil cp deploy/component.yaml gs://${BUCKET}/components/deploy/component.yaml | ||
gsutil acl ch -u AllUsers:R gs://${BUCKET}/components/deploy/component.yaml |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
FROM google/cloud-sdk:latest | ||
ADD ./src /pipelines/component/src | ||
RUN chmod 755 /pipelines/component/src/deploy.sh | ||
ENTRYPOINT ["/pipelines/component/src/deploy.sh"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
#!/bin/sh | ||
|
||
image_name=gcr.io/$PROJECT_ID/kubeflow/ner/deploy | ||
image_tag=latest | ||
|
||
full_image_name=${image_name}:${image_tag} | ||
|
||
cd "$(dirname "$0")" | ||
|
||
docker build -t "${full_image_name}" . | ||
docker push "$full_image_name" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
name: deploy | ||
description: Deploy the model with custom prediction route | ||
inputs: | ||
- name: Model path | ||
type: GCSPath | ||
description: 'Path of GCS directory containing exported Tensorflow model.' | ||
- name: Model name | ||
type: String | ||
description: 'The name specified for the model when it was or get created' | ||
- name: Model region | ||
type: String | ||
description: 'The region where the model is going to be deployed' | ||
- name: Model version | ||
type: String | ||
description: 'The version of the model' | ||
- name: Model runtime version | ||
type: String | ||
description: 'The runtime version of the model' | ||
- name: Model prediction class | ||
type: String | ||
description: 'The runtime version of the model' | ||
- name: Model python version | ||
type: String | ||
description: 'The python version of the model' | ||
- name: Model package uris | ||
type: String | ||
description: 'The packge uri of the model' | ||
outputs: | ||
implementation: | ||
container: | ||
image: gcr.io/<PROJECT-ID>/kubeflow/ner/deploy:latest | ||
command: [ | ||
sh, /pipelines/component/src/deploy.sh | ||
] | ||
args: [ | ||
--model-path, {inputValue: Model path}, | ||
--model-name, {inputValue: Model name}, | ||
--model-region, {inputValue: Model region}, | ||
--model-version, {inputValue: Model version}, | ||
--model-runtime-version, {inputValue: Model runtime version}, | ||
--model-prediction-class, {inputValue: Model prediction class}, | ||
--model-python-version, {inputValue: Model python version}, | ||
--model-package-uris, {inputValue: Model package uris}, | ||
] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,88 @@ | ||
# loop through all parameters | ||
while [ "$1" != "" ]; do | ||
case $1 in | ||
"--model-path") | ||
shift | ||
MODEL_PATH="$1" | ||
echo | ||
shift | ||
;; | ||
"--model-name") | ||
shift | ||
MODEL_NAME="$1" | ||
echo | ||
shift | ||
;; | ||
"--model-region") | ||
shift | ||
MODEL_REGION="$1" | ||
echo | ||
shift | ||
;; | ||
"--model-version") | ||
shift | ||
MODEL_VERSION="$1" | ||
echo | ||
shift | ||
;; | ||
"--model-runtime-version") | ||
shift | ||
RUNTIME_VERSION="$1" | ||
echo | ||
shift | ||
;; | ||
"--model-prediction-class") | ||
shift | ||
MODEL_PREDICTION_CLASS="$1" | ||
echo | ||
shift | ||
;; | ||
"--model-python-version") | ||
shift | ||
MODEL_PYTHON_VERSION="$1" | ||
echo | ||
shift | ||
;; | ||
"--model-package-uris") | ||
shift | ||
MODEL_PACKAGE_URIS="$1" | ||
echo | ||
shift | ||
;; | ||
*) | ||
esac | ||
done | ||
|
||
# echo inputs | ||
echo MODEL_PATH = "${MODEL_PATH}" | ||
echo MODEL = "${MODEL_EXPORT_PATH}" | ||
echo MODEL_NAME = "${MODEL_NAME}" | ||
echo MODEL_REGION = "${MODEL_REGION}" | ||
echo MODEL_VERSION = "${MODEL_VERSION}" | ||
echo RUNTIME_VERSION = "${RUNTIME_VERSION}" | ||
echo MODEL_PREDICTION_CLASS = "${MODEL_PREDICTION_CLASS}" | ||
echo MODEL_PYTHON_VERSION = "${MODEL_PYTHON_VERSION}" | ||
echo MODEL_PACKAGE_URIS = "${MODEL_PACKAGE_URIS}" | ||
|
||
|
||
# create model | ||
modelname=$(gcloud ai-platform models list | grep -w "$MODEL_NAME") | ||
echo "$modelname" | ||
if [ -z "$modelname" ]; then | ||
echo "Creating model $MODEL_NAME in region $REGION" | ||
|
||
gcloud ai-platform models create ${MODEL_NAME} \ | ||
--regions ${MODEL_REGION} | ||
else | ||
echo "Model $MODEL_NAME already exists" | ||
fi | ||
|
||
# create version with custom prediction routine (beta) | ||
echo "Creating version $MODEL_VERSION from $MODEL_PATH" | ||
gcloud beta ai-platform versions create ${MODEL_VERSION} \ | ||
--model ${MODEL_NAME} \ | ||
--origin ${MODEL_PATH} \ | ||
--python-version ${MODEL_PYTHON_VERSION} \ | ||
--runtime-version ${RUNTIME_VERSION} \ | ||
--package-uris ${MODEL_PACKAGE_URIS} \ | ||
--prediction-class ${MODEL_PREDICTION_CLASS} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
ARG BASE_IMAGE_TAG=1.12.0-py3 | ||
FROM tensorflow/tensorflow:$BASE_IMAGE_TAG | ||
RUN python3 -m pip install keras | ||
COPY ./src /pipelines/component/src |
12 changes: 12 additions & 0 deletions
12
named_entity_recognition/components/preprocess/build_image.sh
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
#!/bin/sh | ||
|
||
image_name=gcr.io/$PROJECT_ID/kubeflow/ner/preprocess | ||
image_tag=latest | ||
|
||
full_image_name=${image_name}:${image_tag} | ||
base_image_tag=1.12.0-py3 | ||
|
||
cd "$(dirname "$0")" | ||
|
||
docker build --build-arg BASE_IMAGE_TAG=${base_image_tag} -t "${full_image_name}" . | ||
docker push "$full_image_name" |
34 changes: 34 additions & 0 deletions
34
named_entity_recognition/components/preprocess/component.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
name: preprocess | ||
description: Performs the IOB preprocessing. | ||
inputs: | ||
- {name: Input 1 URI, type: GCSPath} | ||
- {name: Output x URI template, type: GCSPath} | ||
- {name: Output y URI template, type: GCSPath} | ||
- {name: Output preprocessing state URI template, type: GCSPath} | ||
outputs: | ||
- name: Output x URI | ||
type: GCSPath | ||
- name: Output y URI | ||
type: String | ||
- name: Output tags | ||
type: String | ||
- name: Output words | ||
type: String | ||
- name: Output preprocessing state URI | ||
type: String | ||
implementation: | ||
container: | ||
image: gcr.io/<PROJECT-ID>/kubeflow/ner/preprocess:latest | ||
command: [ | ||
python3, /pipelines/component/src/component.py, | ||
--input1-path, {inputValue: Input 1 URI}, | ||
--output-y-path, {inputValue: Output y URI template}, | ||
--output-x-path, {inputValue: Output x URI template}, | ||
--output-preprocessing-state-path, {inputValue: Output preprocessing state URI template}, | ||
|
||
--output-y-path-file, {outputPath: Output y URI}, | ||
--output-x-path-file, {outputPath: Output x URI}, | ||
--output-preprocessing-state-path-file, {outputPath: Output preprocessing state URI}, | ||
--output-tags, {outputPath: Output tags}, | ||
--output-words, {outputPath: Output words}, | ||
] |
Oops, something went wrong.