Before starting with the exercises, make sure that you have the following in place:
- An Azure Machine Learning workspace
- Follow this tutorial, no need to configure the networking section!
- A Compute Instance, running in your workspace (
Standard_D2_v2
is sufficient)- Goto AML Studio (https://ml.azure.com), sign-in, then select
Compute
, thenCompute instance
and clickCreate
- Give it any name, select
Standard_D2_v2
as size and hit create - done!
- Goto AML Studio (https://ml.azure.com), sign-in, then select
We recommend to run these exercises on a Compute Instance on Azure Machine Learning. To get started, open Jupyter or JupyterLab on the Compute Instance, select New --> Terminal
(upper right corner) and clone this repo:
git clone https://github.com/csiebler/azure-machine-learning-mlops-workshop.git
cd azure-machine-learning-mlops-workshop/
Then navigate to the cloned folder in Jupyter, and open single_step_pipeline.ipynb
from this exercise. In case you're asked for a kernel, please use the Python 3.6 - AzureML
kernel that comes with the Compute Instance.
- Provision an Azure Machine Learning Workspace in Azure
- Install the Azure CLI
- Login to your Azure subscription via
az login
- Make sure you are in the correct subscription (the one of your workspace):
az account list
lists all your subscriptionsaz account set -s '<SUBSCRIPTION_ID or NAME>'
sets the default one that the CLI should use
- Install the Azure Machine Learning CLI extensive via
az extension add -n azure-cli-ml
- Clone this repo via
git clone https://github.com/csiebler/azure-machine-learning-mlops-workshop.git
- Navigate into the repo
cd azure-machine-learning-mlops-workshop/
- Attach the whole repo to your workspace via
az ml folder attach -w <YOUR WORKSPACE NAME> -g <YOUR RESOURCE GROUP>
- Fire up your favorite notebook experience and get started!
- Our notebook creates the AzureML pipeline inside the workspace
- This pipeline exposes a REST API through which it can be invoked
- During pipeline creation, we define what scripts it should run (here it's just
train.py
) - We also specify in which AzureML Environment it should run (this defines the runtime environment our script will run in)
- Our pipeline is parameterized so we can pass in other datasets if desired (e.g., for retraining with a new dataset)
- Our pipeline runs on a compute cluster (on-demand spun up when the pipeline is triggered)
- Our pipeline might output data (e.g., during batch scoring) or registers a model (e.g., during training) - this will be covered in the subsequent examples
❓ Question: From where does train_step = PythonScriptStep(name="train-step", ...)
know which Python dependencies to use?
✅ See solution!
It uses the AML environment workshop-env
which we created in the first step. This environment was created using the conda.yml
. We could have defined all this in Python, but having the conda enviroment in a separate file, allows us to easier test this locally, e.g., by using:
conda env create -f conda.yml
python train.py --data-path ../data-training
❓ Question: How can we make a compute cluster scale down quicker/slower?
✅ See solution!
We can adapt idle_seconds_before_scaledown=3600
, which defines the idle time until the cluster scales down to 0 nodes.
❓ Question: From where does ws = Workspace.from_config()
how to which workspace it needs to connect?
✅ See solution!
The call Workspace.from_config()
has the following behaviour:
- Inside a Compute Instance, it resolves to the workspace of the current instance
- If a
config.json
file is present, it loads the workspace reference from there (you can download this file from the Studio UI, by clicking the book icon on the upper right):
{
"subscription_id": "*****",
"resource_group": "aml-mlops-workshop",
"workspace_name": "aml-mlops-workshop"
}
- Use the az CLI to connect to the workspace and use the workspace attached to via
az ml folder attach -g <resource group> -w <workspace name>
❓ Question: What is the difference between PublishedPipeline
and PipelineEndpoint
?
✅ See solution!
PublishedPipeline
allows to publish a pipeline as a RESTful API endpoint, from which it can be invoked. EachPublishedPipeline
will have a new URL endpoint.PipelineEndpoint
allows to "hide" multiplePublishedPipeline
s behind a single URL and routes the request to a specific default version. This enables to continously update thePipelineEndpoint
with newPublishedPipeline
s while the URL stays the same. Hence, the consumer will not notice that the pipeline got "swapped out", "replaced" or "changed". This is very helpful when we want to test pipelines before we release or hand them over to the pipeline consumer.