This document covers the design guide of the following orchestrators -
- It will be a Databricks notebook in Databricks workspace.
- It will be stored in GIT as a python file.
- It will use
dbutils
widgets for parametrization - It will use
pip magic commands
for managing libraries. - It will be executed from a Databricks Job.
- It will perform logging in Application Insights
- It will log artifacts, metrics, parameters, trained model into MLflow.
Parameters are defined using dbutils.widgets.text
, example
dbutils.widgets.text("<param_name>", "<default_value>")
Parameters are read using dbutils.widgets.get
, example
param_value = dbutils.widgets.get("<param_name>")
Starting with Databricks Runtime ML version 6.4 this feature can be enabled when creating a cluster.
To perform this set spark.databricks.conda.condaMagic.enabled
to true
under “Spark Config” (Edit > Advanced Options > Spark).
Libraries are installed as Notebook-scoped Python libraries, example
%pip install dbfs/<path>/<package_name>.whl
MLOps Python Functions are packaged as a wheel package and orchestrator notebook calls the python functions from wheel package.
Orchestrator are executed from DataBricks Job.
For error handling try..catch
block is used to handle exceptions -
try:
model = run_training()
except(Exception ex):
logger.error(f"Encountered error: {ex.Message}") # To log exception in Application Insights
raise Exception(f"Encountered error - {ex}") from ex # To fail the Databricks Job Run
OpenCensus library is used to capture logs and metrics and send it to Application Insights.
The following secrets need to be stored in Databricks Secret Scope:
- Application Insights Instrumentation Key
- Azure ADLS Gen2 Storage Details (account name, container name, shared access key)
Secrets are read using dbutils.secrets.get
, example
secret_value = dbutils.secrets.get(scope = "<scope-name>", key = "<secret-name>")