Skip to content

Latest commit

 

History

History

orchestrator

Orchestrator

Overview

This document covers the design guide of the following orchestrators -

  1. taxi_fares_orchestrator_train.py
  2. taxi_fares_orchestrator_batch_score.py

Considerations

  • It will be a Databricks notebook in Databricks workspace.
  • It will be stored in GIT as a python file.
  • It will use dbutils widgets for parametrization
  • It will use pip magic commands for managing libraries.
  • It will be executed from a Databricks Job.
  • It will perform logging in Application Insights
  • It will log artifacts, metrics, parameters, trained model into MLflow.

Parameters

Define Parameters

Parameters are defined using dbutils.widgets.text, example

dbutils.widgets.text("<param_name>", "<default_value>")

Read Parameters

Parameters are read using dbutils.widgets.get, example

param_value = dbutils.widgets.get("<param_name>")

Installation of libraries

How to enable %pip magic commands

Starting with Databricks Runtime ML version 6.4 this feature can be enabled when creating a cluster. To perform this set spark.databricks.conda.condaMagic.enabled to true under “Spark Config” (Edit > Advanced Options > Spark).

How to install libraries using pip

Libraries are installed as Notebook-scoped Python libraries, example

%pip install dbfs/<path>/<package_name>.whl

Calling MLOps Python Functions

MLOps Python Functions are packaged as a wheel package and orchestrator notebook calls the python functions from wheel package.

Execution of Orchestrator

Orchestrator are executed from DataBricks Job.

Error handling

For error handling try..catch block is used to handle exceptions -

try:
  model = run_training()
except(Exception ex):
  logger.error(f"Encountered error: {ex.Message}") # To log exception in Application Insights
  raise Exception(f"Encountered error - {ex}") from ex # To fail the Databricks Job Run

Observability

OpenCensus library is used to capture logs and metrics and send it to Application Insights.

Secret Management

The following secrets need to be stored in Databricks Secret Scope:

  • Application Insights Instrumentation Key
  • Azure ADLS Gen2 Storage Details (account name, container name, shared access key)

Secrets are read using dbutils.secrets.get, example

secret_value = dbutils.secrets.get(scope = "<scope-name>", key = "<secret-name>")

References

  1. Enable pip magic commands
  2. OpenCensus
  3. DataBricks Job API
  4. DataBricks Cluster API
  5. DataBricks CLI