Skip to content

Latest commit

 

History

History

deployment

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Deployment

Overview

This document covers the deployment guide for MLOps.

Databricks Cluster

For Orchestrator job, either an existing cluster can be used or a new cluster can be created. However, we need to be sure to set following properties in the cluster.

  • Cluster Mode: High Concurrency

  • DataBricks Runtime Version : 8.1 LTS ML (includes Apache Spark 3.0.1, Scala 2.12)

  • Enable Autoscaling: True

  • Worker Type: Standard_F4s

  • Driver Type: Standard_F4s

  • Spark Settings under “Spark Config” (Edit > Advanced Options > Spark)

    spark.databricks.cluster.profile serverless
    spark.databricks.repl.allowedLanguages sql,python,r
    spark.databricks.conda.condaMagic.enabled true
    

Databricks Job

Orchestrator DataBricks Job from a Databricks Job create template can be created using following example CLI command -

databricks jobs create --json-file <job-template-file>.json

Orchestrator DataBricks Job from a Databricks Job reset template can be updated using following example CLI command -

databricks jobs reset --job-id <job-id of existing job> --json-file <job-template-file>.json

Databricks MLflow Experiment

MLflow Experiment can be created using Databricks Workspace Portal or using following CLI commands -

export MLFLOW_TRACKING_URI=databricks
export DATABRICKS_HOST=<databricks host>
export DATABRICKS_TOKEN=<databricks token>
mlflow experiments create --experiment-name /<path in databricks workspace>/<experiment name>

Get DATABRICKS_HOST and DATABRICKS_TOKEN from Databricks CLI Reference

Databricks DBFS Upload

The following CLI command can be used to upload Wheel package into DataBricks DBFS.

databricks fs cp --overwrite python-package.whl <dbfs-path>

Databricks Notebook Import

The following CLI command can be used to import orchestrator python file as a DataBricks notebook into DataBricks workspace.

databricks workspace import -l PYTHON -f SOURCE -o <orchestrator-notebook-python-file>.py <databricks-workspace-path>

Orchestrator DataBricks Job trigger

Orchestrator databricks job can be triggered using following ways -

  • Scheduled :
    • Cron based scheduling.
  • Manual :