In short, the purpose of this project is to demonstrate how Plotly's Dash can be utilized in tandem with the Databricks SDK. More specifically, here we choose to kick off and parameterize a Databricks notebook from a dash application, either running locally, on Heroku, or on Dash Enterprise.
For more information on background and our joint story, please visit our Databricks + Plotly Dash Partner page.
For information on how to get started, see the below instruction set.
-
Ensure that you have permissions on Databricks to kick of Jobs via the Jobs API. Check with your Databricks workspace administrator if you do not, or if any commands in this project fail unusually.
-
Upload the Databricks-SDK-Dash-Jobs-Notebook.ipynb file to Databricks by clicking (+ New) -> Notebook (screenshot below).
-
Run the notebook, and attach it to the cluster that you would like to utilize. Importantly, get the Datbricks cluster's ID. You will utilize this in your dash app's .env file locally. Link.
- Use
git clone [email protected]:plotly/Dash-Databricks-SDK-Article.git
to clone this repository to your local filesystem. - Ensure that you have a .databrickscfg file that contains your Databricks domain and PAT. By default, it should be located in your base directory. i.e.
/.databrickscfg
The file structure should resemble the example provided in this repository, with your Databricks host name and personal access token. cd
into your project directory (called Databricks-Dash-SDK-Article by default)- Remove
.databrickscfg
from your project's directory, proivded you have it already at your base directory (step 2). - In
.env
file, copy-paste your cluster's ID into theDATABRICKS_CLUSTER_ID
field. - Modify constants.py as needed. Mainly, you may choose to rename the Databricks notebook provided in this project. If so, reflect those changes by modifying the
notebook_name
variable. - Run
pip install -r requirements.txt
- Run
python app.py
to get started!