A formatting tool for your Databricks notebooks.
Install:
$ pip install blackbricks
You probably also want to have installed the databricks-cli
, in order to use blackbricks
directly on your notebooks.
$ pip install databricks-cli
$ databricks configure # Required in order to use `blackbricks` on remote notebooks.
You can use blackbricks
on Python notebook files stored locally, or directly on the notebooks stored in Databricks.
For the most part, blackbricks
operates very similary to black
.
$ blackbricks notebook1.py notebook2.py # Formats both notebooks.
$ blackbricks notebook_directory/ # Formats every notebook under the directory (recursively).
An important difference is that blackbricks
will ignore any file that does not contain the # Databricks notebook source
header on the first line. Databricks adds this line to all Python notebooks. This means you can happily run blackbricks
on a directory with both notebooks and regular Python files, and blackbricks
won't touch the latter.
If you specify the -r
or --remote
flag, blackbricks
will work directly on your notebooks stored in Databricks.
$ blackbricks --remote /Users/username/notebook.py
When working on remote files, you can not add whole directories.
$ blackbricks --help
Usage: blackbricks [OPTIONS] [FILENAMES]...
Formatting tool for Databricks python notebooks.
Python cells are formatted using `black`, and SQL cells are formatted by
`sqlparse`.
Local files (without the `--remote` option):
- Only files that look like Databricks (Python) notebooks will be
processed. That is, they must start with the header `# Databricks
notebook source`
- If you specify a directory as one of the file names, all files in that
directory will be added, including any subdirectory.
Remote files (with the `--remote` option):
- Make sure you have installed the Databricks CLI (``pip install
databricks_cli``)
- Make sure you have configured at least one profile (`databricks
configure`). Check the file `~/.databrickscfg` if you are not sure.
- File paths should start with `/`. Otherwise they are interpreted as
relative to `/Users/username`, where `username` is the username
specified in the Databricks profile used.
Arguments:
[FILENAMES]... Path to the notebook(s) to format.
Options:
-r, --remote If this option is used, all filenames are
treated as paths to notebooks on your
Databricks host (i.e. not local files).
[default: False]
-p, --profile NAME If using --remote, which Databricks profile
to use. [default: DEFAULT]
--line-length INTEGER How many characters per line to allow.
[default: 88]
--sql-upper / --no-sql-upper SQL keywords should be UPPERCASE or
lowercase. [default: True]
--indent-with-two-spaces / --no-indent-with-two-spaces
Use two spaces for indentation in Python
cells instead of Black's default of four.
Databricks uses two spaces. [default: True]
--check Don't write the files back, just return the
status. Return code 0 means nothing would
change.
--diff Don't write the files back, just output a
diff for each file on stdout.
--version Display version information and exit.
--help Show this message and exit.
Use pre-commit. Add a .pre-commit-config.yaml
file
to your repo with the following content (changing/removing the args
as you
wish):
repos:
- repo: https://github.com/bsamseth/blackbricks
rev: 0.6.0
hooks:
- id: blackbricks
args: [--line-length=120, --indent-with-two-spaces]
Set the rev
attribute to the most recent version of blackbricks
.
The args
are optional and can be used to set any of blackbricks
options.
If you find blackbricks useful, feel free to say so with a star. If you think it is utterly broken, you are more than welcome to contribute improvements. Please open an issue first to discuss what you want added/fixed. Unless you are just adding tests. In that case your pull request is extremely likely to be merged right away.
Sure! Certain SQL statements might not be parsed and indented properly by sqlparse
, and the result can be jumbled formatting. You can disable SQL formatting for a cell by adding -- nofmt
to the very first line of a cell:
%sql -- nofmt
select this,
sql_will, -- be kept just
like_this
from if_that_is.what_you_need
First, make sure you have set up databricks-cli
on your system (see
installation), and that you have at least one profile setup in
~/.databrickscfg
. As an example:
# File: ~/.databrickscfg
[DEFAULT]
host = https://dbc-b23456-a1243.cloud.databricks.com/
username = [email protected]
password = dapi12345678901234567890
[OTHERPROFILE]
host = https://dbc-c54321-d234.cloud.databricks.com
username = [email protected]
password = dapi09876543211234567890
You should use access tokens instead of your actual password.
You can then do:
$ blackbricks --remote /Users/[email protected]/notebook.py # Uses DEFAULT profile.
$ blackbricks --remote notebook.py # Equivalent to the above.
$ blackbricks --remote --profile OTHERPROFILE /Users/[email protected]/notebook.py
$ blackbricks --remote --profile OTHERPROFILE notebook.py # Equivalent to the above.
This means you had an old version of click
installed from before, and your installation didn't upgrade it automatically. Updating your installation should do the trick, e.g. pip install -U blackbricks
or similar depending on your installation method of choice.