Skip to content

DaveRuijter/blackbricks

 
 

Repository files navigation

PyPI version Downloads Downloads per month License Code style: Black

Blackbricks

A formatting tool for your Databricks notebooks.

  • Python cells are formatted with black
  • SQL cells are formatted with sqlparse

Table of Contents

Installation

Install:

$ pip install blackbricks

You probably also want to have installed the databricks-cli, in order to use blackbricks directly on your notebooks.

$ pip install databricks-cli
$ databricks configure  # Required in order to use `blackbricks` on remote notebooks.

Usage

You can use blackbricks on Python notebook files stored locally, or directly on the notebooks stored in Databricks.

For the most part, blackbricks operates very similary to black.

$ blackbricks notebook1.py notebook2.py  # Formats both notebooks.
$ blackbricks notebook_directory/  # Formats every notebook under the directory (recursively).

An important difference is that blackbricks will ignore any file that does not contain the # Databricks notebook source header on the first line. Databricks adds this line to all Python notebooks. This means you can happily run blackbricks on a directory with both notebooks and regular Python files, and blackbricks won't touch the latter.

If you specify the -r or --remote flag, blackbricks will work directly on your notebooks stored in Databricks.

$ blackbricks --remote /Users/username/notebook.py

When working on remote files, you can not add whole directories.

Full usage

$ blackbricks --help
Usage: blackbricks [OPTIONS] [FILENAMES]...

  Formatting tool for Databricks python notebooks.

  Python cells are formatted using `black`, and SQL cells are formatted by
  `sqlparse`.

  Local files (without the `--remote` option):

    - Only files that look like Databricks (Python) notebooks will be
    processed. That is, they must start with the header `# Databricks
    notebook source`

    - If you specify a directory as one of the file names, all files in that
    directory will be added, including any subdirectory.

  Remote files (with the `--remote` option):

    - Make sure you have installed the Databricks CLI (``pip install
    databricks_cli``)

    - Make sure you have configured at least one profile (`databricks
    configure`). Check the file `~/.databrickscfg` if you are not sure.

    - File paths should start with `/`. Otherwise they are interpreted as
    relative to `/Users/username`, where `username` is the username
    specified in the Databricks profile used.

Arguments:
  [FILENAMES]...  Path to the notebook(s) to format.

Options:
  -r, --remote                    If this option is used, all filenames are
                                  treated as paths to notebooks on your
                                  Databricks host (i.e. not local files).
                                  [default: False]

  -p, --profile NAME              If using --remote, which Databricks profile
                                  to use.  [default: DEFAULT]

  --line-length INTEGER           How many characters per line to allow.
                                  [default: 88]

  --sql-upper / --no-sql-upper    SQL keywords should be UPPERCASE or
                                  lowercase.  [default: True]

  --indent-with-two-spaces / --no-indent-with-two-spaces
                                  Use two spaces for indentation in Python
                                  cells instead of Black's default of four.
                                  Databricks uses two spaces.  [default: True]

  --check                         Don't write the files back, just return the
                                  status. Return code 0 means nothing would
                                  change.

  --diff                          Don't write the files back, just output a
                                  diff for each file on stdout.

  --version                       Display version information and exit.
  --help                          Show this message and exit.

Version control integration

Use pre-commit. Add a .pre-commit-config.yaml file to your repo with the following content (changing/removing the args as you wish):

repos:
-   repo: https://github.com/bsamseth/blackbricks
    rev: 0.6.0
    hooks:
    - id: blackbricks
      args: [--line-length=120, --indent-with-two-spaces]

Set the rev attribute to the most recent version of blackbricks. The args are optional and can be used to set any of blackbricks options.

Contributing

If you find blackbricks useful, feel free to say so with a star. If you think it is utterly broken, you are more than welcome to contribute improvements. Please open an issue first to discuss what you want added/fixed. Unless you are just adding tests. In that case your pull request is extremely likely to be merged right away.

FAQ

Can I disable SQL formatting?

Sure! Certain SQL statements might not be parsed and indented properly by sqlparse, and the result can be jumbled formatting. You can disable SQL formatting for a cell by adding -- nofmt to the very first line of a cell:

%sql  -- nofmt
select this,
             sql_will,   -- be kept just
         like_this
  from if_that_is.what_you_need

How do I use blackbricks on my Databricks notebooks?

First, make sure you have set up databricks-cli on your system (see installation), and that you have at least one profile setup in ~/.databrickscfg. As an example:

# File: ~/.databrickscfg

[DEFAULT]
host = https://dbc-b23456-a1243.cloud.databricks.com/
username = [email protected]
password = dapi12345678901234567890

[OTHERPROFILE]
host = https://dbc-c54321-d234.cloud.databricks.com
username = [email protected]
password = dapi09876543211234567890

You should use access tokens instead of your actual password.

You can then do:

$ blackbricks --remote /Users/[email protected]/notebook.py  # Uses DEFAULT profile.
$ blackbricks --remote notebook.py  # Equivalent to the above.
$ blackbricks --remote --profile OTHERPROFILE /Users/[email protected]/notebook.py
$ blackbricks --remote --profile OTHERPROFILE notebook.py  # Equivalent to the above.

I get an error: TypeError: init() got an unexpected keyword argument 'no_args_is_help'

This means you had an old version of click installed from before, and your installation didn't upgrade it automatically. Updating your installation should do the trick, e.g. pip install -U blackbricks or similar depending on your installation method of choice.

About

Black for Databricks notebooks

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 98.7%
  • Shell 1.3%