Stability: Experimental
The Databricks Utilities for Scala includes functionality to accelerate development with Scala for the Databricks Lakehouse.
The Databricks Utilities for Scala library is implemented mostly using the core of the SDK for Java. Consult that repository's README for information on authentication, logging, and how to make requests directly to the Databricks REST API.
- Getting started
- Migrating to DBUtils
- Limitations when running outside of Databricks Runtime
- Interface stability
- Contributing
- Disclaimer
You can install Databricks Utilities for Scala by adding the following to your pom.xml
:
<dependency>
<groupId>com.databricks</groupId>
<artifactId>databricks-sdk-dbutils</artifactId>
<version>0.1.4</version>
</dependency>
Get an instance of DBUtils by calling DBUtils.getDBUtils()
.
import com.databricks.sdk.scala.dbutils.DBUtils
object App {
final def main(args: Array[String]): Unit = {
DBUtils dbutils = DBUtils.getDBUtils()
dbutils.fs.head("/Volumes/mycatalog/myschema/myvolume/file.txt")
}
}
This code is now portable and can be run both within Databricks Runtime and in applications outside of Databricks Runtime. When this code is run in Databricks Runtime, the returned DBUtils instance proxies all function calls to the DBUtils instance provided by Databricks Runtime. When this code is run outside of Databricks Runtime, DBUtils uses the REST API to emulate the behavior of DBUtils within Databricks Runtime, providing a consistent interface for users to build applications that can run within and outside of Databricks Runtime.
In Databricks notebooks, DBUtils is provided as a built-in and is automatically available for users. To make notebook code using DBUtils portable with this library, add the following code in your notebook:
import com.databricks.sdk.scala.dbutils.DBUtils
val dbutils = DBUtils.getDBUtils()
If you have imported any types from DBUtils, change the package of those types to com.databricks.sdk.scala.dbutils
.
In DBConnect version 1, the DBUtils interface was exposed as com.databricks.service.DBUtils
. Add the following code to your application:
import com.databricks.sdk.scala.dbutils.DBUtils
val dbutils = DBUtils.getDBUtils()
and replace usages of DBUtils
with dbutils
.
Additionally, if you have imported any types from com.databricks.service
, replace those imports with com.databricks.sdk.scala.dbutils
.
The DBUtils interface provides many convenient utilities for interacting with Databricks APIs, notebooks and Databricks Runtime. When run outside of Databricks Runtime, some of these utilities are less useful. The limitations of the version of DBUtils returned by DBUtils.getDBUtils()
in this case are as follows:
- Only
fs
andsecrets
components of DBUtils are supported. Other fields will throw an exception if accessed. - Within
fs
, the mounting methods (mount
,updateMount
,refreshMounts
,mounts
, andunmount
) are not implemented and will throw an exception if called. - Within
fs
, the caching methods (cacheTable
,cacheFiles
,uncacheTable
, anduncacheFiles
) are not implemented and will throw an exception if called. help()
methods are not implemented.
During the Experimental period, Databricks is actively working on stabilizing the Databricks Utilities for Scala's interfaces. You are highly encouraged to pin the exact dependency version and read the changelog where Databricks documents the changes. Databricks may have minor documented backward-incompatible changes, such as renaming the methods or some type names to bring more consistency.
This section contains the guidelines for adding a change to the repository.
- Create a PR with the change.
- Make sure the changes are unit tested.
- Make sure the changes have been tested end to end. Please see the section below for end to end manual testing.
Testing the changes end to end is not straight forward since we don't have a dedicated infrastructure for the repository yet. Please look at the steps below for manually testing a change end to end.
- Build and upload the local jar to Databricks Volumes. This will be used later on to install the library on the cluster.
- Make sure the changes are in the local branch you would be building the jar from.
- From repository root, run:
$ mvn package
- The jars would be build under the following directory from root:
databricks-dbutils-scala/target
- Upload the jar to test to UC Volumes. For building on 0.1.4 and using scala 1.12, this would be
databricks-dbutils-scala_2.12-0.1.4.jar
in most cases.
- Upload the jar in Volumes
- Open the Databricks console.
- Go to the volume you would like to upload to and click:
Upload to this volume
. - Select the jar mentioned above in step 1.4 and upload.
- Add an instance profile if needed. For example in case of interacting with S3.
- On Databricks console, click on the user icon and go to
Settings
->Security
->Manage
. - Click on
Add instance profile
. - Add the instance profile you need.
- On Databricks console, click on the user icon and go to
- Create a cluster and install the library:
- On Databricks console, go to
Compute
->Create Compute
- Attach the instance profile (step - 3.3).
- Install the library from UC Volumes (step - 2.3)
- On Databricks console, go to
- Create a notebook with the code to test the end to end flow.
- On Databricks console, create a notebook, click on
New
->Notebook
- Write the code to test the end to end flow, example:
import com.databricks.sdk.scala.dbutils.DBUtils DBUtils.getDBUtils().fs.mount("s3a://bucket-name", "/mnt/mount-point")
- Connect the cluster (step - 4.3) to the notebook and run
- On Databricks console, create a notebook, click on
- The product is in preview and not intended to be used in production;
- The product may change or may never be released;
- While we will not charge separately for this product right now, we may charge for it in the future. You will still incur charges for DBUs.
- There's no formal support or SLAs for the preview - so please reach out to your account or other contact with any questions or feedback; and
- We may terminate the preview or your access at any time.