- Add config for generating unique tmp table names for enabling parralel merge (thanks @huangxingyi-git!) (854)
- Add support for serverless job clusters on python models (706)
- Add 'user_folder_for_python' behavior to switch writing python model notebooks to the user's folder (835)
- Merge capabilities are extended (739) to include the support for the following features (thanks @mi-volodin):
with schema evolution
clause (requires Databricks Runtime 15.2 or above);when not matched by source
clause, only fordelete
actionmatched
,not matched
andnot matched by source
condition clauses;- custom aliases for source and target tables can be specified and used in condition clauses;
matched
andnot matched
steps can now be skipped;
- Allow for the use of custom constraints, using the
custom
constraint type with anexpression
as the constraint (thanks @roydobbe). (792) - Add "use_info_schema_for_columns" behavior flag to turn on use of information_schema to get column info where possible. This may have more latency but will not truncate complex data types the way that 'describe' can. (808)
- Add support for table_format: iceberg. This uses UniForm under the hood to provide iceberg compatibility for tables or incrementals. (815)
- Add
include_full_name_in_path
config boolean for external locations. This writes tables to {location_root}/{catalog}/{schema}/{table} (823) - Add a new
workflow_job
submission method for python, which creates a long-lived Databricks Workflow instead of a one-time run (thanks @kdazzle!) (762) - Allow for additional options to be passed to the Databricks Job API when using other python submission methods. For example, enable email_notifications (thanks @kdazzle!) (762)
- Support microbatch incremental strategy using replace_where (825)
- Replace array indexing with 'get' in split_part so as not to raise exception when indexing beyond bounds (839)
- Set queue enabled for Python notebook jobs (856)
- Significant refactoring and increased testing of python_submissions (830)
- Fix places where we were not properly closing cursors, and other test warnings (713)
- Drop support for Python 3.8 (713)
- Upgrade databricks-sql-connector dependency to 3.5.0 (833)
- Prepare for python typing deprecations (837)
- Fix behavior flag use in init of DatabricksAdapter (thanks @VersusFacit!) (836)
- Restrict pydantic to V1 per dbt Labs' request (843)
- Switching to Ruff for formatting and linting (847)
- Add config for generating unique tmp table names for enabling parralel replace-where (thanks @huangxingyi-git!) (811)
- Stop setting cluster by to None. If you want to drop liquid clustering, you will need to full-refresh ([806]#806)
- Don't define table properties on snapshot staging views (thanks @jelmerk!) (820)
- Persist table comments for incremental models, snapshots and dbt clone (thanks @henlue!) (750)
- Add relation identifier (i.e. table name) in auto generated constraint names, also adding the statement of table list for foreign keys (thanks @elca-anh!) (774)
- Update tblproperties on incremental runs. Note: only adds/edits. Deletes are too risky/complex for now (765)
- Update default scope/redirect Url for OAuth U2M, so with default OAuth app user can run python models (776)
- Fix foreign key constraints by switching from
parent
toto
andparent_columns
toto_columns
(789) - Now handles external shallow clones without blowing up (795)
- Alter column statements are now done before the alter table statement (thanks @frankivo!). (731)
- Always use lower case when gathering metadata (since objects are stored internally as lower case regardless of how we create them) (742)
- Persist table comments for python models (743)
- Stop cursor destructor warnings (744)
- Race condition on cluster creation. (thanks @jurasan!) (751)
- Fix
dbt seed
command failing for a seed file when the columns for that seed file were partially defined in the properties file. (thanks @kass-artur!) (724) - Add more tblproperties to be ignored with MV/ST (736)
- Readd the External relation type for compliance with adapter expectations (728)
- Fix missing catalog name in one of the metadata gathering calls (714)
- Undo the removal of spark.sql.sources.partitionOverwriteMode = DYNAMIC (688)
- Set spark.sql.sources.partitionOverwriteMode = STATIC on --full-refresh to ensure existing rows are removed (697)
- Migrate to using system.information_schema to fix issue with catalog renames (692)
- Cancel python jobs when dbt operation is canceled (thanks @gaoshihang for kicking this off!) (693)
- Fix the default redirect_url and scopes of the client
dbt-databricks
(704)
- Reduce severity of logging when expected 24 hour token expiration for Azure SPA (thanks @thijs-nijhuis!) (699)
- Migrate remaining unit tests off of unittest.TestCase (701)
- Support Liquid Clustering for python models (663)
- Update Liquid Clustering columns on is_incremental runs (686)
- Rerunning seed with external location + persist_doc now more resilient (662)
- Fix issue with running while a refresh is in progress with MV/ST (674)
- Fix issue with running a refresh with MV/ST that need names to be escaped (674)
- Delay loading of agate library to improve startup (thanks @dwreeves for getting this started!) (661)
- Updating to dbt-adapters~=1.2.0 (683)
- Support
on_config_change
for materialized views, expand the supported config options (536)) - Support
on_config_change
for streaming tables, expand the supported config options (569)) - Support Databricks tags for tables/views/incrementals (631)
- Upgrade databricks-sql-connector to 3.1.0 (593)
- Migrate to decoupled architecture (596)
- Finish migrating integration tests (623)
- Streamline the process of determining materialization types (655)
- Improve catalog performance by getting column description from project for UC (658)
- Fix the issue that 1.7.15 was intended to fix (conn not initialized exception) (671)
- Give sensible logs when connection errors (666)
- Auth headers should now evaluate at call time (648)
- User-configurable OAuth Scopes (currently limited to AWS) (thanks @stevenayers!) (641)
- Reduce default idle limit for connection reuse to 60s and start organizing event logging (648)
- Apply tblproperties to python models (using alter table) (633)
- Make OAuth redirect url configurable (thanks @johnsequeira-paradigm for the inspiration!) ([635]#635)
- Up default socket timeout to 10 minutes
- For HMS, ref all doc comments from dbt project due to poor performance retrieving them from Databricks (618)
- Fix a corner case for insert into where NULL should be DEFAULT (607)
- Fixed integration tests that were leaving behind schemas after running (613)
- Fix performance issue associated with persist docs by turning off incremental catalog generation (thanks @mikealfare!) (615)
- Pin protobuf to < 5 to stop incompatibility breaks (616)
- Fix for U2M flow on windows (sharding long passwords) (thanks @thijs-nijhuis-shell!) (597)
- Fix regression in incremental behavior, and align more with dbt-core expectations (604)
- Don't fail for unknown types when listing schema (600)
- Fixed the behavior of the incremental schema change ignore option to properly handle the scenario when columns are dropped (thanks @case-k-git!) (580)
- Fixed export of saved queries (thanks @peterallenwebb!) (588)
- Properly match against null for merging matching rows (590)
- Rollback databricks-sql-connector to 2.9.3 to actually fix connection timeout issue (578)
Skipped due to incorrect files in deployed package
- Pin databricks sdk to 0.17.0 to fix connection timeout issue (571)
- Added python model specific connection handling to prevent using invalid sessions (547)
- Allow schema to be specified in testing (thanks @case-k-git!) (538)
- Fix dbt incremental_strategy behavior by fixing schema table existing check (thanks @case-k-git!) (530)
- Fixed bug that was causing streaming tables to be dropped and recreated instead of refreshed. (552)
- Fixed Hive performance regression by streamlining materialization type acquisition (557)
- Fix: Python models authentication could be overridden by a
.netrc
file in the user's home directory (338) - Fix: MV/ST REST api authentication could be overriden by a
.netrc
file in the user's home directory (555) - Show details in connection errors (562)
- Updated connection debugging logging and setting connection last used time on session open.(565)
- Adding retries around API calls in python model submission (549)
- Upgrade to databricks-sql-connector 3.0.0 (554)
- Pinning pandas to < 2.2.0 to keep from breaking multiple tests (564)
- Fix for issue where we were invoking create schema or not exists when the schema already exists (leading to permission issue) (529)
- Fix for issue where we never reused connections (517)
- Refactor macro tests to be more usable (524)
- Adding capability to specify compute on a per model basis (488)
- Selectively persist column docs that have changed between runs of incremental (513)
- Enabling access control list for job runs (thanks @srggrs!)(518)
- Allow persisting of column comments on views and retrieving comments for docs on Hive (519)
- Another attempt to improve catalog gathering performance (503)
- Added support for getting info only on specified relations to improve performance of gathering metadata (486), also (with generous help from from @mikealfare) (499)
- Added support for getting freshness from metadata (481)
- Node info now gets added to SQLQuery event (thanks @davidharting!) (494)
- Compatibility with dbt-spark and dbt-core 1.7.1 (499)
- Added required adapter tests to ensure compatibility with 1.7.0 (487)
- Improved large seed performance by not casting every value (thanks @nrichards17!) (493). Note: for
file_format="parquet"
we still need to cast.
- Fixed a bug where setting a primary key constraint before a null constraint would fail by ensuring null constraints happen first (479)
- Foreign key constraints now work with dbt's constraint structure (479)
- Compatibility with dbt-spark 1.7.0rc1 (479)
- Optimize now runs after creating / updating liquid clustering tables (463)
- Fixing an issue where the new python library install from index behavior breaks users who were already customizing their installs (472)
- fix Pylance import errors (thanks @dataders) (471)
- When installing python libraries onto clusters, you can now specify an index_url (Thanks @casperdamen123) (367)
- Log job run information such as run_id when submitting Python jobs to databricks (Thanks @jeffrey-harrison) (#454)
- Node info now gets added to SQLQueryStatus (Thanks @colin-rogers-dbt) (453)
- Fixing python model compatibility with newer DBRs (459)
- Updated the Databricks SDK dependency so as to prevent reliance on an insecure version of requests (460)
- Update logic around submitting python jobs so that if the cluster is already starting, just wait for it to start rather than failing (461)
- Fixed an issue with AWS OAuth M2M flow (#445)
- Fixed an issue where every table in hive_metastore would get described (#446)
- Improved legibility of python stack traces (#434).
- Add
fetchmany
, resolves #408 (Thanks @NodeJSmith) (#409) - Improved legibility of python stack traces (#434)
- Update our Databricks Workflow README to make clear that jobs clusters are not supported targets (#436)
- Relaxed the constraint on databricks-sql-connector to allow newer versions (#436)
- Streamlined sql connector output in dbt.log (#437)
- Switch to running integration tests with OAuth (#436)
- Follow up: re-implement fix for issue where the show tables extended command is limited to 2048 characters. (#326). Set
DBT_DESCRIBE_TABLE_2048_CHAR_BYPASS
totrue
to enable this behaviour. - Add
liquid_clustered_by
config to enable Liquid Clustering for Delta-based dbt models (Thanks @ammarchalifah) (#398).
- Dropping the databricks_sql_endpoint test profile as not truly testing different behavior than databricks_uc_sql_endpoint profile (#417)
- Improve testing of python model support so that we can package the new config options in this release (#421)
- Revert change from #326 as it breaks DESCRIBE table in cases where the dbt API key does not have access to all tables in the schema
- Support for dbt-core==1.6
- Added support for materialized_view and streaming_table materializations
- Support dbt clone operation
- Support new dbt
limit
command-line flag
- Fix issue where the show tables extended command is limited to 2048 characters. (#326)
- Extend python model support to cover the same config options as SQL (#379)
- Drop support for Python 3.7
- Support for revamped
dbt debug
- Fixed issue where starting a terminated cluster in the python path would never return
- Include log events from databricks-sql-connector in dbt logging output.
- Adapter now populates the
query_id
field inrun_results.json
with Query History API query ID.
- Added support for model contracts (#336)
- Pins dependencies to minor versions
- Sets default socket timeout to 180s
- Sets databricks sdk dependency to 0.1.6 to avoid SDK breaking changes
- Add explicit dependency to protobuf >4 to work around dbt-core issue
- Added support for OAuth (SSO and client credentials) (#327)
- Fix integration tests (#316)
- Updated dbt-spark from >=1.4.1 to >= 1.5.0 (#316)
- Throw an error if a model has an enforced contract. (#322)
- fix database not found error matching (#281)
- Auto start cluster for Python models (#306)
- databricks-sql-connector to 2.5.0 (#311)
- Adding replace_where incremental strategy (#293) (#310)
- [feat] Support ZORDER as a model config (#292) (#297)
- Added keyring>=23.13.0 for oauth token cache
- Added databricks-sdk>=0.1.1 for oauth flows
- Updated databricks-sql-connector from >=2.4.0 to >= 2.5.0
Throw an error if a model has an enforced contract. (#322)
- Fix test_grants to use the error class to check the error. (#273)
- Raise exception on unexpected error of list relations (#270)
- Ignore case sensitivity in relation matches method. (#265)
- Raise an exception when schema contains '.'. (#222)
- Containing a catalog in
schema
is not allowed anymore. - Need to explicitly use
catalog
instead.
- Containing a catalog in
- Support Python 3.11 (#233)
- Support
incremental_predicates
(#161) - Apply connection retry refactor, add defaults with exponential backoff (#137)
- Quote by Default (#241)
- Avoid show table extended command. (#231)
- Use show table extended with table name list for get_catalog. (#237)
- Add support for a glob pattern in the databricks_copy_into macro (#259)
- Fix copy into macro when passing
expression_list
. (#223) - Partially revert to fix the case where schema config contains uppercase letters. (#224)
- Show and log a warning when schema contains '.'. (#221)
- Support python model through run command API, currently supported materializations are table and incremental. (dbt-labs/dbt-spark#377, #126)
- Enable Pandas and Pandas-on-Spark DataFrames for dbt python models (dbt-labs/dbt-spark#469, #181)
- Support job cluster in notebook submission method (dbt-labs/dbt-spark#467, #194)
- In
all_purpose_cluster
submission method, a confighttp_path
can be specified in Python model config to switch the cluster where Python model runs.def model(dbt, _): dbt.config( materialized='table', http_path='...' ) ...
- In
- Use builtin timestampadd and timestampdiff functions for dateadd/datediff macros if available (#185)
- Implement testing for a test for various Python models (#189)
- Implement testing for
type_boolean
in Databricks (dbt-labs/dbt-spark#471, #188) - Add a macro to support COPY INTO (#190)
- Apply "Initial refactoring of incremental materialization" (#148)
- Now dbt-databricks uses
adapter.get_incremental_strategy_macro
instead ofdbt_spark_get_incremental_sql
macro to dispatch the incremental strategy macro. The overwrittendbt_spark_get_incremental_sql
macro will not work anymore.
- Now dbt-databricks uses
- Better interface for python submission (dbt-labs/dbt-spark#452, #178)
- Explicitly close cursors (#163)
- Upgrade databricks-sql-connector to 2.0.5 (#166)
- Embed dbt-databricks and databricks-sql-connector versions to SQL comments (#167)
- Support Python 3.10 (#158)
- Add grants to materializations (dbt-labs/dbt-spark#366, dbt-labs/dbt-spark#381)
- Add
connection_parameters
for databricks-sql-connector connection parameters (#135)- This can be used to customize the connection by setting additional parameters.
- The full parameters are listed at Databricks SQL Connector for Python.
- Currently, the following parameters are reserved for
dbt-databricks
. Please use the normal credential settings instead.- server_hostname
- http_path
- access_token
- session_configuration
- catalog
- schema
- Incremental materialization updated to not drop table first if full refresh for delta lake format, as it already runs create or replace table (dbt-labs/dbt-spark#286, dbt-labs/dbt-spark#287)
- Update
SparkColumn.numeric_type
to returndecimal
instead ofnumeric
, since SparkSQL exclusively supports the former (dbt-labs/dbt-spark#380) - Make minimal changes to support dbt Core incremental materialization refactor (dbt-labs/dbt-spark#402, dbt-labs/dbt-spark#394, #136)
- Add new basic tests
TestDocsGenerateDatabricks
andTestDocsGenReferencesDatabricks
(#134) - Set upper bound for
databricks-sql-connector
when Python 3.10 (#154)- Note that
databricks-sql-connector
does not officially support Python 3.10 yet.
- Note that
- Support for Databricks CATALOG as a DATABASE in DBT compilations (#95, #89, #94, #105)
- Setting an initial catalog with
session_properties
is deprecated and will not work in the future release. Please usecatalog
ordatabase
to set the initial catalog. - When using catalog,
spark_build_snapshot_staging_table
macro will not be used. If trying to override the macro,databricks_build_snapshot_staging_table
should be overridden instead.
- Setting an initial catalog with
- Block taking jinja2.runtime.Undefined into DatabricksAdapter (#98)
- Avoid using Cursor.schema API when database is None (#100)
- Drop databricks-sql-connector 1.0 (#108)
- Add support for Delta constraints (#71)
- Port testing framework changes from dbt-labs/dbt-spark#299 and dbt-labs/dbt-spark#314 (#70)
- Make internal macros use macro dispatch pattern (#72)
- Support for setting table properties as part of a model configuration (#33, #49)
- Get the session_properties map to work (#57)
- Bump up databricks-sql-connector to 1.0.1 and use the Cursor APIs (#50)
- Inherit from dbt-spark for backward compatibility with spark-utils and other dbt packages (#32, #35)
- Add SQL Endpoint specific integration tests (#45, #46)
- Make the connection use databricks-sql-connector (#3, #7)
- Make the default file format 'delta' (#14, #16)
- Make the default incremental strategy 'merge' (#23)
- Remove unnecessary stack trace (#10)
- Incremental materialization corrected to respect
full_refresh
config, by usingshould_full_refresh()
macro (#260, #262)
- Add support for Apache Hudi (hudi file format) which supports incremental merge strategies (#187, #210)
- Refactor seed macros: remove duplicated code from dbt-core, and provide clearer logging of SQL parameters that differ by connection method (#249, #250)
- Replace
sample_profiles.yml
withprofile_template.yml
, for use with newdbt init
(#247)
- Remove official support for python 3.6, which is reaching end of life on December 23, 2021 (dbt-core#4134, #253)
- Add support for structured logging (#251)
- Fix
--store-failures
for tests, by suppressing irrelevant error incomment_clause()
macro (#232, #233) - Add support for
on_schema_change
config in incremental models:ignore
,fail
,append_new_columns
. Forsync_all_columns
, removing columns is not supported by Apache Spark or Delta Lake (#198, #226, #229) - Add
persist_docs
call to incremental model (#224, #234)
- Enhanced get_columns_in_relation method to handle a bug in open source deltalake which doesnt return schema details in
show table extended in databasename like '*'
query output. This impacts dbt snapshots if file format is open source deltalake (#207) - Parse properly columns when there are struct fields to avoid considering inner fields: Issue (#202)
- Add
unique_field
to better understand adapter adoption in anonymous usage tracking (#211)
- @harryharanb (#207)
- @SCouto (#204)
- Add pyodbc import error message to dbt.exceptions.RuntimeException to get more detailed information when running
dbt debug
(#192) - Add support for ODBC Server Side Parameters, allowing options that need to be set with the
SET
statement to be used (#201) - Add
retry_all
configuration setting to retry all connection issues, not just when the_is_retryable_error
function determines (#194)
- @JCZuurmond (#192)
- @jethron (#201)
- @gregingenii (#194)
- Fix column-level
persist_docs
on Delta tables, add tests (#180)
- Allow user to specify
use_ssl
(#169) - Allow setting table
OPTIONS
usingconfig
(#171) - Add support for column-level
persist_docs
on Delta tables (#84, #170)
- Cast
table_owner
to string to avoid errors generating docs (#158, #159) - Explicitly cast column types when inserting seeds (#139, #166)
- Parse information returned by
list_relations_without_caching
macro to speed up catalog generation (#93, #160) - More flexible host passing, https:// can be omitted (#153)
- @friendofasquid (#159)
- @franloza (#160)
- @Fokko (#165)
- @rahulgoyal2987 (#169)
- @JCZuurmond (#171)
- @cristianoperez (#170)
- Update serialization calls to use new API in dbt-core
0.19.1b2
(#150)
- Incremental models have
incremental_strategy: append
by default. This strategy adds new records without updating or overwriting existing records. For that, usemerge
orinsert_overwrite
instead, depending on the file format, connection method, and attributes of your underlying data. dbt will try to raise a helpful error if you configure a strategy that is not supported for a given file format or connection. (#140, #141)
- Capture hard-deleted records in snapshot merge, when
invalidate_hard_deletes
config is set (#109, #126)
- Users of the
http
andthrift
connection methods need to install extra requirements:pip install dbt-spark[PyHive]
(#109, #126)
- Enable
CREATE OR REPLACE
support when using Delta. Instead of dropping and recreating the table, it will keep the existing table, and add a new version as supported by Delta. This will ensure that the table stays available when running the pipeline, and you can track the history. - Add changelog, issue templates (#119, #120)
- Handle case of 0 retries better for HTTP Spark Connections (#132)
- @danielvdende (#132)
- @Fokko (#125)
- Allows users to specify
auth
andkerberos_service_name
(#107) - Add support for ODBC driver connections to Databricks clusters and endpoints (#116)