You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched the existing issues, and I could not find an existing issue for this bug
Current Behavior
When using incremental_strategy="microbatch" with batch_size="hour", MicrobatchBuilder generates a batch ID that contains spaces and special characters. This leads to invalid temporary table names and SQL syntax errors in database adapters.
Example error from Snowflake adapter:
SQL compilation error: syntax error line 1 at position 150 unexpected '00'.
The error occurs because the generated temporary table name contains spaces and timezone information:
create or replace temporary table [...].model_name__dbt_tmp_20241218 00:00:00+00:00
Expected Behavior
MicrobatchBuilder should generate a valid batch ID without spaces or special characters for hourly batches, similar to how it handles daily batches.
For example:
create or replace temporary table [...].model_name__dbt_tmp_20241218T000000Z
Steps To Reproduce
Create a model with microbatch incremental strategy:
Run the model with dbt run --select hoge --event-time-start "2024-12-18T00:00:00" --event-time-end "2024-12-18T01:00:00"
Observe the SQL syntax error due to invalid temporary table name
Relevant log output
02:13:16 Using snowflake connection "model.fuga.hoge"
02:13:16 On model.fuga.hoge: create or replace temporary table dbname.schemaname.modelname__dbt_tmp_20241218 00:00:00+00:00
as
(
with
source as (
select*
from dbname.schemaname.modelname
),
deduplicated as (
select*
from source
qualify
row_number() over (
partition by id
order by _partition_hourly asc, bid_at asc
) = 1
)
select*
from deduplicated
)
02:13:17 Snowflake adapter: Snowflake query id: hoge
02:13:17 Snowflake adapter: Snowflake error: 001003 (42000): SQL compilation error:
syntax error line 1 at position 150 unexpected '00'.
02:13:17 Unhandled error while executing
Exception on worker thread. Database Error
001003 (42000): SQL compilation error:
syntax error line 1 at position 150 unexpected '00'.
02:13:17 Batch 1 of 1 ERROR creating batch 2024-12-18 00:00:00+00:00 of dbname.schemaname.modelname [ERROR in 0.39s]
For hourly batches (batch_size="hour"), format_batch_start returns str(batch_start) which generates a datetime string like "2024-12-18 00:00:00+00:00". While batch_id removes hyphens, it does not handle spaces and timezone information, resulting in an invalid table name.
For non-hourly batches (day/month/year), it correctly uses batch_start.date() which produces a clean format like "2024-12-18", and after removing hyphens becomes "20241218".
The issue stems from the fact that the batch ID flows from run.py through the Jinja templating system and into SQL table names without proper sanitization for hourly batches. The fix would likely involve modifying the format_batch_start method to ensure hourly timestamps use a database-friendly format (e.g., ISO format "20241218T000000Z") similar to how it handles daily batches.
The text was updated successfully, but these errors were encountered:
I was able to reproduce this issue -- see "reprex" below for details.
Potential fix
@pei0804 very nice research on the relevant areas of code 🤩.
Making the following modifications to this code worked when I tested it out -- although our final fix might look a bit different:
@staticmethoddefbatch_id(start_time: datetime, batch_size: BatchSize) ->str:
returnMicrobatchBuilder.format_batch_start(start_time, batch_size)
@staticmethoddefformat_batch_start(batch_start: datetime, batch_size: BatchSize) ->str:
# If we want a date onlyifbatch_size!=BatchSize.hour:
returnbatch_start.strftime('%Y%m%d') # e.g. "20241218"# If we want date + timereturnbatch_start.strftime('%Y%m%dT%H%M%SZ') # e.g. "20241218T000000Z"
This changed the SQL from this:
create or replace temporary table analytics_dev.dbt_dbeatty.my_microbatch_model__dbt_tmp_20250110 14:00:00+00:00
to this instead:
create or replace temporary table analytics_dev.dbt_dbeatty.my_microbatch_model__dbt_tmp_20250110T140000Z
Note:
I didn't test out what would happen if the batch_size is something else besides hour.
also didn't test with positive or negative UTC offsets or naive timestamps
Is this a new bug in dbt-core?
Current Behavior
When using
incremental_strategy="microbatch"
withbatch_size="hour"
, MicrobatchBuilder generates a batch ID that contains spaces and special characters. This leads to invalid temporary table names and SQL syntax errors in database adapters.Example error from Snowflake adapter:
The error occurs because the generated temporary table name contains spaces and timezone information:
Expected Behavior
MicrobatchBuilder should generate a valid batch ID without spaces or special characters for hourly batches, similar to how it handles daily batches.
For example:
create or replace temporary table [...].model_name__dbt_tmp_20241218T000000Z
Steps To Reproduce
dbt run --select hoge --event-time-start "2024-12-18T00:00:00" --event-time-end "2024-12-18T01:00:00"
Relevant log output
Environment
Which database adapter are you using with dbt?
snowflake
Additional Context
The issue occurs in several parts of the codebase:
core/dbt/task/run.py
:https://github.com/dbt-labs/dbt-core/blob/main/core/dbt/task/run.py#L352-L356
make_temp_relation
macro to create unique temporary table names:https://github.com/dbt-labs/dbt-adapters/blob/5407391c5cef22a5c0431daa469d6a8295c026d8/dbt/include/global_project/macros/adapters/relation.sql#L9-L16
MicrobatchBuilder
class:dbt-core/core/dbt/materializations/incremental/microbatch.py
Lines 195 to 203 in a175793
For hourly batches (
batch_size="hour"
),format_batch_start
returnsstr(batch_start)
which generates a datetime string like"2024-12-18 00:00:00+00:00"
. Whilebatch_id
removes hyphens, it does not handle spaces and timezone information, resulting in an invalid table name.For non-hourly batches (day/month/year), it correctly uses
batch_start.date()
which produces a clean format like"2024-12-18"
, and after removing hyphens becomes"20241218"
.The issue stems from the fact that the batch ID flows from
run.py
through the Jinja templating system and into SQL table names without proper sanitization for hourly batches. The fix would likely involve modifying theformat_batch_start
method to ensure hourly timestamps use a database-friendly format (e.g., ISO format"20241218T000000Z"
) similar to how it handles daily batches.The text was updated successfully, but these errors were encountered: