-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor deferral: always merge_from_artifact
& support all commands
#9040
Conversation
Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide. |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #9040 +/- ##
==========================================
- Coverage 86.65% 86.60% -0.06%
==========================================
Files 230 230
Lines 27060 26987 -73
==========================================
- Hits 23449 23372 -77
- Misses 3611 3615 +4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
67543e0
to
edf37c0
Compare
merge_from_artifact
& support all commandsmerge_from_artifact
& support all commands
51642a3
to
b9a61d0
Compare
86415da
to
ed0c432
Compare
core/dbt/task/runnable.py
Outdated
favor_state=bool(self.args.favor_state), | ||
) | ||
# TODO: is it wrong to write the manifest here? I think it's right... | ||
write_manifest(self.manifest, self.config.project_target_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we may want to gate on self.config.args.write_json here, similar to what's done when the manifest is initially written here: https://github.com/dbt-labs/dbt-core/blob/main/core/dbt/cli/requires.py#L277
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
longer term.. I wonder if it'd actually make more sense to do this deferral + merging business in the requires.manifest
wrapper to centralize where manifest loading + writing happens. It seems like at this point there is some command-specific functionality which justifies this functionality living at the task-level for now though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The biggest reason to have this at the task level right now is that we're running database queries (= populating + check the adapter cache) while calculating deferral, to know which selected models exist in the expected target schema(s). Everything in requires.manifest
("parsing") takes place before we've made any database connections.
We could try more substantially changing the way that deferral works: During manifest loading, we add defer_relation
as an attribute to every node. Then, during each node's compilation, the RuntimeRefResolver
checks to see if that node exists in the expected schema (cache lookup) — or skips the check if --favor-state
— and resolves to defer_relation
instead if not found. We wouldn't be recording in the manifest that a given node was deferred — but that doesn't feel like such a big loss, and it would still be reflected in the compiled_code
of other nodes downstream.
That feels significantly tidier. What's your (& my) appetite to try doing that here? Or in a follow-up PR? :)
In fact, come to think of it, this bit of the clone
task now doesn't make nearly as much sense, because defer_to_manifest
needs to populate the adapter cache as part of resolving which nodes are "deferred" and which just get a defer_relation
, which isn't even a meaningful distinction for the purpose of clone
:
dbt-core/core/dbt/task/clone.py
Lines 125 to 126 in ed0c432
# unlike in other tasks, we want to add information from the --state manifest *before* caching! | |
self.defer_to_manifest(adapter, selected_uids) |
Risk of breaking changes? Currently, an unselected node will be actually overwritten with the node entry from the stateful manifest. Every single attribute. If you write graph
-parsing Jinja logic, and traverse to that node's entry, those are the attributes you'd see. So custom integrations which write those node attributes back to the data warehouse (for example) might be a little bit funky. I think that's quite niche as backwards-incompatible behavior changes go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This thought got the better of me — here's a first attempt:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's continue working on the follow-up refactor once this is merged to de-risk the scope of these changes, especially with all the other under-the-hood refactoring going in this release :)
This covers 3 out of the 4 UserConfig missing from global configs that showed up in a separate and independent audit. What about |
@@ -62,10 +62,6 @@ def print_result_line(self, result): | |||
|
|||
|
|||
class SeedTask(RunTask): | |||
def defer_to_manifest(self, adapter, selected_uids): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline with @MichelleArk: Should we actually still disable deferral for tasks like seed/freshness/list, since those tasks never (in theory) resolve runtime references?
It's technically possible to have a seed reference another node, via a pre/post hook. Or write custom code in a project (on-run-*) hook that wants to use the
defer_relation
property of nodes in the graph. I don't think that's good, but it is possible.At this point it feels like a choice between consistency, or a (small) performance optimization. Either way, I want all commands to support the flags as CLI options. We could keep the overrides in for now, until the follow-up refactor (#9199) that removes deferral logic from tasks entirely, and makes it part of manifest loading. That does feel like the right direction, and this is a step on the way there.
resolves #7965
resolves #8715
throwback: #2740
Problem
clone
and other tasks in terms of how deferral works. (I borrowed some of @MichelleArk's spike work from spike: inferred contracts #8339.)--defer
&--state
, but they shouldn't raise an error. That's the case currently (e.g. if you submitrun-operation
within the IDE while "Defer to production" is enabled)Solution
--defer
& friends as "global" flags@runleonarun adding copilot summary
This pull request primarily focuses on consolidating deferral methods and flags within the
dbt
project. The changes involve modifications to various methods and classes across multiple files, includingcore/dbt/cli/flags.py
andcore/dbt/cli/main.py
. The most significant changes include the addition of a newparams_assigned_from_user
set and changes to the_assign_params
method, reordering and removal of decorators incore/dbt/cli/main.py
, and the introduction ofdefer_relation
incore/dbt/contracts/graph/manifest.py
.Changes to
_assign_params
method incore/dbt/cli/flags.py
:params_assigned_from_user
is introduced and used to track user-provided and default values across both 'parent' and 'child' levels. This is to support detection of mutually exclusive flags later on. [1] [2]_assign_params
method is modified to includeparams_assigned_from_user
as a parameter. [1] [2]Reordering and removal of decorators in
core/dbt/cli/main.py
:global_flags
function and removed from other functions. This change is likely aimed at consolidating the use of these decorators. [1] [2] [3] and others)Introduction of
defer_relation
incore/dbt/contracts/graph/manifest.py
:defer_relation
attribute is added. This attribute is used when the model is deferred and the adapter doesn't support zero-copy cloning. [1] [2]Changes in
core/dbt/task/clone.py
:CloneTask
class now always requires a state manifest, regardless of whether the--defer
flag has been set.defer_to_manifest
method is removed. Unlike other commands, 'clone' always requires a state manifest.Consolidation of deferral methods and flags:
These changes seem to be part of a larger refactoring effort aimed at simplifying the codebase and improving the handling of deferral methods and flags.
Checklist