diff --git a/website/blog/2025-01-24-sql-comprehension-technologies.md b/website/blog/2025-01-24-sql-comprehension-technologies.md new file mode 100644 index 00000000000..2e0457063b8 --- /dev/null +++ b/website/blog/2025-01-24-sql-comprehension-technologies.md @@ -0,0 +1,240 @@ +--- +title: "The key technologies behind SQL Comprehension" +description: "The technologies that power the three levels of SQL comprehension. " +slug: sql-comprehension-technologies + +authors: [dave_connors] + +tags: [data ecosystem] +hide_table_of_contents: false + +date: 2025-01-24 +is_featured: true +--- + +You ever wonder what’s *really* going on in your database when you fire off a (perfect, efficient, full-of-insight) SQL query to your database? + +OK, probably not 😅. Your personal tastes aside, we’ve been talking a *lot* about SQL Comprehension tools at dbt Labs in the wake of our acquisition of SDF Labs, and think that the community would benefit if we included them in the conversation too! We recently published a [blog that talked about the different levels of SQL Comprehension tools](https://docs.getdbt.com/blog/the-levels-of-sql-comprehension). If you read that, you may have encountered a few new terms you weren’t super familiar with. + +In this post, we’ll talk about the technologies that underpin SQL Comprehension tools in more detail. Hopefully, you come away with a deeper understanding of and appreciation for the hard work that your computer does to turn your SQL queries into actionable business insights! + + + +Here’s a quick refresher on the levels of SQL comprehension: + + + +Each of these levels is powered by a distinct set of technologies. It’s useful to explore these technologies in the context of the SQL Comprehension tool you are probably most familiar with: a database! A database, as you might have guessed, has the deepest possible SQL comprehension abilities as well as SQL *execution* abilities — it contains all necessary technology to translate a SQL query text into rows and columns. + +Here’s a simplified diagram to show your query’s fantastic voyage of translation into tabular data: + + + +First, databases use a **parser** to translate SQL code into a **syntax tree.** This enables syntax validation + error handling. + +Second, database **compilers** **bind** metadata to the syntax tree to create a fully validated **logical plan.** This enables a complete understanding of the operations required to generate your dataset, including information about the datatypes that are input and output during SQL execution. + +Third, the database **optimizes** and **plans** the operations defined by a logical plan, generating a **physical plan** that maps the logical steps to physical hardware, then executes the steps with data to finally return your dataset! + +Let’s explore each of these levels in more depth! + +## Level 1: Parsing + + +At Level 1, SQL comprehension tools use a **parser** to translate SQL code into a **syntax tree.** This enables syntax validation + error handling. *Key Concepts: Intermediate Representations, Parsers, Syntax Trees* + + + +### Intermediate representations + +:::tip +**Intermediate representations** are data objects created during the process of *compiling* code. +::: + +Before we dive into the specific technologies, we should define a key concept in computer science that’s very relevant to understanding how this entire process works under the hood: an [**Intermediate Representation (IR)**](https://en.wikipedia.org/wiki/Intermediate_representation). When code is executed on a computer, it has to be translated from the human-readable code we write to the machine-readable code that actually does the work that the higher-level code specifies, in a process called *compiling*. As a part of this process, your code will be translated into a number of different objects as the program runs; each of these is called an *intermediate representation.* + +To provide an example / analogy that will be familiar to dbt users, think about what your intermediate models are in the context of your dbt DAG — a translated form of your source data created in the process of synthesizing your final data marts. These models are effectively an intermediate representation. We’re going to talk about a few different types of IRs in this post, so it’s useful to know about them now before we get too deep! + +### Parsers + +:::tip +**Parsers** are programs that translate raw code into *syntax trees*. +::: + +All programming languages require a parser, which is often the first step in the translation process from human-readable to machine-readable code. Parsers are programs that can map the syntax, or grammar, of your code into a syntax tree, and understand whether the code you wrote follows the basic rules of the language. + +In computing, parsers have a few underlying pieces of technology that build the syntax tree that understands the relationships between your variables, functions, and classes, etc. The components of a parser include: + +- **a lexer**, which takes raw code strings, and return lists of tokens recognized in the code (in SQL, `SELECT` , `FROM` , and `sum` would be examples of tokens recognized by a lexer) +- **a parser**, which takes the lists of tokens generated by a lexer, and builds the syntax tree based on grammatical rules of the language (i.e. a `SELECT` must be followed by one or more column expressions, a `FROM` must reference a table, or CTE, or subquery, etc). + +In other words, the lexer first detects the tokens that are present in a SQL query (is there a filter? which functions are called?) and the parser is responsible for mapping the dependencies between them. + +A quick vocab note: while technically, the parser is only the component that translates tokens into a syntax tree, the word “parser” has come to be shorthand for the whole process of lexing and parsing. + +### Syntax trees + +:::tip +**Syntax trees** are a representation of a unit of language according to a set of grammatical rules. + +::: + +Your first introduction to understanding syntactical rules probably came when you learned how to diagram sentences in your grade school grammar classes! Diagramming the parts of speech in a sentence and mapping the dependencies between each of its components is precisely what a parser does — the resulting representation of the sentence is a syntax tree. Here’s a silly example: + +> `My cat jumped over my lazy dog` +> + +By parsing this sentence according to the rules of the English language, we can get this syntax tree: + + + +Let’s do the same thing with simple SQL query: + +```sql +select + order_id, + sum(amount) as total_order_amount +from order_items +where + date_trunc('year', ordered_at) = '2025-01-01' +group by 1 +``` + +By parsing this query according to the rules of the SQL language, we get something that looks like this: + + + +The syntax trees produced by parsers are a very valuable type of intermediate representation; with a syntax tree, you can power features like syntax validation, code linting, and code formatting, since those tools only need knowledge of the *syntax* of the code you’ve written to work. + +However, parsers also dutifully parse *syntactically correct code* that *means nothing at all*. To illustrate this, consider the [famous sentence](https://en.wikipedia.org/wiki/Colorless_green_ideas_sleep_furiously) developed by linguistics + philosophy professor Noam Chomsky: + +> `Colorless green ideas sleep furiously` +> + +That’s a perfectly valid, diagrammable, parsable sentence according to the rules of the English language. But that means *absolutely nothing*. In SQL engines, you need a way to imbue a syntax tree with additional metadata to understand whether or not it represents executable code. As described in our first post, Level 1 SQL Comprehension tools are not designed to provide this context. They can only provide pure syntax validation. Level 2 SQL Comprehension tools augment these syntax trees with *meaning* by fully **compiling **the SQL. + +## Level 2: Compiling + +At Level 2, SQL comprehension tools use a **compiler** to **bind** metadata to the syntax tree to create a fully validated **logical plan.** *Key concepts: Binders, Logical Plans, Compilers* + + + +### Binders + +:::tip +In SQL *compilers*, **binders** are programs that enhance + resolve *syntax trees* into *logical plans.* +::: + +In compilers, *binders* (also called *analyzers* or *resolvers*) combine additional metadata with a syntax tree representation and produce a richer, validated, *executable* intermediate representation. In the above English language example, in our heads, we’re *binding* our knowledge of the definitions of each of the words to the structure of the sentence, after which, we can derive *meaning*. + +Binders are responsible for this process of resolution. They must bind additional information about the components of the written code (their types, their scopes, their memory implications) to the code you wrote to produce a valid, executable unit of computation. + +In the case of SQL binders, a major part of its job is to add *warehouse schema information,* like column *datatypes*, with the *type signatures* of warehouse operators described by the syntax tree to bring full *type awareness* to the syntax tree. It’s one thing to recognize a `substring` function in a query; it’s another to *understand* that a `substring` *must* operate on string data, and *always* produces string data, and will fail if you pass it an integer. + + + +In this example, while the syntax tree knows that the `x` column is aliased as `u`, the binder has the knowledge that `x` is indeed a column of type `int` and therefore, the resulting column `u` must also be of type `int`. Similarly, it knows that the filter condition specified will produce a `bool` value, and therefore must have compatible datatypes as its two arguments. Luckily, the binder can also see that `x` and `0` are both of type int, so we're confident this is a fully valid expression. This layer of validation, powered by metadata, is referred to as *type awareness.* + +In addition to being able to trace the way datatypes will flow and change through a set of SQL operations, the function signatures allow the binder to fully validate that you’ve provided valid arguments to a function, inclusive of the acceptable types of columns provided to the function (e.g. `split_part` can’t work on an `int` field) as well as valid function configurations (e.g. the acceptable date parts for `datediff` includes `'nanosecond'` but not `'dog_years'`). + +### Logical plan + +:::tip +In SQL *compilers*, **logical plans** define the validated, resolved set of data processing operations defined by a SQL query. +::: + +The intermediate representation output by a binder is a richer intermediate representation that can be executed in a low level language; in the case of database engines, this IR is known as a *Logical Plan*. + +Critically, as a result of the binder’s work of mapping data types to the syntax tree, logical plans have *full data type awareness* — logical plans can tell you precisely how data flows through an analysis, and can pinpoint when datatypes may change as a result of, say, an aggregation operation. + + + +You can see we’ve gotten a more specific description of how to generate the dataset. Rather than simply mapping the SQL keywords and their dependencies, we have a resolved set of operations, in this case scanning a table, filtering the result, and projecting the values in the `x` column with an alias of `u`. + +The logical plan contains precise logical description of the computing process your query defined, and validates that it can be executed. Logical plans describe the operations as [*relational algebra*](https://en.wikipedia.org/wiki/Relational_algebra), which is what enable these plans to be fully optimized — the steps in a logical plan can be rearranged and reduced with mathematical equivalency to ensure the steps are as efficient as possible. + +This plan can be very helpful for you as a developer, especially if it’s available before you execute the query. If you’ve ever executed an `explain` function in your database, you’ve viewed a logical plan! You can know exactly what operations will be executed, and critically, you can know that they are valid! This validity check pre-compute is what is referred to as *static analysis*. + +### Compilers + +:::tip +**Compilers** are programs that translate high-level language to low-level language. *Parsers* and *binders* together constitute compilers. +::: + +Taken together, a parser plus a binder constitute a *compiler,* a program that takes in high-level code (one that is optimized for human readability, like SQL) and outputs low-level code (one that is optimized for machine readability + execution). In SQL compilers, this output is the logical plan. + + +A compiler definitionally gives you a deeper understanding of the behavior of the query than a parser alone. We’re now able to trace the data flows and operations that we were abstractly expressing when we initially wrote our SQL query. The compiler incrementally enriches its understanding of the original SQL string and results in a logical plan, which provides static analysis and validation of your SQL logic. + +We are however, not all the way down the rabbit hole — a compiler-produced logical plan contains the full instructions for how to execute a piece of code, but doesn’t have any sense of how to actually execute these steps! There’s one more translation required for the rubber to fully meet the motherboard. + +## Level 3: Executing + +*At Level 3, the database’s **execution engine** translates the logical plan into a **physical plan**, which can finally be executed to return a dataset.* *Key concepts: Optimization and Planning, Engines, Physical plans* + +### Optimization and planning + +:::tip +A logical plan goes through a process of **optimization and planning** that maps its operations to the physical hardware that is going to execute each step. + +::: + +Once the database has a resolved logical plan, it goes through a process of optimization and planning. As mentioned, because logical plans are expressed as relational algebraic expressions, it can choose to execute equivalent steps in whichever order it chooses. + +Let’s think of a simple example SQL statement: + +```sql +select + * +from a +join b on a.id = b.a_id +join c on b.id = c.b_id +``` + +The logical plan will contain steps to join the tables together as defined in SQL — great! Let’s suppose, however, that table `a` is several orders of magnitude larger than each of the other two. In that case, the order of joining makes a huge difference in the performance of the query! If we join `a` and `b` first, then the result `ab` with `c`, we end up scanning the entirety of the extremely large table `a` twice. If instead we join `b` and `c` first, and join the much smaller result `bc` with table `a` , we get the same result of `abc` at a fraction of the cost! + +Layering in the knowledge of the physical characteristics of the objects referenced in a query to ensure efficient execution is the job of the optimization and planning stage. + +### Physical plan + +:::tip +A **physical plan** is the intermediate representations that contains all necessary information to execute the query. +::: + +Once we do the work to decide on the optimal plan with details about the physical characteristics of the data, we get one final intermediate representation: the physical plan. Think about the operations defined by a logical plan — we may know that we have a `TableScan` operation of a table called `some_table`. A physical plan is able to map that operation to *specific data partitions* in *specific data storage locations*. The physical plan also contains information relevant to memory allocation so the engine can plan accordingly — as in the previous example, it knows the second join will be a lot more resource intensive! + +Think about what your data platform of choice has to do when you submit a validated SQL query: the last mile step is deciding which partitions of data on which of its servers should be scanned, how they should be joined and aggregated to ultimately generate the dataset you need. Physical plans are among the last intermediate representations created along the way to actually returning data back from a database. + +### Execution + +:::tip +A query engine can **execute** a *physical plan* and return tabular data +::: + +Once a physical plan is generated, all that’s left to do is run it! The database engine executes the physical plan, and fetches, combines, and aggregates your data into the format described by your SQL code. The way that the engine accomplishes this can vary significantly depending on the architecture of your database! Some databases are “single node” in that there is a single computer doing all the work; others are “distributed” and can federate the work across many working compute nodes. + +In general, the engine must: + +1. **Allocate resources** — In order to run your query, a computer must be online and available to do so! This step allocates CPU to each of the operation in the physical plan, whether it be one single node or many nodes executing the full query task + +2. **Read Data Into Memory** — The tables referenced are then scanned as efficiently as possible, and the rows are processed. This may happen in partial stages depending on whether the tasks are distributed or happening within one single node + +3. **Execute Operations** — Once the required data is read into memory, it flows through a pipeline of the nodes in your physical plan. There is more than 50 years of work in building optimizations for these steps as applied to different data structures and in-memory representations; everything from row-oriented databases, to columnar, to time series to geo-spatial to graph. But fundamentally, there are 5 common operations: + + 1. **Projection** — Extract only the columns or expressions that the user requested needed (e.g. `order_id`). + + 2. **Filtering** — Rows that don’t meet your `WHERE` condition are dropped. + + 3. **Joining** — If your query involves multiple tables, the engine merges or joins them—this could be a hash join, sort-merge join, or even a nested loop join depending on data statistics. + + 4. **Aggregation** — If you have an aggregation like `SUM(amount)` or `COUNT(*)`, the engine groups rows by the specified columns and calculates the aggregated values. + + 5. **Sorting / Window Functions** — If the query uses `ORDER BY`, `RANK()`, or other window functions, the data flows into those operators next. + +4. **Merge and return results** — The last mile step is generating the tabular dataset. In the case of distributed systems, this may require combining the results from several nodes into a single result. + +Finally! Actionable business insights, right in the palm of your hand! + +## Looking ahead + +That’s probably more about databases that you bargained for! I know this is a lot to absorb - but the best data practitioners have a deep understanding of their tools and this is all extremely relevant for understanding the next evolution of data tooling and data work. Next time you run a query, don't forget to thank your database for all the hard work it's doing for you. diff --git a/website/blog/authors.yml b/website/blog/authors.yml index da08a8aa729..1fe04c17ceb 100644 --- a/website/blog/authors.yml +++ b/website/blog/authors.yml @@ -137,7 +137,7 @@ colin_rogers: organization: dbt Labs dave_connors: image_url: /img/blog/authors/dconnors.jpeg - job_title: Senior Developer Experience Advocate + job_title: Staff Developer Experience Advocate links: - icon: fa-github url: https://github.com/dave-connors-3 diff --git a/website/docs/docs/cloud/cloud-cli-installation.md b/website/docs/docs/cloud/cloud-cli-installation.md index a80f1a587e0..9637b83f4b8 100644 --- a/website/docs/docs/cloud/cloud-cli-installation.md +++ b/website/docs/docs/cloud/cloud-cli-installation.md @@ -313,16 +313,10 @@ This alias will allow you to use the dbt-cloud command to invoke th - - - -If you've ran a dbt command and receive a Session occupied error, you can reattach to your existing session with dbt reattach and then press Control-C and choose to cancel the invocation. - - - +The dbt Cloud CLI allows only one command that writes to the data warehouse at a time. If you attempt to run multiple write commands simultaneously (for example, `dbt run` and `dbt build`), you will encounter a `stuck session` error. To resolve this, cancel the specific invocation by passing its ID to the cancel command. For more information, refer to [parallel execution](/reference/dbt-commands#parallel-execution). -The Cloud CLI allows only one command that writes to the data warehouse at a time. If you attempt to run multiple write commands simultaneously (for example, `dbt run` and `dbt build`), you will encounter a `stuck session` error. To resolve this, cancel the specific invocation by passing its ID to the cancel command. For more information, refer to [parallel execution](/reference/dbt-commands#parallel-execution). + - \ No newline at end of file + diff --git a/website/docs/docs/cloud/configure-cloud-cli.md b/website/docs/docs/cloud/configure-cloud-cli.md index 5e0a285c5c5..f4f2f795a40 100644 --- a/website/docs/docs/cloud/configure-cloud-cli.md +++ b/website/docs/docs/cloud/configure-cloud-cli.md @@ -124,7 +124,7 @@ As a tip, most command-line tools have a `--help` flag to show available command - `dbt run --help`: Lists the flags available for the `run` command ::: -### Lint SQL files +## Lint SQL files From the dbt Cloud CLI, you can invoke [SQLFluff](https://sqlfluff.com/) which is a modular and configurable SQL linter that warns you of complex functions, syntax, formatting, and compilation errors. Many of the same flags that you can pass to SQLFluff are available from the dbt Cloud CLI. @@ -154,7 +154,8 @@ When running `dbt sqlfluff` from the dbt Cloud CLI, the following are important - An SQLFluff command will return an exit code of 0 if it ran with any file violations. This dbt behavior differs from SQLFluff behavior, where a linting violation returns a non-zero exit code. dbt Labs plans on addressing this in a later release. ## FAQs - + + If you've never had a `.dbt` directory, you should perform the following recommended steps to create one. If you already have a `.dbt` directory, move the `dbt_cloud.yml` file into it. @@ -195,11 +196,12 @@ move %USERPROFILE%\Downloads\dbt_cloud.yml %USERPROFILE%\.dbt\dbt_cloud.yml This command moves the `dbt_cloud.yml` from the `Downloads` folder to the `.dbt` folder. If your `dbt_cloud.yml` file is located elsewhere, adjust the path accordingly. - + - + By default, [all artifacts](/reference/artifacts/dbt-artifacts) are downloaded when you execute dbt commands from the dbt Cloud CLI. To skip these files from being downloaded, add `--download-artifacts=false` to the command you want to run. This can help improve run-time performance but might break workflows that depend on assets like the [manifest](/reference/artifacts/manifest-json). + - + diff --git a/website/docs/docs/collaborate/data-tile.md b/website/docs/docs/collaborate/data-tile.md index 077a4f5a740..23c12f54578 100644 --- a/website/docs/docs/collaborate/data-tile.md +++ b/website/docs/docs/collaborate/data-tile.md @@ -27,7 +27,8 @@ Data health tiles rely on [exposures](/docs/build/exposures) to surface data hea - You must have a dbt Cloud account on a [Team or Enterprise plan](https://www.getdbt.com/pricing/). - You must be an account admin to set up [service tokens](/docs/dbt-cloud-apis/service-tokens#permissions-for-service-account-tokens). - You must have [develop permissions](/docs/cloud/manage-access/seats-and-users). -- Have [exposures](/docs/build/exposures) configured in your project and [source freshness](/docs/deploy/source-freshness) enabled in the job that generates this exposure. +- Have [exposures](/docs/build/exposures) defined in your project and [source freshness](/docs/deploy/source-freshness) enabled in the job that generates this exposure. + - The exposure used for the data health tile must have the [`type` property](/docs/build/exposures#available-properties) set to `dashboard`. Otherwise, you won't be able to view the **Embed data health tile in your dashboard** dropdown in dbt Explorer. ## View exposure in dbt Explorer @@ -61,15 +62,22 @@ Follow these steps to set up your data health tile: 5. Copy the **Metadata Only** token and save it in a secure location. You'll need it token in the next steps. 6. Navigate back to dbt Explorer and select an exposure. + + :::tip + The exposure used for the data health tile must have the [`type` property](/docs/build/exposures#available-properties) set to `dashboard`. Otherwise, you won't be able to view the **Embed data health tile in your dashboard** dropdown in dbt Explorer. + ::: + 7. Below the **Data health** section, expand on the toggle for instructions on how to embed the exposure tile (if you're an account admin with develop permissions). 8. In the expanded toggle, you'll see a text field where you can paste your **Metadata Only token**. 9. Once you’ve pasted your token, you can select either **URL** or **iFrame** depending on which you need to add to your dashboard. + + If your analytics tool supports iFrames, you can embed the dashboard tile within it. -### Examples +## Examples The following examples show how to embed the data health tile in Tableau and PowerBI. diff --git a/website/docs/docs/dbt-versions/release-notes.md b/website/docs/docs/dbt-versions/release-notes.md index bda22baa3ab..651c307c9bf 100644 --- a/website/docs/docs/dbt-versions/release-notes.md +++ b/website/docs/docs/dbt-versions/release-notes.md @@ -18,7 +18,8 @@ Release notes are grouped by month for both multi-tenant and virtual private clo ## January 2025 - +- **Enhancement**: The dbt Semantic Layer now fully supports the [`--favor-state` flag](/docs/cloud/about-cloud-develop-defer) when used with `defer` in the dbt Cloud IDE. This enhancement allows you to always resolve `{{ ref() }}` functions using staging or production metadata, ignoring any development version. +- **New**: Added the `dbt invocation` command to the [dbt Cloud CLI](/docs/cloud/cloud-cli-installation). This command allows you to view and manage active invocations, which are long-running sessions in the dbt Cloud CLI. For more information, see [dbt invocation](/reference/commands/invocation). - **New**: Users can now switch themes directly from the user menu, available [in Preview](/docs/dbt-versions/product-lifecycles#dbt-cloud). We have added support for **Light mode** (default), **Dark mode**, and automatic theme switching based on system preferences. The selected theme is stored in the user profile and will follow users across all devices. - Dark mode is currently available on the Developer plan and will be available for all [plans](https://www.getdbt.com/pricing) in the future. We’ll be rolling it out gradually, so stay tuned for updates. For more information, refer to [Change your dbt Cloud theme](/docs/cloud/about-cloud/change-your-dbt-cloud-theme). - **Fix**: dbt Semantic Layer errors in the Cloud IDE are now displayed with proper formatting, fixing an issue where newlines appeared broken or difficult to read. This fix ensures error messages are more user-friendly and easier to parse. diff --git a/website/docs/faqs/Troubleshooting/long-sessions-cloud-cli.md b/website/docs/faqs/Troubleshooting/long-sessions-cloud-cli.md new file mode 100644 index 00000000000..22c9929b488 --- /dev/null +++ b/website/docs/faqs/Troubleshooting/long-sessions-cloud-cli.md @@ -0,0 +1,14 @@ +--- +title: "I'm getting a \"Session occupied\" error in dbt Cloud CLI?" +description: "How to debug long-running sessions in dbt Cloud CLI" +sidebar_label: 'Debug long-running sessions in dbt Cloud CLI' +id: long-sessions-cloud-cli +--- + +If you're receiving a `Session occupied` error in the dbt Cloud CLI or if you're experiencing a long-running session, you can use the `dbt invocation list` command in a separate terminal window to view the status of your active session. This helps debug the issue and identify the arguments that are causing the long-running session. + +To cancel an active session, use the `Ctrl + Z` shortcut. + +To learn more about the `dbt invocation` command, see the [dbt invocation command reference](/reference/commands/invocation). + +Alternatively, you can reattach to your existing session with dbt reattach and then press Control-C and choose to cancel the invocation. diff --git a/website/docs/reference/commands/invocation.md b/website/docs/reference/commands/invocation.md new file mode 100644 index 00000000000..1961b2555da --- /dev/null +++ b/website/docs/reference/commands/invocation.md @@ -0,0 +1,92 @@ +--- +title: "About dbt invocation command" +sidebar_label: "invocation" +id: invocation +--- + +The `dbt invocation` command is available in the [dbt Cloud CLI](/docs/cloud/cloud-cli-installation) and allows you to: +- List active invocations to debug long-running or hanging invocations. +- Identify and investigate sessions causing the `Session occupied` error. +- Monitor currently active dbt commands (like `run`, `build`) in real-time. + +The `dbt invocation` command only lists _active invocations_. If no sessions are running, the list will be empty. Completed sessions aren't included in the output. + +## Usage + +This page lists the command and flag you can use with `dbt invocation`. To use them, add a command or option like this: `dbt invocation [command]`. + +Available flags in the command line interface (CLI) are [`help`](#dbt-invocation-help) and [`list`](#dbt-invocation-list). + +### dbt invocation help + +The `help` command provides you with the help output for the `invocation` command in the CLI, including the available flags. + +```shell +dbt invocation help +``` + +or + +```shell +dbt help invocation +``` + +The command returns the following information: + +```bash +dbt invocation help +Manage invocations + +Usage: + dbt invocation [command] + +Available Commands: + list List active invocations + +Flags: + -h, --help help for invocation + +Global Flags: + --log-format LogFormat The log format, either json or plain. (default plain) + --log-level LogLevel The log level, one of debug, info, warning, error or fatal. (default info) + --no-color Disables colorization of the output. + -q, --quiet Suppress all non-error logging to stdout. + +Use "dbt invocation [command] --help" for more information about a command. +``` + +### dbt invocation list + +The `list` command provides you with a list of active invocations in your dbt Cloud CLI. When a long-running session is active, you can use this command in a separate terminal window to view the active session to help debug the issue. + +```shell +dbt invocation list +``` + +The command returns the following information, including the `ID`, `status`, `type`, `arguments`, and `started at` time of the active session: + +```bash +dbt invocation list + +Active Invocations: + ID 6dcf4723-e057-48b5-946f-a4d87e1d117a + Status running + Type cli + Args [run --select test.sql] + Started At 2025-01-24 11:03:19 + +➜ jaffle-shop git:(test-cli) ✗ +``` + +:::tip + +To cancel an active session in the terminal, use the `Ctrl + Z` shortcut. + +::: + +## Related docs + +- [Install dbt Cloud CLI](/docs/cloud/cloud-cli-installation) +- [Troubleshooting dbt Cloud CLI 'Session occupied' error](/faqs/Troubleshooting/long-sessions-cloud-cli) + + diff --git a/website/docs/reference/dbt-commands.md b/website/docs/reference/dbt-commands.md index 9cbc5e5e38b..bdaa74e1f3b 100644 --- a/website/docs/reference/dbt-commands.md +++ b/website/docs/reference/dbt-commands.md @@ -45,6 +45,7 @@ Commands with a ('❌') indicate write commands, commands with a ('✅') indicat | [environment](/reference/commands/dbt-environment) | Enables you to interact with your dbt Cloud environment. | N/A | dbt Cloud CLI
Requires [dbt v1.5 or higher](/docs/dbt-versions/core) | | help | Displays help information for any command | N/A | dbt Core, dbt Cloud CLI
All [supported versions](/docs/dbt-versions/core) | | [init](/reference/commands/init) | Initializes a new dbt project | ✅ | dbt Core
All [supported versions](/docs/dbt-versions/core) | +| [invocation](/reference/commands/invocation) | Enables users to debug long-running sessions by interacting with active invocations.| N/A | dbt Cloud CLI
Requires [dbt v1.5 or higher](/docs/dbt-versions/core) | | [list](/reference/commands/list) | Lists resources defined in a dbt project | ✅ | All tools
All [supported versions](/docs/dbt-versions/core) | | [parse](/reference/commands/parse) | Parses a project and writes detailed timing info | ✅ | All tools
All [supported versions](/docs/dbt-versions/core) | | reattach | Reattaches to the most recent invocation to retrieve logs and artifacts. | N/A | dbt Cloud CLI
Requires [dbt v1.6 or higher](/docs/dbt-versions/core) | diff --git a/website/docs/reference/resource-configs/full_refresh.md b/website/docs/reference/resource-configs/full_refresh.md index 5e291fa2454..46b1d81259d 100644 --- a/website/docs/reference/resource-configs/full_refresh.md +++ b/website/docs/reference/resource-configs/full_refresh.md @@ -58,20 +58,17 @@ seeds:
-- If `full_refresh:true` — the configured resources(s) will full-refresh when `dbt run --full-refresh` is invoked. -- If `full_refresh:false` — the configured resources(s) will _not_ full-refresh when `dbt run --full-refresh` is invoked. - - ## Description The `full_refresh` config allows you to optionally configure whether a resource will always or never perform a full-refresh. This config is an override for the `--full-refresh` command line flag used when running dbt commands. +You can set the `full_refresh` config in the `dbt_project.yml` file or in a resource config. | `full_refresh` value | Behavior | | ---------------------------- | -------- | -| `true` | The resource always full-refreshes, regardless of the presence or absence of the `--full-refresh` flag. | -| `false` | The resource never full-refreshes, even if the `--full-refresh` flag is provided. | -| `none` or omitted | The resource follows the behavior of the `--full-refresh` flag. If the flag is used, the resource will full-refresh; otherwise, it won't. | +| If set to `true` | The resource _always_ performs a full refresh, regardless of whether you pass the `--full-refresh` flag in the dbt command. | +| If set to `false` | The resource _never_ performs a full refresh, regardless of whether you pass the `--full-refresh` flag in the dbt command. | +| If set to `none` or omitted | The resource follows the behavior of the `--full-refresh` flag. If the flag is used, the resource will perform a full refresh; otherwise, it will not. | #### Note - The `--full-refresh` flag also supports a short name, `-f`. diff --git a/website/sidebars.js b/website/sidebars.js index c1b12f293ec..a69bec01e49 100644 --- a/website/sidebars.js +++ b/website/sidebars.js @@ -1074,6 +1074,7 @@ const sidebarSettings = { "reference/commands/deps", "reference/commands/dbt-environment", "reference/commands/init", + "reference/commands/invocation", "reference/commands/list", "reference/commands/parse", "reference/commands/retry", diff --git a/website/static/img/blog/2025-01-24-sql-comprehension-technologies/binder.png b/website/static/img/blog/2025-01-24-sql-comprehension-technologies/binder.png new file mode 100644 index 00000000000..9c0ed824c89 Binary files /dev/null and b/website/static/img/blog/2025-01-24-sql-comprehension-technologies/binder.png differ diff --git a/website/static/img/blog/2025-01-24-sql-comprehension-technologies/compiler.png b/website/static/img/blog/2025-01-24-sql-comprehension-technologies/compiler.png new file mode 100644 index 00000000000..dac2d5041d8 Binary files /dev/null and b/website/static/img/blog/2025-01-24-sql-comprehension-technologies/compiler.png differ diff --git a/website/static/img/blog/2025-01-24-sql-comprehension-technologies/full_translation_flow.png b/website/static/img/blog/2025-01-24-sql-comprehension-technologies/full_translation_flow.png new file mode 100644 index 00000000000..baa9b49a4dc Binary files /dev/null and b/website/static/img/blog/2025-01-24-sql-comprehension-technologies/full_translation_flow.png differ diff --git a/website/static/img/blog/2025-01-24-sql-comprehension-technologies/logical_plan.png b/website/static/img/blog/2025-01-24-sql-comprehension-technologies/logical_plan.png new file mode 100644 index 00000000000..02c7e2765e9 Binary files /dev/null and b/website/static/img/blog/2025-01-24-sql-comprehension-technologies/logical_plan.png differ diff --git a/website/static/img/blog/2025-01-24-sql-comprehension-technologies/parser.png b/website/static/img/blog/2025-01-24-sql-comprehension-technologies/parser.png new file mode 100644 index 00000000000..ae307ccd86e Binary files /dev/null and b/website/static/img/blog/2025-01-24-sql-comprehension-technologies/parser.png differ diff --git a/website/static/img/blog/2025-01-24-sql-comprehension-technologies/sentence_syntax_tree.png b/website/static/img/blog/2025-01-24-sql-comprehension-technologies/sentence_syntax_tree.png new file mode 100644 index 00000000000..fe89153416c Binary files /dev/null and b/website/static/img/blog/2025-01-24-sql-comprehension-technologies/sentence_syntax_tree.png differ diff --git a/website/static/img/blog/2025-01-24-sql-comprehension-technologies/sql_syntax_tree.png b/website/static/img/blog/2025-01-24-sql-comprehension-technologies/sql_syntax_tree.png new file mode 100644 index 00000000000..c3abcffcc2b Binary files /dev/null and b/website/static/img/blog/2025-01-24-sql-comprehension-technologies/sql_syntax_tree.png differ