From 4785c25d80836dfc82581768e60bb082e7cba7b7 Mon Sep 17 00:00:00 2001 From: Phil Rzewski Date: Fri, 6 Sep 2024 11:49:02 -0700 Subject: [PATCH] Update docs for lateral subqueries and over operator --- docs/language/lateral-subqueries.md | 74 +++++++++++++++++++++++------ docs/language/operators/over.md | 34 ++----------- 2 files changed, 64 insertions(+), 44 deletions(-) diff --git a/docs/language/lateral-subqueries.md b/docs/language/lateral-subqueries.md index e5e43f07cf..bd8d3bb446 100644 --- a/docs/language/lateral-subqueries.md +++ b/docs/language/lateral-subqueries.md @@ -7,11 +7,19 @@ sidebar_label: Lateral Subqueries Lateral subqueries provide a powerful means to apply a Zed query to each subsequence of values generated from an outer sequence of values. -The inner query may be _any Zed query_ and may refer to values from +The inner query may be _any_ dataflow operator sequence (excluding +[`from` operators](operators/from.md)) and may refer to values from the outer sequence. +:::tip Note +This pattern rhymes with the SQL pattern of a "lateral +join", which runs a SQL subquery for each row of the outer query's table. +::: + Lateral subqueries are created using the scoped form of the -[`over` operator](operators/over.md) and may be nested to arbitrary depth. +[`over` operator](operators/over.md). They may be nested to arbitrary depth +and accesses to variables in parent lateral query bodies follows lexical +scoping. For example, ```mdtest-command @@ -24,7 +32,7 @@ produces {name:"foo",elem:2} {name:"bar",elem:3} ``` -Here the lateral scope, described below, creates a subquery +Here the [lateral scope](#lateral-scope), described below, creates a subquery ``` yield {name,elem:this} ``` @@ -41,7 +49,7 @@ The first subquery thus operates on the input values `1, 2` with the variable {name:"foo",elem:1} {name:"foo",elem:2} ``` -and the second subquery operators on the input value `3` with the variable +and the second subquery operates on the input value `3` with the variable `name` set to "bar", emitting ``` {name:"bar",elem:3} @@ -81,17 +89,23 @@ between each `` evaluated in the outer scope and each ``, which represents a new symbol in the inner scope of the ``. In the field reference form, a single identifier `` refers to a field in the parent scope and makes that field's value available in the lateral scope -with the same name. +via the same name. + +Note that any such variable definitions override [implied field references](dataflow-model.md#implied-field-references) of +`this`. If a both a field named `x` and a variable named `x` need be +referenced in the lateral scope, the field reference should be qualified as +`this.x` while the variable is referenced simply as `x`. -The ``, which may be any Zed query, is evaluated once per outer value +The `` is evaluated once per outer value on the sequence generated by the `over` expression. In the lateral scope, the value `this` refers to the inner sequence generated from the `over` expressions. This query runs to completion for each inner sequence and emits each subquery result as each inner sequence traversal completes. -This structure is powerful because _any_ Zed query can appear in the body of -the lateral scope. In contrast to the `yield` example, a sort could be -applied to each subsequence in the subquery, where sort +This structure is powerful because _any_ dataflow operator sequence (excluding +[`from` operators](operators/from.md)) can appear in the body of +the lateral scope. In contrast to the [`yield`](operators/yield.md) example above, a [`sort`](operators/sort.md) could be +applied to each subsequence in the subquery, where `sort` reads all values of the subsequence, sorts them, emits them, then repeats the process for the next subsequence. For example, ```mdtest-command @@ -112,13 +126,13 @@ parenthesized form: ``` ( over [, ...] [with = [, ... [=]] | ) ``` -> Note that the parentheses disambiguate a lateral expression from a lateral -> dataflow operator. -This form must always include a lateral scope as indicated by ``, -which can be any dataflow operator sequence excluding [`from` operators](operators/from.md). -As with the `over` operator, values from the outer scope can be brought into -the lateral scope using the `with` clause. +:::tip +The parentheses disambiguate a lateral expression from a lateral +dataflow operator. +::: + +This form must always include a [lateral scope](#lateral-scope) as indicated by ``. The lateral expression is evaluated by evaluating each `` and feeding the results as inputs to the `` dataflow operators. Each time the @@ -148,3 +162,33 @@ produces {sorted:[1,4,7],sum:12} {sorted:[1,2,3],sum:6} ``` +Because Zed expressions evaluate to a single result, if multiple values remain +at the conclusion of the lateral dataflow, they are automatically wrapped in +an array, e.g., +```mdtest-command +echo '{x:1} {x:[2]} {x:[3,4]}' | + zq -z 'yield {s:(over x | yield this+1)}' - +``` +produces +```mdtest-output +{s:2} +{s:3} +{s:[4,5]} +``` +To handle such dynamic input data, you can ensure your downstream dataflow +always receives consistently packaged values by explicitly wrapping the result +of the lateral scope, e.g., +```mdtest-command +echo '{x:1} {x:[2]} {x:[3,4]}' | + zq -z 'yield {s:(over x | yield this+1 | collect(this))}' - +``` +produces +```mdtest-output +{s:[2]} +{s:[3]} +{s:[4,5]} +``` +Similarly, a primitive value may be consistently produced by concluding the +lateral scope with an operator such as [`head`](operators/head.md) or +[`tail`](operators/tail.md), or by applying certain [aggregate functions](aggregates/README.md) +such as done with [`sum`](aggregates/sum.md) above. diff --git a/docs/language/operators/over.md b/docs/language/operators/over.md index 2ec242adee..1793b6804e 100644 --- a/docs/language/operators/over.md +++ b/docs/language/operators/over.md @@ -12,45 +12,21 @@ The `over` operator traverses complex values to create a new sequence of derived values (e.g., the elements of an array) and either (in the first form) sends the new values directly to its output or (in the second form) sends the values to a scoped computation as indicated -by ``, which may represent any Zed subquery operating on the -derived sequence of values as `this`. +by ``, which may represent any Zed [subquery](../lateral-subqueries.md) operating on the +derived sequence of values as [`this`](../dataflow-model.md#the-special-value-this). Each expression `` is evaluated in left-to-right order and derived sequences are generated from each such result depending on its types: -* an array value generates each of its element, +* an array value generates each of its elements, * a map value generates a sequence of records of the form `{key:,value:}` for each entry in the map, and * all other values generate a single value equal to itself. -Records can be converted to maps with the [_flatten_ function](../functions/flatten.md) +Records can be converted to maps with the [`flatten` function](../functions/flatten.md) resulting in a map that can be traversed, e.g., if `this` is a record, it can be traversed with `over flatten(this)`. -The nested subquery depicted as `` is called a "lateral query" as the -outer query operates on the top-level sequence of values while the lateral -query operates on subsequences of values derived from each input value. -This pattern rhymes with the SQL pattern of a "lateral join", which runs a -SQL subquery for each row of the outer query's table. - -In a Zed lateral query, each input value induces a derived subsequence and -for each such input, the lateral query runs to completion and yields its results. -In this way, operators like `sort` and `summarize`, which operate on their -entire input, run to completion for each subsequence and yield to the output the -lateral result set for each outer input as a sequence of values. - -Within the lateral query, `this` refers to the values of the subsequence thereby -preventing lateral expressions from accessing the outer `this`. -To accommodate such references, the _over_ operator includes a _with_ clause -that binds arbitrary expressions evaluated in the outer scope -to variables that may be referenced by name in the lateral scope. - -> Note that any such variable definitions override implied field references -> of `this`. If a both a field named "x" and a variable named "x" need be -> referenced in the lateral scope, the field reference should be qualified as `this.x` -> while the variable is referenced simply as `x`. - -Lateral queries may be nested to arbitrary depth and accesses to variables -in parent lateral query bodies follows lexical scoping. +The nested subquery depicted as `` is called a [lateral subquery](../lateral-subqueries.md). ### Examples