Add more "where" coverage in the summarize doc #5316
Merged
+37
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What's Changing
More examples are being proposed in the user-facing doc for the
summarize
operator to show some subtleties related to includingwhere
filtering with an aggreagtion.Why
As part of benchmarking work, I was recently converting some SQL queries to their Zed equivalents and came across the effects shown in these examples. I'm not certain if SQL users learning Zed might be tripped up by the same, but I figure it can't hurt to call it out in the docs just in case.
Details
Here's a separate example I showed to the team at a group sync using the attached sample.csv data.
In essence, I can see that it's possible in both SQL and Zed to create an aggregation result that includes what I'll call "empty buckets":
Likewise, I can also create results in both SQL and Zed without the empty buckets:
Here's my concern, though. I expect SQL users are accustomed to seeing the pattern
SELECT... [aggregate function(s)]... GROUP BY
as "an aggregation", and so when such a user comes to learn Zed, they may look for a similar pattern and seesummarize... [aggregate function(s)]... BY
as an equivalent way to express "an aggregation" . And since in the SQL thewhere
filtering happens in the middle of "an aggregation", I suspect they may try putting thewhere
in the middle of thesummarize
in Zed. But that would give them the "empty buckets" behavior, which they might not expect. Since getting the "without empty buckets" behavior in Zed requires moving the filter to a separate pipeline element before thesummarize
, this seems like something they'll want to know early in their learning of Zed.