Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bring back the "|" symbol and eliminate search rakes #5436

Merged
merged 3 commits into from
Nov 5, 2024
Merged

Conversation

mccanne
Copy link
Collaborator

@mccanne mccanne commented Nov 5, 2024

This commit brings back the "|" symbol for pipeline separator while retaining the "|>" as an option. The latter is needed to exit a SQL expression back into a pipe when a SQL expression is immediately followed by a shortcut that begins with an identifier (implied-yield expression or agg function).

For any valid query without shortcuts, there is never an ambiguity between the bitwise OR "|" and pipe separator because keywords cannot be used as identifiers. The argument that such ambiguity is present made in the Google SQL pipes paper presumably assumes a one-token lookahead parser, which is not the case here.

Toward this end, we changed our grammar to disallow keywords as identifier and added the backtick-string syntax to escape any string as identifier.

We also removed search syntax as a shortcut and now require a "?" (or "search" keyword) to signal the keyword search operator. This means typos that happen to compile into unintended searches are no longer accepted by the grammar thereby eliminating the so-called search rake.

This commit brings back the "|" symbol for pipeline separator
while retaining the "|>" as an option.  The latter is needed to
exit a SQL expression back into a pipe when a SQL expression
is immediately followed by a shortcut that begins with an identifier
(implied-yield expression or agg function).

For any valid query without shortcuts, there is never an ambiguity
between the bitwise OR "|" and pipe separator because keywords cannot
be used as identifiers.  The argument that such ambiguity is
present made in the Google SQL pipes paper presumably assumes a
one-token lookahead parser, which is not the case here.

Toward this end, we changed our grammar to disallow keywords as
identifier and added the backtick-string syntax to escape any
string as identifier.

Furthermore, we arranged for the expression grammar to include
"|" for bitwise OR in SQL expression but omit it in any
non-SQL pipe operator.  This means any valid SQL queries will
continue to be valid SuperSQL queries (inclusive of bitwise OR).
Moreover, expressions in pipeline operators do not have any
pipe-character ambiguity with shortcuts because bitwise-OR is omitted.
We can add bitwise functionality as named functions in place of the
archaic C-style syntax (which can work in both SQL and pipeline
operators).  Best practice will be to use these functions over
the old bitwise syntax.

We also removed search syntax as a shortcut and now require a "?"
(or "search" keyword) to signal the keyword search operator.
This means typos that happen to compile into unintended searches
are no longer accepted by the grammar thereby eliminating the so-called
search rake.
@mccanne mccanne requested review from philrz and a team November 5, 2024 01:46
Ok as disccused, I'm pushing these changes to remove bitwise-OR
from the SQL expression syntax, thus removing the ambiguity with the
SQL shortcuts.  If/when users want to run SuperSQL queries on legacy
SQL with bitwise OR, we can put this grammar back in place but under
a flag to enable it only as needed.  In the meantime, wew ill introduce
bitwise_or(), bitwise_and(), etc named functions in subsequent PR.
compiler/parser/support.go Outdated Show resolved Hide resolved
@mccanne mccanne merged commit 5a0232a into main Nov 5, 2024
4 checks passed
@mccanne mccanne deleted the pipe-char branch November 5, 2024 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants