Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update TSQL grammar so that INTERSECT precedence is handled there #1259

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

asnare
Copy link
Contributor

@asnare asnare commented Nov 29, 2024

This PR updates the TSQL grammar and accompanying IR processing so that the precedence of INTERSECT is handled by the grammar instead of during transportation to our IR.

Resolves #1225.

@asnare asnare added feat/ir everything related to abstract syntax trees tech debt design flaws and other cascading effects sql/tsql antlr Changes to any of the ANTLR g4 grammar files. labels Nov 29, 2024
@asnare asnare self-assigned this Nov 29, 2024
Copy link

Coverage tests results

0 tests   - 464   0 ✅  - 427   0s ⏱️ -4s
0 suites  -   6   0 💤  -  37 
0 files    -   6   0 ❌ ±  0 

Results for commit 71e160c. ± Comparison against base commit 220b303.

@@ -2789,14 +2789,21 @@ predicate
;

queryExpression
: unionExpression sqlIntersection*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this and the below can be greatly simplified using something such as

queryExpression
 : querySpecification
 | queryExpression UNION ALL queryExpression
 | queryExpression INTERSECT queryExpression
etc...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ericvergnaud: This is interesting. I don't fully understand why the grammar is as it is. I already came up with this which also seems to be correct and is closer to what I would think of as the 'classical' way to express these rules:

queryExpression
    : LPAREN queryExpression RPAREN
    | queryExpression (UNION ALL? | EXCEPT) queryExpression
    | queryExpression INTERSECT queryExpression
    | querySpecification
    ;

Are there performance/implementation reasons for preferring the existing folding (??) style over this?

Copy link
Contributor

@jimidle jimidle Nov 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably correct, precedence is generally best done as one rule with precedence then being explicit, then label the alts and the visitor is more simple as well.

Let me look at the existing grammar

Copy link
Contributor Author

@asnare asnare Nov 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jimidle, @vil1 indeed suggested:

queryExpression
    : LPAREN queryExpression RPAREN                                         #inParen
    | left = queryExpression (UNION ALL? | EXCEPT) right = queryExpression  #union
    | left = queryExpression INTERSECT right = queryExpression              #intersect
    | querySpecification                                                    #simple
    ;

Copy link
Contributor

@jimidle jimidle Nov 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, don' tuse the labels left/right as they generate more code. ctx.queryExpression(0), ctx.queryExpression(1) is more efficient. See the read me about working with antlr grammar, labels are useful for disambiguation, but even then it might mean splitting rules or labeling the alts is better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ericvergnaud: This is interesting. I don't fully understand why the grammar is as it is. I already came up with this which also seems to be correct and is closer to what I would think of as the 'classical' way to express these rules:

queryExpression
    : LPAREN queryExpression RPAREN
    | queryExpression (UNION ALL? | EXCEPT) queryExpression
    | queryExpression INTERSECT queryExpression
    | querySpecification
    ;

Are there performance/implementation reasons for preferring the existing folding (??) style over this?

BTW the grammar is like this because it was just typed in by reading the MS docs, which don't encapsulate such things. You should have seen it 6 months ago ;). These are the occasions where we get to fix it up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
antlr Changes to any of the ANTLR g4 grammar files. feat/ir everything related to abstract syntax trees sql/tsql tech debt design flaws and other cascading effects
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants