Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate MapAccess, and Subscript into CompoundExpr to handle the complex field access chain #1551

Merged
merged 34 commits into from
Dec 22, 2024

Conversation

goldmedal
Copy link
Contributor

@goldmedal goldmedal commented Nov 25, 2024

Description

close #1533

This PR introduces a new Expr, Expr::CompoundExpr, which handles the complex field access chain like a.b[1].c or func()[1]. As the previous discussion #1541 (comment), I found we have many similar expressions for the similar purpose (e.g. MapAccess, Subscript, JsonAcess, ...). CompoundExpr tries to represent a structure for all of them. As the suggestion from @iffyio (#1541 (comment)), CompoundExpr is a linear structure, it has a root expression and a list of access fields, instead of a nested representation.

    /// Multi-part Expression accessing. It's used to represent an access chain from a root expression.
    /// e.g. `expr[0]`, `expr[0][0]`, or `expr.field1.filed2[1].field3`, ...
    CompoundExpr {
        root: Box<Expr>,
        chain: Vec<AccessField>,
    },

To avoid too many breaking changes, this PR only replaces Expr::MapAccess and Expr::Subscript with Expr::CompoundExpr. Potentially, it can represent more things.

Array

A 1-dim array a[1] will be represented like

CompoundExpr(Ident('a'), vec![Subscript(1)]

A 2-dim array a[1][2] will be represented like

CompoundExpr(Ident('a'), vec![Subscript(1), Subscript(2)]

Map or Struct (Bracket-style)

A map a['fied1'] will be represented like

CompoundExpr(Ident('a'), vec![Subscript('field')]

A 2-dim map a['field1']['field2'] will be represented like

CompoundExpr(Ident('a'), vec![Subscript('field1'), Subscript('field2')]

Struct (Period-style) (only effect when the chain contains both subscript and expr)

A struct access a[field1].field2 will be represented

CompoundExpr(Ident('a'), vec![Subscript('field1'), Ident('field2')]

If a struct access likes a.field1.field2, it will be represented by CompoundIdentifer([a, field1, field2])

Breaking Change

  • Expr::Subscript and Expr::MapAccess are removed.
  • The result of the array, map, or struct will be changed.

Additional context

@goldmedal goldmedal changed the title Remove MapAccess and handle the field access chain Introduce CompoundExpr to handle the complex field access chain Dec 4, 2024
@goldmedal goldmedal marked this pull request as ready for review December 4, 2024 18:04
@goldmedal
Copy link
Contributor Author

@iffyio This PR is ready for review. Could you take a look at it? Thanks 🙇
This PR involves some breaking changes to the array syntax. I'd appreciate it if @alamb could check it as well. 🙇

src/ast/mod.rs Outdated Show resolved Hide resolved
src/ast/mod.rs Outdated Show resolved Hide resolved
src/ast/mod.rs Outdated Show resolved Hide resolved
src/ast/mod.rs Outdated Show resolved Hide resolved
src/ast/mod.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
src/parser/mod.rs Outdated Show resolved Hide resolved
} else {
let mut expr = self.parse_function(ObjectName(id_parts))?;
// consume all period if it's a method chain
if self.dialect.supports_methods() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try_parse_method already does the if self.dialect.supports_methods() check so that we should be able to skip the if condition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, if this condition was removed, the test for method parsing will fail:

    parse_method_expr
    parse_method_select

Because the following parse_compound_expr will try to consume all the dots for an expression, we need to parse the method here to avoid consume ( in parse_compound_expr.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, to clarify what I meant was that try_parse_method does this already

if !self.dialect.supports_methods() {
	return Ok(expr);
}

So that this can be simplified as following (i.e without the extra if self.dialect.supports_methods())

expr = self.try_parse_method(expr)?;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. It's more simple. Thanks!

src/parser/mod.rs Outdated Show resolved Hide resolved
src/parser/mod.rs Show resolved Hide resolved
@alamb alamb changed the title Introduce CompoundExpr to handle the complex field access chain Consolidate MapAccess, and Subscript into CompoundExpr to handle the complex field access chain Dec 6, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @goldmedal and @iffyio

I didn't study the parser changes closely, but I reviewed the Expr representation and left some comments.

In terms of the potential downstream impacts (and avoiding regressions), one thing that I suggest to help evaluate the potential downstream impact is of this trying it out in DataFusion and see if it works / passes all the tests;

You could potentially use apache/datafusion#13546 as a base to test with

src/ast/mod.rs Outdated Show resolved Hide resolved
src/ast/mod.rs Outdated Show resolved Hide resolved
src/ast/mod.rs Outdated Show resolved Hide resolved
src/ast/mod.rs Outdated Show resolved Hide resolved
src/ast/mod.rs Outdated Show resolved Hide resolved
tests/sqlparser_common.rs Show resolved Hide resolved
tests/sqlparser_common.rs Outdated Show resolved Hide resolved
@goldmedal
Copy link
Contributor Author

Thanks, @alamb and @iffyio for reviewing.

In terms of the potential downstream impacts (and avoiding regressions), one thing that I suggest to help evaluate the potential downstream impact is of this trying it out in DataFusion and see if it works / passes all the tests;

You could potentially use apache/datafusion#13546 as a base to test with

Sounds good 👍 I'll test this PR based on it tomorrow.

@goldmedal
Copy link
Contributor Author

Thanks, @alamb and @iffyio for reviewing.

In terms of the potential downstream impacts (and avoiding regressions), one thing that I suggest to help evaluate the potential downstream impact is of this trying it out in DataFusion and see if it works / passes all the tests;
You could potentially use apache/datafusion#13546 as a base to test with

Sounds good 👍 I'll test this PR based on it tomorrow.

Created apache/datafusion#13734 for testing. The good news is that It only impacts a few parts of the planner and the unparser 👍

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @goldmedal -- this looks good to me. 🙏

@alamb
Copy link
Contributor

alamb commented Dec 11, 2024

Given the potential for non trivial downstream conflicts due to this change (look at the list of conflicts it has already collected) I would like to merge it in right after we have done the next release

Copy link
Contributor

@iffyio iffyio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @goldmedal! Left some minor comments, the changes look good to me!

@@ -3167,42 +3287,23 @@ impl<'a> Parser<'a> {

pub fn parse_map_access(&mut self, expr: Expr) -> Result<Expr, ParserError> {
let key = self.parse_expr()?;
let result = match key {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I was initially wondering about this match since it looked incorrect to restrict the key expression type. e.g. this BigQuery test case. But then I realised that test case is passing because it now takes a different codepath via parse_compound_field_access. And so I wonder if there are any scenarios that rely on this method anymore, if there aren't it seems like we might be able to remove it entirely?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. We can remove it entirely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed by 5c54d1b

src/parser/mod.rs Show resolved Hide resolved
} else {
let mut expr = self.parse_function(ObjectName(id_parts))?;
// consume all period if it's a method chain
if self.dialect.supports_methods() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, to clarify what I meant was that try_parse_method does this already

if !self.dialect.supports_methods() {
	return Ok(expr);
}

So that this can be simplified as following (i.e without the extra if self.dialect.supports_methods())

expr = self.try_parse_method(expr)?;

}
}

fn exprs_to_idents(root: &Expr, fields: &[AccessExpr]) -> Option<Vec<Ident>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a description of what the function does?

}
}

fn exprs_to_idents(root: &Expr, fields: &[AccessExpr]) -> Option<Vec<Ident>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since the caller wouldn't need the copy of root and fields it would be nice to avoid the cloning? maybe something like this could work?

fn exprs_to_idens(root: Expr, fields: Vec<AccessExprs>) -> Result<Vec<Ident>, (Expr, Vec<AccessExprs>)>

so that if no conversion was made the caller gets back the same values they provided?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. I think we can save the cloning by separating it into two functions. One for check and another for transfer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed by 57830e2

Comment on lines 1490 to 1498
if dialect_of!(self is SnowflakeDialect | MsSqlDialect)
&& self.consume_tokens(&[Token::Plus, Token::RParen])
{
Ok(Expr::OuterJoin(Box::new(
match <[Ident; 1]>::try_from(id_parts) {
Ok([ident]) => Expr::Identifier(ident),
Err(parts) => Expr::CompoundIdentifier(parts),
},
)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice if we could reuse this logic somehow in a function? since its not trivial and lists dialects, now that its duplicated it could be easy to miss if changes are made to one but not the other?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed by 4b3818c

@@ -3095,15 +3201,29 @@ impl<'a> Parser<'a> {
})
}

/// Parse an multi-dimension array accessing like `[1:3][1][1]`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Parse an multi-dimension array accessing like `[1:3][1][1]`
/// Parses a multi-dimension array accessing like `[1:3][1][1]`

@@ -3095,15 +3201,29 @@ impl<'a> Parser<'a> {
})
}

/// Parse an multi-dimension array accessing like `[1:3][1][1]`
///
/// Parser is right after the first `[`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given this is a pub function it would be nice if the api was symmetric so that the function also consumed the leading [? I think within the parser itself that would mean calling prev_token() or replacing with peek() before invoking this function which would be a fair price to pay?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed by 67cd877

@goldmedal goldmedal requested a review from iffyio December 16, 2024 09:50
Copy link
Contributor

@iffyio iffyio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @goldmedal!

@goldmedal
Copy link
Contributor Author

Given the potential for non trivial downstream conflicts due to this change (look at the list of conflicts it has already collected) I would like to merge it in right after we have done the next release

@alamb @iffyio, 0.53.0 has been released. Could we merge this PR?
I want to merge it to avoid some potential conflicts.

@iffyio iffyio merged commit 0647a4a into apache:main Dec 22, 2024
8 checks passed
@goldmedal goldmedal deleted the feature/1533-dereference-expr-v3 branch December 22, 2024 14:31
@iffyio
Copy link
Contributor

iffyio commented Dec 22, 2024

Thanks for working on this @goldmedal!

@goldmedal
Copy link
Contributor Author

Thanks @iffyio and @alamb 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve the complex field access chain for the struct type
3 participants