Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for BigQuery struct, array and bytes , int64, float64 datatypes #1003

Merged
merged 3 commits into from
Oct 25, 2023

Conversation

iffyio
Copy link
Contributor

@iffyio iffyio commented Oct 15, 2023

This builds on top of #817

It works around the issue of not being able to parse nested types like ARRAY<ARRAY<INT>> due to the right angle bracket ambiguity where the tokenizer chooses the right-shift operator (this affects other dialects like Hive that have similar syntax). When parsing such types, we accept a closing > or >> and track which variant is in use in order to preserve correctness.

Fixes #901
Closes #817

@coveralls
Copy link

coveralls commented Oct 15, 2023

Pull Request Test Coverage Report for Build 6523097396

  • 465 of 479 (97.08%) changed or added relevant lines in 7 files are covered.
  • 2 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.2%) to 87.579%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/ast/data_type.rs 12 13 92.31%
src/ast/mod.rs 12 13 92.31%
src/parser/mod.rs 83 95 87.37%
Files with Coverage Reduction New Missed Lines %
src/parser/mod.rs 2 82.79%
Totals Coverage Status
Change from base Build 6435373786: 0.2%
Covered Lines: 17211
Relevant Lines: 19652

💛 - Coveralls

This builds on top of apache#817

- `STRUCT` literal support via `STRUCT` keyword
   https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#constructing_a_struct
- `STRUCT` and `ARRAY` type declarations
   https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#declaring_an_array_type
   https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#declaring_a_struct_type

It works around the issue of not being able to parse nested types
like `ARRAY<ARRAY<INT>>` due to the right angle bracket ambiguity
where the tokenizer chooses the right-shift operator (this affects
other dialects like Hive that have similar syntax).
When parsing such types, we accept a closing `>` or `>>` and track
which variant is in use in order to preserve correctness.

Fixes apache#901
Closes apache#817
@alamb
Copy link
Contributor

alamb commented Oct 24, 2023

Note there appears to be a similar PR: #966

@alamb alamb changed the title Add support for BigQuery struct and array datatype Add support for BigQuery struct, array and bytes datatype Oct 24, 2023
@alamb alamb changed the title Add support for BigQuery struct, array and bytes datatype Support for BigQuery struct, array and bytes datatype Oct 24, 2023
@alamb alamb changed the title Support for BigQuery struct, array and bytes datatype Support for BigQuery struct, array and bytes , int64, float64 datatypes Oct 24, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for this contribution @iffyio 👏 -- it is very impressive both well documented and well tested. I had a few minor comments, but nothing that would prevent merging.

Please let me know if you have time to address the comments this week (I plan to make a release later this week)

#[derive(Debug, Clone, PartialEq, PartialOrd, Eq, Ord, Hash)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
#[cfg_attr(feature = "visitor", derive(Visit, VisitMut))]
pub enum ArrayElemTypeDef {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -816,6 +836,10 @@ impl<'a> Parser<'a> {
Keyword::MATCH if dialect_of!(self is MySqlDialect | GenericDialect) => {
self.parse_match_against()
}
Keyword::STRUCT if dialect_of!(self is BigQueryDialect) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to also support this syntax in the Generic dialect? Rationale is explained in https://github.com/sqlparser-rs/sqlparser-rs#new-syntax

Suggested change
Keyword::STRUCT if dialect_of!(self is BigQueryDialect) => {
Keyword::STRUCT if dialect_of!(self is BigQueryDialect | GenericDialect) => {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

/// expr [AS name]
/// ```
/// [1]: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#constructing_a_struct
pub fn parse_struct_field_expr(&mut self, typed_syntax: bool) -> Result<Expr, ParserError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need to be pub?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

Ok(Expr::Struct { values, fields })
}

/// Parse an expression value for a bigquery struct [1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a note to the documentation about the meaning of the typed_syntax parameter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the description to cover the parameter

@iffyio
Copy link
Contributor Author

iffyio commented Oct 25, 2023

Thanks for the review @alamb! Sure I'll fetch the latest changes and look to address the comments later today 👍

Note there appears to be a similar PR: #966

Ah indeed I missed that one, thanks for the heads up!

@alamb alamb merged commit 2f437db into apache:main Oct 25, 2023
10 checks passed
@alamb
Copy link
Contributor

alamb commented Oct 25, 2023

Thanks again @iffyio

@iffyio iffyio deleted the bq-struct branch October 26, 2023 07:10
serprex pushed a commit to serprex/sqlparser-rs that referenced this pull request Nov 6, 2023
lustefaniak pushed a commit to getsynq/sqlparser-rs that referenced this pull request Dec 21, 2023
…4` datatypes (apache#1003)

# Conflicts:
#	src/ast/mod.rs
#	src/parser/mod.rs
#	tests/sqlparser_common.rs
#	tests/sqlparser_snowflake.rs
lustefaniak pushed a commit to getsynq/sqlparser-rs that referenced this pull request Dec 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support field names in BigQuery STRUCT
3 participants