Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for TABLESAMPLE #1580

Merged
merged 10 commits into from
Dec 15, 2024
Merged

Add support for TABLESAMPLE #1580

merged 10 commits into from
Dec 15, 2024

Conversation

yoavcloud
Copy link
Contributor

@yoavcloud yoavcloud commented Dec 7, 2024

This PR adds support for the TABLESAMPLE option in the following dialects:

Collateral work includes expanding the use of a constructor function to create Table structs in unit tests to avoid modifying many files when adding default options to the Table struct.

@yoavcloud yoavcloud marked this pull request as draft December 7, 2024 07:07
@yoavcloud yoavcloud marked this pull request as ready for review December 7, 2024 07:36
src/dialect/mod.rs Show resolved Hide resolved
src/ast/query.rs Outdated
sample: Option<Box<TableSample>>,
/// Position of the table sample modifier in the table factor. Default is after the table alias
/// e.g. `SELECT * FROM tbl t TABLESAMPLE (10 ROWS)`. See `Dialect::supports_table_sample_before_alias`.
sample_before_alias: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to use an enum here? e.g

enum TableSampleKind {
    BeforeTableAlias(Box<TableSample>)
    // ...
}
sample: Option<TableSampleKind>

thinking that avoids the surplus flag on the table factor and would lend itself to be extensible if required later on

self.expect_token(&Token::RParen)?;
TableSample::Bucket(TableSampleBucket { bucket, total, on })
} else {
let value = match self.try_parse(|p| p.parse_number_value()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the try_parse rather be maybe_parse (we seem to be ignoring the returned error otherwise)?

@yoavcloud yoavcloud requested a review from iffyio December 8, 2024 16:49
Comment on lines 10667 to 10669
let value = match self.maybe_parse(|p| p.parse_number_value()) {
Ok(Some(num)) => num,
_ => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let value = match self.maybe_parse(|p| p.parse_number_value()) {
Ok(Some(num)) => num,
_ => {
let value = match self.maybe_parse(|p| p.parse_number_value())? {
Some(num) => num,
None => {

I think this is usually the recursion limit or similar fatal error we can propagate

Comment on lines 10680 to 10681
if self.peek_token().token == Token::RParen
&& !self.dialect.supports_implicit_table_sample_method()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this could be simplified as if self.dialect.supports_implicit_table_sample_method() && self.consume_token(Token::RParen) it would also let us skip the expect_token that follows as well?

repeatable: seed,
})
// Try to parse without an explicit table sample method keyword
} else if self.peek_token().token == Token::LParen {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
} else if self.peek_token().token == Token::LParen {
} else if self.consume_token(Token::LParen) {

Comment on lines 2961 to 2962
"SELECT * FROM testtable SAMPLE (10)",
"SELECT * FROM testtable TABLESAMPLE BERNOULLI (10)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the scenarios that currently rely on one_statement_parse_to, could we represent them faithfully when displaying? e.g this and the ROW vs BERNOULLI variants etc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I understand the question, but if you mean whether the variants I added as interchangeable are really interchangeable in the dialect, as far as I understand yes. If not, please elaborate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah so I was rather thinking that ideally we preserve the syntax roundtrip, even though they'r interchangeable in some dialects

snowflake_and_generic().verified_stmt("SELECT * FROM testtable TABLESAMPLE ROW (20.3)");
snowflake_and_generic().verified_stmt("SELECT * FROM testtable SAMPLE BLOCK (3) SEED (82)");

@yoavcloud yoavcloud requested a review from iffyio December 9, 2024 08:22
@yoavcloud
Copy link
Contributor Author

@iffyio please see latest commit

Copy link
Contributor

@iffyio iffyio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @yoavcloud! Left a comment to distinguish sample vs tablesample. Otherwise this looks good to me!

Comment on lines 2956 to 2959
snowflake_and_generic().one_statement_parses_to(
"SELECT * FROM testtable SAMPLE (10)",
"SELECT * FROM testtable TABLESAMPLE (10)",
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh could we do the same here as well? clickhouse for example doesn't have has SAMPLE but not TABLESAMPLE so that the roundtrip stays correct

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added support for clickhouse as well, with a different approach to how to model the it in the AST

Copy link
Contributor

@iffyio iffyio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @yoavcloud
cc @alamb

@iffyio iffyio merged commit 316bb14 into apache:main Dec 15, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants