Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add all missing table options to be handled in any order #1747

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

benrsatori
Copy link
Contributor

No description provided.

@mvzink
Copy link
Contributor

mvzink commented Feb 26, 2025

Ah, I'm guessing this supersedes #1746. As mentioned in my comment there, I strongly believe we should implement table options in the same way as column options, with a Vec preserving the input order.

src/ast/dml.rs Outdated
@@ -138,6 +143,30 @@ pub struct CreateTable {
pub engine: Option<TableEngine>,
pub comment: Option<CommentDef>,
pub auto_increment_offset: Option<u32>,
pub key_block_size: Option<u32>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's quite a few of these options and its not clear what dialect they're coming from, I assume there will be more across the various dialects.

I'm thinking to @mvzink's point, it will be good to represent this as a Vec, also think that we can represent this similar to the existing with_options: Vec<SqlOption> on this struct (maybe we can just reuse and merge this into that field if it makes sense, otherwise it could be a new field)? essentially that it would be ideal to not have to enumerate and keep in sync all possible table options across dialects but instead always accept them as opaque entries

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will be taking over this PR, diving into this suggestion:
Currently the CreateTable struct has a few types of options which are AFAIU mainly dialect driven:

  • with_options is formatted as "WITH (....options...)" -> Postgres & other
  • options which is formatted as "OPTIONS (....options...)" -> BigQuery
  • table_properties which is formatted as "TBLPROPERTIES (....options...)" -> Databricks

Proposed design to support the mysql variant:

  • Add another parameter Vec which will include properties which do not require a preceding keyword (WITH/OPTIONS/...), e.g. plain_options
  • The Vec will include optional parameters which are currently assigned to the CreateTable struct, e.g. engine / collation / ...
  • Parameter parsing will be delegated to the respective Dialect to avoid clutter in the main flow

@iffyio @mvzink your feedback is appreciated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a reasonable approach to me. Thanks for pushing this forward! I would suggest that whatever parsing routine/subparser produces that vector of options be written in such a way that it could be used for ALTER TABLE parsing as well; I believe their syntax is basically the same, at least for MySQL, and that struct should be updated as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add another parameter Vec which will include properties which do not require a preceding keyword (WITH/OPTIONS/...), e.g. plain_options

seeing as there's a few similar parameters, thinking it could be nice to merge the variants into an enum something like

enum CreateTableOptions {
  TableProperties(Vec<SqlOption>),
  WithOptions(Vec<SqlOption>),
  Options(Vec<SqlOption>),
  Plain(Vec<SqlOption>),
}

that doesn't have to be in this PR however, we could introduce a new field if it could be a hassle to merge the existing ones.

Parameter parsing will be delegated to the respective Dialect to avoid clutter in the main flow

How do we mean by this part? I imagined it would be reusing self.parse_options

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure it makes sense to reuse parse_options, it's designed to parse something in the format of KEYWORD(opt1 , opt2, opt3) while mysql syntax is opt1 opt2 opt3.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of an enum, will handle that on a separate PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iffyio added a commit with merging of all variants into the (pre-existing) CreateTableOptions enum

@@ -721,6 +721,30 @@ fn test_duckdb_union_datatype() {
comment: Default::default(),
auto_increment_offset: Default::default(),
default_charset: Default::default(),
insert_method: Default::default(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use ..Default::default() for all these members?

@@ -1590,6 +1590,30 @@ fn parse_create_table_with_valid_options() {
comment: None,
auto_increment_offset: None,
default_charset: None,
insert_method: None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

..Default::default() ?

@iffyio iffyio marked this pull request as draft March 4, 2025 12:51
@iffyio
Copy link
Contributor

iffyio commented Mar 4, 2025

Marking the PR as draft in the meantime as it is not currently pending review, @benrsatori feel free to undraft and ping when ready!

@tomershaniii tomershaniii force-pushed the table-options branch 3 times, most recently from a3f2732 to 0cd38de Compare March 12, 2025 11:21
@tomershaniii
Copy link
Contributor

@iffyio @mvzink appreciate your feedback on the updated PR.
BTW, please pull this PR out of WIP (I do not have the permissions to do so)

Copy link
Contributor

@mvzink mvzink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Wishlist: I would still like to see this used in ALTER TABLE as well.)

Comment on lines 6887 to 7055
fn parse_plain_option(&mut self) -> Result<Option<SqlOption>, ParserError> {
if let Some(option) = self.dialect.parse_plain_option(self)? {
return Ok(Some(option));
}

// Parse table options in any order
if let Some(keyword) = self.parse_one_of_keywords(&[
Keyword::ENGINE,
Keyword::AUTO_INCREMENT,
Keyword::DEFAULT,
Keyword::CHARSET,
Keyword::COLLATE,
Keyword::INSERT_METHOD,
]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the motivation for making this a dialect parsing method. Is there a problem with putting all the supported options across all dialects in here? It especially raises my eyebrow to see the two lists have so much overlap, including with (afaik) MySQL-specific options like ENGINE and AUTO_INCREMENT in the generic set. If we are going to have Postgres and MSSQL parse those, why not have them parse DATA DIRECTORY too?

Copy link
Contributor

@tomershaniii tomershaniii Mar 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bigger question if the library should branch out to dialect specific parsing or maintain common code supporting all dialects, looking at SnowflakeDialect is seems like a specific implementation exists (parse_create_table) and looking at the common code i can see several places where the code diverges based on dialect (mostly hive related).

Having parse_create_table as part of the dialect is what guided the decision to pull this code into the dialect, I may have overlooked ENGINE and AUTO_INCREMENT, prefer to pull those into mysql code rather than take mysql code out.

LMKWYT

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, that's a bigger question. I don't have a strong suggestion at the moment and this certainly seems servicable to me. My 2¢ on the bigger question is that introducing more dialect overrides (and actually this is more similar to parse_column_option, which is global, than Snowflake's parse_create_table which is only called from SnowflakeDialect::parse_statement, the normal top-level override) is a slippery slope toward fully dynamically dispatched parsing, but it's more justifiable if it's parsing something with a substantially different syntactic structure. However, this is mostly just parsing a different list of acceptable keywords but doing it in the same way. In such cases, I personally think more permissive generic parsing is preferable; e.g. does it really matter if SnowflakeDialect accepts ROW_FORMAT = COMPACT even though it doesn't make sense, so long as it doesn't introduce syntactic ambiguity? The exceptions of special parsing routines for a few options like DATA DIRECTORY and TABLESPACE are a good argument though. If anything, I would probably favor putting all the not-so-special options in the default implementation and only using the override for these options that require special parsing. Honestly, if it wouldn't introduce ambiguity (mostly due to options not being comma separated), I would probably even prefer the default/fallback routine for non-special options just be "parse ident, optionally parse equals sign, parse expr"—that's all thats's going to end up in the SqlOption::KeyValue at the end anyway, so why do we care which specific list of idents are acceptable? That is more of a semantic than syntactic question IMO.

src/ast/mod.rs Outdated

Union(Vec<Ident>),

TableEngine(TableEngine),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh I haven't checked where else SqlOption is used, but it really seems like it should be called TableOption, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be used all over ( ColumnOption, CreateConnector, CreateSchema, Cache & more)
looks like it's mainly used for the key/value variant.
IMO it makes sense to split out into a key/value type and a TableOption type (which has a K/V variant).
I think we should save this for another PR, there is a lot of scope in this one as is.

pub file_format: Option<FileFormat>,
pub location: Option<String>,
pub query: Option<Box<Query>>,
pub without_rowid: bool,
pub like: Option<ObjectName>,
pub clone: Option<ObjectName>,
pub engine: Option<TableEngine>,
pub comment: Option<CommentDef>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if comment should be folded in to options as well (with a new SqlOption::Comment(CommentDef) variant)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, will do

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the latest commit,
note that the hive dialect still has some specific requirements

let value = self.next_token();

match (keyword, value.token) {
(Keyword::AUTO_INCREMENT, Token::Number(n, l)) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is missing the optional = parsing that was present in the old version (and I believe was correct).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the let _ = self.consume_token(&Token::Eq); a could of lines earlier or is it something else i've missed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, I totally missed that!

src/ast/mod.rs Outdated
Comment on lines 6932 to 7320
TableSpace(TablespaceOption),

Union(Vec<Ident>),

TableEngine(TableEngine),
Copy link
Contributor

@iffyio iffyio Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a description (ideally with syntax example) and links to the docs where these variants come from?
this might not be valid given my other comment

Comment on lines 159 to 167
//Some consts
const COMPRESSION_OPTS: [&str; 3] = ["ZLIB", "LZ4", "NONE"];
const INSERT_METHODS_OPTS: [&str; 3] = ["NO", "FIRST", "LAST"];

let keyword = match parser.parse_one_of_keywords(&[
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to clarify my comment here my thinking is that the options are plain key value pairs and idealy would be parsed as such. So that I imagined the parser shouldnt look for explicit keywords, similar to other dialects like bigquery it only looks to parse a key and a value.

Essentially, that it looks to parse and represent something like this I was thinking

 SqlOption::Plain { // Similar to SqlOption::KeyValue except it doesnt format with =
        key: Ident,
        value: Expr,
    }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it would be better to parse more accurately where is is easy to implement (i.e. only a limited set of values are allowed). it does provide some value and I do not see the downside given that it's in dialect specific code.
LMKWYT

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The downside is that there's a bit of code and API to power and maintain in any case, not isolated to the dialect's parsing, the keyword list and the AST are affected as well. But it also implies that we have to model and keep up with all options across all dialects in order to be correct/consistent going forward which would be a bigger ask.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I philosophically agree with @iffyio's ask here, as I mention in my other big ranty comment (sorry 😛). However, I am not 100% convinced in practice, mostly because these options aren't comma separated and this could produce some really confusing parses if users make errors. I imagined the default parsing routine would be "parse ident, optionally parse equals sign, parse expr". But suppose a user intended to send ROW_FORMAT COMPACT DEFAULT CHARACTER SET latin1, but accidentally deleted COMPACT. Instead of a parse error like they would get if options were comma separated or hardcoded, they get a successful parse with ROW_FORMAT = DEFAULT followed by CHARACTER SET = latin1. Is it our fault? Not really. Is it better than the maintenance burden of keeping a list of valid one-word option names up to date? I'm honestly not sure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that behavior makes sense for the parser to mitigate, maybe we could explicitly track the one word options and multiple keyword options to avoid ambiguity? something like:

fn parse_plain_option(&self, parser) -> Result<Option<SqlOption>> {
  if self.parse_keywords(START, TRANSACTION) {
    return Ok(Some(SqlOption::Ident("START TRANSACTION")))
  } // ...

  let key = if self.parse_keywords(DEFAULT, CHARACTER, SET) {
    Ident::new("DEFAULT CHARACTER SET")
  } // ...
  else {
    self.parse_ident()?
  };

  let _ = self.consume_token('=')?;
  let value = self.parse_expr()?;
  Ok(Some(SqlOption::KeyValue {
    key, value
  }))
}

In that, the special cases are only a handful, and we're only tracking their keys, values are opaque.

Hmm interesting with the example, I wonder how mysql interprets ROW_FORMAT DEFAULT CHARACTER SET latin1. Since DEFAULT seems to also be a valid value for ROW_FORMAT from their docs

@iffyio iffyio marked this pull request as ready for review March 12, 2025 19:22
@tomershaniii tomershaniii force-pushed the table-options branch 3 times, most recently from 46deb3b to f1cacd2 Compare March 23, 2025 16:46
@tomershaniii
Copy link
Contributor

@iffyio @mvzink thanks for the feedback, did some additional work per your comments (& commented on other)
LMK if makes sense


// COMPRESSION [=] {'ZLIB' | 'LZ4' | 'NONE'}
(Keyword::COMPRESSION, Token::SingleQuotedString(s))
if COMPRESSION_OPTS.contains(&s.to_uppercase().as_str()) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine but honestly I would skip this check entirely for reasons in a similar vein to what I mentioned in my other comment about dialect overrides. Since they are all single quoted strings anyway, it's not really a syntactic question as to which compression algorithms are allowed, but a semantic one. What if I were writing a MySQL-compatible DBMS with added compression options? Are you really going to make me override and copy and paste this method when the syntax is identical but the semantic content has changed? It's also a maintainability hazard if MySQL (or any MySQL-compatible system that users are parsing with MySqlDialect) adds a new compression algorithm. I think I feel the same way about INSERT_METHODS_OPTS but it's a tad less clear since those are identifiers not strings.

If you are gonna keep these checks, I would inline the arrays instead of defining them at the top of the function, to keep that info local to where it's used.

Comment on lines 6887 to 7055
fn parse_plain_option(&mut self) -> Result<Option<SqlOption>, ParserError> {
if let Some(option) = self.dialect.parse_plain_option(self)? {
return Ok(Some(option));
}

// Parse table options in any order
if let Some(keyword) = self.parse_one_of_keywords(&[
Keyword::ENGINE,
Keyword::AUTO_INCREMENT,
Keyword::DEFAULT,
Keyword::CHARSET,
Keyword::COLLATE,
Keyword::INSERT_METHOD,
]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, that's a bigger question. I don't have a strong suggestion at the moment and this certainly seems servicable to me. My 2¢ on the bigger question is that introducing more dialect overrides (and actually this is more similar to parse_column_option, which is global, than Snowflake's parse_create_table which is only called from SnowflakeDialect::parse_statement, the normal top-level override) is a slippery slope toward fully dynamically dispatched parsing, but it's more justifiable if it's parsing something with a substantially different syntactic structure. However, this is mostly just parsing a different list of acceptable keywords but doing it in the same way. In such cases, I personally think more permissive generic parsing is preferable; e.g. does it really matter if SnowflakeDialect accepts ROW_FORMAT = COMPACT even though it doesn't make sense, so long as it doesn't introduce syntactic ambiguity? The exceptions of special parsing routines for a few options like DATA DIRECTORY and TABLESPACE are a good argument though. If anything, I would probably favor putting all the not-so-special options in the default implementation and only using the override for these options that require special parsing. Honestly, if it wouldn't introduce ambiguity (mostly due to options not being comma separated), I would probably even prefer the default/fallback routine for non-special options just be "parse ident, optionally parse equals sign, parse expr"—that's all thats's going to end up in the SqlOption::KeyValue at the end anyway, so why do we care which specific list of idents are acceptable? That is more of a semantic than syntactic question IMO.

Comment on lines 159 to 167
//Some consts
const COMPRESSION_OPTS: [&str; 3] = ["ZLIB", "LZ4", "NONE"];
const INSERT_METHODS_OPTS: [&str; 3] = ["NO", "FIRST", "LAST"];

let keyword = match parser.parse_one_of_keywords(&[
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I philosophically agree with @iffyio's ask here, as I mention in my other big ranty comment (sorry 😛). However, I am not 100% convinced in practice, mostly because these options aren't comma separated and this could produce some really confusing parses if users make errors. I imagined the default parsing routine would be "parse ident, optionally parse equals sign, parse expr". But suppose a user intended to send ROW_FORMAT COMPACT DEFAULT CHARACTER SET latin1, but accidentally deleted COMPACT. Instead of a parse error like they would get if options were comma separated or hardcoded, they get a successful parse with ROW_FORMAT = DEFAULT followed by CHARACTER SET = latin1. Is it our fault? Not really. Is it better than the maintenance burden of keeping a list of valid one-word option names up to date? I'm honestly not sure.

@tomershaniii
Copy link
Contributor

Hi again @iffyio @mvzink, see the latest commit, per your suggestions:

  • Permissive parsing (removed value validation for key/value parameters)
  • Centralized all plain parameters into a single parsing function
  • Simplified the Key/Value parsing code (aligned @iffyio suggestion)

Copy link
Contributor

@mvzink mvzink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for keeping on with this!

Comment on lines 7174 to 7182
} else if self.parse_keywords(&[Keyword::CHARACTER, Keyword::SET]) {
// [DEFAULT] CHARACTER SET [=] charset_name
Ident::new("CHARACTER SET")
} else if self.parse_keyword(Keyword::CHARSET) {
// [DEFAULT] CHARACTER SET [=] charset_name
Ident::new("CHARSET")
} else if self.parse_keyword(Keyword::COLLATE) {
// [DEFAULT] CHARACTER SET [=] charset_name
Ident::new("COLLATE")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments are wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed

Comment on lines 7159 to 7168
let key = if self.parse_keywords(&[Keyword::DEFAULT, Keyword::CHARSET]) {
// [DEFAULT] CHARACTER SET [=] charset_name
Ident::new("DEFAULT CHARSET")
} else if self.parse_keywords(&[Keyword::DEFAULT, Keyword::CHARACTER, Keyword::SET]) {
// [DEFAULT] CHARACTER SET [=] charset_name
Ident::new("DEFAULT CHARACTER SET")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR IMO, but I'd be curious to hear your thoughts & @iffyio's: would it be appropriate to add another SqlOption variant like TableSpace and TableEngine for a few of these options to make inspecting them easier for users? I'm specifically thinking of [DEFAULT] {CHARSET | CHARACTER SET} because as it is if I wanted to extract charset info from DDL, I would have to check for 4 different hardcoded strings which aren't represented in the public API (i.e. there's no DEFAULT_CHARACTER_SET_IDENT or a set of idents I could expect to see). Does the fact that it has 4 possible representations make it worth special casing with a SqlOption::CharSet { default, shorthand, value }? Since I will in fact need to inspect character set info, I may post a PR for that.

For this PR though, to at least make it harder to miss one of those 4 variants, can we put all the charset related branches next to each other?

Copy link
Contributor

@tomershaniii tomershaniii Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grouped the branches by topic.
IMHO another SqlOption makes sense, in the same way Expr::Substring supports both the shorthand flavor (SUBSTR) and the long format (SUBSTRING)

Comment on lines +7183 to +7242
} else if self.parse_keyword(Keyword::KEY_BLOCK_SIZE) {
// KEY_BLOCK_SIZE [=] value
Ident::new("KEY_BLOCK_SIZE")
} else if self.parse_keyword(Keyword::ROW_FORMAT) {
// ROW_FORMAT [=] {DEFAULT | DYNAMIC | FIXED | COMPRESSED | REDUNDANT | COMPACT}
Ident::new("ROW_FORMAT")
} else if self.parse_keyword(Keyword::PACK_KEYS) {
// PACK_KEYS [=] {0 | 1 | DEFAULT}
Ident::new("PACK_KEYS")
} else if self.parse_keyword(Keyword::STATS_AUTO_RECALC) {
// STATS_AUTO_RECALC [=] {DEFAULT | 0 | 1}
Ident::new("STATS_AUTO_RECALC")
} else if self.parse_keyword(Keyword::STATS_PERSISTENT) {
//STATS_PERSISTENT [=] {DEFAULT | 0 | 1}
Ident::new("STATS_PERSISTENT")
} else if self.parse_keyword(Keyword::STATS_SAMPLE_PAGES) {
// STATS_SAMPLE_PAGES [=] value
Ident::new("STATS_SAMPLE_PAGES")
} else if self.parse_keyword(Keyword::DELAY_KEY_WRITE) {
// DELAY_KEY_WRITE [=] {0 | 1}
Ident::new("DELAY_KEY_WRITE")
} else if self.parse_keyword(Keyword::COMPRESSION) {
// COMPRESSION [=] {'ZLIB' | 'LZ4' | 'NONE'}
Ident::new("COMPRESSION")
} else if self.parse_keyword(Keyword::ENCRYPTION) {
// ENCRYPTION [=] {'Y' | 'N'}
Ident::new("ENCRYPTION")
} else if self.parse_keyword(Keyword::MAX_ROWS) {
// MAX_ROWS [=] value
Ident::new("MAX_ROWS")
} else if self.parse_keyword(Keyword::MIN_ROWS) {
// MIN_ROWS [=] value
Ident::new("MIN_ROWS")
} else if self.parse_keyword(Keyword::AUTOEXTEND_SIZE) {
// AUTOEXTEND_SIZE [=] value
Ident::new("AUTOEXTEND_SIZE")
} else if self.parse_keyword(Keyword::AVG_ROW_LENGTH) {
// AVG_ROW_LENGTH [=] value
Ident::new("AVG_ROW_LENGTH")
} else if self.parse_keyword(Keyword::CHECKSUM) {
// CHECKSUM [=] {0 | 1}
Ident::new("CHECKSUM")
} else if self.parse_keyword(Keyword::CONNECTION) {
// CONNECTION [=] 'connect_string'
Ident::new("CONNECTION")
} else if self.parse_keyword(Keyword::ENGINE_ATTRIBUTE) {
// ENGINE_ATTRIBUTE [=] 'string'
Ident::new("ENGINE_ATTRIBUTE")
} else if self.parse_keyword(Keyword::PASSWORD) {
// PASSWORD [=] 'string'
Ident::new("PASSWORD")
} else if self.parse_keyword(Keyword::SECONDARY_ENGINE_ATTRIBUTE) {
// SECONDARY_ENGINE_ATTRIBUTE [=] 'string'
Ident::new("SECONDARY_ENGINE_ATTRIBUTE")
} else if self.parse_keyword(Keyword::INSERT_METHOD) {
// INSERT_METHOD [=] { NO | FIRST | LAST }
Ident::new("INSERT_METHOD")
} else if self.parse_keyword(Keyword::AUTO_INCREMENT) {
Ident::new("AUTO_INCREMENT")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah so for this part I think these can be handled transparently, its what I meant by this section in my previous comment

  let _ = self.consume_token('=')?;
  let value = self.parse_expr()?;
  Ok(Some(SqlOption::KeyValue {
    key, value
  }))

in that the ideal scenario avoids enumerating the various options where possible

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Value part is handled transparently, not sure how we can prevent enumeration for the different keys (or maybe i've missed something ....)
Interesting to note that 'parse_expr' cannot be used in this context as it has explicit handling for the COLLATE keyword

}

// Key/Value parameter option
let key = if self.parse_keywords(&[Keyword::DEFAULT, Keyword::CHARSET]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I thought that it made a lot of sense with the previous version that delegated parsing the options to the (at least mysql) dialect, where the dialect only handles the special cases while the parser handles the generic cases. The behavior would be self documenting too. It would be nice to somehow flag where each parameter is coming from and link to the docs. No need to move things back to the dialect though if that's a hassle, could be done as a follow up afterwards so that part doesn't need to go into this PR I dont think

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conflicting feedback here :-) Prefer we save this to a future PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please defer to @iffyio, avoiding parsers inside Dialect is just my personal preference 😛

self.parse_comma_separated0(Parser::parse_identifier, Token::RParen)?;
self.expect_token(&Token::RParen)?;

return Ok(Some(SqlOption::Union(tables)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of calling this variant Union can make it reusable for future dialects or similar options? something like this maybe

SqlOption::NamedParenthesizedList {
     name: Ident, // Union, TABLEENGINE
     key: Option<Ident>, // table engine name
     values: Vec<Expr>, // (val1, val2 ...)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that, will do

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, see latest commit

Comment on lines +7127 to +7130
Ok(Some(SqlOption::TableSpace(TablespaceOption {
name,
storage,
})))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking similar to Union it would be good to have a reusable representation for this, maybe?

SqlOption::NamedKeyValue {
     name: Ident, // TABLESPACE
     key: Ident,
     value: Expr
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it's a little premature to generalize, not sure what future variations will look like.
(also, this type is different than others as it does not have the optional '=', see tablespace_option )

…eEngine into SqlOption::NamedParenthesizedList
@tomershaniii
Copy link
Contributor

I will have limited internet connectivity / dev time for the rest of this month, appreciate if we could get this PR merged today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants