Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add #[recursive] #1522

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

Add #[recursive] #1522

wants to merge 13 commits into from

Conversation

blaginin
Copy link

@blaginin blaginin commented Nov 14, 2024

Closes #984, related to apache/datafusion#9375 (comment)

Todo:

  • Test the performance overhead on large queries. If it's too high, move it to a separate feature to allow opting out

@blaginin
Copy link
Author

blaginin commented Nov 14, 2024

I was wondering if we should remove RecursionCounter with this PR. In my opinion, we shouldn't, because the ability to limit max recursion might be useful for some users still

@blaginin
Copy link
Author

blaginin commented Nov 15, 2024

It seems this PR doesn't have any significant performance impact.

critcmp main recursive                        
group                                                    main                                   recursive
-----                                                    ----                                   ---------
sqlparser-rs parsing benchmark/format_large_statement    1.00   309.1±12.72µs        ? ?/sec    1.01    312.4±7.45µs        ? ?/sec
sqlparser-rs parsing benchmark/parse_large_statement     1.02      6.3±0.35ms        ? ?/sec    1.00      6.2±0.40ms        ? ?/sec
sqlparser-rs parsing benchmark/parse_sql_large_query     1.00      6.1±0.23ms        ? ?/sec  
sqlparser-rs parsing benchmark/sqlparser::select         1.02  1893.8±82.19ns        ? ?/sec    1.00  1855.1±25.08ns        ? ?/sec
sqlparser-rs parsing benchmark/sqlparser::with_select    1.03     11.7±1.20µs        ? ?/sec    1.00     11.4±0.28µs        ? ?/sec

@blaginin blaginin marked this pull request as ready for review November 15, 2024 19:22
@blaginin
Copy link
Author

FYI, I marked this ready for review. @peter-toth @Eason0729, if you guys want to take a look 🌻

@iffyio
Copy link
Contributor

iffyio commented Nov 16, 2024

@blaginin thanks for looking to fix this!

Currently the preference is to avoid a third-party dependency for this issue, ideally fixing up the parser behavior instead to properly handle deeply nested input. See comment here for a bit more context on rationale
From a quick look at recursive it seems to be a procmacro on top of the stacker library so that the same considerations should apply I imagine.

Copy link
Contributor

@Eason0729 Eason0729 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @blaginin. LGTM except for some number in unit test.

I left some comments, and any number above 40000 seems to overflow stack without #[recursive].

@@ -42,6 +42,46 @@ fn basic_queries(c: &mut Criterion) {
group.bench_function("sqlparser::with_select", |b| {
b.iter(|| Parser::parse_sql(&dialect, with_query));
});

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For large_statement, making separated test would make differentiating potential(future) regression easier.
It's a suggestion(fine to leave it as it is).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understood your comment, sorry. I added tests for parsing large statements in tests/sqlparser_common.rs. Do you think we should test something else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thinks it would be better to add as separated test.

tests/sqlparser_common.rs Show resolved Hide resolved
src/ast/visitor.rs Show resolved Hide resolved
@peter-toth
Copy link

peter-toth commented Nov 16, 2024

Currently the preference is to avoid a third-party dependency for this issue, ideally fixing up the parser behavior instead to properly handle deeply nested input. See comment here for a bit more context on rationale From a quick look at recursive it seems to be a procmacro on top of the stacker library so that the same considerations should apply I imagine.

@iffyio, we had done some research on stacker / recursive in apache/datafusion#13310 to verify concerns before we added it to datafusion: apache/datafusion#13310 (review) / apache/datafusion#13177 (comment)

@Eason0729
Copy link
Contributor

Currently the preference is to avoid a third-party dependency for this issue, ideally fixing up the parser behavior instead to properly handle deeply nested input. See comment here for a bit more context on rationale From a quick look at recursive it seems to be a procmacro on top of the stacker library so that the same considerations should apply I imagine.

@iffyio, we had done some research on stacker / recursive in apache/datafusion#13310 to verify concerns before we added it to datafusion: apache/datafusion#13310 (review) / apache/datafusion#13177 (comment)

Sorry for missing that context when reviewing. So we would like to avoid using recursive for stablility, maybe try...

  1. vender the code from recursive
  2. use stacker directly, like fix: add stacker and maybe_grow on recursion guard #1468

@blaginin
Copy link
Author

Thanks for the review!! 🙂

Sorry for missing that context when reviewing. So we would like to avoid using recursive for stablility, maybe try...

We ended up using #[recursive] in Datafusion, as Peter highlighted. Feels like it’s good to stay consistent across projects?

@iffyio
Copy link
Contributor

iffyio commented Nov 17, 2024

Ah I see, thanks for the context @peter-toth!

cc @alamb for overall thoughts on adding this dependency to sqlparser?

@alamb
Copy link
Contributor

alamb commented Nov 24, 2024

Ah I see, thanks for the context @peter-toth!

cc @alamb for overall thoughts on adding this dependency to sqlparser?

While adding new dependencies in general ls 🤮 I don't think there is any viable alternative in this case

We have tried to avoid doing something like stacker for several years with sqlparser and datafusion, but I feel like we have only been able to band aid over the problem, not fix the problem once and for all.

I am hopeful that if we adopt this particular crate we won't have to worry about it again 🤞

@alamb
Copy link
Contributor

alamb commented Nov 24, 2024

I think this PR needs a bit more documentation and we shoudl figure out how to rationalize with the existing recursion_limit argument.

https://docs.rs/sqlparser/latest/sqlparser/parser/struct.Parser.html#method.with_recursion_limit

@blaginin
Copy link
Author

@alamb, I feel like recursion_limit should stay because:

  • recursive doesn't solve the issue when std isn’t enabled.
  • Removing it would be a breaking change.
  • It can be a useful feature even with recursive protection, for example, if lib users have additional constraints on the parsed query.

I added notes in the methods I touched; should be better now ✋

Copy link
Contributor

@Eason0729 Eason0729 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Eason0729
Copy link
Contributor

It seems like we reached the decision to add recursive instead of using underlying dependency(stacker).

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks good -- any other thoughts @iffyio

Thank you @blaginin and @Eason0729 for pushing this along

@blaginin
Copy link
Author

blaginin commented Dec 2, 2024

Just for transparency, there's apache/datafusion#13513 raised in Datafusion but I believe it shouldn't be the reason not to merge this one (happy to be challenged)

@alamb
Copy link
Contributor

alamb commented Dec 2, 2024

Just for transparency, there's apache/datafusion#13513 raised in Datafusion but I believe it shouldn't be the reason not to merge this one (happy to be challenged)

Maybe we could make it an optional dependency 🤔

@blaginin
Copy link
Author

blaginin commented Dec 2, 2024

nice idea actually! will do

Copy link
Contributor

@iffyio iffyio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

README.md Outdated Show resolved Hide resolved
@blaginin
Copy link
Author

blaginin commented Dec 6, 2024

resolved conflicts, should be good to merge now 🤗

@alamb
Copy link
Contributor

alamb commented Dec 11, 2024

Given the potential for unintended side effects with this change, I think we should merge it in asap after we have released

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stack overflow in Statement::to_string for deeply nested expresions
5 participants