-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: scalar regex match physical expr #12270
base: main
Are you sure you want to change the base?
feat: scalar regex match physical expr #12270
Conversation
Thank you for this PR @zhuliquan . Have you run any benchmarks that show this approach is noticeably faster than the existing approach? It makes sense that it would be faster as it does not re-compile the regular expression for each batch, but I think it would help to quantify this difference |
yeah add benchmarks
|
d49edca
to
e9fc6c7
Compare
9f02ab6
to
f1a81a7
Compare
f1a81a7
to
493a47a
Compare
ed8688d
to
62f86a5
Compare
b794f95
to
22b5297
Compare
Hello @alamb, I have compared my approach to original |
I wonder if we can see improvements on queries in benchmarks with scalar regexes, e.g. clickbench? |
Emm, It means that we should add some regex matching queries in benchmarks first. |
5f166ec
to
0de7a4f
Compare
1cda23a
to
fc70323
Compare
Which issue does this PR close?
Closes #11146.
Rationale for this change
This PR is successor of PR #11455
BinaryExpr
will compile literal regex pattern when it evaluatingRecordBatch
every time, Sometime, the time of compiling regex pattern is also expensive. In our approach, literal regex pattern will be compiled once and cached to be reused in execution. It's will save compile time of pre execution and speed up execution.What changes are included in this PR?
ScalarRegexMatchExpr
to handle regexp match with literal regrex pattern.PhysicalScalarRegexMatchExprNode
in proto to handleScalarRegexMatchExpr
and add arm in funcparse_physical_expr
andserialize_physical_expr
.BinaryExpr
arm increate_physical_expr
. CreatingScalarRegexMatchExpr
instead ofBinaryExpr
when Rhs is string literal expr andop
isRegexMatch | RegexIMatch | RegexNotMatch | RegexNotIMatch
.Are these changes tested?
Yes, test mod in
scalar_regex_match.rs
Are there any user-facing changes?