-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cudf::strings::find_re API #16742
Add cudf::strings::find_re API #16742
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a pylibcudf test for find_re
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about stream test for the new API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding a test. I'll wait to approve until we decide whether to seperate find_re
and find_all
or combine them under a common file name as per this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving python changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question about the pylibcudf API / code location. Otherwise LGTM.
@@ -38,3 +38,35 @@ cpdef Column findall(Column input, RegexProgram pattern): | |||
) | |||
|
|||
return Column.from_libcudf(move(c_result)) | |||
|
|||
|
|||
cpdef Column find_re(Column input, RegexProgram pattern): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this go in a new file, or should this file be renamed? It seems incorrect to have find_re
defined in the findall
file, since they're different algorithms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was told to keep match these with the header files. So find_re
and findall
are both declared in findall.hpp
. They are common only in that both use regex I suppose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't realize find_re
was in findall.hpp
. Then yes, this is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(But it does raise the question of why find_re
is in findall.hpp
-- it doesn't "find all")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC they will be combined into a common file as per https://github.com/rapidsai/cudf/pull/16742/files#r1781466464
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The find_re
has more in common with findall
than just find
based on the parameters and how they are used just in not what they return.
Perhaps find_re()
would be better in contains.hpp
which has contains_re()
Since find.hpp
has contains()
and find()
(among other similar functions)
So contains.hpp
would have contains_re()
and find_re()
.
I could do that in a separate PR since this one is already approved.
/merge |
Description
Adds the
cudf::strings::find_re
andstr.find_re
API to libcudf/pylibcudf/cudf. This function returns the character position where the pattern first matches in each row of the input column. If a match is not found, -1 is returned for that corresponding row.Closes #16729
Checklist