Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Regex API which returns RzIterator instead of PVectors. #4852

Open
Rot127 opened this issue Jan 20, 2025 · 0 comments
Open

Add Regex API which returns RzIterator instead of PVectors. #4852

Rot127 opened this issue Jan 20, 2025 · 0 comments
Labels
performance A performance problem/enhancement refactor Refactoring requests RzUtil

Comments

@Rot127
Copy link
Member

Rot127 commented Jan 20, 2025

Is your feature request related to a problem? Please describe.

The regex API currently allocates RzPVectors and RzRegexMatch. This is of course not optional. Because we could save the allocations. PCRE2 itself doesn't allocate its matches.

Describe the solution you'd like

Instead we should add an implementation which returns an RzIterator.
The RzIterator->next could just do the pcre2_match(), increment the offset into the buffer and return the RzRegexMatch object over the stack.
In the end RzRegexMatch is rather small.

One problem is grouping of multiple matches per buffer. Implementing match_all requires an additional index. To indicate where each RzRegexMatch belongs to.

Regex: "(Te(x)t)"
Possible groups: 0: (Text) 1: (x)

Match all on: "TextTextText"

Gives: 
[
{0: Text, 1: x},
{0: Text, 1: x},
{0: Text, 1: x},
]

For our match_all implementation with RzIterators, we somehow have to get the indices for each {0: Text, 1: x}.

Describe alternatives you've considered

None

Additional context
...

@Rot127 Rot127 added performance A performance problem/enhancement RzUtil labels Jan 20, 2025
@notxvilka notxvilka added the refactor Refactoring requests label Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance A performance problem/enhancement refactor Refactoring requests RzUtil
Projects
None yet
Development

No branches or pull requests

2 participants