-
Notifications
You must be signed in to change notification settings - Fork 569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
regex fails to match (bad trie optimization?) #22892
Comments
Something is borked in the AHO-CORASICK code. I'm surprised this wasn't noticed long ago. It looks to me like its bailing out of the AHO-CORASICK search too early. FTR: The general idea of AHO-CORASICK is that we build a fail table so that when we fail to find an accepting transition in a given state we fail to the next most appropriate state. We seem to be doing that, but then bailing out of the process too early. The fail transition to state 7, is simultaneously an accepting state for 'C' at ofs 2, but a continuing state of BCDE on the letter D at ofs 1. For some reason we are exiting right after we accept the C, without realizing we can go further. I need to dig into this, just leaving a record in case I cant return immediately to it. Simpler case:
|
…ases properly In some circumstances the AHO-CORASICK logic wasn't matching properly when there were two possibilities whose proper prefix matches a proper suffix of a third possibilty, and one of those possibilities was shorter than the other. This was because we were resetting the failed flag properly. This bug must be rare because it took more than a decade for anyone to notice. This patch fixes the problem by resetting the failed flag after a successful transition. A good example of this problem is as follows: "ABCDE" =~ m/ABCF|BCDE|C/ This should match 'BCDE' and not 'C'. Because of the flag issue we were matching 'C' instead. This fixes #22892
In some circumstances the AHO-CORASICK logic wasn't matching properly when there were two possibilities whose proper prefix matches a proper suffix of a third possibilty, and one of those possibilities was shorter than the other. This was because we were NOT resetting the 'failed' flag properly. This bug must be rare because it took more than a decade for anyone to notice. This patch fixes the problem by resetting the failed flag after a successful transition. A good example of this problem is as follows: "ABCDE" =~ m/ABCF|BCDE|C/ This should match 'BCDE' and not 'C'. Because of the flag issue we were matching 'C' instead. This fixes #22892
and add a bit of extra metadata to both. I used a cruder version of this to debug #22892
and add a bit of extra metadata to both. I used a cruder version of this to debug #22892
and add a bit of extra metadata to both. I used a cruder version of this to debug #22892
In some circumstances the AHO-CORASICK logic wasn't matching properly when there were two possibilities whose proper prefix matches a proper suffix of a third possibilty, and one of those possibilities was shorter than the other. This was because we were NOT resetting the 'failed' flag properly. This bug must be rare because it took more than a decade for anyone to notice. This patch fixes the problem by resetting the failed flag after a successful transition. A good example of this problem is as follows: "ABCDE" =~ m/ABCF|BCDE|C/ This should match 'BCDE' and not 'C'. Because of the flag issue we were matching 'C' instead. This fixes Perl#22892
and add a bit of extra metadata to both. I used a cruder version of this to debug Perl#22892
and add a bit of extra metadata to both. I used a cruder version of this to debug Perl#22892
Description
Some regexes involving alternations unexpectedly fail to match.
See also: Initial report by jwz and discussion at https://www.jwz.org/blog/2025/01/now-i-have-two-problems/.
"ABCDE" =~ m/ABCF|BCDE|C(G)/
fails"ABCDE" =~ m/ABCF|BCDE|CG/
succeeds"ABCDE" =~ m/ABCF|BCDE|C[Gg]/
fails"ABCDE" =~ m/ABCF|BCD[Ee]|C[Gg]/
succeedsAll of these should match.
The last perl version that handled this correctly was 5.8.9. Everything from 5.10 on seems to be broken, which points to the trie optimization introduced in 5.10.
Steps to Reproduce
Expected behavior
Perl configuration
The text was updated successfully, but these errors were encountered: