Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix regex catastrophic backtracking #269

Merged

Conversation

NovemLinguae
Copy link
Member

Fix #245

This new code is vulnerable to deleting the wrong heading if someone puts a category in the wrong place (not at the bottom of the article), but I think that's an acceptable tradeoff for now. If it actually affects someone we can make the solution more complex in a future patch.

@NovemLinguae NovemLinguae force-pushed the catastrophic-backtracking branch from 7b6ee32 to 5e7e3b9 Compare June 2, 2023 22:41
@NovemLinguae
Copy link
Member Author

NovemLinguae commented Jun 15, 2023

Note to self. Algorithm idea to fix this last case ("someone puts a category in the wrong place (not at the bottom of the article)"). Will code this up later and add to patch (and will add some more test cases):

let textBetweenFirstCategoryAndEndOfFile = wikitext.match(/\[\[:?Category:.*$/);
// delete categories from sampled text
textBetweenFirstCategoryAndEndOfFile = textBetweenFirstCategoryAndEndOfFile.replace(/\[\[:?Category:[^\]]+\]\]/g, '');
// does the non-category sample text have anything except whitespace?
let hasNonWhitespace = textBetweenFirstCategoryAndEndOfFile.match(/\S/);
if ( hasNonWhitespace ) {
    return;
}

@NovemLinguae
Copy link
Member Author

Note to self: maybe I can fix this by tweaking the existing regex. Look into some of the ideas in this article, in the "Possessive Quantifiers and Atomic Grouping to The Rescue" section:

https://www.regular-expressions.info/catastrophic.html

Fix wikimedia-gadgets#245

This new code is vulnerable to deleting the wrong heading if someone puts a category in the wrong place (not at the bottom of the article), but I think that's an acceptable tradeoff for now. If it actually affects someone we can make the solution more complex in a future patch.
@NovemLinguae NovemLinguae force-pushed the catastrophic-backtracking branch from b9dd73d to 51e7f73 Compare December 7, 2023 11:30
@NovemLinguae
Copy link
Member Author

OK, I rewrote this and solved all the issues. Ready for review. This alg should be identical to the old alg, but is iterative instead of using regex, so no catastrophic backtracking problems.

@NovemLinguae NovemLinguae merged commit 916c622 into wikimedia-gadgets:master Dec 22, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Regex catastrophic backtracking when h2 followed by 100 spaces
1 participant