Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chunk the blacklists and watchlist to reduce regex recompiles upon reloading the blacklists and watchlist #13765

Merged

Conversation

makyen
Copy link
Contributor

@makyen makyen commented Oct 27, 2024

This PR does:

  1. Splits each of the watchlist, bad keywords, blacklisted websites, and blacklisted usernames Rules into two Rules with the bulk of each list of entries in one of the two and up to 100 entries in the second. This results in dramatically reducing the computation time expended upon a watchlist or blacklist change for recompiling the regexes used. Overall, it should reduce the computation time spent on that task by an estimated 95%.
  2. Implements !!/scan-time and !!/scan-force-time, which add an elapsed time to the output of the !!/scan command showing how long it took to execute the scan(s).
  3. Adjusts some of the post titles for test cases in tests/test_findspam.py in order to make the test cases easier to identify in CI testing output.

Testing for this begins here with the times for the final version tested starting here and a baseline without the chunk/split of the BL/WL Rules after that, starting here. The notable difference being the time after the !!/watch opjd(?<!d) command, which is 14.439 seconds in the baseline, but 4.641 seconds with these changes. The 4.641 seconds is only slightly higher than the 4.491 to 4.619 seconds required for the scan when there hasn't been a change that needs at least one of the regular expressions recompiled.

@makyen makyen added area: blacklists area: spamchecks Detections or the process of testing posts. (No space in the label, is because of Hacktoberfest) area: CI testing area: commands type: enhancement Improvements which don't reach the level of being new features. labels Oct 27, 2024
@makyen makyen force-pushed the Mak-chunk-BL-WL-to-reduce-regex-recompile branch from b119df0 to 69a48f7 Compare October 27, 2024 13:54
@makyen
Copy link
Contributor Author

makyen commented Oct 27, 2024

The force-push was only to rebase to the current head, which will make the resulting commit tree look a bit cleaner, as the parent was previously a commit at the end of the prior PR I had submitted and merged, but without the merge commit.

@makyen makyen merged commit 69b9a70 into Charcoal-SE:master Oct 28, 2024
3 checks passed
@makyen makyen deleted the Mak-chunk-BL-WL-to-reduce-regex-recompile branch October 28, 2024 20:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: blacklists area: CI testing area: commands area: spamchecks Detections or the process of testing posts. (No space in the label, is because of Hacktoberfest) type: enhancement Improvements which don't reach the level of being new features.
Development

Successfully merging this pull request may close these issues.

1 participant