Fuzzy regex matching #80

thomasbird · 2020-11-25T12:21:38Z

Due to typos or OCR errors regex patterns may not always match when they probably should, e.g. typing capital-O instead of zero in a british postcode, where letters and numbers are not usually interchangeable.

It might be interesting to allow regex's to be matched fuzzily, and the package regex allows this!
https://pypi.org/project/regex/#approximate-fuzzy-matching-hg-issue-12-hg-issue-41-hg-issue-109

We should investigate its use instead of the built in re.

The text was updated successfully, but these errors were encountered:

aCampello · 2020-11-25T23:10:45Z

Yes, that should be a really good approach. It seems regex is backwards compatible, so we can replace it!

We have to figure out exactly how many errors we will allow, and perhaps default to 0, to be backwards compatible, but I can visualise that every detector that detects RegexFilth should be able to have a 'exact' regex and it's approximate counterpart.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fuzzy regex matching #80

Fuzzy regex matching #80

thomasbird commented Nov 25, 2020

aCampello commented Nov 25, 2020

Fuzzy regex matching #80

Fuzzy regex matching #80

Comments

thomasbird commented Nov 25, 2020

aCampello commented Nov 25, 2020