Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fuzzy regex matching #80

Open
thomasbird opened this issue Nov 25, 2020 · 1 comment
Open

Fuzzy regex matching #80

thomasbird opened this issue Nov 25, 2020 · 1 comment

Comments

@thomasbird
Copy link
Member

Due to typos or OCR errors regex patterns may not always match when they probably should, e.g. typing capital-O instead of zero in a british postcode, where letters and numbers are not usually interchangeable.

It might be interesting to allow regex's to be matched fuzzily, and the package regex allows this!
https://pypi.org/project/regex/#approximate-fuzzy-matching-hg-issue-12-hg-issue-41-hg-issue-109

We should investigate its use instead of the built in re.

@aCampello
Copy link
Collaborator

Yes, that should be a really good approach. It seems regex is backwards compatible, so we can replace it!

We have to figure out exactly how many errors we will allow, and perhaps default to 0, to be backwards compatible, but I can visualise that every detector that detects RegexFilth should be able to have a 'exact' regex and it's approximate counterpart.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants