Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Name scrubbing issues #19

Closed
srinivasgumdelli opened this issue Mar 9, 2016 · 2 comments
Closed

Name scrubbing issues #19

srinivasgumdelli opened this issue Mar 9, 2016 · 2 comments

Comments

@srinivasgumdelli
Copy link

I was testing out this impressive library and my text files had some names which start with a lower case letter (example: sarah), these kinds were not being filtered.

One more issue that I found with 1.0.3 version was

Hello. Please testing

will be replaced by

{{NAME}}. {{NAME}} testing

Thanks,
Sri

@deanmalmgren
Copy link
Collaborator

Thanks for bringing this to our attention, @srinivasgumdelli! After digging around a bit, it appears that the problem with words like Hello and Please started with textblob version 0.10.1. I've pinned the textblob version to 0.10.0 which should address the Hello and Please issue you were having.

Lower case names remains an issue though, which I suspect will be better addressed by using machine learning techniques (#16) vs strict natural language processing. I added a unit test for this so we can be sure to address that in a more robust way in the future. For now, we're skipping the unit test.

If you have any other suggestions for the package, please let me know!

@deanmalmgren
Copy link
Collaborator

Oh, and I just released version 1.1.0 of scrubadub that should address this issue. You should be able to pip install -U scrubadub and hopefully that will address most of the problems you were having.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants