-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Measure scrubadub accuracy on real data #72
Comments
It would be useful to include that as a repeatable test |
Yes agreed, we should do something similar to the fake data, where it is run in the CI process for speedy models (and offline for slower models): https://travis-ci.org/github/LeapBeyond/scrubadub/jobs/741856233#L616 |
Thoughts:
|
This might be useful for the addresses: https://www.kaggle.com/openaddresses/openaddresses-europe. No UK data for this one, but perhaps there are some related dataset that have it. |
For the UK locale, for addresses and postcode, we can perhaps use the Companies House open database https://www.gov.uk/government/organisations/companies-house |
We can measure scrubadub accuracy with fake data (see #59), it would be good to find real data sources where this is also possible.
The text was updated successfully, but these errors were encountered: