Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve first last name split by using language specific rules #17

Open
chrished opened this issue Sep 16, 2022 · 1 comment
Open

improve first last name split by using language specific rules #17

chrished opened this issue Sep 16, 2022 · 1 comment
Labels
enhancement New feature or request

Comments

@chrished
Copy link
Collaborator

For example for spanish we currently have:

firstname : juan
lastname : gonzalez
middlename : eugenio iglesias

However, the main last name is iglesias (the first last name)

Proposal: use https://nationalize.io to predict which country/language a name is from and implement specific rules for those.

Caveat: For spanish names, sometimes people give just the first lastname and sometimes both. So it is not obvious how to handle it automatically

@chrished chrished added the enhancement New feature or request label Sep 16, 2022
@f-hafner
Copy link
Owner

more generally, improve gender assignment

moreover, from #21:

  • firstnames may be only one letter -> we can use a dictionary-syle lookup of the full name, based on the names from genderize we have, and the census names too.
  • for any given linked subsample, we can use the first name in the other dataset if it is spelled out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants