Documentation: https://5uperpalo.github.io/surname_heritage_classifier/
A hobby project to classify surnames to countries and areas of the world. An attempt for an open source alternative to paid services:
- https://nationalize.io/our-data
- https://namsor.app/
- https://forebears.io/onograph/
- https://census.name/
- ~1000e for their database
used data:
- data/name_dataset
- data/annotated_names_NamePrism.tsv
- kaggle surname-dataset-classification
- data/final_all_names_code.csv
- data/name2lang.txt
aggregated data:
code based on:
other ideas:
- query names and origin countries somehow from wiki https://opendata.stackexchange.com/a/13199
- maybe somehow get more surnames from here: https://en.wiktionary.org/wiki/Appendix:Names
- rerun data gathering from wiki-nationality-estimate