NameDivider is a tool for dividing the Japanese full name into a family name and a given name.
input: 菅義偉 -> output: 菅 義偉
NameDivider divides the name using statistical information of the kanji used in the names.
Measuring the accuracy using a privately held data set, the accuracy is 99.91%.
You can see how it works with this demo.
pip install namedivider-python
It's simple to use.
from namedivider import BasicNameDivider, GBDTNameDivider
from pprint import pprint
basic_divider = BasicNameDivider() # BasicNameDivider is fast but accuracy is 99.2%
divided_name = basic_divider.divide_name("菅義偉")
gbdt_divider = GBDTNameDivider() # GBDTNameDivider is slow but accuracy is 99.9%
divided_name = gbdt_divider.divide_name("菅義偉")
print(divided_name)
# 菅 義偉
pprint(divided_name.to_dict())
# {'algorithm': 'kanji_feature',
# 'family': '菅',
# 'given': '義偉',
# 'score': 0.7300634880343344,
# 'separator': ' '}
For more advanced features, see here.
NameDivider API is a Docker container that provides a RESTful API for dividing the Japanese full name into a family name and a given name.
I am developing NameDivider API to provide NameDivider functionality to non-Python language users.
docker pull rskmoi/namedivider-api
- Run Docker Image
docker run -d --rm -p 8000:8000 rskmoi/namedivider-api
- Send HTTP request
curl -X POST -H "Content-Type: application/json" -d '{"names":["竈門炭治郎", "竈門禰豆子"]}' localhost:8000/divide
- Response
{
"divided_names":
[
{"family":"竈門","given":"炭治郎","separator":" ","score":0.3004587452426102,"algorithm":"kanji_feature"},
{"family":"竈門","given":"禰豆子","separator":" ","score":0.30480429696983175,"algorithm":"kanji_feature"}
]
}
names
is a list of undivided name. The maximum length of the list is 1000.- If you require speed or want to use GBDTNameDivider, please try v0.2.0-beta.
Read namedivider/cli.py for more information.
$ nmdiv name 菅義偉
菅 義偉
$ nmdiv file undivided_names.txt
100%|███████████████████████████████████████████| 4/4 [00:00<00:00, 4194.30it/s]
原 敬
菅 義偉
阿部 晋三
中曽根 康弘
$ nmdiv accuracy divided_names.txt
100%|███████████████████████████████████████████| 5/5 [00:00<00:00, 3673.41it/s]
0.8
True: 滝 登喜男, Pred: 滝登 喜男
MIT License
cc-by-sa-4.0
- English
(1) Purpose of use
family_name_repository.pickle is available for commercial/non-commercial use if you use this software to divide name, and to develop algorithms for dividing name.
Any other use of family_name_repository.pickle is prohibited.
(2) Liability
The author or copyright holder assumes no responsibility for the software.
- Japanese
(1) 利用目的
このソフトウェアを用いて姓名分割、および姓名分割アルゴリズムの開発をする場合、family_name_repository.pickleは商用/非商用問わず利用可能です。
それ以外の目的でのfamily_name_repository.pickleの利用を禁じます。
(2) 責任
作者または著作権者は、family_name_repository.pickleに関して一切の責任を負いません。
The family name data used in family_name_repository.pickle is provided by Myoji-Yurai.net(名字由来net).
- Porting Python to Rust