Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

为什么分词的时候,部分词语丢掉了(已解决) #10

Open
GoogleCodeExporter opened this issue Mar 9, 2016 · 4 comments

Comments

@GoogleCodeExporter
Copy link
Contributor

切换到jar目录后,直接运行测试:
java -jar jcseg-core-1.8.8.jar
结果如下:
1. 叔叔亲了我妈妈也亲了我
   分词语结果:叔叔 亲了 妈妈 亲了
2. 我和你是好朋友
   分词结果:好朋友

What version of the product are you using? On what operating system?
操作系统 Win8 64位 
java版本:1.7.0_15


Original issue reported on code.google.com by [email protected] on 23 Jul 2013 at 3:32

@GoogleCodeExporter
Copy link
Contributor Author

哈,这个不是bug。

默认情况下jcseg的停止词过滤功能是开启的,这个在检索领域
很有作用。

如果你不需要这个功能,在jcseg.properties配置文件中配置jcseg.c
learstopword=0来关闭该功能。

感谢你的反馈。

Original comment by [email protected] on 24 Jul 2013 at 10:56

  • Changed title: 为什么分词的时候,部分词语丢掉了(已解决)

@GoogleCodeExporter
Copy link
Contributor Author

哈哈 谢谢你的回复,已经了解了,这个分词确实不错。

Original comment by [email protected] on 28 Jul 2013 at 2:38

@GoogleCodeExporter
Copy link
Contributor Author

另外提一个问题哈,针对多音字的情况
比如:单田芳,加载出来后,拼音为:dan tian fang

Original comment by [email protected] on 28 Jul 2013 at 2:42

@GoogleCodeExporter
Copy link
Contributor Author

这个是姓名识别功能切分出来的组合词,词的拼音是根据词��
�处理的。并没有考虑多音字,毕竟jcseg主要重点在分词。

解决办法:

将这个词单独作为一个词条放入到CJK_WORDS主词库,然后加上��
�确的拼音即可。

Original comment by [email protected] on 28 Jul 2013 at 4:01

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant