Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于tf-idf模型候选数据集过大,是否越大越好? #9

Open
gekelly opened this issue Dec 3, 2019 · 0 comments
Open

关于tf-idf模型候选数据集过大,是否越大越好? #9

gekelly opened this issue Dec 3, 2019 · 0 comments

Comments

@gekelly
Copy link

gekelly commented Dec 3, 2019

我数据集情况,三类标签,每个类别1W条以上数据。我需要每个标签提取20-50个关键词。

由于数据集多大,jieba分词速度慢,占内存大外,用于训练tf-idf时候出现很多负例(无效关键词),影响模型效果。现在自己做法是限制词频。但想问数据集是否过大需要调整么?该怎么做?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant