We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
我数据集情况,三类标签,每个类别1W条以上数据。我需要每个标签提取20-50个关键词。
由于数据集多大,jieba分词速度慢,占内存大外,用于训练tf-idf时候出现很多负例(无效关键词),影响模型效果。现在自己做法是限制词频。但想问数据集是否过大需要调整么?该怎么做?
The text was updated successfully, but these errors were encountered:
No branches or pull requests
我数据集情况,三类标签,每个类别1W条以上数据。我需要每个标签提取20-50个关键词。
由于数据集多大,jieba分词速度慢,占内存大外,用于训练tf-idf时候出现很多负例(无效关键词),影响模型效果。现在自己做法是限制词频。但想问数据集是否过大需要调整么?该怎么做?
The text was updated successfully, but these errors were encountered: