Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

使用DUTIR词典报错 #5

Open
AirFin opened this issue Jul 6, 2022 · 6 comments
Open

使用DUTIR词典报错 #5

AirFin opened this issue Jul 6, 2022 · 6 comments

Comments

@AirFin
Copy link

AirFin commented Jul 6, 2022

运行代码

import cntext as ct

text = '我今天得奖了,很高兴,我要将快乐分享大家。'

ct.sentiment(text=text,
             diction=ct.load_pkl_dict('DUTIR.pkl')['DUTIR'],
             lang='chinese')

报错

Traceback (most recent call last):
  File "d:\PythonProject\test\test_cntext.py", line 5, in <module>
    ct.sentiment(text=text,
  File "D:\Miniconda3\envs\py38\lib\site-packages\cntext\stats.py", line 159, in sentiment
    jieba.add_word(w)
  File "D:\Miniconda3\envs\py38\lib\site-packages\jieba\__init__.py", line 426, in add_word
    word = strdecode(word)
  File "D:\Miniconda3\envs\py38\lib\site-packages\jieba\_compat.py", line 79, in strdecode
    sentence = sentence.decode('utf-8')
AttributeError: 'int' object has no attribute 'decode'

如果不使用DUTIR词典,使用其他词典,则可以正常运行,如:

import cntext as ct

text = '我今天得奖了,很高兴,我要将快乐分享大家。'

ct.sentiment(text=text,
             diction=ct.load_pkl_dict('HOWNET.pkl')['HOWNET'],
             lang='chinese')

运行结果

{'deny_num': 0,
 'ish_num': 0,
 'more_num': 0,
 'neg_num': 0,
 'pos_num': 3,
 'very_num': 1,
 'stopword_num': 8,
 'word_num': 14,
 'sentence_num': 1}
@hiDaDeng
Copy link
Owner

hiDaDeng commented Jul 6, 2022 via email

@hiDaDeng
Copy link
Owner

hiDaDeng commented Jul 6, 2022 via email

@AirFin
Copy link
Author

AirFin commented Jul 6, 2022

抱歉,没仔细看问题。我觉得如果DUTIR换成Hownet就ok,那应该是词典问题。 词典问题的话,先保证导入的词典符合统一的字典样式。cntext仓库中有关于标准字典样式的词典小案例。

---原始邮件--- 发件人: @.> 发送时间: 2022年7月6日(周三) 中午12:47 收件人: @.>; 抄送: @.>; 主题: [hiDaDeng/cntext] 使用DUTIR词典报错 (Issue #5) 运行代码 import cntext as ct text = '我今天得奖了,很高兴,我要将快乐分享大家。' ct.sentiment(text=text, diction=ct.load_pkl_dict('DUTIR.pkl')['DUTIR'], lang='chinese') 报错 Traceback (most recent call last): File "d:\PythonProject\test\test_cntext.py", line 5, in <module> ct.sentiment(text=text, File "D:\Miniconda3\envs\py38\lib\site-packages\cntext\stats.py", line 159, in sentiment jieba.add_word(w) File "D:\Miniconda3\envs\py38\lib\site-packages\jieba_init_.py", line 426, in add_word word = strdecode(word) File "D:\Miniconda3\envs\py38\lib\site-packages\jieba_compat.py", line 79, in strdecode sentence = sentence.decode('utf-8') AttributeError: 'int' object has no attribute 'decode' 如果不使用DUTIR词典,使用其他词典,则可以正常运行,如: import cntext as ct text = '我今天得奖了,很高兴,我要将快乐分享大家。' ct.sentiment(text=text, diction=ct.load_pkl_dict('HOWNET.pkl')['HOWNET'], lang='chinese') 运行结果 {'deny_num': 0, 'ish_num': 0, 'more_num': 0, 'neg_num': 0, 'pos_num': 3, 'very_num': 1, 'stopword_num': 8, 'word_num': 14, 'sentence_num': 1} — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.>

您好,感谢您的回复。按照您的提示进行修改,仍报错。

我使用的python版本:3.8.5

我的完整代码如下

import cntext as ct 
d:\Miniconda3\envs\py38\lib\site-packages\numpy\_distributor_init.py:30: UserWarning: loaded more than 1 DLL from .libs:
d:\Miniconda3\envs\py38\lib\site-packages\numpy\.libs\libopenblas.4SP5SUA7CBGXUEOC35YP2ASOICYYEQZZ.gfortran-win_amd64.dll
d:\Miniconda3\envs\py38\lib\site-packages\numpy\.libs\libopenblas.EL2C6PLE4ZYW3ECEVIV3OXXGRN2NRFM2.gfortran-win_amd64.dll
  warnings.warn("loaded more than 1 DLL from .libs:"
d:\Miniconda3\envs\py38\lib\site-packages\gensim\similarities\__init__.py:15: UserWarning: The gensim.similarities.levenshtein submodule is disabled, because the optional Levenshtein package <https://pypi.org/project/python-Levenshtein/> is unavailable. Install Levenhstein (e.g. `pip install python-Levenshtein`) to suppress this warning.
  warnings.warn(msg)
print(ct.__version__)
# 导入pkl词典文件,
ct.load_pkl_dict('DUTIR.pkl')
1.7.4
Output exceeds the size limit. Open the full output data in a text editor
{'DUTIR': {'乐': ['急若流星',
   '最后一根稻草',
   '慌乱',
   '张皇',
   '心如悬旌',
   '鞋里长草-慌了脚',
   '紧急',
   '五色无主',
   '脚忙手乱',
   '仓卒应战',
   '缓不济急',
   '忡忡',
   '风声鹤唳',
   '心慌意乱',
   '心虚',
   '体力不支',
   '窘急',
   '惊慌失措',
   '惊慌',
   '发急',
   '心急火燎',
   '芒刺在背',
   '着慌',
   '心切',
   '手忙脚乱',
...
   '恰巧',
   '意出望外',
   '怨不得']},
 'Desc': '大连理工大学情感本体库,细粒度情感词典。含七大类情绪,依次是哀, 好, 惊, 惧, 乐, 怒, 恶',
 'Referer': '徐琳宏,林鸿飞,潘宇,等.情感词汇本体的构造[J]. 情报学报, 2008, 27(2): 180-185.'}
text = '我今天得奖了,很高兴,我要将快乐分享大家。'

ct.sentiment(text=text,
             diction=ct.load_pkl_dict('DUTIR.pkl')['DUTIR'],
             lang='chinese')
Output exceeds the size limit. Open the full output data in a text editor
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_9260/3488132061.py in <module>
      1 text = '我今天得奖了,很高兴,我要将快乐分享大家。'
      2 
----> 3 ct.sentiment(text=text,
      4              diction=ct.load_pkl_dict('DUTIR.pkl')['DUTIR'],
      5              lang='chinese')

d:\Miniconda3\envs\py38\lib\site-packages\cntext\stats.py in sentiment(text, diction, lang)
    157             senti_category_words = diction[senti_category]
    158             for w in senti_category_words:
--> 159                 jieba.add_word(w)
    160 
    161         sentence_num = len(cn_seg_sent(text))

d:\Miniconda3\envs\py38\lib\site-packages\jieba\__init__.py in add_word(self, word, freq, tag)
    424         """
    425         self.check_initialized()
--> 426         word = strdecode(word)
    427         freq = int(freq) if freq is not None else self.suggest_freq(word, False)
    428         self.FREQ[word] = freq

d:\Miniconda3\envs\py38\lib\site-packages\jieba\_compat.py in strdecode(sentence)
     77     if not isinstance(sentence, text_type):
...
---> 79             sentence = sentence.decode('utf-8')
     80         except UnicodeDecodeError:
     81             sentence = sentence.decode('gbk', 'ignore')

AttributeError: 'int' object has no attribute 'decode'

@AirFin
Copy link
Author

AirFin commented Jul 6, 2022

抱歉,没仔细看问题。我觉得如果DUTIR换成Hownet就ok,那应该是词典问题。 词典问题的话,先保证导入的词典符合统一的字典样式。cntext仓库中有关于标准字典样式的词典小案例。

---原始邮件--- 发件人: @.> 发送时间: 2022年7月6日(周三) 中午12:47 收件人: @.>; 抄送: @.>; 主题: [hiDaDeng/cntext] 使用DUTIR词典报错 (Issue #5) 运行代码 import cntext as ct text = '我今天得奖了,很高兴,我要将快乐分享大家。' ct.sentiment(text=text, diction=ct.load_pkl_dict('DUTIR.pkl')['DUTIR'], lang='chinese') 报错 Traceback (most recent call last): File "d:\PythonProject\test\test_cntext.py", line 5, in <module> ct.sentiment(text=text, File "D:\Miniconda3\envs\py38\lib\site-packages\cntext\stats.py", line 159, in sentiment jieba.add_word(w) File "D:\Miniconda3\envs\py38\lib\site-packages\jieba_init_.py", line 426, in add_word word = strdecode(word) File "D:\Miniconda3\envs\py38\lib\site-packages\jieba_compat.py", line 79, in strdecode sentence = sentence.decode('utf-8') AttributeError: 'int' object has no attribute 'decode' 如果不使用DUTIR词典,使用其他词典,则可以正常运行,如: import cntext as ct text = '我今天得奖了,很高兴,我要将快乐分享大家。' ct.sentiment(text=text, diction=ct.load_pkl_dict('HOWNET.pkl')['HOWNET'], lang='chinese') 运行结果 {'deny_num': 0, 'ish_num': 0, 'more_num': 0, 'neg_num': 0, 'pos_num': 3, 'very_num': 1, 'stopword_num': 8, 'word_num': 14, 'sentence_num': 1} — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.>

我又新建了一个python3.7.9的环境,运行相同的代码,还是同样的报错。

@hiDaDeng
Copy link
Owner

hiDaDeng commented Jul 6, 2022

更新至于1.7.5

pip3 install cntext==1.7.6

@AirFin
Copy link
Author

AirFin commented Jul 6, 2022

更新至于1.7.5

pip3 install cntext==1.7.6

成功解决。非常感谢!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants