使用DUTIR词典报错 #5

AirFin · 2022-07-06T04:47:37Z

运行代码

import cntext as ct

text = '我今天得奖了，很高兴，我要将快乐分享大家。'

ct.sentiment(text=text,
             diction=ct.load_pkl_dict('DUTIR.pkl')['DUTIR'],
             lang='chinese')

报错

Traceback (most recent call last):
  File "d:\PythonProject\test\test_cntext.py", line 5, in <module>
    ct.sentiment(text=text,
  File "D:\Miniconda3\envs\py38\lib\site-packages\cntext\stats.py", line 159, in sentiment
    jieba.add_word(w)
  File "D:\Miniconda3\envs\py38\lib\site-packages\jieba\__init__.py", line 426, in add_word
    word = strdecode(word)
  File "D:\Miniconda3\envs\py38\lib\site-packages\jieba\_compat.py", line 79, in strdecode
    sentence = sentence.decode('utf-8')
AttributeError: 'int' object has no attribute 'decode'

如果不使用DUTIR词典，使用其他词典，则可以正常运行，如：

import cntext as ct

text = '我今天得奖了，很高兴，我要将快乐分享大家。'

ct.sentiment(text=text,
             diction=ct.load_pkl_dict('HOWNET.pkl')['HOWNET'],
             lang='chinese')

运行结果

{'deny_num': 0,
 'ish_num': 0,
 'more_num': 0,
 'neg_num': 0,
 'pos_num': 3,
 'very_num': 1,
 'stopword_num': 8,
 'word_num': 14,
 'sentence_num': 1}

The text was updated successfully, but these errors were encountered:

hiDaDeng · 2022-07-06T08:21:08Z

你的数据中text可能有纯文数字或者缺失值字段

…

---原始邮件--- 发件人: ***@***.***> 发送时间: 2022年7月6日(周三) 中午12:47 收件人: ***@***.***>; 抄送: ***@***.***>; 主题: [hiDaDeng/cntext] 使用DUTIR词典报错 (Issue #5) 运行代码 import cntext as ct text = '我今天得奖了，很高兴，我要将快乐分享大家。' ct.sentiment(text=text, diction=ct.load_pkl_dict('DUTIR.pkl')['DUTIR'], lang='chinese') 报错 Traceback (most recent call last): File "d:\PythonProject\test\test_cntext.py", line 5, in <module> ct.sentiment(text=text, File "D:\Miniconda3\envs\py38\lib\site-packages\cntext\stats.py", line 159, in sentiment jieba.add_word(w) File "D:\Miniconda3\envs\py38\lib\site-packages\jieba\__init__.py", line 426, in add_word word = strdecode(word) File "D:\Miniconda3\envs\py38\lib\site-packages\jieba\_compat.py", line 79, in strdecode sentence = sentence.decode('utf-8') AttributeError: 'int' object has no attribute 'decode' 如果不使用DUTIR词典，使用其他词典，则可以正常运行，如： import cntext as ct text = '我今天得奖了，很高兴，我要将快乐分享大家。' ct.sentiment(text=text, diction=ct.load_pkl_dict('HOWNET.pkl')['HOWNET'], lang='chinese') 运行结果 {'deny_num': 0, 'ish_num': 0, 'more_num': 0, 'neg_num': 0, 'pos_num': 3, 'very_num': 1, 'stopword_num': 8, 'word_num': 14, 'sentence_num': 1} — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

hiDaDeng · 2022-07-06T08:26:33Z

抱歉，没仔细看问题。我觉得如果DUTIR换成Hownet就ok，那应该是词典问题。词典问题的话，先保证导入的词典符合统一的字典样式。cntext仓库中有关于标准字典样式的词典小案例。

…

---原始邮件--- 发件人: ***@***.***> 发送时间: 2022年7月6日(周三) 中午12:47 收件人: ***@***.***>; 抄送: ***@***.***>; 主题: [hiDaDeng/cntext] 使用DUTIR词典报错 (Issue #5) 运行代码 import cntext as ct text = '我今天得奖了，很高兴，我要将快乐分享大家。' ct.sentiment(text=text, diction=ct.load_pkl_dict('DUTIR.pkl')['DUTIR'], lang='chinese') 报错 Traceback (most recent call last): File "d:\PythonProject\test\test_cntext.py", line 5, in <module> ct.sentiment(text=text, File "D:\Miniconda3\envs\py38\lib\site-packages\cntext\stats.py", line 159, in sentiment jieba.add_word(w) File "D:\Miniconda3\envs\py38\lib\site-packages\jieba\__init__.py", line 426, in add_word word = strdecode(word) File "D:\Miniconda3\envs\py38\lib\site-packages\jieba\_compat.py", line 79, in strdecode sentence = sentence.decode('utf-8') AttributeError: 'int' object has no attribute 'decode' 如果不使用DUTIR词典，使用其他词典，则可以正常运行，如： import cntext as ct text = '我今天得奖了，很高兴，我要将快乐分享大家。' ct.sentiment(text=text, diction=ct.load_pkl_dict('HOWNET.pkl')['HOWNET'], lang='chinese') 运行结果 {'deny_num': 0, 'ish_num': 0, 'more_num': 0, 'neg_num': 0, 'pos_num': 3, 'very_num': 1, 'stopword_num': 8, 'word_num': 14, 'sentence_num': 1} — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

AirFin · 2022-07-06T08:35:58Z

抱歉，没仔细看问题。我觉得如果DUTIR换成Hownet就ok，那应该是词典问题。词典问题的话，先保证导入的词典符合统一的字典样式。cntext仓库中有关于标准字典样式的词典小案例。
…
---原始邮件--- 发件人: @.> 发送时间: 2022年7月6日(周三) 中午12:47 收件人: @.>; 抄送: @.>; 主题: [hiDaDeng/cntext] 使用DUTIR词典报错 (Issue #5) 运行代码 import cntext as ct text = '我今天得奖了，很高兴，我要将快乐分享大家。' ct.sentiment(text=text, diction=ct.load_pkl_dict('DUTIR.pkl')['DUTIR'], lang='chinese') 报错 Traceback (most recent call last): File "d:\PythonProject\test\test_cntext.py", line 5, in <module> ct.sentiment(text=text, File "D:\Miniconda3\envs\py38\lib\site-packages\cntext\stats.py", line 159, in sentiment jieba.add_word(w) File "D:\Miniconda3\envs\py38\lib\site-packages\jieba_init_.py", line 426, in add_word word = strdecode(word) File "D:\Miniconda3\envs\py38\lib\site-packages\jieba_compat.py", line 79, in strdecode sentence = sentence.decode('utf-8') AttributeError: 'int' object has no attribute 'decode' 如果不使用DUTIR词典，使用其他词典，则可以正常运行，如： import cntext as ct text = '我今天得奖了，很高兴，我要将快乐分享大家。' ct.sentiment(text=text, diction=ct.load_pkl_dict('HOWNET.pkl')['HOWNET'], lang='chinese') 运行结果 {'deny_num': 0, 'ish_num': 0, 'more_num': 0, 'neg_num': 0, 'pos_num': 3, 'very_num': 1, 'stopword_num': 8, 'word_num': 14, 'sentence_num': 1} — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.>

您好，感谢您的回复。按照您的提示进行修改，仍报错。

我使用的python版本：3.8.5

我的完整代码如下

import cntext as ct

d:\Miniconda3\envs\py38\lib\site-packages\numpy\_distributor_init.py:30: UserWarning: loaded more than 1 DLL from .libs:
d:\Miniconda3\envs\py38\lib\site-packages\numpy\.libs\libopenblas.4SP5SUA7CBGXUEOC35YP2ASOICYYEQZZ.gfortran-win_amd64.dll
d:\Miniconda3\envs\py38\lib\site-packages\numpy\.libs\libopenblas.EL2C6PLE4ZYW3ECEVIV3OXXGRN2NRFM2.gfortran-win_amd64.dll
  warnings.warn("loaded more than 1 DLL from .libs:"
d:\Miniconda3\envs\py38\lib\site-packages\gensim\similarities\__init__.py:15: UserWarning: The gensim.similarities.levenshtein submodule is disabled, because the optional Levenshtein package <https://pypi.org/project/python-Levenshtein/> is unavailable. Install Levenhstein (e.g. `pip install python-Levenshtein`) to suppress this warning.
  warnings.warn(msg)

print(ct.__version__)
# 导入pkl词典文件,
ct.load_pkl_dict('DUTIR.pkl')

1.7.4
Output exceeds the size limit. Open the full output data in a text editor
{'DUTIR': {'乐': ['急若流星',
   '最后一根稻草',
   '慌乱',
   '张皇',
   '心如悬旌',
   '鞋里长草－慌了脚',
   '紧急',
   '五色无主',
   '脚忙手乱',
   '仓卒应战',
   '缓不济急',
   '忡忡',
   '风声鹤唳',
   '心慌意乱',
   '心虚',
   '体力不支',
   '窘急',
   '惊慌失措',
   '惊慌',
   '发急',
   '心急火燎',
   '芒刺在背',
   '着慌',
   '心切',
   '手忙脚乱',
...
   '恰巧',
   '意出望外',
   '怨不得']},
 'Desc': '大连理工大学情感本体库，细粒度情感词典。含七大类情绪，依次是哀, 好, 惊, 惧, 乐, 怒, 恶',
 'Referer': '徐琳宏,林鸿飞,潘宇,等.情感词汇本体的构造[J]. 情报学报, 2008, 27(2): 180-185.'}

text = '我今天得奖了，很高兴，我要将快乐分享大家。'

ct.sentiment(text=text,
             diction=ct.load_pkl_dict('DUTIR.pkl')['DUTIR'],
             lang='chinese')

Output exceeds the size limit. Open the full output data in a text editor
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_9260/3488132061.py in <module>
      1 text = '我今天得奖了，很高兴，我要将快乐分享大家。'
      2 
----> 3 ct.sentiment(text=text,
      4              diction=ct.load_pkl_dict('DUTIR.pkl')['DUTIR'],
      5              lang='chinese')

d:\Miniconda3\envs\py38\lib\site-packages\cntext\stats.py in sentiment(text, diction, lang)
    157             senti_category_words = diction[senti_category]
    158             for w in senti_category_words:
--> 159                 jieba.add_word(w)
    160 
    161         sentence_num = len(cn_seg_sent(text))

d:\Miniconda3\envs\py38\lib\site-packages\jieba\__init__.py in add_word(self, word, freq, tag)
    424         """
    425         self.check_initialized()
--> 426         word = strdecode(word)
    427         freq = int(freq) if freq is not None else self.suggest_freq(word, False)
    428         self.FREQ[word] = freq

d:\Miniconda3\envs\py38\lib\site-packages\jieba\_compat.py in strdecode(sentence)
     77     if not isinstance(sentence, text_type):
...
---> 79             sentence = sentence.decode('utf-8')
     80         except UnicodeDecodeError:
     81             sentence = sentence.decode('gbk', 'ignore')

AttributeError: 'int' object has no attribute 'decode'

AirFin · 2022-07-06T09:39:02Z

抱歉，没仔细看问题。我觉得如果DUTIR换成Hownet就ok，那应该是词典问题。词典问题的话，先保证导入的词典符合统一的字典样式。cntext仓库中有关于标准字典样式的词典小案例。
…
---原始邮件--- 发件人: @.> 发送时间: 2022年7月6日(周三) 中午12:47 收件人: @.>; 抄送: @.>; 主题: [hiDaDeng/cntext] 使用DUTIR词典报错 (Issue #5) 运行代码 import cntext as ct text = '我今天得奖了，很高兴，我要将快乐分享大家。' ct.sentiment(text=text, diction=ct.load_pkl_dict('DUTIR.pkl')['DUTIR'], lang='chinese') 报错 Traceback (most recent call last): File "d:\PythonProject\test\test_cntext.py", line 5, in <module> ct.sentiment(text=text, File "D:\Miniconda3\envs\py38\lib\site-packages\cntext\stats.py", line 159, in sentiment jieba.add_word(w) File "D:\Miniconda3\envs\py38\lib\site-packages\jieba_init_.py", line 426, in add_word word = strdecode(word) File "D:\Miniconda3\envs\py38\lib\site-packages\jieba_compat.py", line 79, in strdecode sentence = sentence.decode('utf-8') AttributeError: 'int' object has no attribute 'decode' 如果不使用DUTIR词典，使用其他词典，则可以正常运行，如： import cntext as ct text = '我今天得奖了，很高兴，我要将快乐分享大家。' ct.sentiment(text=text, diction=ct.load_pkl_dict('HOWNET.pkl')['HOWNET'], lang='chinese') 运行结果 {'deny_num': 0, 'ish_num': 0, 'more_num': 0, 'neg_num': 0, 'pos_num': 3, 'very_num': 1, 'stopword_num': 8, 'word_num': 14, 'sentence_num': 1} — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.>

我又新建了一个python3.7.9的环境，运行相同的代码，还是同样的报错。

hiDaDeng · 2022-07-06T09:45:32Z

更新至于1.7.5

pip3 install cntext==1.7.6

AirFin · 2022-07-06T09:59:15Z

更新至于1.7.5

pip3 install cntext==1.7.6

成功解决。非常感谢！

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

使用DUTIR词典报错 #5

使用DUTIR词典报错 #5

AirFin commented Jul 6, 2022

hiDaDeng commented Jul 6, 2022 via email

hiDaDeng commented Jul 6, 2022 via email

AirFin commented Jul 6, 2022

AirFin commented Jul 6, 2022

hiDaDeng commented Jul 6, 2022 •

edited

Loading

AirFin commented Jul 6, 2022

使用DUTIR词典报错 #5

使用DUTIR词典报错 #5

Comments

AirFin commented Jul 6, 2022

hiDaDeng commented Jul 6, 2022 via email

hiDaDeng commented Jul 6, 2022 via email

AirFin commented Jul 6, 2022

AirFin commented Jul 6, 2022

hiDaDeng commented Jul 6, 2022 • edited Loading

AirFin commented Jul 6, 2022

hiDaDeng commented Jul 6, 2022 •

edited

Loading