Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

第四章(朴素贝叶斯)中 rss 订阅失效 #648

Closed
AIkikaze opened this issue Jan 3, 2024 · 2 comments
Closed

第四章(朴素贝叶斯)中 rss 订阅失效 #648

AIkikaze opened this issue Jan 3, 2024 · 2 comments

Comments

@AIkikaze
Copy link

AIkikaze commented Jan 3, 2024

问题描述

在第四章-朴素贝叶斯算法的第三个小实验中,使用了 feedparser 模块来解析两个 rss 源以获取文本数据。验证发现连接已经失效,所获取的文本列表为空。

点击网站连接,会看到如下内容

Your request has been blocked.

If you have questions, please contact us.

问题资源地址

第四章-朴素贝叶斯算法

问题位置截图

bayes_issue

自测代码

def localWords(feed1, feed0):
    docList = []
    classList = []
    fullText = []
    minLen = min(len(feed1["entries"]), len(feed0["entries"]))

    # 1. 文本获取与统计
    for i in range(minLen):
        # 类别 1:每次访问一条 RSS 源
        wordList = textParse(feed1["entries"][i]["summary"])
        docList.append(wordList)
        fullText.extend(wordList)
        classList.append(1)
        # 类别 0:每次访问一条 RSS 源
        wordList = textParse(feed0["entries"][i]["summary"])
        docList.append(wordList)
        fullText.extend(wordList)
        classList.append(0)
    vocabList = bayes.createVocabList(docList)
    top30Words = calMostFreq(vocabList, fullText)

    print(f"打印获取的文本:\n{docList}")
    print(f"打印单词列表:\n{vocabList}")

if __name__ == "__main__":
    import feedparser as fp # type: ignore
    ny = fp.parse('http://newyork.craigslist.org/stp/index.rss')
    sf = fp.parse('http://sfbay.craigslist.org/stp/index.rss')
    localWords(ny, sf)

输出结果

(py38) D:\PROJECT\ml>C:/tools/Anaconda3/envs/py38/python.exe d:/PROJECT/ml/4_bayes/rss.py
打印获取的文本:
[]
打印单词列表:
[]

建议

  1. 更换新的可用源
  2. 或者仅展示实验结果,让大家自己找源来测试算法
@apachecn apachecn deleted a comment from uptonyuan Jan 3, 2024
@apachecn apachecn deleted a comment from zyuegege Jan 3, 2024
@jiangzhonglian
Copy link
Member

可以参考这个来提问: #649

@jiangzhonglian
Copy link
Member

别纠结,直接跳过,这个不影响学习!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants