Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No tweets are being scraped. #10

Closed
iprelic opened this issue Mar 23, 2021 · 7 comments
Closed

No tweets are being scraped. #10

iprelic opened this issue Mar 23, 2021 · 7 comments

Comments

@iprelic
Copy link

iprelic commented Mar 23, 2021

Hashtags are found, but it doesn`t find any tweets. I have lowerd the setting (delay and concurrency) and set ROBOTSTXT_OBEY to false. Any tips?

@gfhswter
Copy link

i don t know how it is supposed to work but i was sarching web for a long time and I didn't find anything useful :))

@JuanDavidG1997
Copy link

In my case it doesn't find any tweets

@michael-pagan
Copy link

@amitupreti any insight on what's going on here? I've run this issue, resolved by pip installing ipdb, and this issue, resolved by updating the USER_AGENT to 'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:48.0) Gecko/20100101 Firefox/48.0', on top of the one posted here.

This seems like a great tool, but I've having a lot of trouble getting things to work. My current output below - you'll see "0 tweets are found," but visiting the queried URL clearly bring back results.

2021-09-28 22:00:30 [scrapy.utils.log] INFO: Scrapy 2.5.0 started (bot: TwitterHashTagCrawler)
2021-09-28 22:00:30 [scrapy.utils.log] INFO: Versions: lxml 4.6.3.0, libxml2 2.9.5, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 21.7.0, Python 3.9.7 (default, Sep 16 2021, 16:59:28) [MSC v.1916 64 bit (AMD64)], pyOpenSSL 21.0.0 (OpenSSL 1.1.1l 24 Aug 2021), cryptography 3.4.8, Platform Windows-10-10.0.19042-SP0
2021-09-28 22:00:30 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.selectreactor.SelectReactor
2021-09-28 22:00:30 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'TwitterHashTagCrawler',
'NEWSPIDER_MODULE': 'TwitterHashTagCrawler.spiders',
'ROBOTSTXT_OBEY': True,
'SPIDER_MODULES': ['TwitterHashTagCrawler.spiders']}
2021-09-28 22:00:30 [scrapy.extensions.telnet] INFO: Telnet Password: 8bfdeceaee79e82e
2021-09-28 22:00:30 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2021-09-28 22:00:31 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2021-09-28 22:00:31 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2021-09-28 22:00:31 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2021-09-28 22:00:31 [scrapy.core.engine] INFO: Spider opened
2021-09-28 22:00:31 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2021-09-28 22:00:31 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2021-09-28 22:00:31 [root] INFO: 1 hashtags found
2021-09-28 22:00:31 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://mobile.twitter.com/robots.txt> (referer: None)
2021-09-28 22:00:31 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://mobile.twitter.com/hashtag/dogsoftwitter> (referer: None)
2021-09-28 22:00:31 [root] INFO: 0 tweets found
2021-09-28 22:00:31 [root] INFO: Next page found:
2021-09-28 22:00:31 [scrapy.core.engine] INFO: Closing spider (finished)
2021-09-28 22:00:31 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 559,
'downloader/request_count': 2,
'downloader/request_method_count/GET': 2,
'downloader/response_bytes': 23053,
'downloader/response_count': 2,
'downloader/response_status_count/200': 2,
'elapsed_time_seconds': 0.43433,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2021, 9, 29, 2, 0, 31, 493310),
'httpcompression/response_bytes': 83405,
'httpcompression/response_count': 2,
'log_count/DEBUG': 2,
'log_count/INFO': 13,
'response_received_count': 2,
'robotstxt/request_count': 1,
'robotstxt/response_count': 1,
'robotstxt/response_status_count/200': 1,
'scheduler/dequeued': 1,
'scheduler/dequeued/memory': 1,
'scheduler/enqueued': 1,
'scheduler/enqueued/memory': 1,
'start_time': datetime.datetime(2021, 9, 29, 2, 0, 31, 58980)}
2021-09-28 22:00:31 [scrapy.core.engine] INFO: Spider closed (finished)

@adam321123
Copy link

i still have same issue. how to fix this?

@michael-pagan
Copy link

michael-pagan commented May 31, 2022

Couldn’t get it working. Wound up creating my own wrapper around tweepy.

@zinDante
Copy link

Any fixes on this issue? No tweets is showing but the hashtags are found if we visit the url
.

@superryeti
Copy link
Owner

Hi, Everyone, the repo is no longer maintained. i am sorry about that. I will not be working on this anytime soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants