Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error Scraping Patreon #110

Open
rebeltaz opened this issue Jan 22, 2022 · 4 comments
Open

Error Scraping Patreon #110

rebeltaz opened this issue Jan 22, 2022 · 4 comments

Comments

@rebeltaz
Copy link

rebeltaz commented Jan 22, 2022

I am trying to get this to scrape patreon, but every time it runs the scheduled scrape, I get this error:

Main.Runtime - INFO - Scheduler executing class: <class 'xascraper.modules.patreon.patreonScrape.GetPatreon'>
ScraperBase Init
Starting up
Main.WebRequest - INFO - Using global chromium tab pool
Main.WebRequest - INFO - User agent overridden!
Starting up?
Main.WebRequest - INFO - Using global chromium tab pool
apscheduler.executors.default - ERROR - Job "pat (trigger: interval[0:05:00], next run at: 2022-01-22 01:30:00 CST)" raised an exception
Traceback (most recent call last):
  File "/home/bob/xA-Scraper/venv/lib/python3.6/site-packages/apscheduler/executors/base.py", line 125, in run_job
    retval = job.func(*job.args, **job.kwargs)
  File "./main_scrape.py", line 37, in runScraper
    instance = scraper_class()
  File "/home/bob/xA-Scraper/xascraper/modules/patreon/patreonScrape.py", line 62, in __init__
    'api_key': settings["captcha"]["anti-captcha"]['api_key'],
KeyError: 'anti-captcha'
Main.Runtime - INFO - Job crashed: 1e2773451533411e98cfafc059f03fe0
Main.Runtime - INFO - Traceback:   File "/home/derek/xA-Scraper/venv/lib/python3.6/site-packages/apscheduler/executors/base.py", line 125, in run_job
    retval = job.func(*job.args, **job.kwargs)
  File "./main_scrape.py", line 37, in runScraper
    instance = scraper_class()
  File "/home/bob/xA-Scraper/xascraper/modules/patreon/patreonScrape.py", line 62, in __init__
    'api_key': settings["captcha"]["anti-captcha"]['api_key'],

Any idea how to fix that? I am running ubuntu 20.04. Also... is there any way to force the scraper to run without having to set the timer to a low refresh? I had to set it to five minutes to get it to run again so I could get that error copied. Thanks.

@fake-name
Copy link
Owner

fake-name commented Jan 22, 2022

Also... is there any way to force the scraper to run without having to set the timer to a low refresh?

python3 -m manage run pat?

Did you delete the relevant line from the example config?

You don't need a valid key at the moment (the actual codepath that uses it is stubbed), but patreon sometimes hits you with a recaptcha, for which I use anti-captcha.com to deal with elsewhere.

The patreon scraper is fairly finicky. It REQUIRES being run in full desktop environment, and having the google-chrome chromium binary present. Running chromium in a full desktop session works around some of the weird client sniffing garbage webshit assholes do these days.

@rebeltaz
Copy link
Author

Did you delete the relevant line from the example config?

I didn't delete it, but I did comment that part out. The error I copied and pasted was after commenting that out.

@rebeltaz
Copy link
Author

python3 -m manage run pat?

Oh, by the way... when I run that, I get:

Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/bob/xA-Scraper/manage/__main__.py", line 15, in <module>
    from . import name_importer
  File "/home/bob/xA-Scraper/manage/name_importer.py", line 6, in <module>
    import psycopg2
ModuleNotFoundError: No module named 'psycopg2'

@fake-name
Copy link
Owner

Huh. Did you not install everything in requirements.txt? psycopg2-binary should provide the psycopg2 package, even if it's not really used if you're using sqlite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants