-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Odd issue with FA scraper cookie storage #98
Comments
The critical part is:
The exception is a bug in the cookie failure return value. FA cannot log in automatically, since they use a captcha. You have to use the web interface to solve the captcha yourself (or use a captcha solving service). The manual login stuff is kind of creaky, I strongly suggest using a service (I like anti-captcha.com). They're a bit of an affair to get money into, but $5-10 of credit should last nearly forever.
Wow, how long since you last pulled? I don't think I've changed the settings file recently. |
It's been quite some time, before the repo went down for a while. It may have been my fault that it stopped working. That last commit fixes this problem, but now it's throwing an exception:
It's odd it doesn't think it has a valid cookie; the cookies.lwp file in the project base directory is valid and should allow it to log in. (Unless it's storing the actual values somewhere else I didn't know about.) |
The cookies file being valid just means that the scraper exited correctly the last time it executed. Whether the relevant cookie in particular is in the cookies file is the issue, and in this case apparently it's not. I just tested, and it appears the captcha handling is currently broken. It probably stopped working when FA did their site redesign, and I missed this fact because I had a valid auth cookie when doing the tests (derp). Additionally, the auth procedure now appears to require a google reCAPTCHA, so I think I'll not be able to support the manual circumvention when I fix the problem. |
Sidenote: DA is also broken ATM. I haven't had time to poke things recently. |
Oh, what I meant was - I logged in on a browser and transplanted the cookie info there into xA-Scraper's cookies file. It worked the last time I tried it, whenever that was. |
Ah. Well, you need two cookies, |
Yes, plus the __cfduid one. |
That's strange. It should at least pass the login check if you do that. This login check was written way, way long ago before I was just looking at cookies, rather then querying the website and checking if I can find your username on a home page path. |
I took another look at this, since my FA scraper still doesn't work. It looks like line 41 of faScrape.py is loading Changing the string makes it run the scrape without complaining, but it's indicating "artist seems to have disabled their account" for a lot of accounts that exist. I'm not sure whether it's actually logged in using the valid cookies in cookies.lwp or not. My FA account is set to use the classic theme, so if all the screen-scraping is made with the old theme in mind, it might be that it's not actually logged in and is trying to scrape pages with the modern theme. I'm not sure. The captcha-handling stuff for FA can probably be removed, as FA no longer appears to use a captcha. Edit: I figured out how to turn on debug logging; it seems to be using the cookies correctly, but gives no indication why it's raising an AccountDisabledException. Edit again: i added a log statement; it looks like maybe the submission count extraction code at line 285 in faScrape.py is failing, or at least the exception raise statement immediately below it is what's getting set off. |
I'm trying to run an FA scrape, after doing a git pull (and subsequently re-making my settings.py to get it working again), and getting this:
Unfortunately, my skills aren't good enough for me to to figure out what exactly is going on here with WebRequest and the cookies file. It's a valid LWP file; I tried updating it with the "a" and "b" cookies to no avail. The "manual FA login" option on the web interface seems to no longer be functional; it looks like they removed the old secondary captcha.
Manually bypassing the cookie check by making it return True lets it scrape, but it reported possible missing art with 946 expected and 624 retrieved from the first artist, so I don't think it's logged in.
The text was updated successfully, but these errors were encountered: