-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stuck in Cloudflare hCaptcha loop. #31
Comments
Same issue here. Looking forward to solution. Thank you!!! |
Hello again, I tested two more things:
Unfortunately nothing was successful, but maybe it helps to narrow down the root cause of the problem. |
It seems like Blinkist / Cloudflare moved from Goggle's captchas (which worked fine) to HCaptcha which causes this issue. From GermanEngineering's tests it seems like more of an issue of Cloudflare detecting the Chromedriver since even with legit cookies this persists. Will need to look into it - any help welcome! |
I found a solution that at least allows me to login and download the text. I tried to do a pull request, but I'm not really familiar with the GitHub process, so please excuse me if this is not the correct way to propose a change. Hope this helps. |
That's weird. I tried all these options and it still won't let me through the hcaptcha.
|
It'd be much better to convert this from Selenium to Puppeteer. I just tried Puppeteer and that works well, especially with the Stealth plugin. |
I think that there used to be a chrome extension from Cloudflare that
bypasses their captcha page. Perhaps that would help? Has anyone tried
it?
wywywywy - Do you think that you could create a new branch with your
changes and make a pull-request with the Puppeteer-based code? Thanks!
|
Hello, I'm not familiar with how Github works, but I'll just share what worked for me. I added At first it worked, but for the next sessions, it started going back to the captcha again. The workaround is after logging in, and when it goes to the cloudfare site, redirect the browser back to Blinkist.com homepage. This is when the log says, "waiting for user to solve recaptcha and login. After that, the scraper will proceed as expected. |
Hello, I encounter the same problem as you guys, getting stuck in the infinity captcha-loop... I think we definitely have to add this line As a first quick fix, it worked for me to change from seleniumwire webdriver to the "normal" selenium webdriver. Doing this you can at least scrape the texts but to get the audio files you need to have access to the request tab, so audio scraping won't work any longer with this. Edit: I think the problem has something to do with the certificate as selenium-wire issues its own certificate (selenium-wire manual). I already added the Selenium Wire CA to Chrome's Authorities section, but the problem remains. |
This comment has been minimized.
This comment has been minimized.
I also run into the hCaptcha loop but can get around it with the following arguments:
Occasionally, without these arguments, I find that my first scrape attempt in 12+ hours usually avoids triggering Captcha. However, audio scraping still doesn't work.
|
In my tests, I had to override the user agent as well on top of implementing @usb4's flags. Although it still asked for the captcha when making a request for the blink's audio files. Reading around, I found this discussion - https://stackoverflow.com/questions/32795460/loading-json-object-in-python-using-urllib-request-and-json-modules - and magically, yes, using I pushed my changes in f4cab05, tested (albeit only on the free daily book) and seems to work fine on my end. |
Leonardo, which user agent did you use with requests? The default one
is a scraper user-agent. That could be why 'urllib.request'
"magically" works.
|
Thank you very much leoncvlt! |
In my case, the user agent was needed to access the actual library / books pages, not specifically for the audio files. I'm using selenium wire to capture the original audio files request and re-use the cookies / auth information to request the rest of the audio blinks - if anyone can come up with an alternative way of accomplishing this, we could scrap the selenium wire requirements 😃 |
Hello and first of all thank you very much for your work!
It looks, like this is exactly the code that I was looking for, but unfortunately I'm not able to get it running because I get stuck in an endless Cloudflare hCaptcha loop on https://www.blinkist.com/en/nc/login when I'm trying to execute it the first time.
The "One more step - Please complete the security check to access - I am human" appears before entering the login information and no matter how often I solve it, I always end up at the next Captcha (tried it for at least 9 times in a row).
My system:
I've already tried:
Unfortunately I don't have any other ideas at the moment and feel pretty lost/stupid.
Did you encounter this problem before and have an idea how to solve it?
Or are there some logfiles or something I can collect that might help in this case?
Thank you very much in advance!
Peter
The text was updated successfully, but these errors were encountered: