You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sounds interesting, I'll see what I can do. I think Yandex is simple enough, but I don't know if we can scrape Baidu without Selenium and I'd like to avoid that.
After some research, I don't think I can add Yandex or Baidu. Yandex keeps giving me a captcha after a couple of requests. Maybe Selenium could help with that, but I want to keep this repo as simple as possible, so I'd rather not add browser automation or OCR dependencies.
Baidu doesn't require Selenium, the problem here is that it doesn't have direct links, the links are like this www.baidu.com/link?url=kh39xCQVnS7frJSxGrpfLAXdudtflGhAhAK8YjhSgpwyf0Sl8L41EGODywKx6Vvqy8UbcOnNGkuEntr1m9KLmq. The url= parameter looks like a base64 string, but it doesn't decode to text and I don't think decoding/decryption is done in client side, the server redirects to the final link. We could use the server to get the actual URLs, but that would be very inefficient and it would probably result in bans.
So, I don't know how to proceed further, if you have any ideas I'd love to hear them.
Thanks for your work, Please consider adding Yandex and Baidu if possible
The text was updated successfully, but these errors were encountered: