Feature request Yandex and Baidu #17

LeoJavaAI · 2021-04-17T12:20:36Z

Thanks for your work, Please consider adding Yandex and Baidu if possible

tasos-py · 2021-04-21T22:34:31Z

Sounds interesting, I'll see what I can do. I think Yandex is simple enough, but I don't know if we can scrape Baidu without Selenium and I'd like to avoid that.

tasos-py · 2021-04-28T07:29:33Z

After some research, I don't think I can add Yandex or Baidu. Yandex keeps giving me a captcha after a couple of requests. Maybe Selenium could help with that, but I want to keep this repo as simple as possible, so I'd rather not add browser automation or OCR dependencies.

Baidu doesn't require Selenium, the problem here is that it doesn't have direct links, the links are like this www.baidu.com/link?url=kh39xCQVnS7frJSxGrpfLAXdudtflGhAhAK8YjhSgpwyf0Sl8L41EGODywKx6Vvqy8UbcOnNGkuEntr1m9KLmq. The url= parameter looks like a base64 string, but it doesn't decode to text and I don't think decoding/decryption is done in client side, the server redirects to the final link. We could use the server to get the actual URLs, but that would be very inefficient and it would probably result in bans.

So, I don't know how to proceed further, if you have any ideas I'd love to hear them.

braindevices mentioned this issue Jan 26, 2022

is it possible to add baidu also? #42

Closed

tasos-py mentioned this issue May 19, 2023

Add yandex to engines #56

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request Yandex and Baidu #17

Feature request Yandex and Baidu #17

LeoJavaAI commented Apr 17, 2021

tasos-py commented Apr 21, 2021

tasos-py commented Apr 28, 2021

Feature request Yandex and Baidu #17

Feature request Yandex and Baidu #17

Comments

LeoJavaAI commented Apr 17, 2021

tasos-py commented Apr 21, 2021

tasos-py commented Apr 28, 2021