Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proxy ips are dynamic from a ip provide, but crawlee cannot support ??? #2770

Closed
1 task
WangShaoyu1 opened this issue Dec 9, 2024 · 1 comment
Closed
1 task
Labels
bug Something isn't working. t-tooling Issues with this label are in the ownership of the tooling team.

Comments

@WangShaoyu1
Copy link

Which package is this bug report for? If unsure which one to select, leave blank

@crawlee/playwright (PlaywrightCrawler)

Issue description

In Nodejs v20.16.0 ,crawlee 3.12 ,
when the proxy ips created by a api,and the ips result is different when excute the dunamic ip api, crawlee can not support this .
I suppose that when scrapy a website , it should excute the api

Code sample

import {PlaywrightCrawler, HttpCrawler, ProxyConfiguration, log, Session} from 'crawlee';
// 配置爬虫
const crawler = new PlaywrightCrawler({
    requestHandler: router,
    headless: true,
    requestHandlerTimeoutSecs: 200,
    autoscaledPoolOptions: {
        maxConcurrency: 20,
        minConcurrency: 10,
        desiredConcurrencyRatio: 0.5,  // 保持接近目标并发数
        scaleUpStepRatio: 0.15,        // 并发增加步长
        scaleDownStepRatio: 0.15,      // 并发减少步长
        autoscaleIntervalSecs: 5       // 自动缩放的时间间隔
    },
    proxyConfiguration: new ProxyConfiguration({
        proxyUrls: await getProxy()
    })
});

Package version

crawlee 3.12.0 Nodejs

Node.js version

20.16

Operating system

windows

Apify platform

  • Tick me if you encountered this issue on the Apify platform

I have tested this on the next release

No response

Other context

No response

@WangShaoyu1 WangShaoyu1 added the bug Something isn't working. label Dec 9, 2024
@github-actions github-actions bot added the t-tooling Issues with this label are in the ownership of the tooling team. label Dec 9, 2024
@B4nan
Copy link
Member

B4nan commented Dec 9, 2024

We don't rotate proxies in browsers by default just yet, there is an option called browserPerProxy for that, but it can easily eat your memory since you need a new browser context for each proxy.

See #2726 for more details

@B4nan B4nan closed this as not planned Won't fix, can't repro, duplicate, stale Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working. t-tooling Issues with this label are in the ownership of the tooling team.
Projects
None yet
Development

No branches or pull requests

2 participants