Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: stopping the crawlers gracefully with BasicCrawler.stop() #2792

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

barjin
Copy link
Contributor

@barjin barjin commented Jan 6, 2025

Allows users to call crawler.stop() to gracefully stop the crawler.

Currently, the crawlers are "stateless", i.e. calling:

async requestHandler({ crawler, enqueueLinks }) {
   await enqueueLinks();
   crawler.stop();
}

...

await crawler.run(['example.com']);
await crawler.run();

Will only crawl example.com once, then stop and purge the RQ / dataset, so the second crawler.run() call will yield no results.

I suppose this is expected, but we could easily add a crawler.pause() method, which would keep the inner state for resuming with crawler.run().

Closes #2777

@barjin barjin requested review from janbuchar and B4nan January 6, 2025 15:51
@barjin barjin self-assigned this Jan 6, 2025
@github-actions github-actions bot added this to the 105th sprint - Tooling team milestone Jan 6, 2025
@github-actions github-actions bot added the t-tooling Issues with this label are in the ownership of the tooling team. label Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t-tooling Issues with this label are in the ownership of the tooling team.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: implement a way to stop crawler from the user function
1 participant