Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Self-Host] No Engines Left Error leaves client hanging #1066

Open
aanokh opened this issue Jan 15, 2025 · 2 comments
Open

[Self-Host] No Engines Left Error leaves client hanging #1066

aanokh opened this issue Jan 15, 2025 · 2 comments

Comments

@aanokh
Copy link

aanokh commented Jan 15, 2025

Describe the Issue
When the crawler encounters the No Engines Left Error and the crawl job fails, a client using the crawl endpoint synchronously through the API hangs forever.

To Reproduce
Steps to reproduce the issue:

  1. Run the following code with the python api (I only have fetch engine turned on and the job errors):
app = FirecrawlApp(api_key="test", api_url="http://localhost:3002")
try:
    crawl_status = app.crawl_url(
        "https://www.omahasteaks.com/blog/steak-cooking-chart/?srsltid=AfmBOorYd-U47LCcmF2oyFeumkDx61J3XYL_DYqAqGXRaa75CS03K1S3",
        params={
            "limit": 10,
            "scrapeOptions": {"formats": ["markdown"]},
        }
    )
except:
    print("an error occurred")
  1. Firecrawl job fails (which is OK for my use case, but here is the error:)
firecrawl-worker-1              | 2025-01-15 04:46:45 error [queue-worker:processJob]: Error: All scraping engines failed! -- Double check the URL to make sure it's not broken. If the issue persists, contact us at [email protected].
firecrawl-worker-1              |     at scrapeURLLoop (/app/dist/src/scraper/scrapeURL/index.js:220:15)
firecrawl-worker-1              |     at async scrapeURL (/app/dist/src/scraper/scrapeURL/index.js:258:24)
firecrawl-worker-1              |     at async runWebScraper (/app/dist/src/main/runWebScraper.js:67:24)
firecrawl-worker-1              |     at async startWebScraperPipeline (/app/dist/src/main/runWebScraper.js:12:12)
firecrawl-worker-1              |     at async processJob (/app/dist/src/services/queue-worker.js:457:26)
firecrawl-worker-1              |     at async processJobInternal (/app/dist/src/services/queue-worker.js:168:28) {"module":"queue-worker","method":"processJob","jobId":"dcfbc7aa-af37-4d30-88a8-22f76f36c07e","scrapeId":"dcfbc7aa-af37-4d30-88a8-22f76f36c07e","crawlId":"0f32eff9-809b-4cff-af4d-b3a173c0aa48","teamId":"bypass"}
firecrawl-worker-1              | 2025-01-15 16:59:57 debug [queue-worker:processJob]: Declaring job as done... 
firecrawl-worker-1              | 2025-01-15 16:59:57 debug [crawl-redis:addCrawlJobDone]: Adding done crawl job to Redis... 
firecrawl-worker-1              | 2025-01-15 16:59:57 debug [queue-worker:processJob]: Logging job to DB... 
firecrawl-worker-1              | 2025-01-15 16:59:57 debug [crawl-redis:finishCrawl]: Marking crawl as finished. 
firecrawl-worker-1              | 2025-01-15 16:59:57 debug [queue-worker:processJobInternal]: Job failed 

Expected Behavior
I would expect the client to return an error or raise an exception.

Additional Context
The Firecrawl app keeps working normally after the job has errored and accepts more requests, which is the behaviour I am looking for. However the client hanging forever when its job errors instead of throwing an error is a problem for me.

@aanokh
Copy link
Author

aanokh commented Jan 15, 2025

Seems like printing out the constant status acquired from the async crawl function of the client reveals this:

{'success': True, 'status': 'scraping', 'total': 0, 'completed': 0, 'creditsUsed': 0, 'expiresAt': '2025-01-16T16:56:06.000Z', 'data': [], 'error': None, 'next': 'http://localhost:3002/v1/crawl/a71fe622-5d49-49e4-86f1-9444fa626566?skip=0'}
{'success': True, 'status': 'scraping', 'total': 0, 'completed': 0, 'creditsUsed': 0, 'expiresAt': '2025-01-16T16:56:06.000Z', 'data': [], 'error': None, 'next': 'http://localhost:3002/v1/crawl/a71fe622-5d49-49e4-86f1-9444fa626566?skip=0'}

so the job is still marked as scraping for some reason

@samipshah100
Copy link

same issue. did you manage to solve it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants