-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High performance: Store gets flooded when too many pages are crawled #28
Comments
Hi, if you could issue a PR that would be awesome! 👍 |
the actual problem comes from httpoinson, the underlying library for making the requests. |
checking how the library works, by using a Genserver.cast in the worker |
Hi @happysalada, thanks for doing more investigation! To be honest I haven't had chance to use my library for a while so I don't remember much off the top of my head. I welcome PR fixes! :) |
I'm doing research on what the best options are to pass to hackney. |
So it's been a few years.... cough I've just pushed up v1.2.0 to address memory leak. Also, there's been some updates in httpoison and hackney too: edgurgel/httpoison#414 I couldn't reproduce this issue so I'm assuming it's resolved. Please feel free to reopen if there's more to discuss. :) |
if I try to launch 100 page to be crawled (each with depth 5)
after a bit the Store process get flooded and starts dropping messages
the upside of having a Registry is that the store is global and the crawler can be run from multiple machines.
The downside is that this single process will become a bottleneck for high performance.
Would you be open to using mnesia ? (fast, distributer, in memory db)
if you don't mind the distributed part, I would use an ets for the store, which should be able to handle more load.
the solution to this is to break the crawling of all those urls, and not send them all at the same time.
let me know if you are open to this, I'm open to putting a tentative PR
The text was updated successfully, but these errors were encountered: