-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Crawler #37
base: master
Are you sure you want to change the base?
Refactor Crawler #37
Conversation
👍 So let's the topic open. What about to start to write a WISH/TODO list about features and improvements we envision in the next releases? So that we have a better context on what should be next and how to move forward? |
The WISHES and TODOsFeel free to add your point in this comment or to
|
Very good points! Thanks alot for your thoughts and your help Going to open a separated issue/thread for each items so that it is easily to keep track of them. On top of my mind:
|
I also don't really know what to do with the current plugin implementation. Next, the existing plugins I would propose to allow the plugins access to page and also to move every single configurable feature to the plugin architecture. This way, someone could replace single features with his own implementation or simply get a slimmer crawler when he does not need some of the features provided. I imagine it like the Middleware of Rack or Sidekiq. |
Me either :) My initial concept was to create an architecture where user's code could run into polipus scope. But I didn't invest much time. I'm also fine to drop the current implementation and explore a Middleware-like implementation (that actually seems a very good idea!!!) |
As a result from #33, I reconsidered the current structure or
PolipusCrawler
.Especially
PolipusCrawler#takeover
is a very long method where lots is going on at the same time.PolipusCrawler
itself has lots of methods and is responsible for everything whats going on in Polipus.I consider this pull request more a proof of concept and a starting point for a discussion.
I would like to move all methods of
PolipusCrawler
to its own classes or plugins so that every class has its own responsibility.For now, I moved most of
PolipusCrawler#takeover
toWorker#run
and split itself again in smaller methods.This would allow a more thorough testing of only specific features of polipus without running the full stack.
The delegator used in
Worker
is more a temporary solution.We could allow plugins to hook into
should_be_visited?
and add then a robots plugin, a follow_links_like plugin and a store_pages plugin.A statistics plugin would replace incr_pages and incr_errors and hook into
on_after_download
andon_page_error