- Propagate user_data values from Redis message to fetched pages
- Support for MongoDB driver ~> 2.0.6 has been added
- Minor code cleanup
- Adds RethinkDB Storage
- BugFix: Update and fix mongo driver v1.11.1 'upsert: 1' -> 'upsert: true'
- Organize and update specs to rspec 3
- BugFix: Better compatibility for mongo 2.6.x on index creation
- BugFix: When a page contains an error, Mongo trows
BSON::InvalidDocument
.Excpetion
is not serializable 31647cc
-
Major Code-Style changes and cleanup #35
-
BugFix: proper initialization of internal_queue #38
-
Better INT / TERM Signal handling #34
New option added:
enable_signal_handler: true / false
-
Zlib::GzipFile::Error handling da3b927
-
Faster and easier overflow management #39
- Add
PolipusCrawler#add_to_queue
to add a page back to the queue #24 - Introduce new block
PolipusCrawler#on_page_error
which runs when there was an error (Page#error
). For example a connectivity error. See/examples/error_handling.rb
#15 - Add
Page#success?
which returns true if HTTP code is something in between 200 and 206. - Polipus supports now
robots.txt
directives. Set the option:obey_robots_txt
totrue
. See/examples/robots_txt_handling.rb
#30 - Add support for GZIP and deflate compressed HTTP requests #26
- Minor improvements to code style