Skip to content

marshallmick007/newsman

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

71 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Newsman

A simple RSS Feed Finder and Info Gatherer

Test Parser Feeds

ATOM

RSS

Installation

Add this line to your application's Gemfile:

gem 'newsman', :git => 'https://github.com/marshallmick007/newsman.git'

And then execute:

$ bundle

Or install it yourself as:

$ gem install newsman

Usage

Finding A Feed

url = "https://gigaom.com/"
hunter = Newsman::FeedHunter.new
options = {
          :strict_header_links => true,
          :search_wellknown_locations => true,
          :parse_body_links => false
       }

feeds = hunter.find_feeds url, options

feeds returns a hash of Feed Titles and URI's

{
             "Gigaom » Feed" => #<URI::HTTPS URL:https://gigaom.com/feed/>,
    "Gigaom » Comments Feed" => #<URI::HTTPS URL:https://gigaom.com/comments/feed/>
}

Gathering Info About a Feed

url = "http://www.theregister.co.uk/security/headlines.atom"
parser = Newsman::RssParser.new
info = parser.fetch url

Returns a RssInfo object containing the raw ruby RSS feed, the raw feed data, and limited parsed information available in the to_h method

info.to_h
{
               :url => "http://www.theregister.co.uk/security/headlines.atom",
             :title => "The Register - Security",
        :item_count => 50,
         :feed_type => :atom,
    :published_date => 2015-02-21 17:15:18 UTC,
             :error => nil
}

Changelog

0.8.1 - Adds canonical_id to Newsman::Post

0.8.0 - Adds comments_url to Newsman::Post

0.7.9 - New Newsman::FeedParser fetch option :keep_source_order which prevents Newsman from attempting to sort a feeds posts by date.

0.7.7 - Computes most recent entry on Feed in the most_recent_entry property

0.7.6 - supports loading feeds from disk, and added static convienence methods to Newsman::FeedParser

0.7.5 - Removing any non-UTF8 Characters from SiteInfo title

0.7.2 - Added Newsman::SiteInformation, which fetches feeds and site icons, as well as site title information

0.7.1 - Added :output_file options to FeedParser. Passing a file name will write the contents of the feed to that file

0.7.0 - Renamed RssPost to Post, RssInfo to Feed

0.6.2 - Added option :advanced_search_mode to feedhunter. Setting to :simple will skip parsing and following of body links

0.6.0 - ATOM feeds can sometimes return #content properties instead of #summary

0.5.8 - Accept headers to fix sites like The Economist who wanted to send RSS.xml as text/html

0.5.6 - Limit body link parsing to 75

0.5.5 - Support :open_timeout and :read_timeout for feed hunter

0.5.3 - Support an options hash to FeedHunter#find_feeds. Additional well-known places are inspected

0.5.1 - Basic support for finding feeds in well-known places

0.5.0 - Can extract links for FeedBlitz

0.4.0 - Can extract canonicalized links from feeds by passing option :parse_links => true

0.3.8 - Can parse content_encoded from feeds (http://www.swiss-miss.com/feed)

0.3.6 - Can parse dc_date from feeds now (http://alistapart.com/main/feed)

0.3.5 - feed.title is now null if it is not located in the RSS/ATOM source

0.3.2 - Can find Feedly "Subscription" links by parsing /http?:\/\/feedly.com\/i\/subscription\/feed\//

0.3.0 - Will now search for links in the body of the page that match wellknown providers like FeedBurner

0.2.4 - Parsing content-length HTTP header when available

0.2.2 - Alias :stats for :post_frequency_stats, added tracking of the download size of a feed

0.2.1 - Added ability to track when feeds contain all the same Publish Date. You can check post_frequency_stats[:type] for the value :same_pub_date

0.1.5 - Added ability to fetch link[rel] only for types of application/rss+xml and application/atom+xml by passing a strict boolean into the find_feeds method

Contributing

  1. Fork it ( https://github.com/marshallmick007/newsman/fork )
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create a new Pull Request

About

Find and Fetch RSS and ATOM feeds

Resources

License

MIT, MIT licenses found

Licenses found

MIT
LICENSE
MIT
LICENSE.txt

Stars

Watchers

Forks

Packages

No packages published

Languages