-
Notifications
You must be signed in to change notification settings - Fork 107
Configuration options explained
You'll find here detail explanations about each configuration options available on LinkThumbnailer.
Maximum number of http
redirection allowed. If LinkThumbnailer cannot resolve given URL before redirect_limit
is reach, it will raise a LinkThumbnailer::RedirectLimit
exception.
Default is 3
You can set the http
user agent used to resolve given URL.
Default is link_thumbnailer
.
You can activate/deactivate SSL verification for each LinkThumbnailer requests.
Default is true
.
The amount of time in seconds to wait for a connection to be opened. If the HTTP object cannot open a connection in this many seconds, it raises a Net::OpenTimeout
exception.
See here for more details.
Default is 5
.
This is a list of backlisted URL pattern (using regex) to skip when LinkThumbnailer will fetch the website images. Use this option to filter advertising images.
Default are well known urls:
^http://ad\.doubleclick\.net/
^http://b\.scorecardresearch\.com/
^http://pixel\.quantserve\.com/
^http://s7\.addthis\.com/
This is a new option introduced in the v2
of LinkThumbnailer allowing you to explicitly tell what kind of HTML attributes you are expected to see.
LinkThumbnailer will do its best to find all given attributes in the provided website using the following scrapers
(order matter):
- OpenGraph protocol scraper
- Homemade custom scraper
See here for more informations about scrapers
and how to build your own.
Currently there are only the following attributes available:
title
description
images
See here for more informations about each attributes.
Default is [:title, :images, :description]
.
This is a new option introduced with the v2
of LinkThumbnailer allowing you to customize how LinkThumbnailer selects the best description for a given website.
When fetching all possible description candidates for a given website, LinkThumbnailer score each one of them according to each grader
return value. Each grader
will return a score (number, positive or negative) that will be added to the global score attached to the description being evaluated. LinkThumbnailer does the above for all possible description candidates in order to return the best description that describes the website.
See here for more informations about graders
and how to build your own.
Default are:
-
Length
grader will score description length -
HtmlAttribute
grader will score class's html node -
HtmlAttribute
grader will score id's html node -
Position
grader will score descriptions based on the order they appeared on the page. The first one are more likely to be reliable descriptions. -
LinkDensity
grader will score description link density
This is a new option introduced with the v2
of LinkThumbnailer allowing you to set description minimum length threshold to be taken as a candidate.
Default is 25
characters.
This is a new option introduced with the v2
of LinkThumbnailer allowing you to customize the word used to score class's html node and id's html node when using the HtmlAttribute
grader. Those are positive keywords.
Default is /article|body|content|entry|hentry|main|page|pagination|post|text|blog|story/i
.
This is a new option introduced with the v2
of LinkThumbnailer allowing you to customize the word used to score class's html node and id's html node when using the HtmlAttribute
grader. Those are negative keywords.
Default is /combx|comment|com-|contact|foot|footer|footnote|masthead|media|meta|outbrain|promo|related|scroll|shoutbox|sidebar|sponsor|shopping|tags|tool|widget|modal/i
.
This is a new option introduced with the v2
of LinkThumbnailer allowing you to set maximum number of images to fetch for a given website. Since fetching image informations has a cost (performing a http request for each images) you should consider setting a limit here.
Default is 5
images.