Skip to content

Library for extracting marketing attribution data from referer (sic) URLs

Notifications You must be signed in to change notification settings

astral303/referer-parser

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

referer-parser

referer-parser is a multi-language library for extracting marketing attribution data (such as search terms) from referer URLs, inspired by the ua-parser ua-parser project (an equivalent library for user agent parsing).

referer-parser is available in the following languages, each in a sub-folder of this repository:

referer-parser is a core component of SnowPlow snowplow, the open-source web-scale analytics platform powered by Hadoop and Hive.

Note that we always use the original HTTP misspelling of 'referer' (and thus 'referal') in this project - never 'referrer'.

Usage: Ruby

require 'referer-parser'

referer_url = 'http://www.google.com/search?q=gateway+oracle+cards+denise+linn&hl=en&client=safari'

r = RefererParser::Referer.new(referer_url)

puts r.known?                # =>  true
puts r.referer               # => 'Google'
puts r.search_parameter      # => 'q'     
puts r.search_term           # => 'gateway oracle cards denise linn'
puts r.uri.host              # => 'www.google.com'

For more information, please see the Ruby [README] ruby-readme.

Usage: Java

import com.snowplowanalytics.refererparser.Parser;

...

  String refererUrl = "http://www.google.com/search?q=gateway+oracle+cards+denise+linn&hl=en&client=safari";

  Parser refererParser = new Parser();
  Referal r = refererParser.parse(refererUrl);

  System.out.println(r.referer.name);       // => "Google"
  System.out.println(r.search.parameter);   // => "q"    
  System.out.println(r.search.term);        // => "gateway oracle cards denise linn"

For more information, please see the Java/Scala [README] java-scala-readme.

Usage: Scala

val refererUrl = "http://www.google.com/search?q=gateway+oracle+cards+denise+linn&hl=en&client=safari"

import com.snowplowanalytics.refererparser.scala.Parser
for (r <- Parser.parse(refererUrl)) {
  println(r.referer.name)      // => "Google"
  for (s <- r.search) {
    println(s.term)            // => "gateway oracle cards denise linn"
    println(s.parameter)       // => "q"    
  }
}

For more information, please see the Java/Scala [README] java-scala-readme.

Usage: Python

from referer_parser import Referer

referer_url = 'http://www.google.com/search?q=gateway+oracle+cards+denise+linn&hl=en&client=safari'

r = Referer(referer_url)

print(r.known)              # True
print(r.referer)            # 'Google'
print(r.search_parameter)   # 'q'     
print(r.search_term)        # 'gateway oracle cards denise linn'
print(r.uri)                # ParseResult(scheme='http', netloc='www.google.com', path='/search', params='', query='q=gateway+oracle+cards+denise+linn&hl=en&client=safari', fragment='')

For more information, please see the Python [README] python-readme.

search.yml

referer-parser identifies whether a URL is a known referer or not by checking it against the [search.yml] search-yml file; the intention is that this YAML file is reusable as-is by every language-specific implementation of referer-parser.

The file lists known search engines by name, and for each, gives a list of the parameters used in that search engine URL to identify the keywords and a list of domains that the search engine uses, for example:

google # Name of search engine referer
  parameters:
    - 'q' # First parameter used by Google
    - 'p' # Alternative parameter used by Google
  domains:
    - google.co.uk  # One domain used by Google
    - google.com    # Another domain used by Google
    - ...

The number of search engines and the domains they use is constantly growing - we need to keep search.yml up-to-date, and hope that the community will help!

In the future, we may augment search.yml with non-search engine referers - e.g. social networks like Facebook or affiliate networks like TradeDoubler. If you have any suggestions here, please [let us know] talk-to-us!

Contributing

We welcome contributions to referer-parser:

  1. New search engines and other referers - if you notice a search engine missing from search.yml, please fork the repo, add the missing entry and submit a pull request
  2. Ports of referer-parser to other languages - we welcome ports of referer-parser to new programming languages (e.g. JavaScript, PHP, C#, Haskell)
  3. Bug fixes, feature requests etc - much appreciated!

Support

General support for referer-parser is handled by the team at SnowPlow Analytics Ltd.

You can contact the SnowPlow Analytics team through any of the [channels listed on their wiki] talk-to-us.

Copyright and license

search.yml is based on [Piwik's] piwik [SearchEngines.php] piwik-search-engines, copyright 2012 Matthieu Aubry and available under the [GNU General Public License v3] gpl-license.

The original Ruby code is copyright 2012 SnowPlow Analytics Ltd and is available under the [Apache License, Version 2.0] apache-license.

The Java/Scala port is copyright 2012 SnowPlow Analytics Ltd and is available under the [Apache License, Version 2.0] apache-license.

The Python port is copyright 2012 [Don Spaulding] donspaulding and is available under the [Apache License, Version 2.0] apache-license.

About

Library for extracting marketing attribution data from referer (sic) URLs

Resources

Stars

Watchers

Forks

Packages

No packages published