SEO Crawler

Bare-bones Basic SEO Crawler using Python Scrapy

This project has become part of the advertools package, checkout the documentation page

Using Scrapy, get the main SEO elements for exploratory analysis of a website. It works by supplying a list of known URLs to crawl and return structured results.

The main elements include:

url: the actual URL
slug: the URI part of the URL
directories: splits the URI by slashes to return the different folders (directories) in each URI
title: the <title> tag
h1, h2, h3, h4: header tags
description: the meta description
link_urls: not activated, needs special configuration to make sure you are getting links to certain sites
link_text: depends on the above, extracts the anchor text of each link
link_count: number of links on page (based on your criteria)
load_time: page load time in seconds
status_code: response code of page 200, 301, 404, etc.

Many other elements should be added to the list but they differ from site to site, some examples:

publishing date
product price
content category
tags of an article
whether or not a certain keyword is in a certain location
type of content (inferred from a URL directory, or from certain content on page)
etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

SEO Crawler

This project has become part of the advertools package, checkout the documentation page

Files

README.md

Latest commit

History

README.md

File metadata and controls

SEO Crawler

This project has become part of the advertools package, checkout the documentation page