Webapp Snapshot Java

This project contains a Java middleware implementation and a direct invocation api to leverage existing web page snapshotting service providers. The web page snapshotting services snapshot a web page after executing any Javascript to enable search engines / bots to parse the post-javascript rendered page (since bots / search engines don't execute javascript). This is helpful if you have a javascript web app (backbone, angular, emberjs, etc.)

There are two parts to this code

Java middleware implemented by a servlet filter that detects if a search-engine / bot is making a request and if so leverages the web page snapshotting service to return a response.
An api to explicitly snapshot your web pages.

The code is based upon https://github.com/greengerong/prerender-java. The ways it differs from that project are:

multiple web app snapshotting service support - built in support for prerender.io and ajaxsnapshots.com. Or you can add support for another service.
open source project support - it has a token provider api for open source projects that don't want to put their snapshot service token in their web.xml.
app engine support - switched from org.apache.httpcomponents to HttpURLConnection to avoid socket read exceptions: http://stackoverflow.com/questions/23103124/unable-to-adjust-socket-timeout-when-using-org-apache-httpcomponents-with-app-en
supports explicit snapshotting in addition to servlet filter snapshotting

Note: If you are using a # in your urls, make sure to change it to #!. View Google's ajax crawling protocol

Note: Make sure you have more than one webserver thread/process running because the snapshotting service will make a request to your server to render the HTML.

Middleware / Servlet Filter

How the servlet filter works:

Check if a webpage snapshot is required
1. Check if the request is from a crawler (_escaped_fragment_ or agent string)
2. Check to make sure we aren't requesting a resource (js, css, etc...)
3. (optional) Check to make sure the url is in the whitelist
4. (optional) Check to make sure the url isn't in the blacklist
If a snapshot is required
1. (optional) Invoke SeoFilterEventHandler.beforeSnapshot to check if a snapshot is available. If so, use this as the snapshot and skip the remaining steps.
2. Make a request to the snapshotting service to get a snapshot.
3. (optional) Invoke SeoFilterEventHandler.afterSnapshot with the snapshot (for persistence / logging)
4. return the snapshot result to the crawler

Installing the servlet filter

[1] Modify your pom.xml

<dependency>
  <groupId>com.github.avaliani.snapshot</groupId>
  <artifactId>webapp-snapshot-java</artifactId>
  <version>1.0</version>
</dependency>

[2] Modify your web.xml (you will probably want to add this filter prior to all other filters)

<filter>
    <filter-name>SeoFilter</filter-name>
    <filter-class>com.github.avaliani.snapshot.SeoFilter</filter-class>
    <init-param>
        <param-name>snapshotService</param-name>
        <param-value>com.github.avaliani.snapshot.AjaxSnapshotsSnapshotService</param-value>
    </init-param>
    <init-param>
        <param-name>snapshotServiceToken</param-name>
        <param-value>{token}</param-value>
    </init-param>
</filter>
<filter-mapping>
    <filter-name>SeoFilter</filter-name>
    <url-pattern>/*</url-pattern>
</filter-mapping>

Servlet filter initialization parameters:

All parameters are optional except the parameter used to specify the snapshot service token: snapshotServiceToken or snapshotServiceTokenProvider. By default AjaxSnapshotsSnapshotService is used as the snapshotService.

Snapshot service parameters:

snapshotService - the snapshotting service. Two built in services are available: (1) com.github.avaliani.snapshot.AjaxSnapshotsSnapshotService and (2) com.github.avaliani.snapshot.PrerenderSnapshotService. Or you can implement your own.
snapshotServiceHeaders - headers to use when making a request to the snapshotting service. Specified as semi-colon seperated headerName and headerValue pairs, e.g. "X-AJS-SNAP-TIME=2000;{headerName2}={headerValue2}"
- AjaxSnapshotsSnapshotService headers documentation
snapshotServiceToken - specifies the snapshot service token
snapshotServiceTokenProvider - used if you want to generate your snapshot service token from a class and not from web.xml. The class must implement com.github.avaliani.snapshot.SnapshotServiceTokenProvider
snapshotServiceUrl - used to specify an explicit url for the snapshotting service. If not specified the default url for the snapshotting service will be used.

Request selection parameters:

crawlerUserAgents - additional user agents to check for
whitelist - if set and the request url is not in the whitelist it is not snapshotted
blacklist - if set and the request url is in the blacklist it is not snapshotted

Other parameters:

loggingLevel - java logging Level at which to write logs. Default logging level is FINE.
seoFilterEventHandler - event handler to be invoked before and after taking snapshots.

Snapshot API

See com.github.avaliani.snapshot.SnapshotService for the API. Two built in services are available: (1) com.github.avaliani.snapshot.AjaxSnapshotsSnapshotService and (2) com.github.avaliani.snapshot.PrerenderSnapshotService.

Testing

If you want to make sure your pages are rendering correctly:

Open the Developer Tools in Chrome (Cmd + Atl + J)
Click the Settings gear in the bottom right corner.
Click "Overrides" on the left side of the settings panel.
Check the "User Agent" checkbox.
Choose "Other..." from the User Agent dropdown.
Type googlebot into the input box.
Refresh the page (make sure to keep the developer tools open).

License

The MIT License (MIT)

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
src/main/java/com/github/avaliani/snapshot		src/main/java/com/github/avaliani/snapshot
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Webapp Snapshot Java

Middleware / Servlet Filter

How the servlet filter works:

Installing the servlet filter

Servlet filter initialization parameters:

Snapshot API

Testing

License

About

Releases

Packages

cainaf/webapp-snapshot-java

Folders and files

Latest commit

History

Repository files navigation

Webapp Snapshot Java

Middleware / Servlet Filter

How the servlet filter works:

Installing the servlet filter

Servlet filter initialization parameters:

Snapshot API

Testing

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages