Skip to content

Commit

Permalink
Add script to save static pages after JS has run
Browse files Browse the repository at this point in the history
#15639

We want to be able to generate some pages as part of the static site
build after JavaScript has run on that page. This will allow these pages
to be populated with data from JavaScript but appear to progressively
enhance when deployed as part of our static site.

This script will fetch a list of URLs from an endpoint on the
application that should be scraped with JS enabled. It will then use
capybara-webkit to fetch those pages, run JS, and save those pages to
disk. The target files will be in the same location as `wget` uses which
will allow us to overwrite the static files already generated by `wget`
with these JS-enhanced versions.

Subsequent commits will need to:

* Create the endpoint that this script uses to determine which URLs to
  scrape
* Update the build script to run this after `wget` has run
  • Loading branch information
henare committed Jan 18, 2018
1 parent bd85183 commit a3f7702
Show file tree
Hide file tree
Showing 2 changed files with 79 additions and 1 deletion.
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ addons:
- cmake
script:
- bundle exec rake
- travis_wait 30 scripts/release.sh
- travis_wait 30 xvfb-run scripts/release.sh
- bash <(curl -fsSL https://github.com/everypolitician/ensure-regression-tests/raw/master/ensure-regression-tests)
sudo: false
rvm:
Expand Down
78 changes: 78 additions & 0 deletions scripts/save_javascript_pages.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
#!/usr/bin/env ruby
# frozen_string_literal: true

# Usage: save_javascript_pages.rb [URL with text list of URLs to scrape]
#
# This script accepts one argument, which is a URL that returns a text list of
# URLs that this script should scrape and save to disk after jQuery has finished
# running AJAX calls. Files are saved to a filesystem path matching the URL
# path, relative to where you run the script from. This matches `wget` behavior
# and allows this script to overwrite files previously scraped by `wget`.

require 'capybara-webkit'
require 'fileutils'
require 'open-uri'

class PageAfterAJAX
attr_reader :page, :url

def initialize
Capybara::Webkit.configure(&:allow_unknown_urls)
@page = Capybara::Session.new(:webkit)
end

def save(url)
@url = url
visit_and_wait
restore_pre_js_page_classes
write_page_to_disk
puts "Saved #{filename}"
end

private

def write_page_to_disk
create_parent_directories
File.write(filename, page.body)
end

def visit_and_wait
page.visit(url)
wait_for_ajax
end

# Restores page classes modified by running JS on the page
def restore_pre_js_page_classes
page.execute_script("$('html').addClass('no-js')")
page.execute_script("$('html').removeClass('flexwrap')")
end

def create_parent_directories
FileUtils.mkdir_p(File.dirname(filename))
end

def filename
url_path[-5..-1] == '.html' ? url_path : "#{url_path}.html"
end

def url_path
URI.parse(url).path[1..-1]
end

def wait_for_ajax
Timeout.timeout(Capybara.default_max_wait_time) do
loop until finished_all_ajax_requests?
end
end

def finished_all_ajax_requests?
page.evaluate_script('jQuery.active').zero?
end
end

page = PageAfterAJAX.new
javascript_pages_to_scrape = open(ARGV[0]).read.split

javascript_pages_to_scrape.each do |url|
page.save(url)
end

0 comments on commit a3f7702

Please sign in to comment.