Add script to save static pages after JS has run

#15639 We want to be able to generate some pages as part of the static site build after JavaScript has run on that page. This will allow these pages to be populated with data from JavaScript but appear to progressively enhance when deployed as part of our static site. This script will fetch a list of URLs from an endpoint on the application that should be scraped with JS enabled. It will then use capybara-webkit to fetch those pages, run JS, and save those pages to disk. The target files will be in the same location as `wget` uses which will allow us to overwrite the static files already generated by `wget` with these JS-enhanced versions. Subsequent commits will need to: * Create the endpoint that this script uses to determine which URLs to scrape * Update the build script to run this after `wget` has run
everypolitician · Jan 18, 2018 · a3f7702 · a3f7702
1 parent bd85183
commit a3f7702
Show file tree

Hide file tree

Showing 2 changed files with 79 additions and 1 deletion.
diff --git a/.travis.yml b/.travis.yml
@@ -9,7 +9,7 @@ addons:
     - cmake
 script:
   - bundle exec rake
-  - travis_wait 30 scripts/release.sh
+  - travis_wait 30 xvfb-run scripts/release.sh
   - bash <(curl -fsSL https://github.com/everypolitician/ensure-regression-tests/raw/master/ensure-regression-tests)
 sudo: false
 rvm:

diff --git a/scripts/save_javascript_pages.rb b/scripts/save_javascript_pages.rb
@@ -0,0 +1,78 @@
+#!/usr/bin/env ruby
+# frozen_string_literal: true
+
+# Usage: save_javascript_pages.rb [URL with text list of URLs to scrape]
+#
+# This script accepts one argument, which is a URL that returns a text list of
+# URLs that this script should scrape and save to disk after jQuery has finished
+# running AJAX calls. Files are saved to a filesystem path matching the URL
+# path, relative to where you run the script from. This matches `wget` behavior
+# and allows this script to overwrite files previously scraped by `wget`.
+
+require 'capybara-webkit'
+require 'fileutils'
+require 'open-uri'
+
+class PageAfterAJAX
+  attr_reader :page, :url
+
+  def initialize
+    Capybara::Webkit.configure(&:allow_unknown_urls)
+    @page = Capybara::Session.new(:webkit)
+  end
+
+  def save(url)
+    @url = url
+    visit_and_wait
+    restore_pre_js_page_classes
+    write_page_to_disk
+    puts "Saved #{filename}"
+  end
+
+  private
+
+  def write_page_to_disk
+    create_parent_directories
+    File.write(filename, page.body)
+  end
+
+  def visit_and_wait
+    page.visit(url)
+    wait_for_ajax
+  end
+
+  # Restores page classes modified by running JS on the page
+  def restore_pre_js_page_classes
+    page.execute_script("$('html').addClass('no-js')")
+    page.execute_script("$('html').removeClass('flexwrap')")
+  end
+
+  def create_parent_directories
+    FileUtils.mkdir_p(File.dirname(filename))
+  end
+
+  def filename
+    url_path[-5..-1] == '.html' ? url_path : "#{url_path}.html"
+  end
+
+  def url_path
+    URI.parse(url).path[1..-1]
+  end
+
+  def wait_for_ajax
+    Timeout.timeout(Capybara.default_max_wait_time) do
+      loop until finished_all_ajax_requests?
+    end
+  end
+
+  def finished_all_ajax_requests?
+    page.evaluate_script('jQuery.active').zero?
+  end
+end
+
+page = PageAfterAJAX.new
+javascript_pages_to_scrape = open(ARGV[0]).read.split
+
+javascript_pages_to_scrape.each do |url|
+  page.save(url)
+end