Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds a functional TAMES scraper using a similar API to the old system scraper. #2

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions data/texappscraper/courts.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,8 @@
---
1:
name: First Court of Appeals
city: Houston
scraper: TAMESScraper
2:
name: Second Court of Appeals
city: Fort Worth
Expand All @@ -9,6 +13,18 @@
city: Austin
scraper: OldSystemScraper
site: http://www.3rdcoa.courts.state.tx.us
4:
name: Fourth Court of Appeals
city: San Antonio
scraper: TAMESScraper
5:
name: Fifth Court of Appeals
city: Dallas
scraper: TAMESScraper
6:
name: Sixth Court of Appeals
city: Texarkana
scraper: TAMESScraper
7:
name: Seventh Court of Appeals
city: Amarillo
Expand All @@ -19,6 +35,10 @@
city: El Paso
scraper: OldSystemScraper
site: http://www.8thcoa.courts.state.tx.us
9:
name: Ninth Court of Appeals
city: Texarkana
scraper: TAMESScraper
10:
name: Tenth Court of Appeals
city: Waco
Expand All @@ -29,3 +49,15 @@
city: Eastland
scraper: OldSystemScraper
site: http://www.11thcoa.courts.state.tx.us
12:
name: Twelfth Court of Appeals
city: Tyler
scraper: TAMESScraper
13:
name: Thirteenth Court of Appeals
city: Corpus Christi/Edunburg
scraper: TAMESScraper
14:
name: Fourteenth Court of Appeals
city: Houston
scraper: TAMESScraper
1 change: 1 addition & 0 deletions lib/texappscraper.rb
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
require "texappscraper/version"
require "texappscraper/old_system_scraper"
require "texappscraper/tames_scraper"
require "texappscraper/courts"
require "texappscraper/cacher"

Expand Down
111 changes: 111 additions & 0 deletions lib/texappscraper/tames_scraper.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
require 'texappscraper/throttled_agent'
require 'texappscraper/courts'

module TexAppScraper
class TAMESScraper
THROTTLE_DELAY = 3

BASE = 'http://www.search.txcourts.gov'

def initialize court, delay=THROTTLE_DELAY
delay = 0 if delay == :no_throttling
@court = court
@agent = ThrottledAgent.new delay
@court_number = court
@court = TexAppScraper::COURTS[court]
end

def scrape date
cases_with_opinions_on_day(date).each do |docket_number|
opinions_for_case(docket_number) do |o|
yield o
end
end
end

DAY_KEY = 'FullDate'
CASE_LINK_RE = %r{Case\.aspx\?cn=(\d\d-\d\d-\d\d\d\d\d-..)}
def cases_with_opinions_on_day day
url = "#{BASE}/Docket.aspx"
params = {
:coa => format("coa%02d", @court_number),
:FullDate => day.strftime('%m/%d/%Y'),
:p => 1
}
page = @agent.get url, params
page.links_with(:href => CASE_LINK_RE).map do |link|
CASE_LINK_RE.match(link.href)[1]
end.uniq
end

CASE_URL = "#{BASE}/Case.aspx"
def opinions_for_case docket_number
page = @agent.get CASE_URL, { :cn => docket_number, :p => 1 }
the_case = case_metadata page
yield_opinions page do |o|
yield ({
:case => the_case,
:date => o[:date],
:type => o[:type],
:url => o[:url]
})
end
end

OPINION_TYPES = {
'Opinion issued' => :opinion,
'Memorandum opinion issued' => :memorandum
}
def yield_opinions page
page.search('.//tr[@class="rgRow" or @class="rgAltRow"]').each do |row|
tds = row.search('./td').to_a
type = tds[1].text
next unless tds.count == 5
if OPINION_TYPES.keys.include? type
links = tds[4]
date = Date.strptime tds[0].text.strip, '%m/%d/%Y'
links.search('.//tr').each do |linkrow|
label = linkrow.css('td').first.text
if /opinion/i =~ label
yield ({
:type => OPINION_TYPES[type],
:date => date,
:url => BASE + '/' + linkrow.css('a').attr('href')
})
end
end
end
end
end

META = '//*[@id="ctl00_ContentPlaceHolder1_tblContent"]//tr[2]/td/table//tr/td/table//tr[3]/td/table//tr/td/table//tr'
META_KEYS = {
'Case Number:' => :docket_number,
'Style:' => :style,
'v.:' => :versus
}
META_FORMAT = {
:filed => lambda { |x| Date.strptime(x, '%m/%d/%Y')}
}
def case_metadata page
meta = {}
page.search(META).to_a.each do |tr|
key = tr.at_css('td.BreadCrumbs').text.strip
next unless META_KEYS.keys.include? key
value = tr.at_css('td.TextNormal').text.gsub("\u00A0"," ").strip
meta_key = META_KEYS[key]
format = META_FORMAT[meta_key]
meta[meta_key] = format.nil? ? value : format.call(value)
end
if meta[:versus] && meta[:versus].length > 0
meta[:style] = "#{meta[:style]} v. #{meta[:versus]}"
end
return({
:court => @court_number,
:docket_number => meta[:docket_number],
:style => meta[:style]
})
end

end
end
9 changes: 0 additions & 9 deletions spec/cloud_uploader_spec.rb

This file was deleted.

851 changes: 851 additions & 0 deletions spec/fixtures/case-01-11-01033-CV

Large diffs are not rendered by default.

775 changes: 775 additions & 0 deletions spec/fixtures/case-01-12-00584-CV

Large diffs are not rendered by default.

589 changes: 589 additions & 0 deletions spec/fixtures/case-01-12-01013-CV

Large diffs are not rendered by default.

635 changes: 635 additions & 0 deletions spec/fixtures/case-01-12-01022-CV

Large diffs are not rendered by default.

487 changes: 487 additions & 0 deletions spec/fixtures/day-1-20130110

Large diffs are not rendered by default.

17 changes: 17 additions & 0 deletions spec/fixtures/day-1-20130110.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
---
- 01-11-00488-CV
- 01-11-00827-CV
- 01-11-01033-CV
- 01-12-00211-CV
- 01-12-00491-CV
- 01-12-00607-CV
- 01-12-00951-CV
- 01-12-01021-CV
- 01-12-01056-CV
- 01-12-01078-CV
- 01-11-00729-CR
- 01-11-00977-CR
- 01-11-00978-CR
- 01-11-00979-CR
- 01-12-00429-CR
- 01-12-00689-CR
Loading