Sketch City Hackathon Workshops: Intro to Webscraping with R

The Iron Yard | Saturday May 6 2017

Amanda Shih, Neeraj Tandon, Jeff Reichman, Meredith Maines, Randy & a packed house

Pre-Install Instructions

download R, install
download R-Studio, install
run script: source('https://raw.githubusercontent.com/HoustonUseRs/intro-to-web-scraping/master/scripts/setup.r')

Approach | Deconstructed

Pick a page
Load a page
Look at page
Determine where info is
Select info
Copy info
New file
Paste into new file
Save new file

Tool Kit

R
RStudio
Chrome Dev Tools
Chrome Selector Gadget Plugin
Text Editor

Getting Started

Go to website (www.houstontx.gov/departments.html)
Using Selector Gadget, click on the type of data you want. In this case, we want the names of all the people. So, by clicking on one of the names, the chosen name will be green. All other items matching the selector will be yellow. Go through and click on the yellow items that are unwanted, turning them red. Selector Gadget will sort through and figure out the proper/most succinct way to caputre the desired content. In this case, it is .table150 a:nth-child(1)
In R-Studio, make a new r script (Shift+command+N)
Recreate manual steps (above) in R.

####libary (rvest)

to run code,cmd + enter
copy url (http://www.houstontx.gov/departments.html)
save url as a variable: depts_url <- 'http://www.houstontx.gov/departments.html'
create a variable for the html depts_html <-read_html(depts_url)
depts_html <- read_html(depts_url)
`fileConn <- file("output.txt")'
writeLines (depts_emails, fileConn)


depts_url <- 'http://www.houstontx.gov/departments.html'
depts_html <- read_html(depts_url)


depts_html %>%
  html_nodes('.table150 a') %>%
    html_attr('href')```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

17_05_06_R_intro_to_webscraping.md

17_05_06_R_intro_to_webscraping.md

Sketch City Hackathon Workshops: Intro to Webscraping with R

The Iron Yard | Saturday May 6 2017

Amanda Shih, Neeraj Tandon, Jeff Reichman, Meredith Maines, Randy & a packed house

Pre-Install Instructions

Approach | Deconstructed

Tool Kit

Getting Started

Files

17_05_06_R_intro_to_webscraping.md

Latest commit

History

17_05_06_R_intro_to_webscraping.md

File metadata and controls

Sketch City Hackathon Workshops: Intro to Webscraping with R

The Iron Yard | Saturday May 6 2017

Amanda Shih, Neeraj Tandon, Jeff Reichman, Meredith Maines, Randy & a packed house

Pre-Install Instructions

Approach | Deconstructed

Tool Kit

Getting Started