This tool scrapes the EPA search engine, grabbing urls that match a list of queries, and combining them into a large CSV file that lists links
The purpose is to get a list of valid urls that match a given query from epa.gov
The go language is not necessary to run the tool, a compiled binary is attached
- If you're not running os x, you'll need to compile the code. First make sure you've downloaded the [Go Programming languge][golang.org/download] and run
go build
from the epa_search_urls directory. - Edit queries.csv, placing a query on each line.
- using the command line, run:
./epa_search_urls
- Wait for the tool to finish scraping.
Once finished, a series of csv files representing each page will be in the results directory, and a combined file of all results will be created at
Sort & De-Duplicate the concatinated results