Skip to content
This repository has been archived by the owner on May 5, 2021. It is now read-only.

mle2718/scrapers

Repository files navigation

This is a small collection of files that scrape the internet. These have not been tested since 2019. I have archived this on May 5, 2021.

GARFO Scraper

The Greater Atlantic Regional Fisheries Office hosts a set of quota monitoring pages. The quota monitoring are updated every week, but the old versions are not archived on the internet. This collection of code parses the GARFO quota monitoring tables and stores the data contained in those tables. The construction (column headings in particular) of the tables varies slightly by FMP, so slightly different code is often required.

R scripts

These R scripts should run with very minor changes to directories

readin_sectors_from_web.R is an R script to download and parse the Sector Summary html tables.

readin_commonpool_from_web.R is an R script to download and parse the and Common Pool Summary

readin_others_from_web.R is an R script to download and parse the herring, haddock catch cap, RHS_mackerel, and RHS_herring html tables.

readin_mid_species_from_web.R is an R script to download and parse the some of the mid-atlantic tables: Bluefish, Black Sea Bass, Fluke, Dogfish, and Scup. These tables are differently stored than the groundfish and RH tables.

Stata

  • batch_download_quota_monitoring.do is a stata .do file that calls the scripts above. It makes some simple exploratory graphs and copies data and graphs to a shared drive where people can see it. You'll need stata to run this file.

A Federal Register Scraper

federal_register_scraper.do will use the federalregister api to download federal register documents that match search terms.

About

A collection of scraping code

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published