Skip to content

laws-africa/gazettemachine-scrapers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gazette Machine Scrapers

These are Scrapy for Gazette Machine. They are run from Zyte and the scraped URLs are posted into S3, from where Gazette Machine pulls them in.

Development

To develop locally:

  1. clone this repo
  2. setup a virtualenv: python3 -m venv env
  3. activate: source env/bin/activate
  4. install dependencies: pip install -r requirements.txt

Deploying

To deploy:

  1. Install the Scraping Hub commandline client with pip install shub
  2. Run shub deploy
  3. In Zyte configure the spider's AWS and output settings, similar to the other spiders.
  4. In gazettemachine, update settings.GM['SCRAPINGHUB_SPIDERS'] to include the new spider, if it should be run daily.
  • AWS_ACCESS_KEY_ID: from AWS
  • AWS_SECRET_ACCESS_KEY: from AWS
  • FEED_FORMAT: csv
  • FEED_URI: s3://lawsafrica-gazettes-incoming/dropbox/<code>.csv

About

Scrapers for Gazette Machine

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages