These are Scrapy for Gazette Machine. They are run from Zyte and the scraped URLs are posted into S3, from where Gazette Machine pulls them in.
To develop locally:
- clone this repo
- setup a virtualenv:
python3 -m venv env
- activate:
source env/bin/activate
- install dependencies:
pip install -r requirements.txt
To deploy:
- Install the Scraping Hub commandline client with
pip install shub
- Run
shub deploy
- In Zyte configure the spider's AWS and output settings, similar to the other spiders.
- In gazettemachine, update
settings.GM['SCRAPINGHUB_SPIDERS']
to include the new spider, if it should be run daily.
- AWS_ACCESS_KEY_ID: from AWS
- AWS_SECRET_ACCESS_KEY: from AWS
- FEED_FORMAT: csv
- FEED_URI:
s3://lawsafrica-gazettes-incoming/dropbox/<code>.csv