Skip to content

vryazanov/scrapy-node-runner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UNDER DEVELOPMENT

node-runner

Node runner is a scrapy command designed to manage scrapy spiders via api.

This tool aims to:

  1. easily launch multiple scrapy spiders on the node
  2. expose api wich can be used by external scheduler
  3. support graceful shutdown processes to ensure data integrity and minimal disruption
  4. synchronizing its configuration with ZooKeeper, allowing seamless integration and accessibility by external scheduler

This command is supposed to used with scrapy-node-operator component which is under development now.

How to test locally

  1. start docker compose
  2. install deps with poetry install
  3. go into scrapy project with cd example
  4. start scrapy node with scrapy node
  5. send {"id": "uniq-id-1", "spider": "quotes"} to http://localhost:8000/start

Note: This document is subject to further updates.

About

A tool to manage scrapy spiders on a node

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages