Skip to content
This repository was archived by the owner on Jul 19, 2018. It is now read-only.

Added chunkexports extension #50

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Conversation

gatufo
Copy link
Member

@gatufo gatufo commented Dec 22, 2014

An extension to break item exports into chunks.

Settings:

  • CHUNKED_FEED_URI: The feed uri to use for exporting (Overrides FEED_URI setting).
  • CHUNKED_FEED_FORMAT: The feed format to use for exporting (Overrides FEED_FORMAT setting).
  • CHUNKED_FEED_ITEMS_PER_CHUNK: Number of items included in each chunk.
  • CHUNKED_FEED_TIMESTAMP_FORMAT: The format to be used for the timestamp uri parameter.

Example:

    CHUNKED_FEED_URI = 'export_%(chunk_number)02d.json'
    CHUNKED_FEED_FORMAT = 'json'
    CHUNKED_FEED_ITEMS_PER_CHUNK = 100

For 250 items will generate the following files:

  • export_01.json (100 items)
  • export_02.json (100 items)
  • export_03.json (50 items)

Available uri format values:

  • chunk_number: The active chunk counter. (Starts in 1).
  • scrapy_job: The Scrapy job (if available).
  • scrapy_project_id: The Scrapy job id (if available).
  • timestamp: Current timestamp in UTC (formatted with CHUNKED_FEED_TIMESTAMP_FORMAT setting).

@redapple
Copy link
Contributor

redapple commented Nov 7, 2016

Hey @gatufo , what do you think of scrapy/scrapy#1545 ?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants