Skip to content

Commit

Permalink
✨ add command sync.catalog for prefetching catalog from R2 to local
Browse files Browse the repository at this point in the history
  • Loading branch information
Marigold committed Apr 16, 2024
1 parent 01dba80 commit 960b51c
Showing 1 changed file with 9 additions and 0 deletions.
9 changes: 9 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ help:
@echo ' make format-all Format code (including modules in lib/)'
@echo ' make full Fetch all data and run full transformations'
@echo ' make grapher Publish supported datasets to Grapher'
@echo ' make sync.catalog Sync catalog from R2 into local data/ folder'
@echo ' make lab Start a Jupyter Lab server'
@echo ' make publish Publish the generated catalog to S3'
@echo ' make api Start the ETL API on port 8081'
Expand Down Expand Up @@ -118,6 +119,14 @@ prune: .venv
@echo '==> Prune datasets with no recipe from catalog'
poetry run etl d prune

# Syncing catalog is useful if you want to avoid rebuilding it locally from scratch
# which could take a few hours. This will download ~10gb from the main channels
# (meadow, garden, open_numbers) and is especially useful when we increase ETL_EPOCH
# or update regions.
sync.catalog: .venv
@echo '==> Sync catalog from R2 into local data/ folder (~10gb)'
rclone sync owid-r2:owid-catalog/ data/ --verbose --fast-list --transfers=64 --checkers=64 --include "/meadow/**" --include "/garden/**" --include "/open_numbers/**"

grapher: .venv
@echo '==> Running full etl with grapher upsert'
poetry run etl run --grapher
Expand Down

0 comments on commit 960b51c

Please sign in to comment.