Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactors upload to R2 to handle via chunks, reformats manifest #56

Merged
merged 4 commits into from
Apr 1, 2024

Conversation

jowparks
Copy link
Contributor

@jowparks jowparks commented Mar 29, 2024

Changes

Now uploads chunks of files. Process is as follows:

  1. Beings creation of file by reading manifest.json from remote URL. If not present, starts archival from beginning. If present, it begins at end of last chunk (if chunk was finalized), or beginning of block if chunk was not finalized (will overwrite outdated chunk).
  2. Once 50MB is reached, file chunk is terminated. A separate blocks.byteRanges.csv is created to provide byte ranges for each block
  3. If 50MB isn't reached in 1 day, the new upload is created anyways, but it is not finalized
  4. blocks binary is not compressed since very little compression occurs. blocks.byteRanges.csv is gz compressed.
  5. manifest.json is updated to contain new block. If last block recorded in manifest was not finalized, it is overwritten by new upload. Otherwise it is appended to end of manifest.json

This way, a new snapshot is created each 24 hours (whether 50MB limit is reached or not), and references the correct folder paths.

NOTE: Lifecycle policy on the buckets are infinite retention for finalized files. For non-finalized, the policy sets to 7 day retention (since they will be overwritten).

General implementation notes:

  • manifest.json is now in new format below.
  • Updates tests
  • Updates example/scripts/download-blocks.ts to reflect new structure

manifest.json

{
  "chunks": [
    {
      "blocks": "finalized/1711996484491/blocks",
      "byteRangesFile": "finalized/1711996484491/blocks.byteRanges.csv.gz",
      "timestamp": 1711996484491,
      "range": {
        "start": 1,
        "end": 104533
      },
      "finalized": true
    },
    {
      "blocks": "finalized/1711996494758/blocks",
      "byteRangesFile": "finalized/1711996494758/blocks.byteRanges.csv.gz",
      "timestamp": 1711996494758,
      "range": {
        "start": 104534,
        "end": 208613
      },
      "finalized": true
    },
    {
      "blocks": "finalized/1711996505056/blocks",
      "byteRangesFile": "finalized/1711996505056/blocks.byteRanges.csv.gz",
      "timestamp": 1711996505056,
      "range": {
        "start": 208614,
        "end": 312360
      },
      "finalized": true
    },
    {
      "blocks": "finalized/1711996514952/blocks",
      "byteRangesFile": "finalized/1711996514952/blocks.byteRanges.csv.gz",
      "timestamp": 1711996514952,
      "range": {
        "start": 312361,
        "end": 422949
      },
      "finalized": true
    }
  ]
}

Testing

Testing screenshots and logs:

Building block cache... - undefined:41
Starting uploader... - undefined:45
Server is running on port 3000
No manifest.json, starting upload from beginning...
Creating new upload, beginning at block 1... - undefined:76
Chunk size reached, finishing file creation... - undefined:113
New file upload created, size 50.0004 MB, blocks: 104532 - undefined:126
Upload: begin... - undefined:80
Upload: binary file complete: finalized/1711996484491/blocks - undefined:82
Gzipping file complete: blocks.byteRanges.csv.gz - undefined:194
Upload: bytes range file complete: finalized/1711996484491/blocks.byteRanges.csv.gz - undefined:85
Upload: updating manifest json file complete: manifest.json - undefined:88
Creating new upload, beginning at block 104534... - undefined:76
Chunk size reached, finishing file creation... - undefined:113
New file upload created, size 50.0001 MB, blocks: 208612 - undefined:126
Upload: begin... - undefined:80
Upload: binary file complete: finalized/1711996494758/blocks - undefined:82
Gzipping file complete: blocks.byteRanges.csv.gz - undefined:194
Upload: bytes range file complete: finalized/1711996494758/blocks.byteRanges.csv.gz - undefined:85
Upload: updating manifest json file complete: manifest.json - undefined:88
Creating new upload, beginning at block 208614... - undefined:76
Chunk size reached, finishing file creation... - undefined:113
New file upload created, size 50.0009 MB, blocks: 312359 - undefined:126
Upload: begin... - undefined:80
Upload: binary file complete: finalized/1711996505056/blocks - undefined:82
Gzipping file complete: blocks.byteRanges.csv.gz - undefined:194
Upload: bytes range file complete: finalized/1711996505056/blocks.byteRanges.csv.gz - undefined:85
Upload: updating manifest json file complete: manifest.json - undefined:88
Creating new upload, beginning at block 312361... - undefined:76
Chunk size reached, finishing file creation... - undefined:113
New file upload created, size 50.0001 MB, blocks: 422948 - undefined:126
Upload: begin... - undefined:80
Upload: binary file complete: finalized/1711996514952/blocks - undefined:82
Gzipping file complete: blocks.byteRanges.csv.gz - undefined:194
Upload: bytes range file complete: finalized/1711996514952/blocks.byteRanges.csv.gz - undefined:85
Upload: updating manifest json file complete: manifest.json - undefined:88
Creating new upload, beginning at block 422950... - undefined:76
15.5464/50.0000 MB written, sequence: 457752, hours since last upload: 0.00/24, waiting for next block... - undefined:104
Screenshot 2024-04-01 at 11 37 54 AM Screenshot 2024-04-01 at 11 35 44 AM

Fixes IFL-2421

@jowparks jowparks marked this pull request as ready for review April 1, 2024 18:40
@jowparks jowparks merged commit ff57392 into main Apr 1, 2024
1 check passed
@dguenther dguenther deleted the chunk-lightblock-upload branch April 29, 2024 20:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants