Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: bundle and tag all related blobs #553

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

islamaliev
Copy link
Contributor

Resolves recallnet/entanglement#25

After uploading a blob and entangling it a new hash sequence is being created that bundles all relevant blob hashes: for original blob, metadata blob and parity blobs.
Instead of returning an original blob's hash we return the hash of the hash sequence.

During the upload process all the blobs are assigned auto tags that are deleted in the end of the upload. The hash sequence is being assigned a "temp-{hash_seq_hash}" tag that will eventually be replaced with "stored-{hash_seq_hash}" by the validator.

@islamaliev islamaliev self-assigned this Feb 28, 2025
@islamaliev
Copy link
Contributor Author

islamaliev commented Feb 28, 2025

here are some console outputs after uploading a blob:

$ iroh blobs list blobs
 pbqupxj2nul4pzli7sy6wuudhl2q44l77p5esbvt6pmezqi4ayaq (256.00 KiB) <- parity blob 1
 ps774yacwmkbhwmwaqdtmuru5sf5apco5uiplyfa2tiymwxkizka (256.00 KiB) <- parity blob 2
 qgxlv6zva5vgobqkfit6i6unbznxz7g7hjsiv2wjqlcpemr6tqka (256.00 KiB) <- parity blob 3
 uga6ypqnqyzpoozzuiznadi5f764dteh54tcekqyo3e6jxfgx2ra (160 B) <- hash_seq
 xmcorssflnxtuiocgbecra7xjpo2hvhy4irdb3nd6cu6jxk72g6q (328 B) <- metadata
 ytwgmhbmhfbrl7vpxhclxmwjbv3nkyesj4oxcfwnrr2f5o5ok3aa (255.01 KiB) <- original blob
$ iroh tags list
"stored-uga6ypqnqyzpoozzuiznadi5f764dteh54tcekqyo3e6jxfgx2ra": uga6ypqnqyzpoozzuiznadi5f764dteh54tcekqyo3e6jxfgx2ra (Raw)
$ recall bu query --address 0xff0000000000000000000000000000000000007f 
{
  "objects": [
    {
      "key": "cargo",
      "value": {
        "hash": "uga6ypqnqyzpoozzuiznadi5f764dteh54tcekqyo3e6jxfgx2ra",
        "size": 261128,
        "metadata": {
          "content-type": "application/octet-stream"
        }
      }
    }
  ],
  "common_prefixes": [],
  "next_key": null
}
$ recall bu  get --address 0xff0000000000000000000000000000000000007f cargo > bla.toml
✨  Downloaded object in 0 seconds (hash=vk7api6jjzykojjy6fgyj3phbvo6lfz37z7izfortszssmyo23la; size=261128)
$ ll Cargo.lock 
-rw-r--r--  1 islam  staff   255K Feb 24 17:22 Cargo.lock

Copy link
Contributor

@sanderpick sanderpick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks on the right track... I left some clarifying questions.

@@ -442,16 +442,76 @@ async fn handle_object_upload(
})
})?;

let hash_seq_hash = tag_entangled_data(&iroh, &ent, &metadata_hash)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't the entangler tag it's own data? ie, it could return some list of temp tags (UUID / similar) instead of relying on the auto tag and having to scan all tags below? I think we need to avoid any design that does a full scan of tags.

.try_filter_map(|tag| {
let cloned_hashes = hashes.clone();
async move {
if cloned_hashes.contains(&tag.hash) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the format of these "temp" tags?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are auto tags generated by iroh

@@ -681,6 +731,72 @@ async fn handle_object_download<F: QueryClient + Send + Sync>(
}
}

async fn extract_blob_hash_and_size(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there any similar changes we need to make to the SDK to accommodate these changes? maybe not, just double checking.

Copy link
Contributor Author

@islamaliev islamaliev Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends... I showed you some terminal outputs. There is the blob (or object) that is uploaded, but the hash that is displayed is the hash of the wrapping seq hash. The size though is still for a single blob, not for the whole seq hash bundle with parity blobs.

I'm not sure if users should know all internals of the structure of blobs. Let me know if it's fine like this, otherwise I can add whatever we need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use Iroh tagging for all hashes
2 participants