-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: bundle and tag all related blobs #553
base: main
Are you sure you want to change the base?
Conversation
here are some console outputs after uploading a blob: $ iroh blobs list blobs
pbqupxj2nul4pzli7sy6wuudhl2q44l77p5esbvt6pmezqi4ayaq (256.00 KiB) <- parity blob 1
ps774yacwmkbhwmwaqdtmuru5sf5apco5uiplyfa2tiymwxkizka (256.00 KiB) <- parity blob 2
qgxlv6zva5vgobqkfit6i6unbznxz7g7hjsiv2wjqlcpemr6tqka (256.00 KiB) <- parity blob 3
uga6ypqnqyzpoozzuiznadi5f764dteh54tcekqyo3e6jxfgx2ra (160 B) <- hash_seq
xmcorssflnxtuiocgbecra7xjpo2hvhy4irdb3nd6cu6jxk72g6q (328 B) <- metadata
ytwgmhbmhfbrl7vpxhclxmwjbv3nkyesj4oxcfwnrr2f5o5ok3aa (255.01 KiB) <- original blob $ iroh tags list
"stored-uga6ypqnqyzpoozzuiznadi5f764dteh54tcekqyo3e6jxfgx2ra": uga6ypqnqyzpoozzuiznadi5f764dteh54tcekqyo3e6jxfgx2ra (Raw) $ recall bu query --address 0xff0000000000000000000000000000000000007f
{
"objects": [
{
"key": "cargo",
"value": {
"hash": "uga6ypqnqyzpoozzuiznadi5f764dteh54tcekqyo3e6jxfgx2ra",
"size": 261128,
"metadata": {
"content-type": "application/octet-stream"
}
}
}
],
"common_prefixes": [],
"next_key": null
} $ recall bu get --address 0xff0000000000000000000000000000000000007f cargo > bla.toml
✨ Downloaded object in 0 seconds (hash=vk7api6jjzykojjy6fgyj3phbvo6lfz37z7izfortszssmyo23la; size=261128) $ ll Cargo.lock
-rw-r--r-- 1 islam staff 255K Feb 24 17:22 Cargo.lock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks on the right track... I left some clarifying questions.
@@ -442,16 +442,76 @@ async fn handle_object_upload( | |||
}) | |||
})?; | |||
|
|||
let hash_seq_hash = tag_entangled_data(&iroh, &ent, &metadata_hash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't the entangler tag it's own data? ie, it could return some list of temp tags (UUID / similar) instead of relying on the auto tag and having to scan all tags below? I think we need to avoid any design that does a full scan of tags.
.try_filter_map(|tag| { | ||
let cloned_hashes = hashes.clone(); | ||
async move { | ||
if cloned_hashes.contains(&tag.hash) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the format of these "temp" tags?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are auto tags generated by iroh
@@ -681,6 +731,72 @@ async fn handle_object_download<F: QueryClient + Send + Sync>( | |||
} | |||
} | |||
|
|||
async fn extract_blob_hash_and_size( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there any similar changes we need to make to the SDK to accommodate these changes? maybe not, just double checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends... I showed you some terminal outputs. There is the blob (or object) that is uploaded, but the hash that is displayed is the hash of the wrapping seq hash. The size though is still for a single blob, not for the whole seq hash bundle with parity blobs.
I'm not sure if users should know all internals of the structure of blobs. Let me know if it's fine like this, otherwise I can add whatever we need.
Resolves recallnet/entanglement#25
After uploading a blob and entangling it a new hash sequence is being created that bundles all relevant blob hashes: for original blob, metadata blob and parity blobs.
Instead of returning an original blob's hash we return the hash of the hash sequence.
During the upload process all the blobs are assigned auto tags that are deleted in the end of the upload. The hash sequence is being assigned a "temp-{hash_seq_hash}" tag that will eventually be replaced with "stored-{hash_seq_hash}" by the validator.