Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inline CIDs #2320

Open
Stebalien opened this issue Jul 8, 2020 · 5 comments
Open

Inline CIDs #2320

Stebalien opened this issue Jul 8, 2020 · 5 comments
Labels
area/ux Area: UX impact/consensus Impact: Consensus kind/enhancement Kind: Enhancement need/team-input Hint: Needs Team Input

Comments

@Stebalien
Copy link
Member

@Kubuxu took a block size:

Histogram of value sizes (in bytes)
Total count: 34788050
Min value: 1
Max value: 49371
Mean: 765.27
                   Range     Count
[         0,          2)         1
[         2,          4)         1
[         4,          8)        34
[         8,         16)    476039
[        16,         32)   2369324
[        32,         64)   7740959
[        64,        128)   3360095
[       128,        256)   3710247
[       256,        512)   6946622
[       512,       1024)   1477828
[      1024,       2048)   4482532
[      2048,       4096)   3761116
[      4096,       8192)    445006
[      8192,      16384)     18242
[     16384,      32768)         3
[     32768,      65536)         1

Given this, inlining small blocks into CIDs using the identity hash function would save at least 12% of disk space (probably more because these CIDs would often be smaller).

It would also save us from having to write/read all these small objects. Unfortunately, we don't have an access histogram.

Here's an auto-inlining CID builder: https://github.com/ipfs/go-cidutil/blob/master/inline.go

The tricky part is how to wire this in. Ideally, we'd expose the CID builder on the runtime and use it internally inside the CBOR store. Unfortunately, we have some objects that expose a Cid() function to create their own CID.

The best reasonable solution is to:

  1. Have some common package (e.g., the specs-actors?) export a common CIDBuilder.
  2. Have cbor.NewCborStore take a CIDBuilder in the constructor.
@vmx
Copy link
Contributor

vmx commented Jul 9, 2020

That might not be the perfect place to bring it up, but it's so related. As I've been working on the Rust implementation of Multihash, it came up that the identity hash currently doesn't specify any limits. From an optimization perspective (this is why it came up in Rust), but also from a security perspective I think it would make sense to specify an upper bound for its size.

I personally would take a quite low limit which is similar to what current hash functions have as length. So perhaps something around 64 bytes?

@ribasushi
Copy link
Collaborator

ribasushi commented Jul 9, 2020

( we should probably take this into a separate issue )
@vmx there are definitely deployments out there today ( i.e. peergos ) using ~2k inlined CIDs. Generally any data that you know won't ever be repeated is a good candidate for inlining. An upper limit already exists: the limit of a network block itself ( 1MiB soft, 2MiB-1 hard ). 64b is most definitely arbitrary and I'd be very sad if we adopt that.

@vmx
Copy link
Contributor

vmx commented Jul 9, 2020

I don't want to derail this issue, hence I openend multiformats/multihash#130 (I should've from the start, sorry).

@arajasek
Copy link
Contributor

arajasek commented Nov 5, 2020

Closed by #2568, I think?

@Stebalien
Copy link
Member Author

No. That paved the way to support this feature, but we still don't actually inline small blocks into CIDs.

@magik6k magik6k added the impact/consensus Impact: Consensus label Aug 31, 2021
@TippyFlitsUK TippyFlitsUK added kind/enhancement Kind: Enhancement need/team-input Hint: Needs Team Input area/ux Area: UX labels Mar 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ux Area: UX impact/consensus Impact: Consensus kind/enhancement Kind: Enhancement need/team-input Hint: Needs Team Input
Projects
None yet
Development

No branches or pull requests

6 participants