-
Notifications
You must be signed in to change notification settings - Fork 948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DHT usage and future work #11
Comments
This is issue is great historic context. But is it still relevant? |
I don't think so. DHT is still being used but not for content discovery, so closing |
Wondertan
added a commit
that referenced
this issue
Aug 23, 2022
Wondertan
added a commit
that referenced
this issue
Aug 23, 2022
Wondertan
added a commit
that referenced
this issue
Sep 2, 2022
Wondertan
added a commit
that referenced
this issue
Sep 20, 2022
Wondertan
added a commit
that referenced
this issue
Sep 20, 2022
distractedm1nd
pushed a commit
to distractedm1nd/celestia-node
that referenced
this issue
Sep 21, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Background
The Data Availability model we use requires data discovery. We rely on IPFS's Kademlia DHT, which basically allows any network participant to find a host for a certain piece of data by its hash.
Usage Description
To describe the way we use it, let's introduce a simple pseudo-code interface for it:
When a block producer creates a block, it saves it and calls
Provide
for every Data Availability root of the block, making it discoverable and afterward available. After, any other node that wants to get the block's data or validate its availability can callFindProviders
, detect the block producer, and finally access the block data through Bitswap. The block producer and block requester also callReprovide
. Overall, with the described flow, we aim for maximum confidence that data of any particular block is always discoverable from peers storing it.What's Left
The current state of the implementation does not conform to the flow above, and these things are left to be done:
Pain Points
Node churn
Records of someone hosting data are stored on peers selected not by their qualities but by the simple XOR metric. Unfortunately, this eventually makes different light clients store those records unreliably, as they are not meant to be full-featured daemons. Therefore, some data may become undiscoverable for some period of time.
Solutions
Providing Time
We need to ensure providing takes less time than the time between two subsequent block proposals by a node. Otherwise, DHT providing wouldn't keep up with block production, creating an evergrowing providing queue. Unfortunately, for the standard DHT client, providing can take up to 3 mins on a large-scale network.
From this also comes a rule - the bigger the committee is, the more time the node has to proceed with providing. So naturally, the larger the network, the larger the committee is, and the larger the providing time, so altogether, these can overlap organically, not causing any issues. But if we still observe slow providing time being an issue, full routing table DHT client for block producer would be a solution as it significantly drops providing time.
Other Possible Improvements
The text was updated successfully, but these errors were encountered: