-
Notifications
You must be signed in to change notification settings - Fork 292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Namespaced Merkle Tree #22
Comments
In the prototype, the namespaced Merkle tree was implemented by using a standard Merkle tree library, but with a different hash function. In practice I have found this to result in the need for hacks, particularly because I had to make the hash function stateful so that it knows when its hashing an erasure code parity share. This is because in the NMT all the erasure code parity shares should go in a reserved namespaced at the very end of the NMT (all bits set to 1). However, the hash function cannot determine by itself if a piece of data is a parity share or not, because parity shares in theory look like random data. It's worth thinking about the cleanest way to implement the NMT. For example, the first byte of each share could be a prefix that signifies if the share is a message share or a parity share. |
I was wondering if we couldn't make the hash function understand from (some prefix of) the data if it is hashing an erasure code parity share or not. It's still a bit hacky but then the hasher wouldn't need to be stateful. Didn't think this through yet. But this sounds very similar:
|
Revisiting this issue with the evolution of the spec in mind:
We reserve a namespace ID for parity shares as per: Hence, the simplest way is similar to the prototype but let the hash function decide via the prefix: As the underlying merkle tree, we could also use nebolouslabs' implementation, or change tendermint's implementation (which basically implements the tree of rfc6962) to accept a different hasher. The latter is probably better on the long run as we can upstream it back to tendermint. |
Thinking about it further, it might be worth to either implement our own merkle tree, or, use one that allows to plug in the underlying hasher in a slightly more fine-grained way, than the nebulousLabs tree: The tree should allow to pass in a "hasher" that enables us to hash leafs and internal nodes differently. Not only is this needed for prefixing leafs / inner nodes differently but also the prefixing with the proper namespaces. As a go interface, this could look like type TreeHasher interface {
HashLeaf(leaf []byte) []byte
HashInner(left, right []byte) []byte
} An implementation of this could take in an unmodified (plain) hasher and would need a way to extract the namespace for the value (message) that it wants to hash (for leaf nodes) and compute the n_min / n_max for inner nodes (https://github.com/lazyledger/lazyledger-specs/blob/master/specs/data_structures.md#namespace-merkle-tree). |
The namespace ID isn't in the value that's being hashed though, so the hash function still has no way to differentiate. The namespace ID is something that the hash function is responsible for computing - it's not something that's an input to the hash function.
Leaf and internal nodes are prefixed and hashed differently in the NebulousLabs library - see https://gitlab.com/NebulousLabs/merkletree/-/blob/master/tree.go#L61. |
Wait, if I submit a message for my application, aren't I supposed to submit it with the right namespace ID (which I need to know upfront)?
I know, I didn't mention nebulousLabs in that sentence but:
|
Yes but the namespace ID should be part of the transaction data. We assume a function ns(m) that takes as input a message and returns its namespace ID. The hash function being able to compute the namespaces from the leaf values is important to ensure than the tree being ordered correctly is consensus-critical. I suppose we can modify parity shares in some way such that ns(m) returns the parity shares namespace, e.g. by prefixing the parity shares, which is what I think you're suggesting. However, that would make the size of the leaves in the parity shares greater than the non-parity shares, so that's something that we may need to think about.
I think you can achieve the same effect in the NebulousLabs library by simply passing the tree a custom hasher. The library prefixes leafs and nodes differently, so the hash function can act accordingly. That's what I do in the prototype. |
I think the point I was trying to make is that it could be cleaner if the merkle tree library we use (or write) would allow to pass in a Maybe, I should just integrate a first version with nebolousLabs library as you did in the prototype and one variant with as vaguely described by me. Then we can decide what makes the most sense. |
@adlerjohn: sounds like this needs to go into the Tx data in the spec, too? It's currently only part of the message data.
If the namespace ID is embedded in the message struct like currently in the spec, it would just return the |
Updated the description of the issue to reflect celestiaorg/celestia-specs#45 (comment). I'd propose to define a minimalistic interface for the NMT such that the internal actual implementation can easily be swapped out by anything that adheres to that interface (see opening comment). |
closing in favour of #58 The NMT itself has been implemented in https://github.com/lazyledger/nmt. While the API of the NMT might still undergo minimal changes / iterations (e.g. replacing the underlying merkle tree lib), it is feature complete in the sense that it can be integrated in here. |
Description
Currently tendermint uses simple merkle tree to compute "data hash" (what we currently call
availableDataRoot
in the spec).LazyLedger uses a Namespaced Merkle Tree instead. Also the leafs of that aren't just the Tx (as currently in tendermint) but everything under
Block.Data
extended through erasure erasure coding and arranged in a 2-dim Matrix. There are 2 such trees used (one for the row- and one for column-data).This issue tracks the NMT implementation only.
Copy and pasting a simple example of a namespaced merkle tree from https://arxiv.org/abs/1905.09274
Proposal
EDIT:
Integrate a tree that implements the following interface:
leaf data
Different than the example (Fig. 2) above, we hash leafs as
(i)
nsid||nsid||hash(leafPrefix||leaf)
, whereleaf = rawData
as opposed to (in Figure 2):
(ii)
nsid||nsid||hash(leafPrefix||leaf)
, whereleaf = nid || rawData
Marked this as a "good first issue", because it should not be too difficult to integrate this into tendermint (replace the current use of
SimpleHashFromByteSlices
with the NMT forBlock.Data
. Also, the main logic is just a slightly modified hasher. An implementation that could easily changed to implement aboveinterface
and only needs to update what goes into the leaf (to adhere to (i)) can be found here.The text was updated successfully, but these errors were encountered: