Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data availability fields in block header #16

Closed
liamsi opened this issue Apr 16, 2020 · 11 comments · Fixed by #17
Closed

Data availability fields in block header #16

liamsi opened this issue Apr 16, 2020 · 11 comments · Fixed by #17

Comments

@liamsi
Copy link
Member

liamsi commented Apr 16, 2020

This issue drafts changes of a first iteration of modifying the block header to support data availability proofs.

Basically be a variant of what we spec'ed out in https://github.com/LazyLedger/lazyledger-specs:
with as little changes as possible to tendermint core.

As a side note, we should track and document theses decisions and changes in the repository itself. We can track them in the code directly via so called ADRs (we used these at tendermint: https://github.com/tendermint/tendermint/tree/master/docs/architecture).

Note that the spec in it's current form expects immediate execution (ref: #3) which doesn't really affect this issue besides that the mentioned intermediate state roots included in block of height h+1 reflects the state induced by transactions included in block of height h (as described in #3 (comment)).

@liamsi
Copy link
Member Author

liamsi commented Apr 16, 2020

Status quo

Block and Header

Current Block and Header:
https://github.com/LazyLedger/lazyledger-core/blob/da745371227f54aa90c609845cd4cc2f36a152f1/types/block.go#L38-L44
https://github.com/LazyLedger/lazyledger-core/blob/da745371227f54aa90c609845cd4cc2f36a152f1/types/block.go#L323-L348

We can use the existing Data field for what is called availableData in the current LazyLedger spec.

Current Data type:
https://github.com/LazyLedger/lazyledger-core/blob/da745371227f54aa90c609845cd4cc2f36a152f1/types/block.go#L809-L819

Proposed changes:

Block

Changes to the Block are simple. Basically, move over Evidence field into Data as it needs availability too. And add the availableDataHeader:

type Block struct {
    Header
    AvailableDataHeader [][]byte
    Data
    LastCommit *Commit
}

Header

The Header does not need to change at all 🎉 (for now at least)

There are some fields that can be removed if we switch to immediate execution.
As this proposal tries to keep the changes as small and contained as possible,
we will keep the field names as well as some fields used in the current
tendermint (light client) verification logic. With the same reasoning we keep EvidenceHash
in the Header.

I added NOTEs where this deviates from the spec (cc @adlerjohn).

type Header struct {
    Version version.Consensus
    ChainID string
    Height  int64
    Time    time.Time

    LastBlockID BlockID

    LastCommitHash tmbytes.HexBytes
    // NOTE: becomes available Data root
    // (not renaming for now)
    DataHash

    // NOTE: keep the current hashes / pointers
    // to keep the current light client logic intact
    // (will address removal in separate issues):
    ValidatorsHash     tmbytes.HexBytes       // validators for the current block
    NextValidatorsHash tmbytes.HexBytes       // validators for the next block
    ConsensusHash      tmbytes.HexBytes       // consensus params for current block
    // NOTE: different from the current spec!
    // this is not the `stateCommitment` for the
    // Tx in this block but for the Tx of the previous block:
    AppHash            tmbytes.HexBytes           // state after txs from the previous block
    // NOTE: this field is part of the deferred
    // execution model
    // (DeliverTx results on the previous blocks merkelized)
    LastResultsHash tmbytes.HexBytes

    // NOTE: this is kinda duplicate now because the `availableDataRoot`
    //basically includes a commitment to the evidence data
    // (will address removal in a separate issue)
    EvidenceHash    tmbytes.HexBytes     // evidence included in the block
    ProposerAddress Address              // original proposer of the block
}

Data

Data will contain everything that requires data availability.
Adding fields does not cause any trouble with the current logic.
The semantic of that hash changes though (currently merkle root of all fields).

Moving over evidence should be a simple change (and could be done separately
later).

The newly introduced IntermediateStateRoots can probably be computed using
the ResponseDeliverTx.Data without changing the current logic inside tendermint at all.

Note: The only thing I'm not sure yet is how to handle the separation of
"Transactions" and "Message" in the spec's sense. Maybe we can also include this
in ResponseDeliverTx.Data: executing a Tx returns the intermediate state root
associated with that Tx as well as the corresponding message.

Maybe we can include this in the response of checkTx in ResponseCheckTx.Data (DeliverTx would be one off again). Note that tendermint does not know of the distinction we make between Transaction and Message. The way they would be sent to a node would be in a single (network) message / (tendermint) Tx. Somewhere these need to be split up to be encoded in the block as described in the spec. At least if we don't want to add another sperate pool for messages and a separate reactor.

For the other fields in data we can hard code namespace ids.

type Data struct {
    // Txs that will be applied by state @ block.Height+1.
    // NOTE: not all txs here are valid.  We're just agreeing on the order first.
    // This means that block.AppHash does not include these txs.
    Txs Txs

    // Note: As Tendermint does not execute the
    // Tx itself (the app does), we need a way to
    // bring compute these intermediate state roots.
    // Simply using the `Data` field in `ResponseDeliverTx`
    // for this looks promising.
    IntermediateStateRoots []tmbytes.HexBytes

    // NOTE: moved over from Block
    Evidence   EvidenceData

    // Note: Tendermint Tx are just blobs. Tendermint core itself does
    // not understand how to interpret a Tx or it's validity.
    // That is part of the App logic.
    //
    // Maybe we do not need a separate message field
    // and can leverage the generality of Tx in
    // tendermint instead?
    Messages   []Message // TODO: This needs more thought!

    // Volatile
    hash tmbytes.HexBytes
}

Message (new struct)

type Message struct {
    // TODO: figure out constrains and
    // introduce dedicated type instead of just []byte
    // namespace id is necessary as this dictates where
    // in the namespace merkle tree this will go.
    NamespaceID []byte
    Data        []byte
}

@adlerjohn
Copy link
Member

This seems reasonable. The availableDataHeader (which contains a matrix of commitments to the data in RSMT2D form) looks like it's missing from the block though.

Note: The only thing I'm not sure yet is how to handle the separation of
"Transactions" and "Message" in the spec's sense.

For how they're encoded in the commitment(s), I was thinking reserve different namespace IDs for 1) transactions 2) intermediate state roots 3) evidence. Then Messages can have any other namespace ID.

For how they're handled at the networking layer, yeah you're right it looks like it'll take a bit of effort to make changes required to pass them around as a single blob then split them up. But shouldn't be unreasonably difficult (I hope!).

@liamsi
Copy link
Member Author

liamsi commented Apr 17, 2020

The availableDataHeader (which contains a matrix of commitments to the data in RSMT2D form) looks like it's missing from the block though.

Ah, thanks. Good catch!

For how they're handled at the networking layer, yeah you're right it looks like it'll take a bit of effort to make changes required to pass them around as a single blob then split them up. But shouldn't be unreasonably difficult (I hope!).

Exactly, not yet sure how to handle this part. I'm confident we can use checkTx for that. Haven't thought that through yet though.

@liamsi
Copy link
Member Author

liamsi commented Apr 17, 2020

Regarding the availableDataHeader: does it matter to you, if the underlying data type be []byte like in the spec or [][]byte for the row/colums semantics? We should align the spec here too (but I guess that was on your radar anyways).

@adlerjohn
Copy link
Member

adlerjohn commented Apr 17, 2020

does it matter to you, if the underlying data type be []byte like in the spec or [][]byte for the row/columns semantics?

Is there any inherent downside to using a 2D matrix representation instead of a 1D array? I guess with the 2D matrix you'd need to specify two indices all the time. It would also be more work to Merkleize a matrix instead of an array (maybe not, if the iterator of the matrix is implemented nicely). Given that, I'd lean towards using a 1D array, and just doing pointer arithmetic to map {row, col} <-> {index}. Thoughts?

@liamsi
Copy link
Member Author

liamsi commented Apr 17, 2020

It would also be more work to Merkleize a matrix instead of an array (maybe not, if the iterator of the matrix is implemented nicely)

Not really. An iterator as you mentioned is kinda equivalent to that (row, col) -> index bijection you mentioned. I'm leaning towards 2-dim (for the encoding) and defining a type AvailableDataHeader that hides away that detail (in the implementation) and returns an iterator / ordering and methods ( RowRoot(i), ColumnRoot(j)` or whatever they need to be accessed). In the end it doesn't matter too much.

@adlerjohn
Copy link
Member

Fair enough. Will there be any serialization issue with matrices? If not then 2D is fine. Will be easier to reason about too.

@liamsi
Copy link
Member Author

liamsi commented Apr 17, 2020

Will there be any serialization issue with matrices?

Not that I'm aware of. For binary/proto seriliazation this will be:

message  Block {
    // [...]
    repeated bytes AvailableDataHeader = 5; // or whatever field number
}

Or we can flatten the matrix for serialization (and have it only bytes) basically the way you described.

@liamsi
Copy link
Member Author

liamsi commented Apr 17, 2020

Updated above

@adlerjohn
Copy link
Member

Ugh, it looks like protobuf doesn't have native support for multidimensional arrays. So it would have to be encoded as a 1D array for serialization. It doesn't actually matter how the Go client stores this internally, so long as it serializes it properly.

@liamsi
Copy link
Member Author

liamsi commented Apr 17, 2020

It doesn't actually matter how the Go client stores this internally, so long as it serializes it properly.

True, still would like the spec and the code match as much as possible (where it makes sense).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants