ADR 006: Block Propagation with Rows #434

Wondertan · 2021-06-24T20:46:13Z

Wondertan · 2021-06-24T20:47:38Z

Note to me: Add references.

liamsi

I think in order to merge this ADR, slightly more details are needed. The main purpose as discussed is to get a good overview of the required changes so we can triage this and make an informed decision if we want to do this now or later (without just yet going through the whole process of implementation, testing, reviews etc).

docs/lazy-adr/adr-006-row-propagation.md

musalbas · 2021-06-24T21:58:01Z

Why only rows, and not columns too?

liamsi · 2021-06-24T22:13:38Z

Why only rows, and not columns too?

It does not make much sense to broadcast both during consensus. It would only increase bandwidth requirements further and introduce more complexity on the sending and receiving end with no real benefits. Or what would those be? The purpose of block data propagation during consensus is only to gossip the data to validators (and tendermint full nodes). If a block proposer withholds the data during consensus, no validator would sign off the block and no light client would ever see that block.

musalbas · 2021-06-24T22:16:17Z

What is the relationship between RowSet and DAHeader? RowSet isn't a commitment to the row data right? Only DAHeader is?

evan-forbes · 2021-06-25T00:43:21Z

This ADR is, ~~for the most part~~, spot on and proposes an elegant solution! In fact, something similar was my preferred solution when we were discussing #423. ~~Unfortunately, it accounts for one of the three scenarios in which we will propagate blocks.~~ edit: this is not true

Besides after a block proposal, blocks also need to be gossiped after a validator goes back online during replay, and the during fast-sync. While the block's types.Data will be gossiped using RowSet, the rest of the block will not, and we will have to gossip those separately. While we could gossip those separately, we'd also have to have some commit to that data, in order to limit the ability of a peer to spam another with garbage data.

We could keep the PartSetHeader for the other scenarios when we need to gossip blocks, but I think that dramatically reduces the simplicity gained by RowSet. The other option would be to use RowSet and encode the other pieces of block into the original data square, but that would take additional time and effort to properly encode and decode that data to the original data square.

edit: the above two paragraphs are not true and can be ignored

There's also the fact that PartSetHeader provides us with a precise and adaptable control over the size of message gossiped. Presumably with RowSet, we would be stuck with a single sized message, depending on the square size.

Despite these things, I still think RowSet or something similar would be an excellent improvement, but I'm not sure that it would be worth the work at the moment, given that it is an optimizaiton. I agree that it would only take 1.5 weeks, but that's if we do not completely get rid of PartSet and the PartSetHeader. If we don't decide to follow through with this solution, it should definitely be added to our todo list.

liamsi · 2021-06-26T07:21:03Z

ref: tendermint/tendermint#7922

Wondertan · 2021-06-26T12:11:25Z

Besides after a block proposal, blocks also need to be gossiped after a validator goes back online during replay, and the during fast-sync. While the block's types.Data will be gossiped using RowSet, the rest of the block will not, and we will have to gossip those separately. While we could gossip those separately, we'd also have to have some commit to that data, in order to limit the ability of a peer to spam another with garbage data.

@evan-forbes, so I double-checked on that and

Fastsync or blockchain_reactor_v0 sends the whole block in a message and not by parts, so there is no interaction with PartSet and thus won't be any with RowSet. Also, I don't see any problem keeping sending the whole serialized block in FastSync mode, so not sure if the addition of the Part/Row concept there is needed in the future.
Regarding Replay. I don't see a problem you've described here as well. Replay does not imply any networking. Replay relies on WAL, which saves messages ahead of executing them to replay them in case of a failure. From the code, replaying relies only on local data which seems enough to restore the block.

There's also the fact that PartSetHeader provides us with a precise and adaptable control over the size of message gossiped. Presumably with RowSet, we would be stuck with a single sized message, depending on the square size.

What does that control give us? My intuition is that dynamic gossiped chunk is better than fixed. The gossiped chunk size dynamicity proportional to block size will affect propagation positively. Also, in such a case, we always get an equal chunk, while in the case of Parts, the last chunk always has unpredictable size. I would say that even for debugging any issues Rows would give us more control and precise info. Just by seeing the block size 128x128 somewhere in the proposal stage, you can immediately understand how many messages would be sent their size.

Wondertan · 2021-06-26T12:21:23Z

What is the relationship between RowSet and DAHeader?

@musalbas, RowSet is a helper structure that wraps DAHeader and tracks received Rows with their integrity against DAHeader and tells its user when the block is complete and/or can be recovered. Mostly it is a helper and is not a high-level concept.

RowSet isn't a commitment to the row data right?

Right

Only DAHeader is?

Yes

evan-forbes · 2021-06-26T20:03:21Z

@Wondertan You're right, I was very mistaken, the only time blocks are gossiped in parts is during proposal. That means that we could implement the full proposed change faster than I originally thought.

What does that control give us? My intuition is that dynamic gossiped chunk is better than fixed. The gossiped chunk size dynamicity proportional to block size will affect propagation positively. Also, in such a case, we always get an equal chunk, while in the case of Parts, the last chunk always has unpredictable size. I would say that even for debugging any issues Rows would give us more control and precise info. Just by seeing the block size 128x128 somewhere in the proposal stage, you can immediately understand how many messages would be sent their size.

While the current implementation uses a constant for the part size, we could easily change that to be dynamic should it be beneficial for some reason. The current implementation would give us more flexibility into the exact size chosen, but I don't have any intuition on whether this would actually be needed or not. As mentioned in an earlier comment, a single row of the max extended square size would be identical to types.BlockPartSizeBytes anyway (64 kB).

docs/lazy-adr/adr-006-row-propagation.md

musalbas · 2021-06-27T11:55:36Z

Since the data we're committing to in the DAHeader is erasure coded, I wonder if can also do erasure coded block gossiping similar to this or this.

This should be possible by extending this ADR so that the extended rows are gossiped too, and once the node receives enough rows, it can reconstruct the entire block by itself. The efficiency comes from the fact that the node needs any n of 2n rows to recover the block, rather than all n rows.

liamsi · 2021-06-27T13:06:42Z

I think the preferred way of moving forward with this:

remove / decouple the PartSetHeader from BlockID (and make BlockID a simpler hash) PartSetHeader will have to be used in consensus messages (mainly Proposal)-> make the BlockID a simple hash (BlockID: replace with header hash + add/use DA header instead of PartsSetHeader #184); similarly the DAHeader should be moved out of the BlockID (in case we merge Add the Data Availability Header to BlockID #312)
in votes (which already assume all data was received) only sign over the header hash and not over the PartSetHeader (which implies that validators correctly recomputed the DAHeader too as the data root in the header commits to the DAHeader) (see also: Codify the decision that validators will download all block data celestia-specs#180 (comment))
fledge out this ADR and start using rows instead of shares (using the original data only)
Write a separate ADR for optimizing block propagation as described in: ADR 006: Block Propagation with Rows #434 (comment) and implement this too

Note that 1. and 2. are independent of 3 and only necessary that the blocks produced and finalized by validators only contain a single commitment to the data in the header (data root in the header commiting to the DAHeader). This is the bare minimum changes we should definitely do before launch! Decoupling the implementation detail on how data is gossiped from the BlockID in the header should actually make the implementation of 3. a bit easier to achieve. (also we should consider upstreaming this as the PartSetHeader should not be in the vanilla tendermint header too).

Then, 4. is a bit more involved and while it certainly is the most beautiful and clean approach. Also, we might need to go down the route of 4. anyways in case the naive block propagation in tendermint does not work well with really large blocks. Similarly, this could also be upstreamed would the implementation turn out to be either of similar performance (but more robust in the light of malicious peers/network failures) or faster even (and still more robust). 3. and 4. could ofc also combined into one ADR instead but this certainly requires much more details.

Does this make sense to everyone involved?

1. what it would take to remove the PartSetHeader from BlockID 2. keep gossiping as is vs change gossiping Conclusion 1. could be finished in a few days (really): most time will be spent fixing tests. It will create a lot of changes but if carefully committed, it should still be reviewable. Need clarity if we should remove the DAHeader from the Proposal until we switch to gossiping e.g. to rows / erasured data etc. ref #434 #184)

Wondertan · 2021-06-27T19:05:33Z

This should be possible by extending this ADR so that the extended rows are gossiped too, and once the node receives enough rows, it can reconstruct the entire block by itself. The efficiency comes from the fact that the node needs any n of 2n rows to recover the block, rather than all n rows.

@musalbas, Yeah, but that would also require us to send a whole extended sqaure, while recovery would happen by rows only(4x more data, but only 1D recovery). I agree that we should go ideally with the erasure coding approach, but to make this fully efficient for our 2D case, we should also repair by columns, not just by rows, thus we need share-level gossiping. Share level gossiping was discussed here, but we are currently not interested in that as not "pragmatic".

I don't think that implementing share-level gossiping using tm's gossiping would be a good idea both now and in general, but just rows with actual data will fit ideally there instead of existing Parts, so I would keep this ADR focused on that and live without erasure coding in consensus unless we decide to rework consensus profoundly(If I had a choice to do so, I would have done it earlier than later).

musalbas · 2021-06-27T22:16:38Z

You don't need to gossip the entire extended square, just the first half of all the rows (i.e. a vertical rectangle). This is sufficient to recover all the original rows. I'm not talking about share gossiping, only row gossiping. I think share gossiping is unnecessary as the shares are too small to be gossiped individually.

Wondertan · 2021-06-28T07:41:12Z

This should be possible by extending this ADR so that the extended rows are gossiped too, and once the node receives enough rows, it can reconstruct the entire block by itself.

@MusAlba, so in the case of a vertical rectangle, we don't need to send extended rows anymore and only columns, right?

Wondertan · 2021-06-28T07:42:57Z

I think share gossiping is unnecessary as the shares are too small to be gossiped individually.

Can you elaborate? What’s the problem with gossiping smaller chunks?

Currently, we pull request shares(in storage and DA node cases), and if comparing with gossiping those, the latter seems a much better approach, because 256 bytes are very little for one request of data.

Also, doesn't share-level gossiping establish Data Availability properties for gossiping?

liamsi · 2021-06-28T09:35:44Z

so in the case of a vertical rectangle, we don't need to send extended rows anymore and only columns, right?

Let's look at the extended square:

 ------- -------
|       |       |
|   O   |  E_1  |
|       |       |
 ------- -------
|       |       |
|  E_2  |  E_3  |
|       |       |
 ------- -------

What @musalbas is suggesting is to gossip

 ------- 
|       | 
|   O   |
|       |  
 ------- 
|       |  
|  E_2  |
|       |  
 -------

during consensus (and row by row).

Alternatively, this would also work:

 ------- -------
|       |       |
|   O   |  E_1  |
|       |       |
 ------- -------

but the downside that (at least with the current rsmt2d implementation) you'd have to erasure code twice to get the full square (O to E_2 and then E_2 to E_3). The beauty of sending that vertical rectangle is that you can simply extend "to the right" without treating original row data or parity data differently. Each row can be handled separately (in parallel).

liamsi · 2021-06-28T09:44:07Z

Can you elaborate? What’s the problem with gossiping smaller chunks?

The smaller the chunks, the more we create just overhead (on the network level as well as computational overhead on the receiving end). There is likely a sweet spot for chunk size. It seems like the 64KB perform well in practice for instance. IMO we should better understand the tradeoffs between fixed-size chunks and rows (dynamic sized) if we decide to change the gossiping mechanism (and before the implementation).

Wondertan · 2021-07-21T09:22:15Z

Then we certainly would want them to be sent with a merkle proof (like tendermint does with Parts)

@liamsi, I understand your argument, and in case sending Merkle proof is needed, then share-level gossiping might be not a good idea. However, your message states sending Merkle proofs along as a certainty, while it is not clear why we need that in the first place. The fact that Tendermint sends those with parts, does mean that this applies to us in the same way. DAHeader of any form should be sufficient to validate Rows at least. Note that I do not have a strong opinion about share-level gossiping in general, while I do have a strong opinion that in scalable p2p networks every node/peer should care only about the data of its own interest and the share-level gossiping model fits in here best. And that's not only about DAS validators, but the entire network.

P.S. considerations like this should be part of the ADR process. If someone is proposing a change it is their task to convince everyone of why the change is good idea. Not the other way around.

Totally agree, but share-level gossiping is not part of this ADR and is just a side conversation, not a proposal.

I think we can close the share-level gossiping discussion for now, as it is not a priority, mostly 'wonderting'. Though we still need to go on with this ADR and as discussed, we can wait for experiments to go on implementation and then look at the results. Also, you've mentioned Merkle Proofs that I removed from gossiping in this ADR. In the case of DAHeader, they are not needed and provide us with additional bandwidth savings. What's your opinion on that?

liamsi · 2021-07-23T13:18:57Z

However, your message states sending Merkle proofs along as a certainty, while it is not clear why we need that in the first place.

Yeah, that is a good point. Sending along the proofs with the shares is a mechanism to detect as early as possible if someone is sending you garbage (while sacrificing some bandwidth). IMO, it only makes sense in the case of larger block sizes (which is not the case in many / most tendermint-based applications). In case of large blocks, you would not want to download a lot of data until you know if someone is sending you made-up data.

docs/lazy-adr/adr-006-row-propagation.md

liamsi

Some minor comments. Otherwise looks good.

docs/lazy-adr/adr-006-row-propagation.md

liamsi

One last thing: the lazy-adr directory was renamed to just adr. Can you rebase your changes on the lates master (or merge in master) and move adr 006 into the correct directory as well?

Wondertan · 2021-08-18T11:44:03Z

Rebasing should work, ok

Co-authored-by: Ismail Khoffi <[email protected]>

Co-authored-by: Evan Forbes <[email protected]>

Co-authored-by: Ismail Khoffi <[email protected]>

liamsi

Thanks for your patience @Wondertan. I feel the discussions in this PR as well as the ADR itself will become relevant again in the future 👍🏼

Wondertan · 2021-08-18T12:01:28Z

Me either 🚀

Follow-up to #427. The workflow in #427 is merely informational, which is helpful, but can be bypassed. It's probably best to enforce this check when we push tags. If the check fails, it will prevent the cutting of a release, meaning we'll have to delete the tag, fix the version number, and tag again. --- #### PR checklist - [ ] Tests written/updated - [ ] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [ ] Updated relevant documentation (`docs/` or `spec/`) and code comments (cherry picked from commit 0d3c2f3) Co-authored-by: Thane Thomson <[email protected]>

Wondertan requested review from adlerjohn, liamsi and evan-forbes June 24, 2021 20:46

Wondertan self-assigned this Jun 24, 2021

Wondertan requested a review from musalbas as a code owner June 24, 2021 20:46

liamsi requested changes Jun 24, 2021

View reviewed changes

This was referenced Jun 25, 2021

DAS nodes (tracking issue) #381

Closed

Revert writing to local dag during proposal #436

Closed

evan-forbes mentioned this pull request Jun 25, 2021

Investigate time to consensus for really large block sizes #438

Closed

3 tasks

liamsi reviewed Jun 27, 2021

View reviewed changes

docs/lazy-adr/adr-006-row-propagation.md Outdated Show resolved Hide resolved

liamsi reviewed Jun 27, 2021

View reviewed changes

docs/lazy-adr/adr-006-row-propagation.md Outdated Show resolved Hide resolved

liamsi reviewed Jun 27, 2021

View reviewed changes

docs/lazy-adr/adr-006-row-propagation.md Outdated Show resolved Hide resolved

liamsi reviewed Aug 3, 2021

View reviewed changes

docs/lazy-adr/adr-006-row-propagation.md Outdated Show resolved Hide resolved

liamsi mentioned this pull request Aug 16, 2021

ADR: Block Data Propagation #389

Closed

Wondertan mentioned this pull request Aug 17, 2021

ipld: consider moving into separate repo #296

Closed

3 tasks

liamsi mentioned this pull request Aug 17, 2021

Revert writing to IPLD during defaultDecideProposal #493

Merged

Wondertan requested a review from liamsi August 17, 2021 15:11

liamsi reviewed Aug 17, 2021

View reviewed changes

docs/lazy-adr/adr-006-row-propagation.md Outdated Show resolved Hide resolved

docs/lazy-adr/adr-006-row-propagation.md Outdated Show resolved Hide resolved

docs/lazy-adr/adr-006-row-propagation.md Outdated Show resolved Hide resolved

Wondertan requested a review from liamsi August 18, 2021 11:39

liamsi requested changes Aug 18, 2021

View reviewed changes

Wondertan and others added 11 commits August 18, 2021 14:53

adr-006: initial description for Block Propagation with Rows ADR

504236f

Update docs/lazy-adr/adr-006-row-propagation.md

5b27ba9

Co-authored-by: Ismail Khoffi <[email protected]>

Update docs/lazy-adr/adr-006-row-propagation.md

7089cba

Co-authored-by: Ismail Khoffi <[email protected]>

adr-006: add more desing details and lesser subjectism of an Author

60d7cdc

Update docs/lazy-adr/adr-006-row-propagation.md

5435634

Co-authored-by: Evan Forbes <[email protected]>

Update docs/lazy-adr/adr-006-row-propagation.md

fb7e1fb

Co-authored-by: Evan Forbes <[email protected]>

Update docs/lazy-adr/adr-006-row-propagation.md

3cd262d

Co-authored-by: Evan Forbes <[email protected]>

Add more alternative approaches. Thanks @liamsi.

c06edf3

Update docs/lazy-adr/adr-006-row-propagation.md

c623496

Co-authored-by: Ismail Khoffi <[email protected]>

Update docs/lazy-adr/adr-006-row-propagation.md

5dd62b8

Co-authored-by: Ismail Khoffi <[email protected]>

Update docs/lazy-adr/adr-006-row-propagation.md

79385cc

Co-authored-by: Ismail Khoffi <[email protected]>

Wondertan force-pushed the hlib/row-propagation-adr branch from 5dbcf3a to 79385cc Compare August 18, 2021 11:55

Wondertan requested a review from liamsi August 18, 2021 11:55

liamsi approved these changes Aug 18, 2021

View reviewed changes

Wondertan merged commit f2a8e50 into master Aug 18, 2021

Wondertan deleted the hlib/row-propagation-adr branch August 18, 2021 12:01

evan-forbes mentioned this pull request Dec 7, 2023

Change the PartSetHeader to allow for committing to arbitrary part schemes #1147

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ADR 006: Block Propagation with Rows #434

ADR 006: Block Propagation with Rows #434

Wondertan commented Jun 24, 2021 •

edited

Loading

Wondertan commented Jun 24, 2021

liamsi left a comment •

edited

Loading

musalbas commented Jun 24, 2021

liamsi commented Jun 24, 2021

musalbas commented Jun 24, 2021

evan-forbes commented Jun 25, 2021 •

edited

Loading

liamsi commented Jun 26, 2021

Wondertan commented Jun 26, 2021 •

edited

Loading

Wondertan commented Jun 26, 2021 •

edited

Loading

evan-forbes commented Jun 26, 2021 •

edited

Loading

musalbas commented Jun 27, 2021 •

edited

Loading

liamsi commented Jun 27, 2021 •

edited

Loading

Wondertan commented Jun 27, 2021 •

edited

Loading

musalbas commented Jun 27, 2021 •

edited

Loading

Wondertan commented Jun 28, 2021

Wondertan commented Jun 28, 2021 •

edited

Loading

liamsi commented Jun 28, 2021 •

edited

Loading

liamsi commented Jun 28, 2021

Wondertan commented Jul 21, 2021 •

edited

Loading

liamsi commented Jul 23, 2021

liamsi left a comment

liamsi left a comment

Wondertan commented Aug 18, 2021

liamsi left a comment •

edited

Loading

Wondertan commented Aug 18, 2021

ADR 006: Block Propagation with Rows #434

ADR 006: Block Propagation with Rows #434

Conversation

Wondertan commented Jun 24, 2021 • edited Loading

Wondertan commented Jun 24, 2021

liamsi left a comment • edited Loading

Choose a reason for hiding this comment

musalbas commented Jun 24, 2021

liamsi commented Jun 24, 2021

musalbas commented Jun 24, 2021

evan-forbes commented Jun 25, 2021 • edited Loading

liamsi commented Jun 26, 2021

Wondertan commented Jun 26, 2021 • edited Loading

Wondertan commented Jun 26, 2021 • edited Loading

evan-forbes commented Jun 26, 2021 • edited Loading

musalbas commented Jun 27, 2021 • edited Loading

liamsi commented Jun 27, 2021 • edited Loading

Wondertan commented Jun 27, 2021 • edited Loading

musalbas commented Jun 27, 2021 • edited Loading

Wondertan commented Jun 28, 2021

Wondertan commented Jun 28, 2021 • edited Loading

liamsi commented Jun 28, 2021 • edited Loading

liamsi commented Jun 28, 2021

Wondertan commented Jul 21, 2021 • edited Loading

liamsi commented Jul 23, 2021

liamsi left a comment

Choose a reason for hiding this comment

liamsi left a comment

Choose a reason for hiding this comment

Wondertan commented Aug 18, 2021

liamsi left a comment • edited Loading

Choose a reason for hiding this comment

Wondertan commented Aug 18, 2021

Wondertan commented Jun 24, 2021 •

edited

Loading

liamsi left a comment •

edited

Loading

evan-forbes commented Jun 25, 2021 •

edited

Loading

Wondertan commented Jun 26, 2021 •

edited

Loading

Wondertan commented Jun 26, 2021 •

edited

Loading

evan-forbes commented Jun 26, 2021 •

edited

Loading

musalbas commented Jun 27, 2021 •

edited

Loading

liamsi commented Jun 27, 2021 •

edited

Loading

Wondertan commented Jun 27, 2021 •

edited

Loading

musalbas commented Jun 27, 2021 •

edited

Loading

Wondertan commented Jun 28, 2021 •

edited

Loading

liamsi commented Jun 28, 2021 •

edited

Loading

Wondertan commented Jul 21, 2021 •

edited

Loading

liamsi left a comment •

edited

Loading