Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADR 006: Non-interactive Defaults, Wrapped Txs, and Subtree Root Message Inclusion Checks #673

Merged
merged 41 commits into from
Oct 3, 2022

Conversation

evan-forbes
Copy link
Member

@evan-forbes evan-forbes commented Aug 30, 2022

ADR 006

Non-interactive Defaults, Wrapped Txs, and Subtree Root Message Inclusion Checks

rendered

closes #598

@evan-forbes evan-forbes added documentation Improvements or additions to documentation discussion inherently unactionable issue for the sole purpose of discussion labels Aug 30, 2022
@evan-forbes evan-forbes self-assigned this Aug 30, 2022
Copy link
Collaborator

@rootulp rootulp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am SO happy about these diagrams b/c they're really helping me develop an intuition for non-interactive defaults. Thanks for creating them!

docs/architecture/ADR-003-Non-interactive-defaults.md Outdated Show resolved Hide resolved
docs/architecture/ADR-003-Non-interactive-defaults.md Outdated Show resolved Hide resolved


- Messages begin at a location aligned with the largest power of 2 that is not larger than the message length or k.
- If the largest power of 2 of a given message spans multiple rows it must begin at the start of a row (this can occur if a message is longer than k shares or if the block producer decides to start a message partway through a row and it cannot fit).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question] I'm confused by this rule's application in example-full-block.png for ns=2. ns=2 is a message that spans 7 shares. The largest power of 2 less than the message length (7) or k (8) is 4. The message is aligned with index 4.

If the largest power of 2 of a given message spans multiple rows

The largest power of 2 for this message (4) doesn't span multiple rows but this message does because the block producer decided to start this message partway through the row and it cannot fit. If ns=2 is indeed supposed to be split across rows, then should this read:

Suggested change
- If the largest power of 2 of a given message spans multiple rows it must begin at the start of a row (this can occur if a message is longer than k shares or if the block producer decides to start a message partway through a row and it cannot fit).
- If the largest power of 2 of a given message is larger than k then it must begin at the start of a row.

If ns=2 is supposed to start at a new row, I think the current language makes sense.

example-full-block

example-full-block

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still thinking about this, but I think the first rule might be the only one we need. The we can just go into more detail of a what an "aligned power of two" actually means

@evan-forbes evan-forbes marked this pull request as ready for review September 1, 2022 03:40
docs/architecture/ADR-003-Non-interactive-defaults.md Outdated Show resolved Hide resolved
docs/architecture/ADR-003-Non-interactive-defaults.md Outdated Show resolved Hide resolved
Comment on lines 21 to 25
> **Messages must begin at a location aligned with the largest power of 2 that is not larger than the message length or k.**

![Subtree root commitment](./assets/subtree-root.png "Subtree Root based commitments")

> **If the messages are larger than k, then they must start on a new row.**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[optional] I took a stab at rephrasing these rules so that I could grok them better. Defer to author if this is clearer to others

Suggested change
> **Messages must begin at a location aligned with the largest power of 2 that is not larger than the message length or k.**
![Subtree root commitment](./assets/subtree-root.png "Subtree Root based commitments")
> **If the messages are larger than k, then they must start on a new row.**
> **If a message length is less than or equal to k, then it must begin at a location aligned with the largest power of 2 that is less than or equal to the message length.**
![Subtree root commitment](./assets/subtree-root.png "Subtree Root based commitments")
> **If the message is larger than k, then it must start on a new row.**

Copy link
Member Author

@evan-forbes evan-forbes Sep 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm I'll defer to @adlerjohn as they were the original author. I tried to stick to as close to what is written in the specs

Messages begin at a location aligned with the largest power of 2 that is not larger than the message length or k.

the rule was mainly just modified to be first, as I felt it was more important since we're only following a portion of the other rule now

Messages that span multiple rows must begin at the start of a row (this can occur if a message is longer than k shares or if the block producer decides to start a message partway through a row and it cannot fit).

we're not following the "if the block producer decides to start a message partway through a row and it cannot fit" part, as it doesn't make a different to the commitment provided that it follows the other rule

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but we should probably be more explicit about that in the ADR come to think of it

Copy link
Member

@rach-id rach-id Sep 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the following diagram is correct: https://github.com/celestiaorg/celestia-app/raw/27ee1dac22be7498ebfb87be85c7cd4718a4d56e/docs/architecture/assets/example-full-block.png
Then, it is strictly less than the largest power of 2 that is less than or equal to the message length : ns = 4
Edit: the ns = 4 is put straight after ns = 3. If we go by the largest power of 2 that is lower than 2, then it's 1. I guess the diagrams will need to be updated and we need to decide on:

  • Case of 2
  • Whether the power is strictly less than the length, or, less or equal.

Copy link
Member Author

@evan-forbes evan-forbes Sep 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether the power is strictly less than the length, or, less or equal.

that is not larger than the message length or k.

I interpret this to mean less than or equal to

does that clarify this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aaaah, now I get it, when calculating the largest power of 2 that is less or equal than message length:

  1. We start the count from the share after the last share of the previous namespace
  2. We start the new namespace at the last position of the count. For example, if message length is 7, the largest power of 2 that is less than message length is 4. Then, we start the new namespace at the fourth share after the last share of the previous namespace.

Thanks a lot for your help, now I get this more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, we start the new namespace at the fourth share after the last share of the previous namespace.

just tbc, it doesn't really have that much to do with where the last namespace ended. we want to start at the next aligned power of two, because we want a single subroot to contain all of the set of 4 shares. In a square size of 8, there are exactly two spots for this, share 0, and share 4. If the largest power of two a message cannot fit on the next aligned power of two, then it must always start on the next row.

Copy link
Member

@rach-id rach-id Sep 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do more research and come back to you. thanks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have other questions, please, when you have time 😸 :

  1. In your previous response:

In a square size of 8, there are exactly two spots for this, share 0, and share 4

You mean share 0 and share 3, right? Because the message will be aligned with the power of 2?

  1. If ns = 1 had 7 shares. Where would it start?
  • The beginning of a new row
  • The 4th position in a new row
  • The same position it is starting now
  1. if the k = 8, then the largest power of 2 that are possible are: 1, 2, 4, and 8.
    If a message length is 4. Then, the possible start of message are 1, 2 and 4. Then, the block producer checks if that collides with a previous namespace, then it does that on a new row, right?
    In the above diagram, ns = 4 is still a mystery to me.

And, for the clients, if I understand right, all they need to do is decompose the message to a merkle mountain range structure, that is bound by the row size (which is always a power of 2), and sign all the subtree roots commitments, for all row sizes.

Copy link
Member Author

@evan-forbes evan-forbes Sep 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean share 0 and share 3, right? Because the message will be aligned with the power of 2?

we actually want 4 in this case, as we want to start at the next aligned power of two, and we would want to start on the next row (5th position).

f ns = 1 had 7 shares. Where would it start?

if there was room on the row for 4 shares, the largest power of two less than or equal to the size of the message, then it would start there. If not, then it would start on the next row.

if the k = 8, then the largest power of 2 that are possible are: 1, 2, 4, and 8.

yeah!

If a message length is 4. Then, the possible start of message are 1, 2 and 4, or the start of a new row

then we have to use the largest, 4. it has to start at a share index that is divisible by 4

docs/architecture/ADR-003-Non-interactive-defaults.md Outdated Show resolved Hide resolved
docs/architecture/ADR-003-Non-interactive-defaults.md Outdated Show resolved Hide resolved
docs/architecture/ADR-003-Non-interactive-defaults.md Outdated Show resolved Hide resolved

recall our non-interactive default rule:

> **Messages must begin at a location aligned with the largest power of 2 that is not larger than the message length or k.**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[optional] This should only be accepted if the suggestion above is accepted.

Suggested change
> **Messages must begin at a location aligned with the largest power of 2 that is not larger than the message length or k.**
> **If a message length is less than or equal to k, then it must begin at a location aligned with the largest power of 2 that is less than or equal to the message length.**

docs/architecture/ADR-003-Non-interactive-defaults.md Outdated Show resolved Hide resolved
rootulp
rootulp previously approved these changes Sep 2, 2022
Copy link
Collaborator

@rootulp rootulp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incredible ADR, thanks for writing this! Again, the diagrams are insanely helpful.

Copy link
Member

@rach-id rach-id left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome stuff 🚀 👍. Mainly adding questions that I have.

docs/architecture/ADR-003-Non-interactive-defaults.md Outdated Show resolved Hide resolved
docs/architecture/ADR-003-Non-interactive-defaults.md Outdated Show resolved Hide resolved
docs/architecture/ADR-003-Non-interactive-defaults.md Outdated Show resolved Hide resolved
docs/architecture/ADR-003-Non-interactive-defaults.md Outdated Show resolved Hide resolved
docs/architecture/ADR-003-Non-interactive-defaults.md Outdated Show resolved Hide resolved
docs/architecture/ADR-003-Non-interactive-defaults.md Outdated Show resolved Hide resolved
docs/architecture/ADR-003-Non-interactive-defaults.md Outdated Show resolved Hide resolved
docs/architecture/ADR-003-Non-interactive-defaults.md Outdated Show resolved Hide resolved
Comment on lines 21 to 25
> **Messages must begin at a location aligned with the largest power of 2 that is not larger than the message length or k.**

![Subtree root commitment](./assets/subtree-root.png "Subtree Root based commitments")

> **If the messages are larger than k, then they must start on a new row.**
Copy link
Member

@rach-id rach-id Sep 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the following diagram is correct: https://github.com/celestiaorg/celestia-app/raw/27ee1dac22be7498ebfb87be85c7cd4718a4d56e/docs/architecture/assets/example-full-block.png
Then, it is strictly less than the largest power of 2 that is less than or equal to the message length : ns = 4
Edit: the ns = 4 is put straight after ns = 3. If we go by the largest power of 2 that is lower than 2, then it's 1. I guess the diagrams will need to be updated and we need to decide on:

  • Case of 2
  • Whether the power is strictly less than the length, or, less or equal.

docs/architecture/ADR-003-Non-interactive-defaults.md Outdated Show resolved Hide resolved
docs/architecture/ADR-003-Non-interactive-defaults.md Outdated Show resolved Hide resolved
docs/architecture/ADR-003-Non-interactive-defaults.md Outdated Show resolved Hide resolved
docs/architecture/ADR-003-Non-interactive-defaults.md Outdated Show resolved Hide resolved
docs/architecture/ADR-003-Non-interactive-defaults.md Outdated Show resolved Hide resolved
docs/architecture/ADR-003-Non-interactive-defaults.md Outdated Show resolved Hide resolved
rach-id
rach-id previously approved these changes Sep 4, 2022
Copy link
Member

@rach-id rach-id left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 👍 🎉 🎉 Awesome work. This should also be called: Demystifying non-interactive defaults.

docs/architecture/ADR-003-Non-interactive-defaults.md Outdated Show resolved Hide resolved
docs/architecture/ADR-003-Non-interactive-defaults.md Outdated Show resolved Hide resolved
docs/architecture/ADR-003-Non-interactive-defaults.md Outdated Show resolved Hide resolved
docs/architecture/ADR-003-Non-interactive-defaults.md Outdated Show resolved Hide resolved
Co-authored-by: CHAMI Rachid <[email protected]>
@evan-forbes evan-forbes dismissed stale reviews from rach-id and rootulp via a74e1ec September 4, 2022 19:11
@HoytRen
Copy link

HoytRen commented Sep 21, 2022

After reading #382, I still don't understand the importance of the following statement.

”This works for ensuring that each message was included, but this doesn't actually check that the commitment is actually a subtree root to one of the data availability header row or column roots.“

If we include the commitment into somewhere ( block header for example ), then It's even possible ( however have some additional work to do ) for a light node to validate the message, then we don't need the commitment to be a row/column root? As long as the message is in a share ( or some continuous shares ) and the row/column root ( or continuous roots ) includes these shares ( I know this is another addition work ), then it should be ok?

You mentioned the fraud proof at the end of the post. Then, why don't rely on it ( and may remove the necessary that include commitments for light node and download of the message ), but it's just a 'good to have? The fraud proof seems important for celestia' basic assumption, then we could trust this way?

@evan-forbes
Copy link
Member Author

evan-forbes commented Sep 21, 2022

@HoytRen I think the confusion here lies with term "Message Inclusion". As you stated, we could provide merkle proofs to prove that a message was included in the square. That's not the issue though.

The issue is that we have to prove that each payment transaction (PFD) has a message included in the same block.

We have to have a commitment that can be signed before the order of the messages in the block are known. That's the reason for the additional complexity.

if you haven't already, I would recommend reading the original specs

@HoytRen
Copy link

HoytRen commented Sep 21, 2022

I see 3 cases that message could be missing or wrong:

case 1: the sequenser submit a tx with wrong data. For this case the validator should keep the tx in the pool until data arrived or timeout, then the tx is failed.

case 2: the validator pick a tx into block but don't release the data. This is what the DAS states for. The full node should always get all block data or generate a DA fraud proof. When full node have all date, but the message is missing, he should generate fraud proof too.

case 3: the full node doesn't send all message to light node. This is what the NMT states for. I explained it earlier.

Is here something I forget? I believe celestia rely on fraud proof to protect light nodes, and the DAS make sure the fraud proof could be generated. If we need another proof to allow light node check the message by themself, then is here a case that DAS doesn't work well? Before light node retrieve messages of it's own namespace ( i.e. when doing DAS ), I believe he doesn't need to have the concept of message, the validators should care of this, but not the light node ( the light node help validator to get confident that he isn't under split-view-attack too ). Even if light node really care of this, he could finally retrieve all message of his namespace, and then he could validate the messages.

Let's get back to the orignial speces, it state:

'otherwise every node that validates the sanctity of the Celestia coin would need to download all message data'

This means we are talking about light node, then we should accept a probablistic result and the whole network should afford the cost. In fact, even if a node get to wrong state, he should be fixed when next block comes, because the pre-state roots are conflict. If celectia can't trust DAS when handle it's own tx and coin, how could we convince people that light node of celestia could work for them? Then DAS has no meaning...( No, I don't think so )

@evan-forbes
Copy link
Member Author

@HoytRen I'm not sure I see how this is related to this ADR. This ADR is strictly related to the mechanism that allows users to sign over a commitment that links to subtree roots of the message without knowing the square layout ahead of time.

I could be wrong, but I'm not sure this specific PR is the best place to answer your questions. Perhaps it would be easier via a different medium?

@HoytRen
Copy link

HoytRen commented Sep 22, 2022

Yes, you are right. I finally go out of the scope of this thread. We could discuss relative ideas somewhere else later ( Discord DM may be a better place when you are not that busy ). Since most of logics relative to ADR003 is implemented and both of us agree they are obviously work ( however I feel it has some drawbacks ), then the design problem isn't urgent. I find here are a lot of 'first good issue' for celestia-node project that I haven't evaluated. It's better I have a check there first.

Copy link
Member

@liamsi liamsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not fully ready with the review but submitting my first comments.

docs/architecture/ADR-003-Non-interactive-defaults.md Outdated Show resolved Hide resolved

The main issue with that requirement is that users must know the relevant subtree roots before they sign, which is problematic considering that if the block is not organized perfectly, the subtree roots will include data unknown to the user at the time of signing.

To fix this, the spec outlines the “non-interactive default rules”. These involve a few additional **default but optional** message layout rules that enables the user to follow the above block validity rule, while also not interacting with a block producer. Commitments to messages can consist entirely of sub tree roots of the data hash, and for those sub tree roots to be generated only from the message itself (so that the user can sign something “non-interactively”).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These involve a few additional default but optional message layout rules

This is something that always confuses me. The way I understood this on calls in the past is that this messages layout will be verified during ProcessProposal. How is it only a default and even more so: how is it optional?

Copy link
Member Author

@evan-forbes evan-forbes Sep 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's optional because when other validators are verifying the square, they don't pick where the message starts. That is encoded in the wrapped transactions as a share index.

This way, its possible for the block validity rule to still be followed provided that the user signs over a share commitment that matches the subtree roots of a message at an arbitrary starting point in the square.

All share commitments included in MsgPayForData must consist only of subtree roots of the data square.

thinking about this, we don't have an integration test (although we do have many unit tests for the code that finds subtree roots) for this scenario, so I'll add an issue.

Comment on lines 87 to 88
- Refactor `PrepareProposal` to arrange the shares such that each message has the appropriate subtree roots, and so that the metadata connection between transactions and messages is correct.
- Refactor `ProcessProposal` to check for message inclusion using only subtree roots to row roots.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the defaults are not followed during ProcessProposal? Does the block get rejected?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not if they don't follow the non-interactive defaults, but yes if they don't follow the block validity rule

All share commitments included in MsgPayForData must consist only of subtree roots of the data square.

also see the above #673 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no way to do that atm, and it would be quite complicated. So a simpler answer would probably just be, yes.

docs/architecture/ADR-003-Non-interactive-defaults.md Outdated Show resolved Hide resolved
docs/architecture/ADR-003-Non-interactive-defaults.md Outdated Show resolved Hide resolved
docs/architecture/ADR-003-Non-interactive-defaults.md Outdated Show resolved Hide resolved
docs/architecture/ADR-003-Non-interactive-defaults.md Outdated Show resolved Hide resolved
docs/architecture/ADR-003-Non-interactive-defaults.md Outdated Show resolved Hide resolved
docs/architecture/ADR-003-Non-interactive-defaults.md Outdated Show resolved Hide resolved
@evan-forbes evan-forbes changed the title ADR 003: Non-interactive Defaults, Wrapped Txs, and Subtree Root Message Inclusion Checks ADR 006: Non-interactive Defaults, Wrapped Txs, and Subtree Root Message Inclusion Checks Sep 27, 2022
@evan-forbes evan-forbes linked an issue Sep 29, 2022 that may be closed by this pull request
4 tasks
@evan-forbes
Copy link
Member Author

evan-forbes commented Sep 29, 2022

will likely merge this Monday w/o further feedback @liamsi, just so we can officially close #382.

@evan-forbes evan-forbes merged commit e69283f into main Oct 3, 2022
@evan-forbes evan-forbes deleted the evan/adr-003-NID branch October 3, 2022 19:58
rach-id added a commit to rach-id/celestia-app that referenced this pull request Nov 16, 2022
…age Inclusion Checks (celestiaorg#673)

# ADR 006
#### Non-interactive Defaults, Wrapped Txs, and Subtree Root Message
Inclusion Checks


[rendered](https://github.com/celestiaorg/celestia-app/blob/63b0ff3671e20e397b951f3ff82a93d3e36e84c6/docs/architecture/ADR-003-Non-interactive-defaults.md)

closes celestiaorg#598

Co-authored-by: Rootul Patel <[email protected]>
Co-authored-by: CHAMI Rachid <[email protected]>
Co-authored-by: John Adler <[email protected]>
Co-authored-by: Ismail Khoffi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion inherently unactionable issue for the sole purpose of discussion documentation Improvements or additions to documentation
Projects
No open projects
Status: Done
6 participants