Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lazy-adr: Add Data Availability library #170

Merged
merged 21 commits into from
Mar 5, 2021
Merged

lazy-adr: Add Data Availability library #170

merged 21 commits into from
Mar 5, 2021

Conversation

liamsi
Copy link
Member

@liamsi liamsi commented Mar 1, 2021

Description

This PR will contain the "DA lib" accompanied with an ADR describing the design as well as providing some context.

Rendered ADR: https://github.com/lazyledger/lazyledger-core/blob/ismail/da_lib_adr/docs/lazy-adr/adr-002-ipds-da-sampling.md

related to: #85, #163

@liamsi liamsi force-pushed the ismail/da_lib_adr branch from f661921 to 1e95a71 Compare March 1, 2021 11:26
@liamsi liamsi requested a review from evan-forbes March 1, 2021 14:07
Comment on lines +185 to +193
### A Note on IPFS/IPLD

In IPFS all data is _content addressed_ which basically means the data is identified by its hash.
Particularly, in the LazyLedger case, the root CID identifies the Namespaced Merkle tree including all its contents (inner and leaf nodes).
This means that if a `GetLeafData` request succeeds, the retrieved leaf data is in fact the leaf data in the tree.
We do not need to additionally verify Merkle proofs per leaf as this will essentially be done via IPFS on each layer while
resolving and getting to the leaf data.

> TODO: validate this assumption and link to code that shows how this is done internally
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or do we want to explicitly verify proofs either way? To not rely on the fact that ipfs in combination with our plugin handles this correctly?

cc @musalbas @adlerjohn

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's safer and more idiot-proof if GetLeafDataonly succeeds if the proof is valid. Anyway, I thought it would only succeed with IPFS is the proof is valid, with the custom hasher?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, I thought it would only succeed with IPFS is the proof is valid, with the custom hasher?

Yes, that is my understanding as well. For every retrieved leaf, the proof nodes should also be resolved and validated on its path down.

@liamsi liamsi requested review from adlerjohn and musalbas March 2, 2021 08:44
Comment on lines +138 to +141
// The context can be used to provide a timeout.
// TODO: Should there be a constant = lower bound for #samples
func ValidateAvailability(
ctx contex.Context,
Copy link
Member Author

@liamsi liamsi Mar 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider moving these Context objects lower down the stack too.

@liamsi liamsi changed the title Add DA library Add DA library ADR Mar 2, 2021
@liamsi liamsi marked this pull request as ready for review March 2, 2021 09:54
@liamsi liamsi requested a review from tac0turtle as a code owner March 2, 2021 09:54
Copy link
Member

@musalbas musalbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

docs/lazy-adr/adr-002-ipds-da-sampling.md Outdated Show resolved Hide resolved
````go
// This constructs an IPFS node instance
node, _ := core.NewNode(ctx, nodeOptions)
// This attaches the Core API to the constructed node
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if you don't attach a core API?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also pass around the node object directly, or simply the DAG field's ipld.DAGService. In the former case it would just be less pluggable (as we are passing around a concrete object instead of an interface).

docs/lazy-adr/adr-002-ipds-da-sampling.md Outdated Show resolved Hide resolved
// to process the leaf data the moment it was validated.
// The context can be used to provide a timeout.
// TODO: Should there be a constant = lower bound for #samples
func ValidateAvailability(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could block for a few minutes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is definitely something that should be done asynchronously.

docs/lazy-adr/adr-002-ipds-da-sampling.md Outdated Show resolved Hide resolved
docs/lazy-adr/adr-002-ipds-da-sampling.md Outdated Show resolved Hide resolved
docs/lazy-adr/adr-002-ipds-da-sampling.md Outdated Show resolved Hide resolved
docs/lazy-adr/adr-002-ipds-da-sampling.md Outdated Show resolved Hide resolved
docs/lazy-adr/adr-002-ipds-da-sampling.md Outdated Show resolved Hide resolved
docs/lazy-adr/adr-002-ipds-da-sampling.md Outdated Show resolved Hide resolved
docs/lazy-adr/adr-002-ipds-da-sampling.md Outdated Show resolved Hide resolved
// Specifically all steps of the the protocol described in section
// _5.2 Random Sampling and Network Block Recovery_ are carried out.
//
// In more detail it will first create numSamples random unique coordinates.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: add that the domain for coordinates can excludes parts of the original data square (and extended rows!) based on the number of "real" shares in a block, i.e. the availableDataOriginalSharesUsed field in the header https://github.com/lazyledger/lazyledger-specs/blob/10732d7a258a0b64dfccf96fd863830faca73ce3/specs/data_structures.md#header

// to process the leaf data the moment it was validated.
// The context can be used to provide a timeout.
// TODO: Should there be a constant = lower bound for #samples
func ValidateAvailability(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is definitely something that should be done asynchronously.

docs/lazy-adr/adr-002-ipds-da-sampling.md Outdated Show resolved Hide resolved
// Note, that this method could also return the row and column roots.
// Tha caller is responsible for making sure that the leaves are sorted by namespace ID.
// The data will be pinned by default.
func PutLeaves(ctx contex.Context, namespacedLeaves [][]byte) error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to be passing the IPFS node object, then I think PutLeaves will need a format.NodeAdder argument, as it will not have access to the IPFS node object.

@evan-forbes
Copy link
Member

evan-forbes commented Mar 2, 2021

I'm not sure if defining the API perfectly is in the scope of this ADR, but I think the API will have to change to accommodate some form of access to the IPFS node object if we're passing it around. If we use one of the alternatives mentioned, then the API specified still might have to be modified slightly, but that depends on which alternative we end up going with.

@evan-forbes
Copy link
Member

evan-forbes commented Mar 4, 2021

I posted a draft for the writing portion of this ADR at #178. The PR incorporates PutLeaves into PutBlock, as that allows for more optimization in batch adding the ipld nodes to the dag. It also adds an IPFS core object as an argument to PutBlock. Notably, the PR requires the erasure data be computed twice, as it is not currently cached, but that can change.

@liamsi
Copy link
Member Author

liamsi commented Mar 4, 2021

I'm not sure if defining the API perfectly is in the scope of this ADR

I mostly created this ADR such that we have a basis on which to discuss API alternatives. If no major shortcomings in the ADR, we can use it as a blueprint to start the implementation. It is definitely not complete and it won't be until we wrapped up the implementation. I do not think it is realistic to define the APIs perfectly without drafting at least "spikes" / "tracer bullet" implementations.

Actually, my suggestion is to merge this if it is sound and sane from a high-level pov. Because then, you and I can make the modifications on portions of the ADR while we work on the implementation in separate branches. The moment, we want to merge a PR that implements a part of this, we need to make sure the ADR matches the implementation API (and we changed the status from Proposed to Implemented). At least that was the idea. I'm open to other suggestions.

@liamsi liamsi changed the title Add DA library ADR lazy-adr: Add Data Availability library Mar 4, 2021
Copy link
Member

@evan-forbes evan-forbes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, any changes needed during implementation can be made then. Two 👍 from me. 🚀

@liamsi liamsi mentioned this pull request Mar 5, 2021
5 tasks
@liamsi
Copy link
Member Author

liamsi commented Mar 5, 2021

OK, let's merge this then! I've captured a bunch of smaller todos here: #179

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants