Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add docs on data retrievability and pruning #1123

Merged
merged 13 commits into from
Oct 11, 2023
Merged
30 changes: 30 additions & 0 deletions docs/developers/retrievability.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
---
sidebar_label: Data retrievability and pruning
musalbas marked this conversation as resolved.
Show resolved Hide resolved
---

# Data retrievability and pruning

The purpose of data availability layers such as Celestia is to ensure that block data is provably published to the Internet, so that applications and rollups can know what the state of their chain is, and store that data. Once the data is published, data availability layers [do not inherently guarantee that historical data will be permanently stored](https://notes.ethereum.org/@vbuterin/proto_danksharding_faq#If-data-is-deleted-after-30-days-how-would-users-access-older-blobs) and be retrievable.
musalbas marked this conversation as resolved.
Show resolved Hide resolved

In this document, we discuss the state of data retrievability and pruning in Celestia, as well as some tips for rollup developers in order to ensure that syncing new rollup nodes is possible.

## Data retrievability and pruning in celestia-node

Celestia-node's main branch does not currently support pruning, and therefore all bridge and full storage nodes currently store and serve all historical data by default, and act as **archival nodes**.

However, support for **pruned nodes** exists in an [experimental feature branch](https://github.com/celestiaorg/celestia-node/pull/2738) that is expected to land in main soon after mainnet. The data recency window for which pruned nodes will store data blobs for is currently proposed to be **30 days**.
musalbas marked this conversation as resolved.
Show resolved Hide resolved

Data blobs older than the recency window will be pruned by pruned nodes, but will continue to be stored by archival nodes that do not prune data. Light nodes will be able to query historic blob data in namespaces from archival nodes, as long as archival nodes exist on the public network.

When a data recency window is established, light nodes will only perform data availability sampling for blocks within the data recency window.

## Suggested practices for rollups

Rollups may need to access historic data in order to allow new rollup nodes to reconstruct the latest state by replaying historic blocks. Once data has been published on Celestia and guaranteed to have been made available, rollups and applications are responsible for storing their historical data.

While it is possible to continue do this by using the `GetAll` API method in celestia-node on historic blocks as long as archival nodes exist on the public Celestia network, rollup developers should not rely on this as the only method to access historical data, as archival nodes serving requests for historical data for free is not guaranteed. Below are some other suggested methods to access historical data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One idea, i’ve been discussing with Mary is the ability to plug various data sources into the node. In the same way, we have multiple options to get data from p2p, we can also allow developers to plug in CDNs, cloud providers, and web3 storage providers. This is similar to marketed Espresso’s tiramisu with the difference that we have required foundation implemented. If we enable plugging various standardized Getters, then there will be less need for Rollup devs to migrate off GetAll and implement custom interfaces and data retrievals. They would be able to just use standardized data sources in our ecosystem.

musalbas marked this conversation as resolved.
Show resolved Hide resolved

* **Use professional archival node or data providers.** It is expected that professional infrastructure providers will provide paid access to archival nodes, where historical data can be retrieved, for example using the `GetAll` API method. This provides better guarantees than solely relying on free archival nodes on the public Celestia network.
* **Share snapshots of rollup nodes.** Rollups could share snapshots of their data directories which can be downloaded manually by users bootstrapping new nodes. These snapshots could contain the latest state of the rollup, and/or all the historical blocks.
* **Add peer-to-peer support for historical block sync.** A less manual version of sharing snapshots, where rollup nodes could implement built-in support for block sync, where rollup nodes download historical block data from each other over a peer-to-peer network.
* [**Namespace pinning.**](https://github.com/celestiaorg/celestia-node/issues/2830) In the future, celestia-node is expected to allow nodes to choose to "pin" data from selected namespaces that they wish to store and make available for other nodes. This will allow rollup nodes to be responsible for the storage of their data, without needing to implement their own peer-to-peer historical block sync mechanism.
5 changes: 5 additions & 0 deletions sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -468,6 +468,11 @@ const sidebars = {
label: "Integrate Celestia",
id: "developers/integrate-celestia",
},
{
type: "doc",
label: "Data retrievability and pruning",
id: "developers/retrievability",
},
],
community: [
{ type: "doc", label: "Overview", id: "community/overview" },
Expand Down