Skip to content

Commit

Permalink
State Storage - State (#571)
Browse files Browse the repository at this point in the history
  • Loading branch information
staffik authored Nov 6, 2024
1 parent 1ee5a74 commit 6c98441
Showing 1 changed file with 106 additions and 0 deletions.
106 changes: 106 additions & 0 deletions neps/nep-0568.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,28 @@ Splitting a shard's Flat State is performed in multiple steps:
snapshots and to reload Mem Tries.

### State Storage - State

Check failure on line 109 in neps/nep-0568.md

View workflow job for this annotation

GitHub Actions / markdown-lint

Headings should be surrounded by blank lines [Expected: 1; Actual: 0; Below] [Context: "### State Storage - State"]

neps/nep-0568.md:109 MD022/blanks-around-headings/blanks-around-headers Headings should be surrounded by blank lines [Expected: 1; Actual: 0; Below] [Context: "### State Storage - State"]
// TODO Describe integration with cold storage once design is ready

Each shard’s Trie is stored in the `State` column of the database, with keys prefixed by `ShardUId`, followed by a node's hash.
This structure uniquely identifies each shard’s data. To avoid copying all entries under a new `ShardUId` during resharding,
a mapping strategy allows child shards to access ancestor shard data without directly creating new entries.

A naive approach to resharding would involve copying all `State` entries with a new `ShardUId` for a child shard, effectively duplicating the state.
This method, while straightforward, is not feasible because copying a large state would take too much time.
Resharding needs to appear complete between two blocks, so a direct copy would not allow the process to occur quickly enough.

To address this, Resharding V3 employs an efficient mapping strategy, using the `DBCol::ShardUIdMapping` column
to link each child shard’s `ShardUId` to the closest ancestor’s `ShardUId` holding the relevant data.
This allows child shards to access and update state data under the ancestor shard’s prefix without duplicating entries.

Initially, `ShardUIdMapping` is empty, as existing shards map to themselves. During resharding, a mapping entry is added to `ShardUIdMapping`,
pointing each child shard’s `ShardUId` to the appropriate ancestor. Mappings persist as long as any descendant shard references the ancestor’s data.
Once a node stops tracking all children and descendants of a shard, the entry for that shard can be removed, allowing its data to be garbage collected.
For archival nodes, mappings are retained indefinitely to maintain access to the full historical state.

This mapping strategy enables efficient shard management during resharding events,
supporting smooth transitions without altering storage structures directly.


### Stateless Validation

Expand Down Expand Up @@ -134,6 +156,90 @@ Splitting a shard's Flat State is performed in multiple steps:
The section should return to the examples given in the previous section, and explain more fully how the detailed proposal makes those examples work.]
```

### State Storage - State mapping

To enable efficient shard state management during resharding, Resharding V3 uses the `DBCol::ShardUIdMapping` column.
This mapping allows child shards to reference ancestor shard data, avoiding the need for immediate duplication of state entries.

#### Mapping application in adapters

The core of the mapping logic is applied in `TrieStoreAdapter` and `TrieStoreUpdateAdapter`, which act as layers over the general `Store` interface.
Here’s a breakdown of the key functions involved:

- **Key resolution**:

Check failure on line 169 in neps/nep-0568.md

View workflow job for this annotation

GitHub Actions / markdown-lint

Unordered list style [Expected: asterisk; Actual: dash]

neps/nep-0568.md:169:1 MD004/ul-style Unordered list style [Expected: asterisk; Actual: dash]
The `get_key_from_shard_uid_and_hash` function is central to determining the correct `ShardUId` for state access.
At a high level, operations use the child shard's `ShardUId`, but within this function,
the `DBCol::ShardUIdMapping` column is checked to determine if an ancestor `ShardUId` should be used instead.

```rust
fn get_key_from_shard_uid_and_hash(
store: &Store,
shard_uid: ShardUId,
hash: &CryptoHash,
) -> [u8; 40] {
let mapped_shard_uid = store
.get_ser::<ShardUId>(DBCol::StateShardUIdMapping, &shard_uid.to_bytes())
.expect("get_key_from_shard_uid_and_hash() failed")
.unwrap_or(shard_uid);
let mut key = [0; 40];
key[0..8].copy_from_slice(&mapped_shard_uid.to_bytes());
key[8..].copy_from_slice(hash.as_ref());
key
}
```

This function first attempts to retrieve a mapped ancestor `ShardUId` from `DBCol::ShardUIdMapping`.
If no mapping exists, it defaults to the provided child `ShardUId`.
This resolved `ShardUId` is then combined with the `node_hash` to form the final key used in `State` column operations.

- **State access operations**:

Check failure on line 195 in neps/nep-0568.md

View workflow job for this annotation

GitHub Actions / markdown-lint

Unordered list style [Expected: asterisk; Actual: dash]

neps/nep-0568.md:195:1 MD004/ul-style Unordered list style [Expected: asterisk; Actual: dash]
The `TrieStoreAdapter` and `TrieStoreUpdateAdapter` use `get_key_from_shard_uid_and_hash` to correctly resolve the key for both reads and writes.
Example methods include:

```rust
// In TrieStoreAdapter
pub fn get(&self, shard_uid: ShardUId, hash: &CryptoHash) -> Result<Arc<[u8]>, StorageError> {
let key = get_key_from_shard_uid_and_hash(self.store, shard_uid, hash);
self.store.get(DBCol::State, &key)
}

// In TrieStoreUpdateAdapter
pub fn increment_refcount_by(
&mut self,
shard_uid: ShardUId,
hash: &CryptoHash,
data: &[u8],
increment: NonZero<u32>,
) {
let key = get_key_from_shard_uid_and_hash(self.store, shard_uid, hash);
self.store_update.increment_refcount_by(DBCol::State, key.as_ref(), data, increment);
}
```

Check failure on line 217 in neps/nep-0568.md

View workflow job for this annotation

GitHub Actions / markdown-lint

Fenced code blocks should be surrounded by blank lines [Context: "```"]

neps/nep-0568.md:217 MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines [Context: "```"]
The `get` function retrieves data using the resolved `ShardUId` and key, while `increment_refcount_by` manages reference counts,
ensuring correct tracking even when accessing data under an ancestor shard.

#### Mapping retention and cleanup

Mappings in `DBCol::ShardUIdMapping` persist as long as any descendant relies on an ancestor’s data.
To manage this, the `set_shard_uid_mapping` function in `TrieStoreUpdateAdapter` adds a new mapping during resharding:
```rust

Check failure on line 225 in neps/nep-0568.md

View workflow job for this annotation

GitHub Actions / markdown-lint

Fenced code blocks should be surrounded by blank lines [Context: "```rust"]

neps/nep-0568.md:225 MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines [Context: "```rust"]
fn set_shard_uid_mapping(&mut self, child_shard_uid: ShardUId, parent_shard_uid: ShardUId) {
self.store_update.set(
DBCol::StateShardUIdMapping,
child_shard_uid.to_bytes().as_ref(),
&borsh::to_vec(&parent_shard_uid).expect("Borsh serialize cannot fail"),
)
}
```

When a node stops tracking all descendants of a shard, the associated mapping entry can be removed, allowing RocksDB to perform garbage collection.
For archival nodes, mappings are retained permanently to ensure access to the historical state of all shards.

This implementation ensures efficient and scalable shard state transitions,
allowing child shards to use ancestor data without creating redundant entries.



## Security Implications

```text
Expand Down

0 comments on commit 6c98441

Please sign in to comment.