Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data processing docs #272

Merged
merged 6 commits into from
Oct 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -33,5 +33,4 @@ __pycache__
.arguments*.json

corelib
*.svg
*.txt
28 changes: 6 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,26 +15,9 @@ Shinigami is a zero-knowledge Bitcoin client implemented in Cairo. It aims to pr

> **Disclaimer:** This project is in the early stages of development and should not be used in production. It will evolve rapidly, expect breaking changes.

```mermaid
flowchart TB
Pnm1(STARK proof of the chain state up to the block <i>n - 1</i>, including utxo accumulator) --> Vp(zk verifier)
Bn(blocks <i>n..m</i>) ----> Vb

subgraph Cairo
Vp{{STARK verifier}}-->ChS(verified chain state)
ChS --> Vb{{validate block<br>against the chain state}}
Vb --> ChS
end

Vb --> Pn(STARK proof of the chain state up to the block <i>m</i>,<br> including utxo accumulator)

style Bn fill:pink
style Pn fill:lightgreen
style Pnm1 fill:lightgreen
style ChS fill:greenyellow
style Vp fill:gold
style Vb fill:gold
```
<p align="center" width="100%">
<img src="./docs/img/client.svg" alt="client"/>
</p>

At its core, consensus client accepts two inputs: a batch of consecutive blocks <i>n</i> to <i>m</i> and a STARK proof of the state of the chain up to block <i>n−1</i>. It ensures that the historical chain state is valid by verifying the STARK proof. Then, it produces a new chain state by applying the new blocks on top of the historical state. As a result, a proof of the new state is generated.

Expand Down Expand Up @@ -154,11 +137,12 @@ pip install -r scripts/data/requirements.txt

## References

* [Data processing notes](./data/data.md)
* [ZeroSync](https://github.com/ZeroSync/ZeroSync)
* [Shinigami Script](https://github.com/keep-starknet-strange/shinigami)
* [STWO](https://github.com/starkware-libs/stwo)
* [Cairo](https://www.cairo-lang.org/)
* [Circle STARK paper](https://eprint.iacr.org/2024/278)
* [ZeroSync](https://github.com/ZeroSync/ZeroSync)
* [Shinigami](https://github.com/keep-starknet-strange/shinigami)

## Contributors ✨

Expand Down
19 changes: 19 additions & 0 deletions docs/data.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Data processing notes
In order to generate input to the [validate_and_apply](../packages/consensus/src/types/chain_state.cairo#L62) function, a lot of data needs to be gathered.

## ChainState and Block
Generating ChainState and Block data involves joining information between multiple blocks and transactions. Since this kind of operations is slow with Bitcoin RPC we use Google Bitcoin data set which allows us to export data with plain sql. Unfortunately due to the [missing transaction_index](https://github.com/blockchain-etl/bitcoin-etl/issues/47) bug in the data set it can't be the only source of data.

<p align="center" width="100%">
<img src="./img/data.svg" alt="client"/>
</p>

Input data is processed in multiple steps:
1. [previous_timestamps.sql](../scripts/data/previous_timestamps.sql) and [previous_utxos.sql](../scripts/data/previous_utxos.sql) queries dump data into GCS
2. Timestamp data dump is processed by [generate_timestamp_data.py](../scripts/data/generate_timestamp_data.py) script. Data is downloaded from GCS and index files are created. Index maps block number to per block timestamp related data. Index is broken down into smaller files, in order to be quickly loaded into the memory.
3. Utxo data dump is by [generate_utxo_data.py](../scripts/data/generate_utxo_data.py) script: is downloaded from GCS, data files are broken down into smaller chunks, each chunk contains data about several blocks. Index files are created. Index maps block number to a chunk file. Index is broken down into smaller files.
4. After data dump processing is complete functions [`get_timestamp_data`](../scripts/data/generate_timestamp_data.py#L88) and [`get_utxo_set`](../scripts/data/generate_utxo_data.py#L125) give access to the per block data.
5. Script [generate_data](../scripts/data/generate_data.py) generates data that can be consumed by the `validate_and_apply` function.

## UtxoSet
tbd
Loading
Loading