keep-starknet-strange · maciejka · Oct 17, 2024 · Oct 16, 2024 · Oct 16, 2024 · Oct 16, 2024
@@ -33,5 +33,4 @@ __pycache__
 .arguments*.json
 
 corelib
-*.svg
 *.txt
@@ -15,26 +15,9 @@ Shinigami is a zero-knowledge Bitcoin client implemented in Cairo. It aims to pr
 
 > **Disclaimer:** This project is in the early stages of development and should not be used in production. It will evolve rapidly, expect breaking changes.
 
-```mermaid
-flowchart TB
-Pnm1(STARK proof of the chain state up to the block <i>n - 1</i>, including utxo accumulator) --> Vp(zk verifier)
-Bn(blocks <i>n..m</i>) ----> Vb
-
-subgraph Cairo
-    Vp{{STARK verifier}}-->ChS(verified chain state)
-    ChS --> Vb{{validate block<br>against the chain state}}
-    Vb --> ChS
-end
-
-Vb --> Pn(STARK proof of the chain state up to the block <i>m</i>,<br> including utxo accumulator)
-
-style Bn fill:pink
-style Pn fill:lightgreen
-style Pnm1 fill:lightgreen
-style ChS fill:greenyellow
-style Vp fill:gold
-style Vb fill:gold
-```
+<p align="center" width="100%">
+  <img src="./docs/img/client.svg" alt="client"/>
+</p>
 
 At its core, consensus client accepts two inputs: a batch of consecutive blocks <i>n</i> to <i>m</i> and a STARK proof of the state of the chain up to block <i>n−1</i>. It ensures that the historical chain state is valid by verifying the STARK proof. Then, it produces a new chain state by applying the new blocks on top of the historical state. As a result, a proof of the new state is generated.
 
@@ -154,11 +137,12 @@ pip install -r scripts/data/requirements.txt
 
 ## References
 
+* [Data processing notes](./data/data.md)
+* [ZeroSync](https://github.com/ZeroSync/ZeroSync)
+* [Shinigami Script](https://github.com/keep-starknet-strange/shinigami)
 * [STWO](https://github.com/starkware-libs/stwo)
 * [Cairo](https://www.cairo-lang.org/)
 * [Circle STARK paper](https://eprint.iacr.org/2024/278)
-* [ZeroSync](https://github.com/ZeroSync/ZeroSync)
-* [Shinigami](https://github.com/keep-starknet-strange/shinigami)
 
 ## Contributors ✨
 

@@ -0,0 +1,19 @@
+# Data processing notes
+In order to generate input to the [validate_and_apply](../packages/consensus/src/types/chain_state.cairo#L62) function, a lot of data needs to be gathered. 
+
+## ChainState and Block
+Generating ChainState and Block data involves joining information between multiple blocks and transactions. Since this kind of operations is slow with Bitcoin RPC we use Google Bitcoin data set which allows us to export data with plain sql. Unfortunately due to the [missing transaction_index](https://github.com/blockchain-etl/bitcoin-etl/issues/47) bug in the data set it can't be the only source of data.
+
+<p align="center" width="100%">
+  <img src="./img/data.svg" alt="client"/>
+</p>
+
+Input data is processed in multiple steps:
+1. [previous_timestamps.sql](../scripts/data/previous_timestamps.sql) and [previous_utxos.sql](../scripts/data/previous_utxos.sql) queries dump data into GCS
+2. Timestamp data dump is processed by [generate_timestamp_data.py](../scripts/data/generate_timestamp_data.py) script. Data is downloaded from GCS and index files are created. Index maps block number to per block timestamp related data. Index is broken down into smaller files, in order to be quickly loaded into the memory.
+3. Utxo data dump is  by [generate_utxo_data.py](../scripts/data/generate_utxo_data.py) script: is downloaded from GCS, data files are broken down into smaller chunks, each chunk contains data about several blocks. Index files are created. Index maps block number to a chunk file. Index is broken down into smaller files.
+4. After data dump processing is complete functions [`get_timestamp_data`](../scripts/data/generate_timestamp_data.py#L88) and [`get_utxo_set`](../scripts/data/generate_utxo_data.py#L125)  give access to the per block data.
+5. Script [generate_data](../scripts/data/generate_data.py) generates data that can be consumed by the `validate_and_apply` function.
+
+## UtxoSet
+tbd
-Original file line number
+Diff line change
@@ Expand Up / @@ -33,5 +33,4 @@ __pycache__ @@
     .arguments*.json
     corelib
-    *.svg
     *.txt