BFT-475: Add BatchVotePublisher and expose it to the Executor #137

aakoshh · 2024-06-26T16:06:59Z

What ❔

Adds a BatchVotePublisher which will be used in the Executor to push L1 batch votes to the gossip layer.

Why ❔

So that we can insert the nodes attestations over L1 batches into vote collection, which automatically gets gossiped to connected peers in Network::run_stream.

node/actors/executor/src/lib.rs

## What ❔ - [x] Add an `l1_batches_consensus` table to hold [L1 batch Quorum Certificates](https://github.com/matter-labs/era-consensus/blob/177881457f392fca990dbb3df1695737d90fd0c7/node/libs/roles/src/attester/messages/batch.rs#L67) from Attesters - [x] Add attesters to the config - [x] Implement methods in `PersistentBatchStore` - [x] `persisted` - [x] `last_batch` - [x] `last_batch_qc` - [x] `get_batch` - [x] `get_batch_qc` - [x] `store_qc` - [ ] `queue_next_batch` - _not going to implement for now_ - [ ] assign `SyncBatch::proof` - _not going to implement for now_ - [x] Add tests for all new methods in `ConsensusDal` and the `PersistentBatchStore` ### Caveat Implemented the updating of `persisted` with a loop that polls the database for newly available `SyncBatch` records, even if they have no proof. This inevitably triggers the gossiping of batch statuses and the pulling of `SyncBatch` between peers. For this reason `queue_next_batch` just drop the data, since we can't do anything with it without the proof yet. Returning an error or panicking would stop the consensus tasks. I ended up disabling the `persisted` by leaving its dummy implementation in place because when enabled the full node tests keep going on forever, printing the following logs in a loop: ```console ❯ RUST_LOG=info zk test rust test_full_nodes --no-capture ... 2024-07-03T14:22:57.882784Z INFO in{addr=[::1]:53082}: zksync_consensus_network: 191: new connection 2024-07-03T14:22:57.883457Z INFO in{addr=[::1]:53082}:gossip: zksync_consensus_network::gossip::runner: 383: peer = node:public:ed25519:068ffa0b3fedbbe5c2a6da3defd26e0d084248f12bfe98db85f7785b0b08b63e 2024-07-03T14:22:57.883764Z INFO out{addr="[::1]:52998"}:gossip: zksync_consensus_network::gossip::runner: 416: peer = node:public:ed25519:7710ed90aad9f5859dfba06e13fb4e6fb0fe4d686f81f9d819464ad1fdc371bd 2024-07-03T14:22:57.886204Z INFO in{addr=[::1]:53082}:gossip: zksync_consensus_network::rpc: 222: message too large: max = 10240B, got 13773B 2024-07-03T14:22:57.886280Z INFO out{addr="[::1]:52998"}:gossip: zksync_consensus_network::rpc: 222: message too large: max = 10240B, got 13773B 2024-07-03T14:22:57.886633Z INFO in{addr=[::1]:53082}:gossip: zksync_consensus_network::rpc: 222: canceled ... 2024-07-03T14:22:57.888143Z INFO out{addr="[::1]:52998"}:gossip: zksync_consensus_network::rpc: 222: disconnected ... 2024-07-03T14:22:57.888390Z INFO zksync_consensus_network: 216: [::1]:53082: gossip.run_inbound_stream(): push_batch_store_state.: end of stream 2024-07-03T14:22:57.888446Z INFO zksync_consensus_network: 158: gossip.run_outbound_stream("[::1]:52998"): push_batch_store_state.: end of stream ``` So in the tests the message size exceeds the maximum. I think it's [hardcoded here](https://github.com/matter-labs/era-consensus/blob/decb988eb9e1a45fd5171d2cc540a360d9ca5f1f/node/actors/network/src/gossip/runner.rs#L109). Since this functionality isn't expected to work, I think we can disable it for now. ## Why ❔ The workflow of signing and submitting L1 batch certificates will be like this: 1. Data is inserted into the `l1_batches` table. 2. If the node is one of the Attesters it picks up the batch, signs and sends it to the gossip layer via matter-labs/era-consensus#137 3. The consensus collects votes about the L1 batch, and when the threshold is reached it saves the quorum certificate into Postgres 4. The node monitors Main Node (later L1) for new batch QCs and upserts them into the database (the QC can be different than what a particular node inserted based on gossip). This way a node which has been down for a period of time can backfill any QCs it missed. It is assumed that the Main Node API only serves QCs that have no gaps following them, ie. they are final - if it was L1 it wouldn't allow submissions with gaps, and this simulates that semantic. 5. The last height that doesn't have any gaps following it is used as a floor for what needs to be (re)signed and gossiped This PR supports the above workflow up to step 3. ## Checklist   - [x] PR title corresponds to the body of PR (we generate changelog entries from PRs). - [x] Tests for the changes have been added / updated. - [x] Documentation comments have been added / updated. - [x] Code has been formatted via `zk fmt` and `zk lint`.

## What ❔ - [x] Add an `l1_batches_consensus` table to hold [L1 batch Quorum Certificates](https://github.com/matter-labs/era-consensus/blob/177881457f392fca990dbb3df1695737d90fd0c7/node/libs/roles/src/attester/messages/batch.rs#L67) from Attesters - [x] Add attesters to the config - [x] Implement methods in `PersistentBatchStore` - [x] `persisted` - [x] `last_batch` - [x] `last_batch_qc` - [x] `get_batch` - [x] `get_batch_qc` - [x] `store_qc` - [ ] `queue_next_batch` - _not going to implement for now_ - [ ] assign `SyncBatch::proof` - _not going to implement for now_ - [x] Add tests for all new methods in `ConsensusDal` and the `PersistentBatchStore` ### Caveat Implemented the updating of `persisted` with a loop that polls the database for newly available `SyncBatch` records, even if they have no proof. This inevitably triggers the gossiping of batch statuses and the pulling of `SyncBatch` between peers. For this reason `queue_next_batch` just drop the data, since we can't do anything with it without the proof yet. Returning an error or panicking would stop the consensus tasks. I ended up disabling the `persisted` by leaving its dummy implementation in place because when enabled the full node tests keep going on forever, printing the following logs in a loop: ```console ❯ RUST_LOG=info zk test rust test_full_nodes --no-capture ... 2024-07-03T14:22:57.882784Z INFO in{addr=[::1]:53082}: zksync_consensus_network: 191: new connection 2024-07-03T14:22:57.883457Z INFO in{addr=[::1]:53082}:gossip: zksync_consensus_network::gossip::runner: 383: peer = node:public:ed25519:068ffa0b3fedbbe5c2a6da3defd26e0d084248f12bfe98db85f7785b0b08b63e 2024-07-03T14:22:57.883764Z INFO out{addr="[::1]:52998"}:gossip: zksync_consensus_network::gossip::runner: 416: peer = node:public:ed25519:7710ed90aad9f5859dfba06e13fb4e6fb0fe4d686f81f9d819464ad1fdc371bd 2024-07-03T14:22:57.886204Z INFO in{addr=[::1]:53082}:gossip: zksync_consensus_network::rpc: 222: message too large: max = 10240B, got 13773B 2024-07-03T14:22:57.886280Z INFO out{addr="[::1]:52998"}:gossip: zksync_consensus_network::rpc: 222: message too large: max = 10240B, got 13773B 2024-07-03T14:22:57.886633Z INFO in{addr=[::1]:53082}:gossip: zksync_consensus_network::rpc: 222: canceled ... 2024-07-03T14:22:57.888143Z INFO out{addr="[::1]:52998"}:gossip: zksync_consensus_network::rpc: 222: disconnected ... 2024-07-03T14:22:57.888390Z INFO zksync_consensus_network: 216: [::1]:53082: gossip.run_inbound_stream(): push_batch_store_state.: end of stream 2024-07-03T14:22:57.888446Z INFO zksync_consensus_network: 158: gossip.run_outbound_stream("[::1]:52998"): push_batch_store_state.: end of stream ``` So in the tests the message size exceeds the maximum. I think it's [hardcoded here](https://github.com/matter-labs/era-consensus/blob/decb988eb9e1a45fd5171d2cc540a360d9ca5f1f/node/actors/network/src/gossip/runner.rs#L109). Since this functionality isn't expected to work, I think we can disable it for now. ## Why ❔ The workflow of signing and submitting L1 batch certificates will be like this: 1. Data is inserted into the `l1_batches` table. 2. If the node is one of the Attesters it picks up the batch, signs and sends it to the gossip layer via matter-labs/era-consensus#137 3. The consensus collects votes about the L1 batch, and when the threshold is reached it saves the quorum certificate into Postgres 4. The node monitors Main Node (later L1) for new batch QCs and upserts them into the database (the QC can be different than what a particular node inserted based on gossip). This way a node which has been down for a period of time can backfill any QCs it missed. It is assumed that the Main Node API only serves QCs that have no gaps following them, ie. they are final - if it was L1 it wouldn't allow submissions with gaps, and this simulates that semantic. 5. The last height that doesn't have any gaps following it is used as a floor for what needs to be (re)signed and gossiped This PR supports the above workflow up to step 3. ## Checklist   - [x] PR title corresponds to the body of PR (we generate changelog entries from PRs). - [x] Tests for the changes have been added / updated. - [x] Documentation comments have been added / updated. - [x] Code has been formatted via `zk fmt` and `zk lint`. --------- Co-authored-by: Bruno França <[email protected]>

aakoshh changed the title ~~BFT-475: Add BatchVotesWatch to the Executor~~ BFT-475: Add BatchVotePublisher and expose it to the Executor Jun 26, 2024

aakoshh marked this pull request as ready for review June 26, 2024 18:45

aakoshh requested a review from pompon0 as a code owner June 26, 2024 18:45

BFT-475: Add BatchVotePublisher

ecda871

aakoshh force-pushed the bft-475-gossip-attestation branch from 39da8a6 to ecda871 Compare June 26, 2024 18:47

pompon0 approved these changes Jun 26, 2024

View reviewed changes

node/actors/executor/src/lib.rs Show resolved Hide resolved

brunoffranca and others added 3 commits June 27, 2024 03:02

Merge branch 'main' into bft-475-gossip-attestation

e115a7c

Merge branch 'main' into bft-475-gossip-attestation

c3136d4

Merge branch 'main' into bft-475-gossip-attestation

6951b72

aakoshh merged commit 1778814 into main Jun 27, 2024
5 checks passed

aakoshh deleted the bft-475-gossip-attestation branch June 27, 2024 09:42

aakoshh mentioned this pull request Jun 27, 2024

feat: L1 batch QC database (BFT-476) matter-labs/zksync-era#2340

Merged

16 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BFT-475: Add BatchVotePublisher and expose it to the Executor #137

BFT-475: Add BatchVotePublisher and expose it to the Executor #137

aakoshh commented Jun 26, 2024 •

edited

Loading

BFT-475: Add BatchVotePublisher and expose it to the Executor #137

BFT-475: Add BatchVotePublisher and expose it to the Executor #137

Conversation

aakoshh commented Jun 26, 2024 • edited Loading

What ❔

Why ❔

aakoshh commented Jun 26, 2024 •

edited

Loading