-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BFT-475: Add BatchVotePublisher and expose it to the Executor #137
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
aakoshh
changed the title
BFT-475: Add BatchVotesWatch to the Executor
BFT-475: Add BatchVotePublisher and expose it to the Executor
Jun 26, 2024
aakoshh
force-pushed
the
bft-475-gossip-attestation
branch
from
June 26, 2024 18:47
39da8a6
to
ecda871
Compare
pompon0
approved these changes
Jun 26, 2024
16 tasks
github-merge-queue bot
pushed a commit
to matter-labs/zksync-era
that referenced
this pull request
Jul 8, 2024
## What ❔ - [x] Add an `l1_batches_consensus` table to hold [L1 batch Quorum Certificates](https://github.com/matter-labs/era-consensus/blob/177881457f392fca990dbb3df1695737d90fd0c7/node/libs/roles/src/attester/messages/batch.rs#L67) from Attesters - [x] Add attesters to the config - [x] Implement methods in `PersistentBatchStore` - [x] `persisted` - [x] `last_batch` - [x] `last_batch_qc` - [x] `get_batch` - [x] `get_batch_qc` - [x] `store_qc` - [ ] `queue_next_batch` - _not going to implement for now_ - [ ] assign `SyncBatch::proof` - _not going to implement for now_ - [x] Add tests for all new methods in `ConsensusDal` and the `PersistentBatchStore` ### Caveat Implemented the updating of `persisted` with a loop that polls the database for newly available `SyncBatch` records, even if they have no proof. This inevitably triggers the gossiping of batch statuses and the pulling of `SyncBatch` between peers. For this reason `queue_next_batch` just drop the data, since we can't do anything with it without the proof yet. Returning an error or panicking would stop the consensus tasks. I ended up disabling the `persisted` by leaving its dummy implementation in place because when enabled the full node tests keep going on forever, printing the following logs in a loop: ```console ❯ RUST_LOG=info zk test rust test_full_nodes --no-capture ... 2024-07-03T14:22:57.882784Z INFO in{addr=[::1]:53082}: zksync_consensus_network: 191: new connection 2024-07-03T14:22:57.883457Z INFO in{addr=[::1]:53082}:gossip: zksync_consensus_network::gossip::runner: 383: peer = node:public:ed25519:068ffa0b3fedbbe5c2a6da3defd26e0d084248f12bfe98db85f7785b0b08b63e 2024-07-03T14:22:57.883764Z INFO out{addr="[::1]:52998"}:gossip: zksync_consensus_network::gossip::runner: 416: peer = node:public:ed25519:7710ed90aad9f5859dfba06e13fb4e6fb0fe4d686f81f9d819464ad1fdc371bd 2024-07-03T14:22:57.886204Z INFO in{addr=[::1]:53082}:gossip: zksync_consensus_network::rpc: 222: message too large: max = 10240B, got 13773B 2024-07-03T14:22:57.886280Z INFO out{addr="[::1]:52998"}:gossip: zksync_consensus_network::rpc: 222: message too large: max = 10240B, got 13773B 2024-07-03T14:22:57.886633Z INFO in{addr=[::1]:53082}:gossip: zksync_consensus_network::rpc: 222: canceled ... 2024-07-03T14:22:57.888143Z INFO out{addr="[::1]:52998"}:gossip: zksync_consensus_network::rpc: 222: disconnected ... 2024-07-03T14:22:57.888390Z INFO zksync_consensus_network: 216: [::1]:53082: gossip.run_inbound_stream(): push_batch_store_state.: end of stream 2024-07-03T14:22:57.888446Z INFO zksync_consensus_network: 158: gossip.run_outbound_stream("[::1]:52998"): push_batch_store_state.: end of stream ``` So in the tests the message size exceeds the maximum. I think it's [hardcoded here](https://github.com/matter-labs/era-consensus/blob/decb988eb9e1a45fd5171d2cc540a360d9ca5f1f/node/actors/network/src/gossip/runner.rs#L109). Since this functionality isn't expected to work, I think we can disable it for now. ## Why ❔ The workflow of signing and submitting L1 batch certificates will be like this: 1. Data is inserted into the `l1_batches` table. 2. If the node is one of the Attesters it picks up the batch, signs and sends it to the gossip layer via matter-labs/era-consensus#137 3. The consensus collects votes about the L1 batch, and when the threshold is reached it saves the quorum certificate into Postgres 4. The node monitors Main Node (later L1) for new batch QCs and upserts them into the database (the QC can be different than what a particular node inserted based on gossip). This way a node which has been down for a period of time can backfill any QCs it missed. It is assumed that the Main Node API only serves QCs that have no gaps following them, ie. they are final - if it was L1 it wouldn't allow submissions with gaps, and this simulates that semantic. 5. The last height that doesn't have any gaps following it is used as a floor for what needs to be (re)signed and gossiped This PR supports the above workflow up to step 3. ## Checklist <!-- Check your PR fulfills the following items. --> <!-- For draft PRs check the boxes as you complete them. --> - [x] PR title corresponds to the body of PR (we generate changelog entries from PRs). - [x] Tests for the changes have been added / updated. - [x] Documentation comments have been added / updated. - [x] Code has been formatted via `zk fmt` and `zk lint`.
github-merge-queue bot
pushed a commit
to matter-labs/zksync-era
that referenced
this pull request
Jul 8, 2024
## What ❔ - [x] Add an `l1_batches_consensus` table to hold [L1 batch Quorum Certificates](https://github.com/matter-labs/era-consensus/blob/177881457f392fca990dbb3df1695737d90fd0c7/node/libs/roles/src/attester/messages/batch.rs#L67) from Attesters - [x] Add attesters to the config - [x] Implement methods in `PersistentBatchStore` - [x] `persisted` - [x] `last_batch` - [x] `last_batch_qc` - [x] `get_batch` - [x] `get_batch_qc` - [x] `store_qc` - [ ] `queue_next_batch` - _not going to implement for now_ - [ ] assign `SyncBatch::proof` - _not going to implement for now_ - [x] Add tests for all new methods in `ConsensusDal` and the `PersistentBatchStore` ### Caveat Implemented the updating of `persisted` with a loop that polls the database for newly available `SyncBatch` records, even if they have no proof. This inevitably triggers the gossiping of batch statuses and the pulling of `SyncBatch` between peers. For this reason `queue_next_batch` just drop the data, since we can't do anything with it without the proof yet. Returning an error or panicking would stop the consensus tasks. I ended up disabling the `persisted` by leaving its dummy implementation in place because when enabled the full node tests keep going on forever, printing the following logs in a loop: ```console ❯ RUST_LOG=info zk test rust test_full_nodes --no-capture ... 2024-07-03T14:22:57.882784Z INFO in{addr=[::1]:53082}: zksync_consensus_network: 191: new connection 2024-07-03T14:22:57.883457Z INFO in{addr=[::1]:53082}:gossip: zksync_consensus_network::gossip::runner: 383: peer = node:public:ed25519:068ffa0b3fedbbe5c2a6da3defd26e0d084248f12bfe98db85f7785b0b08b63e 2024-07-03T14:22:57.883764Z INFO out{addr="[::1]:52998"}:gossip: zksync_consensus_network::gossip::runner: 416: peer = node:public:ed25519:7710ed90aad9f5859dfba06e13fb4e6fb0fe4d686f81f9d819464ad1fdc371bd 2024-07-03T14:22:57.886204Z INFO in{addr=[::1]:53082}:gossip: zksync_consensus_network::rpc: 222: message too large: max = 10240B, got 13773B 2024-07-03T14:22:57.886280Z INFO out{addr="[::1]:52998"}:gossip: zksync_consensus_network::rpc: 222: message too large: max = 10240B, got 13773B 2024-07-03T14:22:57.886633Z INFO in{addr=[::1]:53082}:gossip: zksync_consensus_network::rpc: 222: canceled ... 2024-07-03T14:22:57.888143Z INFO out{addr="[::1]:52998"}:gossip: zksync_consensus_network::rpc: 222: disconnected ... 2024-07-03T14:22:57.888390Z INFO zksync_consensus_network: 216: [::1]:53082: gossip.run_inbound_stream(): push_batch_store_state.: end of stream 2024-07-03T14:22:57.888446Z INFO zksync_consensus_network: 158: gossip.run_outbound_stream("[::1]:52998"): push_batch_store_state.: end of stream ``` So in the tests the message size exceeds the maximum. I think it's [hardcoded here](https://github.com/matter-labs/era-consensus/blob/decb988eb9e1a45fd5171d2cc540a360d9ca5f1f/node/actors/network/src/gossip/runner.rs#L109). Since this functionality isn't expected to work, I think we can disable it for now. ## Why ❔ The workflow of signing and submitting L1 batch certificates will be like this: 1. Data is inserted into the `l1_batches` table. 2. If the node is one of the Attesters it picks up the batch, signs and sends it to the gossip layer via matter-labs/era-consensus#137 3. The consensus collects votes about the L1 batch, and when the threshold is reached it saves the quorum certificate into Postgres 4. The node monitors Main Node (later L1) for new batch QCs and upserts them into the database (the QC can be different than what a particular node inserted based on gossip). This way a node which has been down for a period of time can backfill any QCs it missed. It is assumed that the Main Node API only serves QCs that have no gaps following them, ie. they are final - if it was L1 it wouldn't allow submissions with gaps, and this simulates that semantic. 5. The last height that doesn't have any gaps following it is used as a floor for what needs to be (re)signed and gossiped This PR supports the above workflow up to step 3. ## Checklist <!-- Check your PR fulfills the following items. --> <!-- For draft PRs check the boxes as you complete them. --> - [x] PR title corresponds to the body of PR (we generate changelog entries from PRs). - [x] Tests for the changes have been added / updated. - [x] Documentation comments have been added / updated. - [x] Code has been formatted via `zk fmt` and `zk lint`.
github-merge-queue bot
pushed a commit
to matter-labs/zksync-era
that referenced
this pull request
Jul 8, 2024
## What ❔ - [x] Add an `l1_batches_consensus` table to hold [L1 batch Quorum Certificates](https://github.com/matter-labs/era-consensus/blob/177881457f392fca990dbb3df1695737d90fd0c7/node/libs/roles/src/attester/messages/batch.rs#L67) from Attesters - [x] Add attesters to the config - [x] Implement methods in `PersistentBatchStore` - [x] `persisted` - [x] `last_batch` - [x] `last_batch_qc` - [x] `get_batch` - [x] `get_batch_qc` - [x] `store_qc` - [ ] `queue_next_batch` - _not going to implement for now_ - [ ] assign `SyncBatch::proof` - _not going to implement for now_ - [x] Add tests for all new methods in `ConsensusDal` and the `PersistentBatchStore` ### Caveat Implemented the updating of `persisted` with a loop that polls the database for newly available `SyncBatch` records, even if they have no proof. This inevitably triggers the gossiping of batch statuses and the pulling of `SyncBatch` between peers. For this reason `queue_next_batch` just drop the data, since we can't do anything with it without the proof yet. Returning an error or panicking would stop the consensus tasks. I ended up disabling the `persisted` by leaving its dummy implementation in place because when enabled the full node tests keep going on forever, printing the following logs in a loop: ```console ❯ RUST_LOG=info zk test rust test_full_nodes --no-capture ... 2024-07-03T14:22:57.882784Z INFO in{addr=[::1]:53082}: zksync_consensus_network: 191: new connection 2024-07-03T14:22:57.883457Z INFO in{addr=[::1]:53082}:gossip: zksync_consensus_network::gossip::runner: 383: peer = node:public:ed25519:068ffa0b3fedbbe5c2a6da3defd26e0d084248f12bfe98db85f7785b0b08b63e 2024-07-03T14:22:57.883764Z INFO out{addr="[::1]:52998"}:gossip: zksync_consensus_network::gossip::runner: 416: peer = node:public:ed25519:7710ed90aad9f5859dfba06e13fb4e6fb0fe4d686f81f9d819464ad1fdc371bd 2024-07-03T14:22:57.886204Z INFO in{addr=[::1]:53082}:gossip: zksync_consensus_network::rpc: 222: message too large: max = 10240B, got 13773B 2024-07-03T14:22:57.886280Z INFO out{addr="[::1]:52998"}:gossip: zksync_consensus_network::rpc: 222: message too large: max = 10240B, got 13773B 2024-07-03T14:22:57.886633Z INFO in{addr=[::1]:53082}:gossip: zksync_consensus_network::rpc: 222: canceled ... 2024-07-03T14:22:57.888143Z INFO out{addr="[::1]:52998"}:gossip: zksync_consensus_network::rpc: 222: disconnected ... 2024-07-03T14:22:57.888390Z INFO zksync_consensus_network: 216: [::1]:53082: gossip.run_inbound_stream(): push_batch_store_state.: end of stream 2024-07-03T14:22:57.888446Z INFO zksync_consensus_network: 158: gossip.run_outbound_stream("[::1]:52998"): push_batch_store_state.: end of stream ``` So in the tests the message size exceeds the maximum. I think it's [hardcoded here](https://github.com/matter-labs/era-consensus/blob/decb988eb9e1a45fd5171d2cc540a360d9ca5f1f/node/actors/network/src/gossip/runner.rs#L109). Since this functionality isn't expected to work, I think we can disable it for now. ## Why ❔ The workflow of signing and submitting L1 batch certificates will be like this: 1. Data is inserted into the `l1_batches` table. 2. If the node is one of the Attesters it picks up the batch, signs and sends it to the gossip layer via matter-labs/era-consensus#137 3. The consensus collects votes about the L1 batch, and when the threshold is reached it saves the quorum certificate into Postgres 4. The node monitors Main Node (later L1) for new batch QCs and upserts them into the database (the QC can be different than what a particular node inserted based on gossip). This way a node which has been down for a period of time can backfill any QCs it missed. It is assumed that the Main Node API only serves QCs that have no gaps following them, ie. they are final - if it was L1 it wouldn't allow submissions with gaps, and this simulates that semantic. 5. The last height that doesn't have any gaps following it is used as a floor for what needs to be (re)signed and gossiped This PR supports the above workflow up to step 3. ## Checklist <!-- Check your PR fulfills the following items. --> <!-- For draft PRs check the boxes as you complete them. --> - [x] PR title corresponds to the body of PR (we generate changelog entries from PRs). - [x] Tests for the changes have been added / updated. - [x] Documentation comments have been added / updated. - [x] Code has been formatted via `zk fmt` and `zk lint`.
github-merge-queue bot
pushed a commit
to matter-labs/zksync-era
that referenced
this pull request
Jul 8, 2024
## What ❔ - [x] Add an `l1_batches_consensus` table to hold [L1 batch Quorum Certificates](https://github.com/matter-labs/era-consensus/blob/177881457f392fca990dbb3df1695737d90fd0c7/node/libs/roles/src/attester/messages/batch.rs#L67) from Attesters - [x] Add attesters to the config - [x] Implement methods in `PersistentBatchStore` - [x] `persisted` - [x] `last_batch` - [x] `last_batch_qc` - [x] `get_batch` - [x] `get_batch_qc` - [x] `store_qc` - [ ] `queue_next_batch` - _not going to implement for now_ - [ ] assign `SyncBatch::proof` - _not going to implement for now_ - [x] Add tests for all new methods in `ConsensusDal` and the `PersistentBatchStore` ### Caveat Implemented the updating of `persisted` with a loop that polls the database for newly available `SyncBatch` records, even if they have no proof. This inevitably triggers the gossiping of batch statuses and the pulling of `SyncBatch` between peers. For this reason `queue_next_batch` just drop the data, since we can't do anything with it without the proof yet. Returning an error or panicking would stop the consensus tasks. I ended up disabling the `persisted` by leaving its dummy implementation in place because when enabled the full node tests keep going on forever, printing the following logs in a loop: ```console ❯ RUST_LOG=info zk test rust test_full_nodes --no-capture ... 2024-07-03T14:22:57.882784Z INFO in{addr=[::1]:53082}: zksync_consensus_network: 191: new connection 2024-07-03T14:22:57.883457Z INFO in{addr=[::1]:53082}:gossip: zksync_consensus_network::gossip::runner: 383: peer = node:public:ed25519:068ffa0b3fedbbe5c2a6da3defd26e0d084248f12bfe98db85f7785b0b08b63e 2024-07-03T14:22:57.883764Z INFO out{addr="[::1]:52998"}:gossip: zksync_consensus_network::gossip::runner: 416: peer = node:public:ed25519:7710ed90aad9f5859dfba06e13fb4e6fb0fe4d686f81f9d819464ad1fdc371bd 2024-07-03T14:22:57.886204Z INFO in{addr=[::1]:53082}:gossip: zksync_consensus_network::rpc: 222: message too large: max = 10240B, got 13773B 2024-07-03T14:22:57.886280Z INFO out{addr="[::1]:52998"}:gossip: zksync_consensus_network::rpc: 222: message too large: max = 10240B, got 13773B 2024-07-03T14:22:57.886633Z INFO in{addr=[::1]:53082}:gossip: zksync_consensus_network::rpc: 222: canceled ... 2024-07-03T14:22:57.888143Z INFO out{addr="[::1]:52998"}:gossip: zksync_consensus_network::rpc: 222: disconnected ... 2024-07-03T14:22:57.888390Z INFO zksync_consensus_network: 216: [::1]:53082: gossip.run_inbound_stream(): push_batch_store_state.: end of stream 2024-07-03T14:22:57.888446Z INFO zksync_consensus_network: 158: gossip.run_outbound_stream("[::1]:52998"): push_batch_store_state.: end of stream ``` So in the tests the message size exceeds the maximum. I think it's [hardcoded here](https://github.com/matter-labs/era-consensus/blob/decb988eb9e1a45fd5171d2cc540a360d9ca5f1f/node/actors/network/src/gossip/runner.rs#L109). Since this functionality isn't expected to work, I think we can disable it for now. ## Why ❔ The workflow of signing and submitting L1 batch certificates will be like this: 1. Data is inserted into the `l1_batches` table. 2. If the node is one of the Attesters it picks up the batch, signs and sends it to the gossip layer via matter-labs/era-consensus#137 3. The consensus collects votes about the L1 batch, and when the threshold is reached it saves the quorum certificate into Postgres 4. The node monitors Main Node (later L1) for new batch QCs and upserts them into the database (the QC can be different than what a particular node inserted based on gossip). This way a node which has been down for a period of time can backfill any QCs it missed. It is assumed that the Main Node API only serves QCs that have no gaps following them, ie. they are final - if it was L1 it wouldn't allow submissions with gaps, and this simulates that semantic. 5. The last height that doesn't have any gaps following it is used as a floor for what needs to be (re)signed and gossiped This PR supports the above workflow up to step 3. ## Checklist <!-- Check your PR fulfills the following items. --> <!-- For draft PRs check the boxes as you complete them. --> - [x] PR title corresponds to the body of PR (we generate changelog entries from PRs). - [x] Tests for the changes have been added / updated. - [x] Documentation comments have been added / updated. - [x] Code has been formatted via `zk fmt` and `zk lint`.
github-merge-queue bot
pushed a commit
to matter-labs/zksync-era
that referenced
this pull request
Jul 8, 2024
## What ❔ - [x] Add an `l1_batches_consensus` table to hold [L1 batch Quorum Certificates](https://github.com/matter-labs/era-consensus/blob/177881457f392fca990dbb3df1695737d90fd0c7/node/libs/roles/src/attester/messages/batch.rs#L67) from Attesters - [x] Add attesters to the config - [x] Implement methods in `PersistentBatchStore` - [x] `persisted` - [x] `last_batch` - [x] `last_batch_qc` - [x] `get_batch` - [x] `get_batch_qc` - [x] `store_qc` - [ ] `queue_next_batch` - _not going to implement for now_ - [ ] assign `SyncBatch::proof` - _not going to implement for now_ - [x] Add tests for all new methods in `ConsensusDal` and the `PersistentBatchStore` ### Caveat Implemented the updating of `persisted` with a loop that polls the database for newly available `SyncBatch` records, even if they have no proof. This inevitably triggers the gossiping of batch statuses and the pulling of `SyncBatch` between peers. For this reason `queue_next_batch` just drop the data, since we can't do anything with it without the proof yet. Returning an error or panicking would stop the consensus tasks. I ended up disabling the `persisted` by leaving its dummy implementation in place because when enabled the full node tests keep going on forever, printing the following logs in a loop: ```console ❯ RUST_LOG=info zk test rust test_full_nodes --no-capture ... 2024-07-03T14:22:57.882784Z INFO in{addr=[::1]:53082}: zksync_consensus_network: 191: new connection 2024-07-03T14:22:57.883457Z INFO in{addr=[::1]:53082}:gossip: zksync_consensus_network::gossip::runner: 383: peer = node:public:ed25519:068ffa0b3fedbbe5c2a6da3defd26e0d084248f12bfe98db85f7785b0b08b63e 2024-07-03T14:22:57.883764Z INFO out{addr="[::1]:52998"}:gossip: zksync_consensus_network::gossip::runner: 416: peer = node:public:ed25519:7710ed90aad9f5859dfba06e13fb4e6fb0fe4d686f81f9d819464ad1fdc371bd 2024-07-03T14:22:57.886204Z INFO in{addr=[::1]:53082}:gossip: zksync_consensus_network::rpc: 222: message too large: max = 10240B, got 13773B 2024-07-03T14:22:57.886280Z INFO out{addr="[::1]:52998"}:gossip: zksync_consensus_network::rpc: 222: message too large: max = 10240B, got 13773B 2024-07-03T14:22:57.886633Z INFO in{addr=[::1]:53082}:gossip: zksync_consensus_network::rpc: 222: canceled ... 2024-07-03T14:22:57.888143Z INFO out{addr="[::1]:52998"}:gossip: zksync_consensus_network::rpc: 222: disconnected ... 2024-07-03T14:22:57.888390Z INFO zksync_consensus_network: 216: [::1]:53082: gossip.run_inbound_stream(): push_batch_store_state.: end of stream 2024-07-03T14:22:57.888446Z INFO zksync_consensus_network: 158: gossip.run_outbound_stream("[::1]:52998"): push_batch_store_state.: end of stream ``` So in the tests the message size exceeds the maximum. I think it's [hardcoded here](https://github.com/matter-labs/era-consensus/blob/decb988eb9e1a45fd5171d2cc540a360d9ca5f1f/node/actors/network/src/gossip/runner.rs#L109). Since this functionality isn't expected to work, I think we can disable it for now. ## Why ❔ The workflow of signing and submitting L1 batch certificates will be like this: 1. Data is inserted into the `l1_batches` table. 2. If the node is one of the Attesters it picks up the batch, signs and sends it to the gossip layer via matter-labs/era-consensus#137 3. The consensus collects votes about the L1 batch, and when the threshold is reached it saves the quorum certificate into Postgres 4. The node monitors Main Node (later L1) for new batch QCs and upserts them into the database (the QC can be different than what a particular node inserted based on gossip). This way a node which has been down for a period of time can backfill any QCs it missed. It is assumed that the Main Node API only serves QCs that have no gaps following them, ie. they are final - if it was L1 it wouldn't allow submissions with gaps, and this simulates that semantic. 5. The last height that doesn't have any gaps following it is used as a floor for what needs to be (re)signed and gossiped This PR supports the above workflow up to step 3. ## Checklist <!-- Check your PR fulfills the following items. --> <!-- For draft PRs check the boxes as you complete them. --> - [x] PR title corresponds to the body of PR (we generate changelog entries from PRs). - [x] Tests for the changes have been added / updated. - [x] Documentation comments have been added / updated. - [x] Code has been formatted via `zk fmt` and `zk lint`. --------- Co-authored-by: Bruno França <[email protected]>
github-merge-queue bot
pushed a commit
to matter-labs/zksync-era
that referenced
this pull request
Jul 9, 2024
## What ❔ - [x] Add an `l1_batches_consensus` table to hold [L1 batch Quorum Certificates](https://github.com/matter-labs/era-consensus/blob/177881457f392fca990dbb3df1695737d90fd0c7/node/libs/roles/src/attester/messages/batch.rs#L67) from Attesters - [x] Add attesters to the config - [x] Implement methods in `PersistentBatchStore` - [x] `persisted` - [x] `last_batch` - [x] `last_batch_qc` - [x] `get_batch` - [x] `get_batch_qc` - [x] `store_qc` - [ ] `queue_next_batch` - _not going to implement for now_ - [ ] assign `SyncBatch::proof` - _not going to implement for now_ - [x] Add tests for all new methods in `ConsensusDal` and the `PersistentBatchStore` ### Caveat Implemented the updating of `persisted` with a loop that polls the database for newly available `SyncBatch` records, even if they have no proof. This inevitably triggers the gossiping of batch statuses and the pulling of `SyncBatch` between peers. For this reason `queue_next_batch` just drop the data, since we can't do anything with it without the proof yet. Returning an error or panicking would stop the consensus tasks. I ended up disabling the `persisted` by leaving its dummy implementation in place because when enabled the full node tests keep going on forever, printing the following logs in a loop: ```console ❯ RUST_LOG=info zk test rust test_full_nodes --no-capture ... 2024-07-03T14:22:57.882784Z INFO in{addr=[::1]:53082}: zksync_consensus_network: 191: new connection 2024-07-03T14:22:57.883457Z INFO in{addr=[::1]:53082}:gossip: zksync_consensus_network::gossip::runner: 383: peer = node:public:ed25519:068ffa0b3fedbbe5c2a6da3defd26e0d084248f12bfe98db85f7785b0b08b63e 2024-07-03T14:22:57.883764Z INFO out{addr="[::1]:52998"}:gossip: zksync_consensus_network::gossip::runner: 416: peer = node:public:ed25519:7710ed90aad9f5859dfba06e13fb4e6fb0fe4d686f81f9d819464ad1fdc371bd 2024-07-03T14:22:57.886204Z INFO in{addr=[::1]:53082}:gossip: zksync_consensus_network::rpc: 222: message too large: max = 10240B, got 13773B 2024-07-03T14:22:57.886280Z INFO out{addr="[::1]:52998"}:gossip: zksync_consensus_network::rpc: 222: message too large: max = 10240B, got 13773B 2024-07-03T14:22:57.886633Z INFO in{addr=[::1]:53082}:gossip: zksync_consensus_network::rpc: 222: canceled ... 2024-07-03T14:22:57.888143Z INFO out{addr="[::1]:52998"}:gossip: zksync_consensus_network::rpc: 222: disconnected ... 2024-07-03T14:22:57.888390Z INFO zksync_consensus_network: 216: [::1]:53082: gossip.run_inbound_stream(): push_batch_store_state.: end of stream 2024-07-03T14:22:57.888446Z INFO zksync_consensus_network: 158: gossip.run_outbound_stream("[::1]:52998"): push_batch_store_state.: end of stream ``` So in the tests the message size exceeds the maximum. I think it's [hardcoded here](https://github.com/matter-labs/era-consensus/blob/decb988eb9e1a45fd5171d2cc540a360d9ca5f1f/node/actors/network/src/gossip/runner.rs#L109). Since this functionality isn't expected to work, I think we can disable it for now. ## Why ❔ The workflow of signing and submitting L1 batch certificates will be like this: 1. Data is inserted into the `l1_batches` table. 2. If the node is one of the Attesters it picks up the batch, signs and sends it to the gossip layer via matter-labs/era-consensus#137 3. The consensus collects votes about the L1 batch, and when the threshold is reached it saves the quorum certificate into Postgres 4. The node monitors Main Node (later L1) for new batch QCs and upserts them into the database (the QC can be different than what a particular node inserted based on gossip). This way a node which has been down for a period of time can backfill any QCs it missed. It is assumed that the Main Node API only serves QCs that have no gaps following them, ie. they are final - if it was L1 it wouldn't allow submissions with gaps, and this simulates that semantic. 5. The last height that doesn't have any gaps following it is used as a floor for what needs to be (re)signed and gossiped This PR supports the above workflow up to step 3. ## Checklist <!-- Check your PR fulfills the following items. --> <!-- For draft PRs check the boxes as you complete them. --> - [x] PR title corresponds to the body of PR (we generate changelog entries from PRs). - [x] Tests for the changes have been added / updated. - [x] Documentation comments have been added / updated. - [x] Code has been formatted via `zk fmt` and `zk lint`. --------- Co-authored-by: Bruno França <[email protected]>
irnb
pushed a commit
to vianetwork/via-server
that referenced
this pull request
Jul 12, 2024
## What ❔ - [x] Add an `l1_batches_consensus` table to hold [L1 batch Quorum Certificates](https://github.com/matter-labs/era-consensus/blob/177881457f392fca990dbb3df1695737d90fd0c7/node/libs/roles/src/attester/messages/batch.rs#L67) from Attesters - [x] Add attesters to the config - [x] Implement methods in `PersistentBatchStore` - [x] `persisted` - [x] `last_batch` - [x] `last_batch_qc` - [x] `get_batch` - [x] `get_batch_qc` - [x] `store_qc` - [ ] `queue_next_batch` - _not going to implement for now_ - [ ] assign `SyncBatch::proof` - _not going to implement for now_ - [x] Add tests for all new methods in `ConsensusDal` and the `PersistentBatchStore` ### Caveat Implemented the updating of `persisted` with a loop that polls the database for newly available `SyncBatch` records, even if they have no proof. This inevitably triggers the gossiping of batch statuses and the pulling of `SyncBatch` between peers. For this reason `queue_next_batch` just drop the data, since we can't do anything with it without the proof yet. Returning an error or panicking would stop the consensus tasks. I ended up disabling the `persisted` by leaving its dummy implementation in place because when enabled the full node tests keep going on forever, printing the following logs in a loop: ```console ❯ RUST_LOG=info zk test rust test_full_nodes --no-capture ... 2024-07-03T14:22:57.882784Z INFO in{addr=[::1]:53082}: zksync_consensus_network: 191: new connection 2024-07-03T14:22:57.883457Z INFO in{addr=[::1]:53082}:gossip: zksync_consensus_network::gossip::runner: 383: peer = node:public:ed25519:068ffa0b3fedbbe5c2a6da3defd26e0d084248f12bfe98db85f7785b0b08b63e 2024-07-03T14:22:57.883764Z INFO out{addr="[::1]:52998"}:gossip: zksync_consensus_network::gossip::runner: 416: peer = node:public:ed25519:7710ed90aad9f5859dfba06e13fb4e6fb0fe4d686f81f9d819464ad1fdc371bd 2024-07-03T14:22:57.886204Z INFO in{addr=[::1]:53082}:gossip: zksync_consensus_network::rpc: 222: message too large: max = 10240B, got 13773B 2024-07-03T14:22:57.886280Z INFO out{addr="[::1]:52998"}:gossip: zksync_consensus_network::rpc: 222: message too large: max = 10240B, got 13773B 2024-07-03T14:22:57.886633Z INFO in{addr=[::1]:53082}:gossip: zksync_consensus_network::rpc: 222: canceled ... 2024-07-03T14:22:57.888143Z INFO out{addr="[::1]:52998"}:gossip: zksync_consensus_network::rpc: 222: disconnected ... 2024-07-03T14:22:57.888390Z INFO zksync_consensus_network: 216: [::1]:53082: gossip.run_inbound_stream(): push_batch_store_state.: end of stream 2024-07-03T14:22:57.888446Z INFO zksync_consensus_network: 158: gossip.run_outbound_stream("[::1]:52998"): push_batch_store_state.: end of stream ``` So in the tests the message size exceeds the maximum. I think it's [hardcoded here](https://github.com/matter-labs/era-consensus/blob/decb988eb9e1a45fd5171d2cc540a360d9ca5f1f/node/actors/network/src/gossip/runner.rs#L109). Since this functionality isn't expected to work, I think we can disable it for now. ## Why ❔ The workflow of signing and submitting L1 batch certificates will be like this: 1. Data is inserted into the `l1_batches` table. 2. If the node is one of the Attesters it picks up the batch, signs and sends it to the gossip layer via matter-labs/era-consensus#137 3. The consensus collects votes about the L1 batch, and when the threshold is reached it saves the quorum certificate into Postgres 4. The node monitors Main Node (later L1) for new batch QCs and upserts them into the database (the QC can be different than what a particular node inserted based on gossip). This way a node which has been down for a period of time can backfill any QCs it missed. It is assumed that the Main Node API only serves QCs that have no gaps following them, ie. they are final - if it was L1 it wouldn't allow submissions with gaps, and this simulates that semantic. 5. The last height that doesn't have any gaps following it is used as a floor for what needs to be (re)signed and gossiped This PR supports the above workflow up to step 3. ## Checklist <!-- Check your PR fulfills the following items. --> <!-- For draft PRs check the boxes as you complete them. --> - [x] PR title corresponds to the body of PR (we generate changelog entries from PRs). - [x] Tests for the changes have been added / updated. - [x] Documentation comments have been added / updated. - [x] Code has been formatted via `zk fmt` and `zk lint`. --------- Co-authored-by: Bruno França <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What ❔
Adds a
BatchVotePublisher
which will be used in theExecutor
to push L1 batch votes to the gossip layer.Why ❔
So that we can insert the nodes attestations over L1 batches into vote collection, which automatically gets gossiped to connected peers in
Network::run_stream
.