Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(tee_verifier): correctly initialize storage for re-execution #3017

Merged
merged 48 commits into from
Oct 17, 2024

Conversation

haraldh
Copy link
Collaborator

@haraldh haraldh commented Oct 4, 2024

What ❔

With this commit, the TEE verifier uses WitnessStorageState of VMRunWitnessInputData to initialize the storage. This requires waiting for the BasicWitnessInputProducer to complete and therefore the TEE verifier input producer can be removed. The input for the TEE verifier is now collected in the proof_data_handler, which enables to remove the whole job queue for the TEE verifier input producer.

Why ❔

Previously the storage for VM re-execution was initialized just from WitnessInputMerklePaths. This although misses the storage values for slots, which are only read/written to by rolled back transactions. This led to failed verification of blocks, which would normally pass.

Checklist

  • PR title corresponds to the body of PR (we generate changelog entries from PRs).
  • Tests for the changes have been added / updated.
  • Documentation comments have been added / updated.
  • Code has been formatted via zk_supervisor fmt and zk_supervisor lint.

@haraldh haraldh requested review from pbeza, popzxc, slowli and EmilLuta and removed request for popzxc October 4, 2024 14:09
@haraldh haraldh force-pushed the tee_prover_new_2 branch 2 times, most recently from ef9a41d to 3f93bdd Compare October 4, 2024 14:26
@haraldh haraldh marked this pull request as ready for review October 4, 2024 14:28
Previously the storage for VM re-execution was initialized just from `WitnessInputMerklePaths`.
This although misses the storage values for slots, which are only read/written to by rolled back transactions.

With this commit, the TEE verifier uses `WitnessStorageState` of `VMRunWitnessInputData` to initialize the storage.
This requires waiting for the BasicWitnessInputProducer to complete
and therefore the TEE verifier input producer can be removed.
The input for the TEE verifier is now collected in the `proof_data_handler`, which enables to remove
the whole job queue for the TEE verifier input producer.

Co-authored-by: Patrick Beza <[email protected]>
Signed-off-by: Harald Hoyer <[email protected]>
simplify empty `tee_proofs` case, but pre-filter with `tee_type` to exclude other TEE techs.

Signed-off-by: Harald Hoyer <[email protected]>
Signed-off-by: Harald Hoyer <[email protected]>
correct leftovers from debug/merge

Signed-off-by: Harald Hoyer <[email protected]>
Copy link
Contributor

@EmilLuta EmilLuta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left my concerns on the current implementation. If there's a doc for me to understand what's going on, that'd be great, could provide better input. Otherwise, maybe writing it down (maybe as part of PR?) would help moving this forward.

core/bin/zksync_server/src/main.rs Show resolved Hide resolved
core/lib/dal/src/tee_proof_generation_dal.rs Outdated Show resolved Hide resolved
core/lib/object_store/src/file.rs Show resolved Hide resolved
core/lib/prover_interface/src/inputs.rs Show resolved Hide resolved
core/node/vm_runner/src/impls/bwip.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@pbeza pbeza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably remove this function altogether:

pub async fn insert_tee_proof_generation_job(
&mut self,
batch_number: L1BatchNumber,
tee_type: TeeType,
) -> DalResult<()> {
let batch_number = i64::from(batch_number.0);
let query = sqlx::query!(
r#"
INSERT INTO
tee_proof_generation_details (
l1_batch_number, tee_type, status, created_at, updated_at
)
VALUES
($1, $2, $3, NOW(), NOW())
ON CONFLICT (l1_batch_number, tee_type) DO NOTHING
"#,
batch_number,
tee_type.to_string(),
TeeProofGenerationJobStatus::Unpicked.to_string(),
);
let instrumentation = Instrumented::new("insert_tee_proof_generation_job")
.with_arg("l1_batch_number", &batch_number)
.with_arg("tee_type", &tee_type);
instrumentation
.clone()
.with(query)
.execute(self.storage)
.await?;
Ok(())
}

You are now inserting new entries in the lock_batch_for_proving function instead:

INSERT INTO
tee_proof_generation_details (
l1_batch_number, tee_type, status, created_at, updated_at, prover_taken_at
)
SELECT
l1_batch_number,
$1,
$2,
NOW(),
NOW(),
NOW()
FROM
upsert
ON CONFLICT (l1_batch_number, tee_type) DO
UPDATE
SET
status = $2,
updated_at = NOW(),
prover_taken_at = NOW()
RETURNING
l1_batch_number
"#,
tee_type.to_string(),
TeeProofGenerationJobStatus::PickedByProver.to_string(),
TeeProofGenerationJobStatus::Unpicked.to_string(),
processing_timeout,
min_batch_number

Update

You may consider merging my PR (#3037) that is addressing the above mentioned issue.

pbeza and others added 3 commits October 9, 2024 19:15
This makes handling large tables a lot more performant

Signed-off-by: Harald Hoyer <[email protected]>
EmilLuta
EmilLuta previously approved these changes Oct 16, 2024
EmilLuta
EmilLuta previously approved these changes Oct 16, 2024
Copy link
Contributor

@EmilLuta EmilLuta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rubber stamping, as long as @slowli's comments have been addressed, good with me.

slowli
slowli previously approved these changes Oct 16, 2024
core/node/proof_data_handler/src/errors.rs Outdated Show resolved Hide resolved
core/node/proof_data_handler/src/errors.rs Outdated Show resolved Hide resolved
core/node/proof_data_handler/src/errors.rs Outdated Show resolved Hide resolved
@pbeza
Copy link
Collaborator

pbeza commented Oct 16, 2024

I think core/lib/dal/doc/TeeProofGenerationDal.md got outdated.

@haraldh haraldh requested a review from slowli October 17, 2024 10:58
@haraldh haraldh added this pull request to the merge queue Oct 17, 2024
github-merge-queue bot pushed a commit that referenced this pull request Oct 17, 2024
## What ❔

With this commit, the TEE verifier uses `WitnessStorageState` of
`VMRunWitnessInputData` to initialize the storage. This requires waiting
for the BasicWitnessInputProducer to complete and therefore the TEE
verifier input producer can be removed. The input for the TEE verifier
is now collected in the `proof_data_handler`, which enables to remove
the whole job queue for the TEE verifier input producer.

## Why ❔

Previously the storage for VM re-execution was initialized just from
`WitnessInputMerklePaths`. This although misses the storage values for
slots, which are only read/written to by rolled back transactions. This
led to failed verification of blocks, which would normally pass.

## Checklist

<!-- Check your PR fulfills the following items. -->
<!-- For draft PRs check the boxes as you complete them. -->

- [x] PR title corresponds to the body of PR (we generate changelog
entries from PRs).
- [x] Tests for the changes have been added / updated.
- [ ] Documentation comments have been added / updated.
- [x] Code has been formatted via `zk_supervisor fmt` and `zk_supervisor
lint`.

---------

Signed-off-by: Harald Hoyer <[email protected]>
Co-authored-by: Patrick Beza <[email protected]>
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 17, 2024
@haraldh haraldh added this pull request to the merge queue Oct 17, 2024
Merged via the queue into main with commit 9d88373 Oct 17, 2024
43 checks passed
@haraldh haraldh deleted the tee_prover_new_2 branch October 17, 2024 13:35
github-merge-queue bot pushed a commit that referenced this pull request Oct 23, 2024
🤖 I have created a release *beep* *boop*
---


##
[25.0.0](core-v24.29.0...core-v25.0.0)
(2024-10-23)


### ⚠ BREAKING CHANGES

* **contracts:** integrate protocol defense changes
([#2737](#2737))

### Features

* Add CoinMarketCap external API
([#2971](#2971))
([c1cb30e](c1cb30e))
* **api:** Implement eth_maxPriorityFeePerGas
([#3135](#3135))
([35e84cc](35e84cc))
* **api:** Make acceptable values cache lag configurable
([#3028](#3028))
([6747529](6747529))
* **contracts:** integrate protocol defense changes
([#2737](#2737))
([c60a348](c60a348))
* **external-node:** save protocol version before opening a batch
([#3136](#3136))
([d6de4f4](d6de4f4))
* Prover e2e test
([#2975](#2975))
([0edd796](0edd796))
* **prover:** Add min_provers and dry_run features. Improve metrics and
test. ([#3129](#3129))
([7c28964](7c28964))
* **tee_verifier:** speedup SQL query for new jobs
([#3133](#3133))
([30ceee8](30ceee8))
* vm2 tracers can access storage
([#3114](#3114))
([e466b52](e466b52))
* **vm:** Return compressed bytecodes from `push_transaction()`
([#3126](#3126))
([37f209f](37f209f))


### Bug Fixes

* **call_tracer:** Flat call tracer fixes for blocks
([#3095](#3095))
([30ddb29](30ddb29))
* **consensus:** preventing config update reverts
([#3148](#3148))
([caee55f](caee55f))
* **en:** Return `SyncState` health check
([#3142](#3142))
([abeee81](abeee81))
* **external-node:** delete empty unsealed batch on EN initialization
([#3125](#3125))
([5d5214b](5d5214b))
* Fix counter metric type to be Counter.
([#3153](#3153))
([08a3fe7](08a3fe7))
* **mempool:** minor mempool improvements
([#3113](#3113))
([cd16083](cd16083))
* **prover:** Run for zero queue to allow scaling down to 0
([#3115](#3115))
([bbe1919](bbe1919))
* restore instruction count functionality
([#3081](#3081))
([6159f75](6159f75))
* **state-keeper:** save call trace for upgrade txs
([#3132](#3132))
([e1c363f](e1c363f))
* **tee_prover:** add zstd compression
([#3144](#3144))
([7241ae1](7241ae1))
* **tee_verifier:** correctly initialize storage for re-execution
([#3017](#3017))
([9d88373](9d88373))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: zksync-era-bot <[email protected]>
github-merge-queue bot pushed a commit that referenced this pull request Oct 31, 2024
🤖 I have created a release *beep* *boop*
---


##
[16.6.0](prover-v16.5.0...prover-v16.6.0)
(2024-10-31)


### Features

* (DB migration) Rename recursion_scheduler_level_vk_hash to
snark_wrapper_vk_hash
([#2809](#2809))
([64f9551](64f9551))
* Add initial version prover_autoscaler
([#2993](#2993))
([ebf9604](ebf9604))
* added seed_peers to consensus global config
([#2920](#2920))
([e9d1d90](e9d1d90))
* attester committees data extractor (BFT-434)
([#2684](#2684))
([92dde03](92dde03))
* Bump crypto and protocol deps
([#2825](#2825))
([a5ffaf1](a5ffaf1))
* **circuit_prover:** Add circuit prover
([#2908](#2908))
([48317e6](48317e6))
* **consensus:** Support for syncing blocks before consensus genesis
over p2p network
([#3040](#3040))
([d3edc3d](d3edc3d))
* **da-clients:** add secrets
([#2954](#2954))
([f4631e4](f4631e4))
* gateway preparation
([#3006](#3006))
([16f2757](16f2757))
* Integrate tracers and implement circuits tracer in vm2
([#2653](#2653))
([87b02e3](87b02e3))
* Move prover data to
/home/popzxc/workspace/current/zksync-era/prover/data
([#2778](#2778))
([62e4d46](62e4d46))
* Prover e2e test
([#2975](#2975))
([0edd796](0edd796))
* **prover:** add CLI option to run prover with max allocation
([#2794](#2794))
([35e4cae](35e4cae))
* **prover:** Add endpoint to PJM to get queue reports
([#2918](#2918))
([2cec83f](2cec83f))
* **prover:** Add error to panic message of prover
([#2807](#2807))
([6e057eb](6e057eb))
* **prover:** Add min_provers and dry_run features. Improve metrics and
test. ([#3129](#3129))
([7c28964](7c28964))
* **prover:** Add scale failure events watching and pods eviction.
([#3175](#3175))
([dd166f8](dd166f8))
* **prover:** Add sending scale requests for Scaler targets
([#3194](#3194))
([767c5bc](767c5bc))
* **prover:** Add support for scaling WGs and compressor
([#3179](#3179))
([c41db9e](c41db9e))
* **prover:** Autoscaler sends scale request to appropriate agents.
([#3150](#3150))
([bfedac0](bfedac0))
* **prover:** Extract keystore into a separate crate
([#2797](#2797))
([e239260](e239260))
* **prover:** Optimize setup keys loading
([#2847](#2847))
([19887ef](19887ef))
* **prover:** Refactor WitnessGenerator
([#2845](#2845))
([934634b](934634b))
* **prover:** Update witness generator to zkevm_test_harness 0.150.6
([#3029](#3029))
([2151c28](2151c28))
* **prover:** Use query macro instead string literals for queries
([#2930](#2930))
([1cf959d](1cf959d))
* **prover:** WG refactoring
[#3](#3)
([#2942](#2942))
([df68762](df68762))
* **prover:** WitnessGenerator refactoring
[#2](#2)
([#2899](#2899))
([36e5340](36e5340))
* Refactor metrics/make API use binaries
([#2735](#2735))
([8ed086a](8ed086a))
* Remove prover db from house keeper
([#2795](#2795))
([85b7346](85b7346))
* **tee:** use hex serialization for RPC responses
([#2887](#2887))
([abe0440](abe0440))
* **utils:** Rework locate_workspace, introduce Workspace type
([#2830](#2830))
([d256092](d256092))
* vm2 tracers can access storage
([#3114](#3114))
([e466b52](e466b52))
* **vm:** Do not panic on VM divergence
([#2705](#2705))
([7aa5721](7aa5721))
* **vm:** EVM emulator support – base
([#2979](#2979))
([deafa46](deafa46))
* **vm:** Extract batch executor to separate crate
([#2702](#2702))
([b82dfa4](b82dfa4))
* **zk_toolbox:** `zk_supervisor prover` subcommand
([#2820](#2820))
([3506731](3506731))
* **zk_toolbox:** Add external_node consensus support
([#2821](#2821))
([4a10d7d](4a10d7d))
* **zk_toolbox:** Add SQL format for zk supervisor
([#2950](#2950))
([540e5d7](540e5d7))
* **zk_toolbox:** deploy legacy bridge
([#2837](#2837))
([93b4e08](93b4e08))
* **zk_toolbox:** Redesign zk_toolbox commands
([#3003](#3003))
([114834f](114834f))
* **zkstack_cli:** Build dependencies at zkstack build time
([#3157](#3157))
([724d9a9](724d9a9))


### Bug Fixes

* allow compilation under current toolchain
([#3176](#3176))
([89eadd3](89eadd3))
* **api:** Return correct flat call tracer
([#2917](#2917))
([218646a](218646a))
* count SECP256 precompile to account validation gas limit as well
([#2859](#2859))
([fee0c2a](fee0c2a))
* Fix Doc lint.
([#3158](#3158))
([c79949b](c79949b))
* ignore unknown fields in rpc json response
([#2962](#2962))
([692ea73](692ea73))
* **prover:** Do not exit on missing watcher data.
([#3119](#3119))
([76ed6d9](76ed6d9))
* **prover:** fix setup_metadata_to_setup_data_key
([#2875](#2875))
([4ae5a93](4ae5a93))
* **prover:** Run for zero queue to allow scaling down to 0
([#3115](#3115))
([bbe1919](bbe1919))
* **tee_verifier:** correctly initialize storage for re-execution
([#3017](#3017))
([9d88373](9d88373))
* **vm:** Prepare new VM for use in API server and fix divergences
([#2994](#2994))
([741b77e](741b77e))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants