feat: implement `anvil_dumpState`/`anvil_loadState` API endpoints #418

itegulov · 2024-11-25T05:44:38Z

We can likely reuse snapshot logic for this. Ideally #414 should be done before this is worked on to avoid changes in serialization format (although likely we don't want to provide any cross-version guarantees here).

#[rpc(name = "anvil_dumpState")]
fn dump_state(&self, preserve_historical_states: Option<bool>) -> RpcResult<Bytes>;

#[rpc(name = "anvil_loadState")]
fn load_state(&self, bytes: Bytes) -> RpcResult<bool>;

UPD: After spending some time implementing this I have realized the semantics for these methods is different from what I expected:

anvil_dumpState only dumps storage data about accounts specifically
anvil_loadState is supposed to be additive to the existing state (while overriding conflicting keys/txs/blocks)

In short, we can't reuse snapshot logic at all

The text was updated successfully, but these errors were encountered:

itegulov · 2024-11-28T08:22:54Z

I am moving this back to backlog as there is some ambiguity on how this should work and I don't want to make anything half-baked that we will then have to support for the foreseeable future.

POC branch here.

Observations

This section contains some observations I have made about upstream anvil while working on this issue.

Foundry's anvil_dumpState dumps storage keys in account-state pairs. Example:

{
  "accounts": {
    "0x0000000000000000000000000000000000000000": {
      "nonce": 0,
      "balance": "0xa1f24a420c00",
      "code": "0x",
      "storage": {}
    },
    "0x5fbdb2315678afecb367f032d93f642f64180aa3": {
      "nonce": 1,
      "balance": "0x0",
      "code": "0x608060405234801561001057600080fd5b50600436106100365760003560e01c80630c55699c1461003b578063371303c014610059575b600080fd5b610043610063565b604051610050919061009b565b60405180910390f35b610061610069565b005b60005481565b60008081548092919061007b906100e5565b9190505550565b6000819050919050565b61009581610082565b82525050565b60006020820190506100b0600083018461008c565b92915050565b7f4e487b7100000000000000000000000000000000000000000000000000000000600052601160045260246000fd5b60006100f082610082565b91507fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff8203610122576101216100b6565b5b60018201905091905056fea2646970667358221220ce715c269a6493f974b0599f932508687267018b4e3881d45acf952eac8aa45364736f6c63430008110033",
      "storage": {
        "0x0": "0x2"
      }
    }
  }
}

anvil_loadState does not simply overwrite state data -- instead it appends the provided data to the existing state, merging it in the process (with provided data being prioritized in case of a conflict).

Questions

Question that came up while implementing this:

Should we be able to accept upstream anvil's state format or use purely our format (i.e. you can only dump/load state in-between two era-test-nodes)? Ideally former but my feeling is that it will be very difficult to achieve (if even possible).
Should we adopt anvil's approach to exporting account-state key pairs? It works for them but I think not for us as our storage has a lot of extra VM-specific stuff in it. Still, it might be necessary to do so otherwise I am not sure if we will ever be able to merge two different states together. If we don't adopt it, then how do we merge two storage states in a way that makes sense?
What guarantees do we want to provide to users when they interact with dump/load state across different era-test-node versions? Especially if we go with our own format in option 1 we might not get it 100% right from the get go. It's good to establish expectations early IMHO.

itegulov · 2024-11-28T08:23:35Z

@dutterbutter @popzxc low priority but just wanted to update you guys on the status of this and that there is some ambiguity here ^

dutterbutter · 2024-11-28T16:47:00Z

My general thoughts, helpful or not lol :

As you mentioned, it would be ideal to use upstream Anvil's state format. However, if this proves infeasible or introduces significant overhead, adopting our own format is more then reasonable.

In my view, the interfaces with Anvil (e.g., API and CLI) should aim for as much parity as possible. That said, some differences in behaviour and output are inevitable due to our unique differences. We should document these divergences so its clear for devs.

I am not familiar with what vm specific stuff would be included so not too sure how helpful my thoughts here would be, but generally I think its fine to adopt our own approach for the same reason I mention above.
Is it possible to embed metadata that includes era-test-node versions to assist here?

popzxc · 2024-11-29T06:00:02Z

Should we be able to accept upstream anvil's state format

Anvil state includes bytecode, right? So I believe it's meaningless for us until we have ZK OS. So it's fine to have different state formats. And to this point, incremental improvements are better than no improvements.

On 2 & 3 -- I believe that our state is significantly different from L1 state (e.g. nonces are stored in a system contract, ETH balance is a balance on contract, etc). Probably I'd say that for now let's implement a custom format that supports versioning, and for now for simplicity dump state just as a flat set of storage keys to storage value mappings. Something like:

{
  "state_version": "1.0.0",
  "state": {
      "0x0000000000000000000000000000000000000000": "0x0000000000000000000000000000000000000000",
      "0x0000000000000000000000000000000000000001": "0x0000000000000000000000000000000000000002",
      "0x0000000000000000000000000000000000000003": "0x0000000000000000000000000000000000000004",
      "0x0000000000000000000000000000000000000005": "0x0000000000000000000000000000000000000006",
      "0x0000000000000000000000000000000000000007": "0x0000000000000000000000000000000000000008",
   }
}

Then merging is trivial: we just insert new keys there. Given that we don't care about proving, we probably don't care about thins like initial writes etc.

popzxc · 2024-11-29T06:02:54Z

If it's easily doable, we can make it a bit more complex so that people can actually read the state, e.g.:

{
  "state_version": "1.0.0",
  "state": {
      // Key is the address that owns the storage. It doesn't mean anything for loading process, it's just a hint for people:
      // e.g. if they want to manually edit the state file, they can look for the address of a particular contract
      // and figure out the relevant storage slot
      "0x0000000000000000000000000000000000000000": {
         "0x0000000000000000000000000000000000000000": "0x0000000000000000000000000000000000000000",
         "0x0000000000000000000000000000000000000001": "0x0000000000000000000000000000000000000002",
      },
      "0x0000000000000000000000000000000000000002": {
         "0x0000000000000000000000000000000000000003": "0x0000000000000000000000000000000000000004",
         "0x0000000000000000000000000000000000000005": "0x0000000000000000000000000000000000000006",
         "0x0000000000000000000000000000000000000007": "0x0000000000000000000000000000000000000008",
      }
   }
}

itegulov · 2024-11-29T09:27:57Z

Anvil state includes bytecode, right? So I believe it's meaningless for us until we have ZK OS.

Ah, good point!

Then merging is trivial: we just insert new keys there. Given that we don't care about proving, we probably don't care about thins like initial writes etc.

Correct me if I am wrong but some parts of system contract's storage has information about things like current block, batch and potentially other things.

Let's say we dump a node with 100 blocks and try to load it into another node with 1000 blocks: first 100 blocks will be overwritten as expected but also system contract's block/batch number will be overwritten as well. Nothing unsolvable of course, we just can't do merging in the most naive way I think.

If it's easily doable, we can make it a bit more complex so that people can actually read the state:

That would be ideal and I think this is achievable with some effort but need a lot of refactoring on the storage layer.

popzxc · 2024-11-29T10:22:54Z

Correct me if I am wrong but some parts of system contract's storage has information about things like current block, batch and potentially other things.

I may be completely wrong here, but from what I remember, we load this information right to the bootloader memory when executing batch (e.g. things like block.timestamp read from the VM memory, not from the contract). We obviously need to check it more thoroughly, but I think that things shouldn't be majorly broken.

One thing that is stored in system contracts is nonces, but by looking at the anvil state example you provided, I believe nonces can be overridden in anvil too, so it's not that big of an issue.

(but again -- I may be terribly wrong)

That would be ideal and I think this is achievable with some effort but need a lot of refactoring on the storage layer.

Makes sense. Well, I guess with versioning supported for storage format it shouldn't be an issue to add it later.

itegulov self-assigned this Nov 27, 2024

itegulov mentioned this issue Dec 6, 2024

feat: add anvil_dumpState/anvil_loadState #476

Merged

itegulov closed this as completed in #476 Dec 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement `anvil_dumpState`/`anvil_loadState` API endpoints #418

feat: implement `anvil_dumpState`/`anvil_loadState` API endpoints #418

itegulov commented Nov 25, 2024 •

edited

Loading

itegulov commented Nov 28, 2024

itegulov commented Nov 28, 2024

dutterbutter commented Nov 28, 2024

popzxc commented Nov 29, 2024 •

edited

Loading

popzxc commented Nov 29, 2024

itegulov commented Nov 29, 2024

popzxc commented Nov 29, 2024

feat: implement anvil_dumpState/anvil_loadState API endpoints #418

feat: implement anvil_dumpState/anvil_loadState API endpoints #418

Comments

itegulov commented Nov 25, 2024 • edited Loading

itegulov commented Nov 28, 2024

Observations

Questions

itegulov commented Nov 28, 2024

dutterbutter commented Nov 28, 2024

popzxc commented Nov 29, 2024 • edited Loading

popzxc commented Nov 29, 2024

itegulov commented Nov 29, 2024

popzxc commented Nov 29, 2024

feat: implement `anvil_dumpState`/`anvil_loadState` API endpoints #418

feat: implement `anvil_dumpState`/`anvil_loadState` API endpoints #418

itegulov commented Nov 25, 2024 •

edited

Loading

popzxc commented Nov 29, 2024 •

edited

Loading