Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement anvil_dumpState/anvil_loadState API endpoints #418

Closed
itegulov opened this issue Nov 25, 2024 · 7 comments · Fixed by #476
Closed

feat: implement anvil_dumpState/anvil_loadState API endpoints #418

itegulov opened this issue Nov 25, 2024 · 7 comments · Fixed by #476
Assignees

Comments

@itegulov
Copy link
Contributor

itegulov commented Nov 25, 2024

We can likely reuse snapshot logic for this. Ideally #414 should be done before this is worked on to avoid changes in serialization format (although likely we don't want to provide any cross-version guarantees here).

#[rpc(name = "anvil_dumpState")]
fn dump_state(&self, preserve_historical_states: Option<bool>) -> RpcResult<Bytes>;

#[rpc(name = "anvil_loadState")]
fn load_state(&self, bytes: Bytes) -> RpcResult<bool>;

UPD: After spending some time implementing this I have realized the semantics for these methods is different from what I expected:

  • anvil_dumpState only dumps storage data about accounts specifically
  • anvil_loadState is supposed to be additive to the existing state (while overriding conflicting keys/txs/blocks)

In short, we can't reuse snapshot logic at all

@itegulov itegulov self-assigned this Nov 27, 2024
@itegulov
Copy link
Contributor Author

I am moving this back to backlog as there is some ambiguity on how this should work and I don't want to make anything half-baked that we will then have to support for the foreseeable future.

POC branch here.

Observations

This section contains some observations I have made about upstream anvil while working on this issue.

Foundry's anvil_dumpState dumps storage keys in account-state pairs. Example:

{
  "accounts": {
    "0x0000000000000000000000000000000000000000": {
      "nonce": 0,
      "balance": "0xa1f24a420c00",
      "code": "0x",
      "storage": {}
    },
    "0x5fbdb2315678afecb367f032d93f642f64180aa3": {
      "nonce": 1,
      "balance": "0x0",
      "code": "0x608060405234801561001057600080fd5b50600436106100365760003560e01c80630c55699c1461003b578063371303c014610059575b600080fd5b610043610063565b604051610050919061009b565b60405180910390f35b610061610069565b005b60005481565b60008081548092919061007b906100e5565b9190505550565b6000819050919050565b61009581610082565b82525050565b60006020820190506100b0600083018461008c565b92915050565b7f4e487b7100000000000000000000000000000000000000000000000000000000600052601160045260246000fd5b60006100f082610082565b91507fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff8203610122576101216100b6565b5b60018201905091905056fea2646970667358221220ce715c269a6493f974b0599f932508687267018b4e3881d45acf952eac8aa45364736f6c63430008110033",
      "storage": {
        "0x0": "0x2"
      }
    }
  }
}

anvil_loadState does not simply overwrite state data -- instead it appends the provided data to the existing state, merging it in the process (with provided data being prioritized in case of a conflict).

Questions

Question that came up while implementing this:

  1. Should we be able to accept upstream anvil's state format or use purely our format (i.e. you can only dump/load state in-between two era-test-nodes)? Ideally former but my feeling is that it will be very difficult to achieve (if even possible).
  2. Should we adopt anvil's approach to exporting account-state key pairs? It works for them but I think not for us as our storage has a lot of extra VM-specific stuff in it. Still, it might be necessary to do so otherwise I am not sure if we will ever be able to merge two different states together. If we don't adopt it, then how do we merge two storage states in a way that makes sense?
  3. What guarantees do we want to provide to users when they interact with dump/load state across different era-test-node versions? Especially if we go with our own format in option 1 we might not get it 100% right from the get go. It's good to establish expectations early IMHO.

@itegulov
Copy link
Contributor Author

@dutterbutter @popzxc low priority but just wanted to update you guys on the status of this and that there is some ambiguity here ^

@dutterbutter
Copy link
Collaborator

My general thoughts, helpful or not lol :

  1. As you mentioned, it would be ideal to use upstream Anvil's state format. However, if this proves infeasible or introduces significant overhead, adopting our own format is more then reasonable.

In my view, the interfaces with Anvil (e.g., API and CLI) should aim for as much parity as possible. That said, some differences in behaviour and output are inevitable due to our unique differences. We should document these divergences so its clear for devs.

  1. I am not familiar with what vm specific stuff would be included so not too sure how helpful my thoughts here would be, but generally I think its fine to adopt our own approach for the same reason I mention above.

  2. Is it possible to embed metadata that includes era-test-node versions to assist here?

@popzxc
Copy link
Member

popzxc commented Nov 29, 2024

Should we be able to accept upstream anvil's state format

Anvil state includes bytecode, right? So I believe it's meaningless for us until we have ZK OS. So it's fine to have different state formats. And to this point, incremental improvements are better than no improvements.

On 2 & 3 -- I believe that our state is significantly different from L1 state (e.g. nonces are stored in a system contract, ETH balance is a balance on contract, etc). Probably I'd say that for now let's implement a custom format that supports versioning, and for now for simplicity dump state just as a flat set of storage keys to storage value mappings. Something like:

{
  "state_version": "1.0.0",
  "state": {
      "0x0000000000000000000000000000000000000000": "0x0000000000000000000000000000000000000000",
      "0x0000000000000000000000000000000000000001": "0x0000000000000000000000000000000000000002",
      "0x0000000000000000000000000000000000000003": "0x0000000000000000000000000000000000000004",
      "0x0000000000000000000000000000000000000005": "0x0000000000000000000000000000000000000006",
      "0x0000000000000000000000000000000000000007": "0x0000000000000000000000000000000000000008",
   }
}

Then merging is trivial: we just insert new keys there. Given that we don't care about proving, we probably don't care about thins like initial writes etc.

@popzxc
Copy link
Member

popzxc commented Nov 29, 2024

If it's easily doable, we can make it a bit more complex so that people can actually read the state, e.g.:

{
  "state_version": "1.0.0",
  "state": {
      // Key is the address that owns the storage. It doesn't mean anything for loading process, it's just a hint for people:
      // e.g. if they want to manually edit the state file, they can look for the address of a particular contract
      // and figure out the relevant storage slot
      "0x0000000000000000000000000000000000000000": {
         "0x0000000000000000000000000000000000000000": "0x0000000000000000000000000000000000000000",
         "0x0000000000000000000000000000000000000001": "0x0000000000000000000000000000000000000002",
      },
      "0x0000000000000000000000000000000000000002": {
         "0x0000000000000000000000000000000000000003": "0x0000000000000000000000000000000000000004",
         "0x0000000000000000000000000000000000000005": "0x0000000000000000000000000000000000000006",
         "0x0000000000000000000000000000000000000007": "0x0000000000000000000000000000000000000008",
      }
   }
}

@itegulov
Copy link
Contributor Author

Anvil state includes bytecode, right? So I believe it's meaningless for us until we have ZK OS.

Ah, good point!

Then merging is trivial: we just insert new keys there. Given that we don't care about proving, we probably don't care about thins like initial writes etc.

Correct me if I am wrong but some parts of system contract's storage has information about things like current block, batch and potentially other things.

Let's say we dump a node with 100 blocks and try to load it into another node with 1000 blocks: first 100 blocks will be overwritten as expected but also system contract's block/batch number will be overwritten as well. Nothing unsolvable of course, we just can't do merging in the most naive way I think.

If it's easily doable, we can make it a bit more complex so that people can actually read the state:

That would be ideal and I think this is achievable with some effort but need a lot of refactoring on the storage layer.

@popzxc
Copy link
Member

popzxc commented Nov 29, 2024

Correct me if I am wrong but some parts of system contract's storage has information about things like current block, batch and potentially other things.

I may be completely wrong here, but from what I remember, we load this information right to the bootloader memory when executing batch (e.g. things like block.timestamp read from the VM memory, not from the contract). We obviously need to check it more thoroughly, but I think that things shouldn't be majorly broken.

One thing that is stored in system contracts is nonces, but by looking at the anvil state example you provided, I believe nonces can be overridden in anvil too, so it's not that big of an issue.

(but again -- I may be terribly wrong)

That would be ideal and I think this is achievable with some effort but need a lot of refactoring on the storage layer.

Makes sense. Well, I guess with versioning supported for storage format it shouldn't be an issue to add it later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants