Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API-level proxy detection and implementation resolution #1655

Closed
5 of 6 tasks
manuelwedler opened this issue Sep 23, 2024 · 24 comments · Fixed by #1711
Closed
5 of 6 tasks

API-level proxy detection and implementation resolution #1655

manuelwedler opened this issue Sep 23, 2024 · 24 comments · Fixed by #1711
Assignees

Comments

@manuelwedler
Copy link
Collaborator

manuelwedler commented Sep 23, 2024

As it has been a feature that was requested frequently, we decided to implement proxy detection and resolution of the implementation contract as an API-level functionality. This is made possible by the whatsabi library which detects major proxy styles by only looking at the bytecode. As a first proof of concept we implement this in API v1 before we incorporate it in the API v2 design.

Spec

On a request on the check-all-by-addresses endpoint, the bytecode is loaded from the db as before. We use the bytecode to call whatsabi.autoload. This will detect if the contract is a proxy and which type of proxy. It works without the need of rpc calls. If the address is a proxy we resolve the implementation address. This will make use of an rpc call depending on the type of proxy.

The response of the API should look like this then:

[
  {
    "address": "0x6F1D75a53a8805DcA5347aE5F3eDE035CAE3CBC1",
    "chainIds": [
      {
        "chainId": "5",
        "status": "perfect",
        "isProxy": true,
        "proxyType": "EIP1967Proxy",
        "implementations": [
            {
                address: "0x751D7C0Cf91a9b7704541b44E5fF7BeC3D2caA6F",
                name: "Logic contract"
            }
         ]
      }
    ]
  }
]

If a contract is not verified, we don't have the bytecode. In this case we can let whatsabi fetch the bytecode as a fallback, and show the proxy status even if a contract is not verified. This is necessary to also support Minimal Proxies (#1624).

ToDos

@kuzdogan
Copy link
Member

bytecode is loaded from the db as before

Why was the bytecode being loaded? Afaik we don't return the bytecode in the checkBy.. responses.

I'm thinking if it makes sense to save the proxy information in the db, possibly in the sourcify_matches to not modify the VerA schema. This could be an isProxy and a proxyType. I am guessing deriving the implementation address from the bytecode for an EIP1967Proxy is trivial, right? So saving the implementation address wouldn't be worth it.

In general, in APIv1 or v2 this option to get the proxy should be disabled by default and should be optional. If it's enabled by default it will result in an RPC call for every check... request.

@manuelwedler
Copy link
Collaborator Author

manuelwedler commented Sep 24, 2024

Why was the bytecode being loaded? Afaik we don't return the bytecode in the checkBy.. responses.

You are right. The bytecode isn't loaded at the moment. But we would need to if we want to save on the RPC request here.

I am guessing deriving the implementation address from the bytecode for an EIP1967Proxy is trivial, right? So saving the implementation address wouldn't be worth it.

Yes it is at a fixed storage slot, but still you need to resolve. We would save the RPC request if we store the implementation address. I actually think it is more worth storing the implementation address than the other info for this reason. Detecting a proxy can be done without RPC calls.

Edit: Thinking about this again, and we should definitely not store the implementation address. The reason for implementing the resolution on-the-fly is that we don't index the chain and don't know when to update the implementation address.

In general, in APIv1 or v2 this option to get the proxy should be disabled by default and should be optional. If it's enabled by default it will result in an RPC call for every check... request.

If we have the bytecode it won't result in an RPC call for every request, only if we detect that it is a proxy. But I agree that this extra complexity shouldn't be the default. So making it optional is good.

@kuzdogan
Copy link
Member

Yes it is at a fixed storage slot, but still you need to resolve. We would save the RPC request if we store the implementation address. I actually think it is more worth storing the implementation address than the other info for this reason. Detecting a proxy can be done without RPC calls.

Edit: Thinking about this again, and we should definitely not store the implementation address. The reason for implementing the resolution on-the-fly is that we don't index the chain and don't know when to update the implementation address.

I actually wanted to refer to Minimal Proxies (ERC-1167) here so not EIP1967, my bad. So, is it trivial to derive the implementation address?

@manuelwedler
Copy link
Collaborator Author

Yes, for Minimal Proxies this is true. No RPC call needed.

@manuelwedler
Copy link
Collaborator Author

To summarize from the conversation above:

  • We return this feature optionally with some API parameter
  • I think we should load the bytecode from the DB for running the detection on it

Anything else we should consider before implementing?

@kuzdogan
Copy link
Member

To be clear we already want to add this to APIv1 as in the first message, right? In that case we should be able to easily turn it on/off for the proof of concept.

Also, we are not saving the implementation address in the DB for sure, but do we want to save the info on whether this contract is a proxy and what type of proxy it is?

Finally, can you run a quick analysis on how many additional RPC calls this would cause? Like, how many requests we have now, estimate some percentage for proxy contracts (maybe look at #1632 ?), and have a rough daily request count. So that we can estimate the potential cost impact of this

@manuelwedler
Copy link
Collaborator Author

Also, we are not saving the implementation address in the DB for sure, but do we want to save the info on whether this contract is a proxy and what type of proxy it is?

I would rather not do it. It seems not too complex to me in terms of computation to detect a proxy. But to prove this, I can run some tests / measurements. Then we can decide if it is worth storing.

@marcocastignoli
Copy link
Member

I'm thinking from the prospective of someone using our exports, if a contract is a proxy and which type is a valuable information. It makes sense to store the proxy type in the database so that others don't have to recalculate it.

@manuelwedler
Copy link
Collaborator Author

That's a very good point. The detection should then just be integrated into the verification process IMO.

How about implementing this as simple first imlementation that runs the detection when the API is called, and add the DB functionality in another issue? The DB change will also need some migration process again.

@manuelwedler
Copy link
Collaborator Author

To gather some statistics, I ran a script detecting proxies over the complete production database. The script is using the whatsabi library.

Number of proxies in Sourcify's DB

Number of proxies detected: 2,207,547 (of 5,471,192 analysed matches)

The number of proxies is quite high (40% of all contracts). This makes sense when you have a look at #1632.

Proxy types

Here are the numbers of individual types of proxies that were detected:

{
    "EIP1967Proxy": 530980,
    "GnosisSafeProxy": 1907106,
    "DiamondProxy": 3318,
    "PROXIABLEProxy": 1502,
    "ZeppelinOSProxy": 942,
    "FixedProxy": 5942,
    "SequenceWalletProxy": 28
  }

Note that there is a small portion of FixedProxy contracts. We called these Minimal Proxies in another issue (#1624). It is possible to resolve these contracts without an rpc call as the implementation address is at a fixed position in the bytecode. The surprising thing is that we have them in our database in general, because the original ERC-1167 standard does not use any Solidity source code which we could verify.

Estimate of RPC calls

We currently have roughly 18,000 daily requests to the check-by-* API endpoints. Assuming the proxies are equally distributed among the requested contracts, we would have at least 7,243 (((2207547-5942) / 5471192) * 18000) additional RPC calls per day, if we would enable this features by default.

Multi proxy contracts

The whatsabi library can return multiple proxies for a provided bytecode, because a contract can potentially implement multiple proxy standards. I found a lot of contracts for which actually multiple proxies are returned. There were 206,246 such contracts while the number of returned proxies ranged from 2 to 31. I still need to find out if these resolve to the same implementation addresses and if some are a problem of the whatsabi library. I definitely also saw a few returning multiple proxy standards, while many returned the same standard multiple times.

Performance

I also measured the time it took to run the detection per bytecode. I found a median of 0.028 millisecond for running the detection algorithm on my machine. So the only bottleneck of this feature can be the rpc calls for resolving the implementation address.

@kuzdogan
Copy link
Member

kuzdogan commented Oct 8, 2024

That is a very insightful breakdown, thanks!

7k RPC calls for 18k current ones is not trivial but feels ok. I also suspect /check-by- is not the only endpoint people use to get contract files but also /files/any as well as /repository/contracts/....

The surprising thing is that we have them in our database in general, because the original ERC-1167 standard does not use any Solidity source code which we could verify.

Yeah very interesting. What source code are they associated with? Maybe an assembly Solidity code that compiles to the exact same bytecode?

@manuelwedler
Copy link
Collaborator Author

I pushed my script to here: https://github.com/sourcifyeth/data-analysis-scripts

Will probably add another script, for analysing the multi proxy contracts further. The issue with them is we might not be able to just return one implementation address for them from our API. For DiamonProxy contracts multiple implemenations are expected because they can map every function signature to another implementation.

Multiple implementation addresses would also mean multiple RPC calls, and therefore I would not like to return multiple addresses. In such a case we can maybe just return "implementationAddress": "multi"

@manuelwedler
Copy link
Collaborator Author

Yeah very interesting. What source code are they associated with? Maybe an assembly Solidity code that compiles to the exact same bytecode?

Regarding that, there are definitely proxies that have source code but have a fixed implementation address for example taken in the constructor. See this as an example: https://sepolia.etherscan.io/address/0x2f4eeccbe817e2b9f66e8123387aa81bae08dfec

@kuzdogan
Copy link
Member

Multiple implementation addresses would also mean multiple RPC calls, and therefore I would not like to return multiple addresses. In such a case we can maybe just return "implementationAddress": "multi"

How does other block explorers handle multiple proxies? Because normally they have an implementationAddress and that returns a single address.

Yeah very interesting. What source code are they associated with? Maybe an assembly Solidity code that compiles to the exact same bytecode?

Regarding that, there are definitely proxies that have source code but have a fixed implementation address for example taken in the constructor. See this as an example: https://sepolia.etherscan.io/address/0x2f4eeccbe817e2b9f66e8123387aa81bae08dfec

So does WhatsABI classify this example and the ERC-1176 Minimal Proxy both as FixedProxy? IMO there should be a separate class for eip1167.

@manuelwedler
Copy link
Collaborator Author

manuelwedler commented Oct 10, 2024

How does other block explorers handle multiple proxies? Because normally they have an implementationAddress and that returns a single address.

Good question. I will check Etherscan while I check the data a bit more.

So does WhatsABI classify this example and the ERC-1176 Minimal Proxy both as FixedProxy? IMO there should be a separate class for eip1167.

To my understanding, yes. But let me try it out actually

Edit: I tried it with https://optimistic.etherscan.io/address/0x6c5aDf5eDd2C77b63d6Fed3851c7C30E3f6D3C2b
I can confirm that ERC-1167 Minimal Proxy contracts are detected as a FixedProxy by whatsabi.

@manuelwedler
Copy link
Collaborator Author

How does other block explorers handle multiple proxies? Because normally they have an implementationAddress and that returns a single address.

So what I found regarding DiamonProxy (ERC-2535) contracts:

Etherscan

Etherscan does not support them. Either they were not shown as proxy and also not verifiable as proxy, or they showed just one implementation address of many. In the second case, it was also not possible to use the "Read as proxy" and "Write as proxy" functionalities.

A DiamondProxy on Etherscan: https://etherscan.io/address/0x69460570c93f9de5e2edbc3052bf10125f0ca22d

Blockscout

Blockscout adds a dropdown to select the implementation which you want to interact with when you want to "read/write as proxy". The API returns actually an array of implementations (also for other types of proxies):


  "implementations": [
    ...
    {
      "address": "0x4cf5935cc30Ee24A509b0a98b2E3d942168c542e",
      "name": "RangoSymbiosisFacet"
    },
    {
      "address": "0x05D7453335d24b3e73265C8F792a7d9FC8510E54",
      "name": "RangoVoyagerFacet"
    },
    {
      "address": "0x12692298958AC71ED2B695A363869298568bE414",
      "name": "RangoStargateFacet"
    },
    ...
 ]

Same contract on Blockscout: https://eth.blockscout.com/address/0x69460570c93f9DE5E2edbC3052bf10125f0Ca22d?tab=read_proxy

Louper

Louper is an Inspector for DiamonProxies. I just want to add this tool here, because I found it quite helpful. Here is the above contract on Louper: https://louper.dev/diamond/0x69460570c93f9de5e2edbc3052bf10125f0ca22d?network=mainnet#facets

@manuelwedler
Copy link
Collaborator Author

To be able to support multiple implementations, we decided to have the following field in the response:

        "implementations": [
            {
                address: "0x751D7C0Cf91a9b7704541b44E5fF7BeC3D2caA6F",
                name: "Logic contract"
            }
         ]

I also updated the issue description.

@kuzdogan
Copy link
Member

To be able to support multiple implementations, we decided to have the following field in the response:

        "implementations": [
            {
                address: "0x751D7C0Cf91a9b7704541b44E5fF7BeC3D2caA6F",
                name: "Logic contract"
            }
         ]

I also updated the issue description.

Shouldn't we also include the proxy type in the implementation or somewhere else? Because my understanding is we have isProxy but don't have the proxy type information anywhere

@manuelwedler
Copy link
Collaborator Author

Yes, in the issue description above, I already have a proxyType field.

@kuzdogan
Copy link
Member

Yes but this should also be in the API response, no?

@manuelwedler
Copy link
Collaborator Author

That code field in the description is how I designed the API response. Or what do you mean? For API v2 I will propose something later.

@kuzdogan
Copy link
Member

Ok I missed the proxyType field. So the APIv1 response looks like this?

[
  {
    "address": "0x6F1D75a53a8805DcA5347aE5F3eDE035CAE3CBC1",
    "chainIds": [
      {
        "chainId": "5",
        "status": "perfect",
        "isProxy": true,
        "proxyType": "EIP1967Proxy",
        "implementations": [
            {
                address: "0x751D7C0Cf91a9b7704541b44E5fF7BeC3D2caA6F",
                name: "Logic contract"
            }
         ]
      }
    ]
  }
]

My only question would be, there can only be one proxyType but multiple implementations, right?

@kuzdogan
Copy link
Member

One main use case that I'm not sure if we considered is the following.

I was trying to verify the contract https://sepolia.etherscan.io/address/0x5a6b9d7b73e5bc32044d2ca014555dc542bebc45#code for testing traces. The contract is created by a factory and is actually a minimal proxy. The verification failed because of course the contract's bytecode is just the minimal proxy runtime bytecode. I think Sourcify should automatically handle the resolution of the implementation and marking the proxy and the implementation appropriately.

@manuelwedler
Copy link
Collaborator Author

Ok I missed the proxyType field. So the APIv1 response looks like this?

Yes exactly.

My only question would be, there can only be one proxyType but multiple implementations, right?

Yes. Still, I saw some cases where the library returned multiple kinds of proxies, but these were rather a limitation of the library / bug. I could work around that, so there will definitely be only one proxyType for one contract. See this issue for further reference. The only case when we would have multiple implementations would be DiamondProxies.

I was trying to verify the contract https://sepolia.etherscan.io/address/0x5a6b9d7b73e5bc32044d2ca014555dc542bebc45#code for testing traces. The contract is created by a factory and is actually a minimal proxy. The verification failed because of course the contract's bytecode is just the minimal proxy runtime bytecode. I think Sourcify should automatically handle the resolution of the implementation and marking the proxy and the implementation appropriately.

I agree with that. I think that the proxy verification should be integrated into the verification process, and it should be possible to run our verify endpoint with minimal proxies. I propose to make this and the required db schema change with our planned verification flow refactoring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Sprint - Done
Development

Successfully merging a pull request may close this issue.

3 participants