API-level proxy detection and implementation resolution #1655

manuelwedler · 2024-09-23T15:59:01Z

As it has been a feature that was requested frequently, we decided to implement proxy detection and resolution of the implementation contract as an API-level functionality. This is made possible by the whatsabi library which detects major proxy styles by only looking at the bytecode. As a first proof of concept we implement this in API v1 before we incorporate it in the API v2 design.

Spec

On a request on the check-all-by-addresses endpoint, the bytecode is loaded from the db as before. We use the bytecode to call whatsabi.autoload. This will detect if the contract is a proxy and which type of proxy. It works without the need of rpc calls. If the address is a proxy we resolve the implementation address. This will make use of an rpc call depending on the type of proxy.

The response of the API should look like this then:

[
  {
    "address": "0x6F1D75a53a8805DcA5347aE5F3eDE035CAE3CBC1",
    "chainIds": [
      {
        "chainId": "5",
        "status": "perfect",
        "isProxy": true,
        "proxyType": "EIP1967Proxy",
        "implementations": [
            {
                address: "0x751D7C0Cf91a9b7704541b44E5fF7BeC3D2caA6F",
                name: "Logic contract"
            }
         ]
      }
    ]
  }
]

If a contract is not verified, we don't have the bytecode. In this case we can let whatsabi fetch the bytecode as a fallback, and show the proxy status even if a contract is not verified. This is necessary to also support Minimal Proxies (#1624).

ToDos

Estimate costs in terms of additional RPC calls (see API-level proxy detection and implementation resolution #1655 (comment))
Wait for next release of whatsabi that should include Improve devex for only resolving proxies shazow/whatsabi#120
Investigate the source code of FixedProxy contracts in our DB
Investigate multi proxy contracts and what implementations they resolve to
Decide what to return as implementation address for proxies that have multiple implementations
Implement PoC in API v1

The text was updated successfully, but these errors were encountered:

kuzdogan · 2024-09-24T06:45:03Z

bytecode is loaded from the db as before

Why was the bytecode being loaded? Afaik we don't return the bytecode in the checkBy.. responses.

I'm thinking if it makes sense to save the proxy information in the db, possibly in the sourcify_matches to not modify the VerA schema. This could be an isProxy and a proxyType. I am guessing deriving the implementation address from the bytecode for an EIP1967Proxy is trivial, right? So saving the implementation address wouldn't be worth it.

In general, in APIv1 or v2 this option to get the proxy should be disabled by default and should be optional. If it's enabled by default it will result in an RPC call for every check... request.

manuelwedler · 2024-09-24T09:09:08Z

Why was the bytecode being loaded? Afaik we don't return the bytecode in the checkBy.. responses.

You are right. The bytecode isn't loaded at the moment. But we would need to if we want to save on the RPC request here.

I am guessing deriving the implementation address from the bytecode for an EIP1967Proxy is trivial, right? So saving the implementation address wouldn't be worth it.

Yes it is at a fixed storage slot, but still you need to resolve. We would save the RPC request if we store the implementation address. ~~I actually think it is more worth storing the implementation address than the other info for this reason. Detecting a proxy can be done without RPC calls.~~

Edit: Thinking about this again, and we should definitely not store the implementation address. The reason for implementing the resolution on-the-fly is that we don't index the chain and don't know when to update the implementation address.

In general, in APIv1 or v2 this option to get the proxy should be disabled by default and should be optional. If it's enabled by default it will result in an RPC call for every check... request.

If we have the bytecode it won't result in an RPC call for every request, only if we detect that it is a proxy. But I agree that this extra complexity shouldn't be the default. So making it optional is good.

kuzdogan · 2024-09-24T09:56:21Z

Yes it is at a fixed storage slot, but still you need to resolve. We would save the RPC request if we store the implementation address. I actually think it is more worth storing the implementation address than the other info for this reason. Detecting a proxy can be done without RPC calls.

Edit: Thinking about this again, and we should definitely not store the implementation address. The reason for implementing the resolution on-the-fly is that we don't index the chain and don't know when to update the implementation address.

I actually wanted to refer to Minimal Proxies (ERC-1167) here so not EIP1967, my bad. So, is it trivial to derive the implementation address?

manuelwedler · 2024-09-24T10:24:20Z

Yes, for Minimal Proxies this is true. No RPC call needed.

manuelwedler · 2024-09-27T15:31:43Z

To summarize from the conversation above:

We return this feature optionally with some API parameter
I think we should load the bytecode from the DB for running the detection on it

Anything else we should consider before implementing?

kuzdogan · 2024-09-30T10:46:16Z

To be clear we already want to add this to APIv1 as in the first message, right? In that case we should be able to easily turn it on/off for the proof of concept.

Also, we are not saving the implementation address in the DB for sure, but do we want to save the info on whether this contract is a proxy and what type of proxy it is?

Finally, can you run a quick analysis on how many additional RPC calls this would cause? Like, how many requests we have now, estimate some percentage for proxy contracts (maybe look at #1632 ?), and have a rough daily request count. So that we can estimate the potential cost impact of this

manuelwedler · 2024-10-01T08:35:52Z

Also, we are not saving the implementation address in the DB for sure, but do we want to save the info on whether this contract is a proxy and what type of proxy it is?

I would rather not do it. It seems not too complex to me in terms of computation to detect a proxy. But to prove this, I can run some tests / measurements. Then we can decide if it is worth storing.

marcocastignoli · 2024-10-01T12:56:15Z

I'm thinking from the prospective of someone using our exports, if a contract is a proxy and which type is a valuable information. It makes sense to store the proxy type in the database so that others don't have to recalculate it.

manuelwedler · 2024-10-01T13:26:40Z

That's a very good point. The detection should then just be integrated into the verification process IMO.

How about implementing this as simple first imlementation that runs the detection when the API is called, and add the DB functionality in another issue? The DB change will also need some migration process again.

manuelwedler · 2024-10-07T16:51:04Z

To gather some statistics, I ran a script detecting proxies over the complete production database. The script is using the whatsabi library.

Number of proxies in Sourcify's DB

Number of proxies detected: 2,207,547 (of 5,471,192 analysed matches)

The number of proxies is quite high (40% of all contracts). This makes sense when you have a look at #1632.

Proxy types

Here are the numbers of individual types of proxies that were detected:

{
    "EIP1967Proxy": 530980,
    "GnosisSafeProxy": 1907106,
    "DiamondProxy": 3318,
    "PROXIABLEProxy": 1502,
    "ZeppelinOSProxy": 942,
    "FixedProxy": 5942,
    "SequenceWalletProxy": 28
  }

Note that there is a small portion of FixedProxy contracts. We called these Minimal Proxies in another issue (#1624). It is possible to resolve these contracts without an rpc call as the implementation address is at a fixed position in the bytecode. The surprising thing is that we have them in our database in general, because the original ERC-1167 standard does not use any Solidity source code which we could verify.

Estimate of RPC calls

We currently have roughly 18,000 daily requests to the check-by-* API endpoints. Assuming the proxies are equally distributed among the requested contracts, we would have at least 7,243 (((2207547-5942) / 5471192) * 18000) additional RPC calls per day, if we would enable this features by default.

Multi proxy contracts

The whatsabi library can return multiple proxies for a provided bytecode, because a contract can potentially implement multiple proxy standards. I found a lot of contracts for which actually multiple proxies are returned. There were 206,246 such contracts while the number of returned proxies ranged from 2 to 31. I still need to find out if these resolve to the same implementation addresses and if some are a problem of the whatsabi library. I definitely also saw a few returning multiple proxy standards, while many returned the same standard multiple times.

Performance

I also measured the time it took to run the detection per bytecode. I found a median of 0.028 millisecond for running the detection algorithm on my machine. So the only bottleneck of this feature can be the rpc calls for resolving the implementation address.

kuzdogan · 2024-10-08T09:09:43Z

That is a very insightful breakdown, thanks!

7k RPC calls for 18k current ones is not trivial but feels ok. I also suspect /check-by- is not the only endpoint people use to get contract files but also /files/any as well as /repository/contracts/....

The surprising thing is that we have them in our database in general, because the original ERC-1167 standard does not use any Solidity source code which we could verify.

Yeah very interesting. What source code are they associated with? Maybe an assembly Solidity code that compiles to the exact same bytecode?

manuelwedler · 2024-10-09T10:16:14Z

I pushed my script to here: https://github.com/sourcifyeth/data-analysis-scripts

Will probably add another script, for analysing the multi proxy contracts further. The issue with them is we might not be able to just return one implementation address for them from our API. For DiamonProxy contracts multiple implemenations are expected because they can map every function signature to another implementation.

Multiple implementation addresses would also mean multiple RPC calls, and therefore I would not like to return multiple addresses. In such a case we can maybe just return "implementationAddress": "multi"

manuelwedler · 2024-10-09T13:52:36Z

Yeah very interesting. What source code are they associated with? Maybe an assembly Solidity code that compiles to the exact same bytecode?

Regarding that, there are definitely proxies that have source code but have a fixed implementation address for example taken in the constructor. See this as an example: https://sepolia.etherscan.io/address/0x2f4eeccbe817e2b9f66e8123387aa81bae08dfec

kuzdogan · 2024-10-10T06:59:00Z

Multiple implementation addresses would also mean multiple RPC calls, and therefore I would not like to return multiple addresses. In such a case we can maybe just return "implementationAddress": "multi"

How does other block explorers handle multiple proxies? Because normally they have an implementationAddress and that returns a single address.

Yeah very interesting. What source code are they associated with? Maybe an assembly Solidity code that compiles to the exact same bytecode?

Regarding that, there are definitely proxies that have source code but have a fixed implementation address for example taken in the constructor. See this as an example: https://sepolia.etherscan.io/address/0x2f4eeccbe817e2b9f66e8123387aa81bae08dfec

So does WhatsABI classify this example and the ERC-1176 Minimal Proxy both as FixedProxy? IMO there should be a separate class for eip1167.

manuelwedler · 2024-10-10T08:57:26Z

How does other block explorers handle multiple proxies? Because normally they have an implementationAddress and that returns a single address.

Good question. I will check Etherscan while I check the data a bit more.

So does WhatsABI classify this example and the ERC-1176 Minimal Proxy both as FixedProxy? IMO there should be a separate class for eip1167.

To my understanding, yes. But let me try it out actually

Edit: I tried it with https://optimistic.etherscan.io/address/0x6c5aDf5eDd2C77b63d6Fed3851c7C30E3f6D3C2b
I can confirm that ERC-1167 Minimal Proxy contracts are detected as a FixedProxy by whatsabi.

manuelwedler · 2024-10-10T15:57:18Z

How does other block explorers handle multiple proxies? Because normally they have an implementationAddress and that returns a single address.

So what I found regarding DiamonProxy (ERC-2535) contracts:

Etherscan

Etherscan does not support them. Either they were not shown as proxy and also not verifiable as proxy, or they showed just one implementation address of many. In the second case, it was also not possible to use the "Read as proxy" and "Write as proxy" functionalities.

A DiamondProxy on Etherscan: https://etherscan.io/address/0x69460570c93f9de5e2edbc3052bf10125f0ca22d

Blockscout

Blockscout adds a dropdown to select the implementation which you want to interact with when you want to "read/write as proxy". The API returns actually an array of implementations (also for other types of proxies):


  "implementations": [
    ...
    {
      "address": "0x4cf5935cc30Ee24A509b0a98b2E3d942168c542e",
      "name": "RangoSymbiosisFacet"
    },
    {
      "address": "0x05D7453335d24b3e73265C8F792a7d9FC8510E54",
      "name": "RangoVoyagerFacet"
    },
    {
      "address": "0x12692298958AC71ED2B695A363869298568bE414",
      "name": "RangoStargateFacet"
    },
    ...
 ]

Same contract on Blockscout: https://eth.blockscout.com/address/0x69460570c93f9DE5E2edbC3052bf10125f0Ca22d?tab=read_proxy

Louper

Louper is an Inspector for DiamonProxies. I just want to add this tool here, because I found it quite helpful. Here is the above contract on Louper: https://louper.dev/diamond/0x69460570c93f9de5e2edbc3052bf10125f0ca22d?network=mainnet#facets

manuelwedler · 2024-10-15T10:57:25Z

To be able to support multiple implementations, we decided to have the following field in the response:

        "implementations": [
            {
                address: "0x751D7C0Cf91a9b7704541b44E5fF7BeC3D2caA6F",
                name: "Logic contract"
            }
         ]

I also updated the issue description.

kuzdogan · 2024-10-15T13:48:59Z

To be able to support multiple implementations, we decided to have the following field in the response:
        "implementations": [
            {
                address: "0x751D7C0Cf91a9b7704541b44E5fF7BeC3D2caA6F",
                name: "Logic contract"
            }
         ]
I also updated the issue description.

Shouldn't we also include the proxy type in the implementation or somewhere else? Because my understanding is we have isProxy but don't have the proxy type information anywhere

manuelwedler · 2024-10-16T07:27:17Z

Yes, in the issue description above, I already have a proxyType field.

kuzdogan · 2024-10-16T17:00:48Z

Yes but this should also be in the API response, no?

manuelwedler · 2024-10-17T07:44:58Z

That code field in the description is how I designed the API response. Or what do you mean? For API v2 I will propose something later.

kuzdogan · 2024-10-21T07:39:53Z

Ok I missed the proxyType field. So the APIv1 response looks like this?

[
  {
    "address": "0x6F1D75a53a8805DcA5347aE5F3eDE035CAE3CBC1",
    "chainIds": [
      {
        "chainId": "5",
        "status": "perfect",
        "isProxy": true,
        "proxyType": "EIP1967Proxy",
        "implementations": [
            {
                address: "0x751D7C0Cf91a9b7704541b44E5fF7BeC3D2caA6F",
                name: "Logic contract"
            }
         ]
      }
    ]
  }
]

My only question would be, there can only be one proxyType but multiple implementations, right?

kuzdogan · 2024-10-21T11:08:19Z

One main use case that I'm not sure if we considered is the following.

I was trying to verify the contract https://sepolia.etherscan.io/address/0x5a6b9d7b73e5bc32044d2ca014555dc542bebc45#code for testing traces. The contract is created by a factory and is actually a minimal proxy. The verification failed because of course the contract's bytecode is just the minimal proxy runtime bytecode. I think Sourcify should automatically handle the resolution of the implementation and marking the proxy and the implementation appropriately.

manuelwedler · 2024-10-21T16:08:01Z

Ok I missed the proxyType field. So the APIv1 response looks like this?

Yes exactly.

My only question would be, there can only be one proxyType but multiple implementations, right?

Yes. Still, I saw some cases where the library returned multiple kinds of proxies, but these were rather a limitation of the library / bug. I could work around that, so there will definitely be only one proxyType for one contract. See this issue for further reference. The only case when we would have multiple implementations would be DiamondProxies.

I was trying to verify the contract https://sepolia.etherscan.io/address/0x5a6b9d7b73e5bc32044d2ca014555dc542bebc45#code for testing traces. The contract is created by a factory and is actually a minimal proxy. The verification failed because of course the contract's bytecode is just the minimal proxy runtime bytecode. I think Sourcify should automatically handle the resolution of the implementation and marking the proxy and the implementation appropriately.

I agree with that. I think that the proxy verification should be integrated into the verification process, and it should be possible to run our verify endpoint with minimal proxies. I propose to make this and the required db schema change with our planned verification flow refactoring.

github-project-automation bot added this to Sourcify Public Sep 23, 2024

github-project-automation bot moved this to Triage in Sourcify Public Sep 23, 2024

manuelwedler mentioned this issue Sep 23, 2024

Feature Request: Support for ERC-1967 Compliant Proxy Contracts Registration and Lookup #1336

Open

kuzdogan mentioned this issue Sep 24, 2024

Gather APIv2 Feedback and Iterate #1646

Closed

4 tasks

kuzdogan assigned manuelwedler Sep 25, 2024

shazow mentioned this issue Sep 30, 2024

autoload: AutoloadResult.ContractResult shazow/whatsabi#131

Merged

kuzdogan mentioned this issue Oct 15, 2024

Incorporate and finalize the proxy format to the APIv2 response #1701

Closed

This was referenced Oct 22, 2024

Add API-level proxy detection and implementation resolution #1711

Merged

Integrate proxy detection into the verification flow #1712

Open

manuelwedler closed this as completed in #1711 Nov 27, 2024

github-project-automation bot moved this from Sprint - Needs Review to Sprint - Done in Sourcify Public Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API-level proxy detection and implementation resolution #1655

API-level proxy detection and implementation resolution #1655

manuelwedler commented Sep 23, 2024 •

edited

Loading

kuzdogan commented Sep 24, 2024

manuelwedler commented Sep 24, 2024 •

edited

Loading

kuzdogan commented Sep 24, 2024

manuelwedler commented Sep 24, 2024

manuelwedler commented Sep 27, 2024

kuzdogan commented Sep 30, 2024

manuelwedler commented Oct 1, 2024

marcocastignoli commented Oct 1, 2024

manuelwedler commented Oct 1, 2024

manuelwedler commented Oct 7, 2024

kuzdogan commented Oct 8, 2024

manuelwedler commented Oct 9, 2024

manuelwedler commented Oct 9, 2024

kuzdogan commented Oct 10, 2024

manuelwedler commented Oct 10, 2024 •

edited

Loading

manuelwedler commented Oct 10, 2024

manuelwedler commented Oct 15, 2024

kuzdogan commented Oct 15, 2024

manuelwedler commented Oct 16, 2024

kuzdogan commented Oct 16, 2024

manuelwedler commented Oct 17, 2024

kuzdogan commented Oct 21, 2024

kuzdogan commented Oct 21, 2024

manuelwedler commented Oct 21, 2024

API-level proxy detection and implementation resolution #1655

API-level proxy detection and implementation resolution #1655

Comments

manuelwedler commented Sep 23, 2024 • edited Loading

Spec

ToDos

kuzdogan commented Sep 24, 2024

manuelwedler commented Sep 24, 2024 • edited Loading

kuzdogan commented Sep 24, 2024

manuelwedler commented Sep 24, 2024

manuelwedler commented Sep 27, 2024

kuzdogan commented Sep 30, 2024

manuelwedler commented Oct 1, 2024

marcocastignoli commented Oct 1, 2024

manuelwedler commented Oct 1, 2024

manuelwedler commented Oct 7, 2024

Number of proxies in Sourcify's DB

Proxy types

Estimate of RPC calls

Multi proxy contracts

Performance

kuzdogan commented Oct 8, 2024

manuelwedler commented Oct 9, 2024

manuelwedler commented Oct 9, 2024

kuzdogan commented Oct 10, 2024

manuelwedler commented Oct 10, 2024 • edited Loading

manuelwedler commented Oct 10, 2024

Etherscan

Blockscout

Louper

manuelwedler commented Oct 15, 2024

kuzdogan commented Oct 15, 2024

manuelwedler commented Oct 16, 2024

kuzdogan commented Oct 16, 2024

manuelwedler commented Oct 17, 2024

kuzdogan commented Oct 21, 2024

kuzdogan commented Oct 21, 2024

manuelwedler commented Oct 21, 2024

manuelwedler commented Sep 23, 2024 •

edited

Loading

manuelwedler commented Sep 24, 2024 •

edited

Loading

manuelwedler commented Oct 10, 2024 •

edited

Loading