Gateway discovery and verification protocol #342

gusinacio · 2024-10-03T14:30:11Z

Problem statement

We have a system where indexers must manually add the aggregator endpoint for specific gateways and we'd like to facilitate the onboarding of new gateways.

But new gateways may be malicious and we need to make sure that the cost to attack is higher than the value they can extract. We currently limit the amount an indexer can lose with the setting max_amount_willing_to_lose but this must be scaled accordingly to the number of queries per second, so if an attacker targets these big indexers, just the max_amount_willing_to_lose isn't enough to stop a profit on this attack.

Expectation proposal

We propose an off-chain agreement protocol where gateways can send a registration request and indexers can verify if it's possible to aggregate receipts.

Gateway TAP state machine

graph TD;
    Unregistered-->Verifying;
    Unregistered-->Blocked;
    Verifying-->Blocked;
    Blocked-->Verifying;
    Verifying-->Allowed;
    Allowed-->Denied;
    Denied-->Allowed;

Behaviour

Currently, our system has only two states: Allowed and Denied. Every gateway that is not on the tap_aggregator_endpoints map is denied as soon as tap-agent tries to create a SenderAccount and can't find the aggregator value.

With this proposal, we update our system to have different behaviors depending on which state a sender is.

Unregistered state

This state demands that the sender sends a tap-aggregator header within the first request so it can register the sender and start aggregating. If a query sent by a unregistered sender doesn't have the header, we deny right away with a possible error: "Sender not registered".

We should aggregate the first receipt to verify that the tap-aggregator is working and we can communicate with it. If it works we update to a Verifying state otherwise we change to a Blocked state.

Verifying state

In this state, every RAV request should not fail until we reach a certain amount (configurable) where we can trust the sender and transition it to an Allowed state. If any of the RAV requests fail, we transition to a Blocked state

Blocked state

In this state, we have a backoff retry process where we keep trying to aggregate the receipts that we have. In case we have a successful aggregation, we transition to a Verifying state.
We should not spend that much resources trying to aggregate, so after some time of backoff (it can be configurable but with small defaults like 1 day), we stop trying to aggregate it.

Senders that are blocked can request to update their tap-aggregator by sending another query, but they won't have their query processed. When indexers get a new receipt, the system should try again in a tentative way to verify the gateway (stopping after the same backoff period).

Allowed state

It's the current normal operation, each sender has a max unaggregated fee that they are allowed to serve before being denied. In case the pending fees is over the escrow balance or the unaggregated fee is bigger than max_willing_to_lose, we transition to Deny state.

Deny state

We already have this state in the current system. In this state, we wait for the escrow balance to update, or we keep retrying every 30 seconds to do a RAV request which lowers the unaggregated fee. We then transition to Allowed state resuming operation.

Tap Aggregator Header

For a gateway to update its tap-aggregator, it must send a signed receipt by one of its signers on the tap contracts. The query handler already demands a receipt

Database modification

New table responsible for storing tap-aggregator endpoints.
Upgrade the deny_list table to a sender_state table.

New error responses

We should notify the sender that it should send the header updating the tap-aggregator in the next request.
It would be nice to have information if the sender is denied or blocked and what is the reason
- Low escrow funds
- Too much pending fees (Tap-Aggregator not working)
- Verification failed

Alternative considerations

Register route

Instead of sending the aggregator through a header, a specific /register route could be used so we could save traffic on query handler by not sending through a header and just receiving a direct request. This new request would need to receive a receipt that should be aggregated and verified, updating it to the verifying state.

Tap Registry

We also have #94 as another possible solution but it requires a new contract and a new subgraph which is not worth it at the current time. Also, we'd need the same verification protocol to guarantee stability.

Synchronous communication between indexer-service and tap-agent

We used async communication by sharing the same database between those two components, if in any case it seems necessary synchronous communication, we should consider using #84. This should have a deep discussion because this means that indexer-service now would need to know tap-agent address.

The text was updated successfully, but these errors were encountered:

pcarranzav · 2024-10-04T20:42:05Z

I like the proposed solution, imo it's super important that new gateways (or even end consumers running their own gateways) can permissionlessly start interacting with the network without establishing trust outside the network. The proposed approach seems to achieve that in a straightforward way.

I think this is much preferable to #94 - if the registry is permissioned it precludes consumer-run gateways or new permissionless gateways, and if it's permissionless it doesn't prevent the attack, so we'd need to implement something like this anyways.

calinah · 2024-10-08T16:27:08Z

hey @gusinacio having read this proposal in detail I think it makes sense. I also favour it to a TAP registry.
This being said I do have some concerns around additional support being needed for Gateway operators as they respond to additional queries from indexers and as a gateway moves back and forth from one state to another. Presumably this support would not be an issue with a TAP registry or the other solution suggested around having a GPIA as a private body for curating a list of Gateways.
So overall in favour of this approach but not sure what other challenges may arise and it's not clear to me the level of priority we'd assign for this.

gusinacio · 2024-10-15T18:40:52Z

This being said I do have some concerns around additional support being needed for Gateway operators as they respond to additional queries from indexers and as a gateway moves back and forth from one state to another. Presumably this support would not be an issue with a TAP registry or the other solution suggested around having a GPIA as a private body for curating a list of Gateways.

You can provide your aggregator for all requests in the headers and then you don't need to worry about it. The only drawback to it is that you are paying for traffic related to those few bytes for every request.

gusinacio added size:x-large Very large p3 Low priority type:feature New or enhanced functionality labels Oct 3, 2024

gusinacio changed the title ~~[Feat.Req] Gateway verification protocol~~ [Feat.Req] Gateway discovery and verification protocol Oct 3, 2024

gusinacio added the repo:indexer-rs label Oct 17, 2024 — with Linear

gusinacio changed the title ~~[Feat.Req] Gateway discovery and verification protocol~~ Gateway discovery and verification protocol Oct 17, 2024

gusinacio mentioned this issue Dec 19, 2024

Integrate TAP Agent with gateway registry contract #94

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gateway discovery and verification protocol #342

Gateway discovery and verification protocol #342

gusinacio commented Oct 3, 2024

pcarranzav commented Oct 4, 2024

calinah commented Oct 8, 2024

gusinacio commented Oct 15, 2024

Gateway discovery and verification protocol #342

Gateway discovery and verification protocol #342

Comments

gusinacio commented Oct 3, 2024

Problem statement

Expectation proposal

Gateway TAP state machine

Behaviour

Unregistered state

Verifying state

Blocked state

Allowed state

Deny state

Tap Aggregator Header

Database modification

New error responses

Alternative considerations

Register route

Tap Registry

Synchronous communication between indexer-service and tap-agent

pcarranzav commented Oct 4, 2024

calinah commented Oct 8, 2024

gusinacio commented Oct 15, 2024