feat: Implement basic circuit breaker in autopilot #2667

fleupold · 2024-04-26T13:01:35Z

Problem

Solvers that are running their own driver, could theoretically submit solutions outside of the competition (simply calling settle with a bunch of signed orders). In this case the duration for which a malicious solver can continue this behavior is critical for the damage that gets accrued (the more orders settled at limit price, the higher the foregone surplus).

It's therefore important for the protocol to run an automated and diverse set of "watch dogs" that monitor misbehaviour and "circuit break" a malicious solvers from continuing damage.

Alternatives considered

Fully rely on external systems (e.g. the EBBO check, a new standalone circuit breaker tool) for this task. However, we believe that having multiple independent implementations of this crucial logic is important to ensure robust workings of the system. Moreover, the autopilot is still taking a central role in arbitrating auctions and thus is in a good position to enforce and potentially halt the issuance of new auctions in case something goes wrong.

We also already have reliable and reorg-resistant indexing of all settlements, which should™️ make adding this logic a fairly contained task (compared to building a new standalone system from scratch). This is not to say that we shouldn't have multiple independent services that can perform this task in the future.

Additional context

Since the barn and production environment operate on separate auction but share the same settlement contract instance, we probably cannot simply audit all settlement events that are emitted and ban if a settlement is emitted "out of competition". If we did, the prod autopilot would have to check the barn autopilot if a settlement satisfied its competition and vice versa. Moreover, there are some legit out of competition settlements (e.g. for periodic fee withdrawals). I'd suggest configuring each driver with a set of allow-listed solver addresses it tracks settlements for (and ignores all other settlements). It's then up to the autopilot configuration to make sure all allow-listed solver addresses are covered.
Monitoring all settlements and querying both the production and barn orderbook API can be left for the EBBO circuit breaker.

The current allow list contract is located at https://etherscan.io/address/0x2c4c28DDBdAc9C5E7055b4C863b72eA0149D8aFE with https://app.safe.global/balances?safe=eth:0xA03be496e67Ec29bC62F01a428683D7F9c204930 being the owner.

With the help of Zodiac, we can enable a Roles Modifier Module on the Safe that allows an arbitrary set of accounts to perform specific operations on behalf of the Safe without requiring additional signatures for them signers. This tx creates a role allowing a single EOA to remove solvers from the allow list and this tx executes a removal on behalf of the multi-sig on Sepolia. The latter is what we would expect the autopilot to perform in case wrong-doing is detected.

Acceptance criteria

Every driver that is available in the autopilot should be registered with their public address for solution submission. For those addresses, whenever a settlement is observed the autopilot checks:

The settlement references an auction ID
The address that submitted the settlement won this auction
Exactly the orders that were referenced in the winning solution got settled and matched at the prices indicated in the solution
Pre and post interactions found in the settlement are a subset of the pre/post interactions of the matched orders

If any of those checks fails, the autopilot should send a transaction to remove the affected solver from the allow list and send an alert.

The text was updated successfully, but these errors were encountered:

sunce86 · 2024-05-09T08:52:20Z

I'd suggest configuring each driver with a set of allow-listed solver addresses it tracks settlements for (and ignores all other settlements)

Instead I would do the following:

During competition time, when we have current auction available, consider only solutions with orders that either

are part of the auction
not part of the auction but also not existent in the db (JIT)

During competition, for winning solution, save Quality to db so it can be checked in postprocessing. Quality means EncodedSettlement metadata saved to db (tokens, clearing prices, trades). Instead of saving everything we could hash it and compare hashes later (I think we can assume solvers won't deliver better solution than promised because why would they do that instead of pocketing the diff themselves).

This way we make sure both checks are true: solver did not settle orders outside competition but also with promised quality.

MartinquaXD · 2024-05-09T09:47:59Z

Instead of saving everything we could hash it and compare hashes later (I think we can assume solvers won't deliver better solution than promised because why would they do that instead of pocketing the diff themselves).

I'm worried that using just hashes is too unstable. I think given that settlements are usually not very big we should start by storing all necessary data unhashed and only later optimize for space if this actually becomes a concern.
Also when there is a discrepancy between the executed and expected value it would already be easier to analyse what went wrong which should help with recovering damages.

sunce86 · 2024-05-09T13:06:15Z

In that case, if we don't want to hash but instead save full metadata, I would suggest saving dto representation of domain settlement object: https://github.com/cowprotocol/services/blob/main/crates/autopilot/src/domain/settlement/mod.rs#L22. This object contains subset of encoded settlement data that autopilot is interested in: it can derive surplus/fee/score based on it. If we ever add a new field autopilot is interested in, it will be part of this struct.

This object is also easily recoverable from plain calldata that is provided by /reveal/ driver endpoint.

Edit: we have a settlement_call_data database table where we save calldata of the winning solution, so we might as well fetch it from there (and convert to Settlement) instead of saving individual fields.

fleupold · 2024-05-10T09:23:42Z

I'd suggest configuring each driver with a set of allow-listed solver addresses it tracks settlements for (and ignores all other settlements)

Instead I would do the following: [...]
This way we make sure both checks are true: solver did not settle orders outside competition but also with promised quality.

I still think we need to configure each autopilot with a list of solver addresses it needs to "observe" as otherwise a solver could just return an empty solution and still settle out of competition (without specifying an auction ID). By only adding solver addresses to the on-chain allow list that are tracked by at least one autopilot (barn vs. prod) we make sure every solver is being observed.

All of the promised solution data is already stored in the competitions table (we may have to refactor this one to be more "stable" as it was considered a pure debugging table in the past).

Wrt /reveal I think we can even consider deleting this endpoint altogether. It was used to ensure we have the uninternalised calldata of the winning solution (to make sure buffer trading is "ok"), but since recently we treat buffer trading as slippage and thus don't require proof that the trading was indeed possible. The only advantage I see from /reveal is that it could simplify implementing cowprotocol/contracts#84 by simply signing the entire calldata (instead of creating a new struct), but I'm not sure the extra round trip is worth it.

sunce86 · 2024-05-10T10:06:30Z

I still think we need to configure each autopilot with a list of solver addresses it needs to "observe" as otherwise a solver could just return an empty solution and still settle out of competition (without specifying an auction ID). By only adding solver addresses to the on-chain allow list that are tracked by at least one autopilot (barn vs. prod) we make sure every solver is being observed.

I still don't understand why we need this configuration. Can you elaborate this case further as I don't see how we would be unable to detect it?

without specifying an auction ID

This is a slashable violation itself. We would call removeSolver on this violation.

fleupold · 2024-05-10T11:08:02Z

I still don't understand why we need this configuration. Can you elaborate this case further as I don't see how we would be unable to detect it?

We are running two autopilots, barn and prod. How would the barn autopilot be able to "check" that a settlement of the prod autopilot is ok (it requires access to the prod db or API)? Therefore it must either ignore settlements from a certain list of addresses (knowing that some other component checks it) or only focus on settlements coming from a certain list of addresses. As long as we make sure all addresses that are allow listed to solve are covered by at least one autopilot this is fine (we can have another more generalized backup circuit breaker that the solver team is working on which checks all settlements based on API access).

Note, that there are also fee withdrawals which create Settlement events and shouldn't lead to removals, e.g.: https://etherscan.io/tx/0x54f65d217e31cc07d759f3c23b70c564bd906bad942f4b813ce56971cc6854d7/advanced#eventlog

sunce86 · 2024-05-14T15:13:50Z

Pre and post interactions found in the settlement are a subset of the pre/post interactions of the matched orders

Pre/post interactions are not a part of promised solution (solution returned on /solve during competition). Does this mean we need to add interactions to this response? Is that what we want to compare here? Or do we want to compare with pre/post interactions in the order database table?

fleupold · 2024-05-14T18:27:51Z

Correct, pre and post interactions are available to the autopilot via the user order. In the first phase (where we are simply circuit breaking whenever we see a bad settlement) we should be able to assert that those interactions were part of the final settlement.

Later, when we want to sign off on the solution things may be a bit more complicated (the autopilot can sign off on a subset of interactions as solvers are allowed to add additional pre/post interactions to their settlements).

MartinquaXD · 2024-05-15T09:33:01Z

Later, when we want to sign off on the solution things may be a bit more complicated

This basically brings us back to needing the /reveal endpoint, no? Otherwise solvers either have to produce the complete executable solution on /solve which MMs don't like because they only want to hedge risks if they win or we blindly trust that solvers actually end up executing the pre- and post-interactions they promised in /solve.

Until the autopilot actually signs the winning calldata I don't think we need to have the pre- and post-interactions in the /solve endpoint. Whatever is returned there is not be trusted anyway and it's reasonable to expect solvers to include these custom interactions in their solution.
Since we are able to recover custom interactions for orders only with the calldata (by checking our DB with the settled order uids) I don't think we need to add the info in the /solve endpoint. Or am I missing something here?

fleupold · 2024-05-15T12:20:17Z

This basically brings us back to needing the /reveal endpoint, no?

I don't think so. The autopilot can still sign off on the list of pre and post interactions it knows from the orders (without knowing anything from the driver), as long as the driver makes sure they encode that list first (or last).
This way in the smart contract checking the settlement transaction, we can just look at the first n pre/post interactions of the solution and use that subset to re-compute the commitment and verify the autopilot signature.

This might be a bit too technical for our current issues, so I wouldn't discuss this in detail on this issue.

# Description Related to #2667 POC implementation for using "Roles" safe module to grant special role to an EOA to sign and execute "removeSolver" function on behalf of the gpv2_authenticator manager/owner safe. Need to add tests to see if this actually works. # Changes  - [ ] Added `Roles` smart contract - [ ] Added EOA account as configuration - [ ] Implemented `remove_solver` function ## How to test todo --------- Co-authored-by: Mateo <[email protected]> Co-authored-by: Mateo-mro <[email protected]>

commit ab487f4 Author: MartinquaXD <[email protected]> Date: Sat Jun 1 08:44:04 2024 +0200 Fix clippy warning commit 576cf19 Author: MartinquaXD <[email protected]> Date: Sat Jun 1 08:39:19 2024 +0200 Implemented suggested changes commit 60650da Author: Mateo <[email protected]> Date: Fri May 31 18:22:47 2024 +0200 Remove log from the mock solver commit 859380a Author: Mateo <[email protected]> Date: Fri May 31 15:37:34 2024 +0200 Linter commit 4a67faf Author: Mateo <[email protected]> Date: Fri May 31 15:22:14 2024 +0200 Implement e2e tests for JIT orders commit debfa39 Author: Dusan Stanivukovic <[email protected]> Date: Fri May 31 09:37:51 2024 +0200 Circuit breaker remove solver (#2705) # Description Related to #2667 POC implementation for using "Roles" safe module to grant special role to an EOA to sign and execute "removeSolver" function on behalf of the gpv2_authenticator manager/owner safe. Need to add tests to see if this actually works. # Changes  - [ ] Added `Roles` smart contract - [ ] Added EOA account as configuration - [ ] Implemented `remove_solver` function ## How to test todo --------- Co-authored-by: Mateo <[email protected]> Co-authored-by: Mateo-mro <[email protected]>

github-actions · 2024-07-15T00:07:16Z

This issue has been marked as stale because it has been inactive a while. Please update this issue or it will be automatically closed.

mfw78 · 2024-07-15T06:19:57Z

Not stale

github-actions · 2024-09-14T00:07:37Z

This issue has been marked as stale because it has been inactive a while. Please update this issue or it will be automatically closed.

github-actions · 2024-11-15T00:08:46Z

This issue has been marked as stale because it has been inactive a while. Please update this issue or it will be automatically closed.

github-actions · 2025-01-22T00:52:57Z

This issue has been marked as stale because it has been inactive a while. Please update this issue or it will be automatically closed.

fleupold added the E:7.1 Ext. solvers operating driver See https://github.com/cowprotocol/pm/issues/57 for details label Apr 26, 2024

fleupold mentioned this issue Apr 26, 2024

[Epic] 7.1: Ext. solvers operating driver cowprotocol/pm#57

Open

6 tasks

sunce86 mentioned this issue May 8, 2024

Circuit breaker remove solver #2705

Merged

3 tasks

sunce86 mentioned this issue May 9, 2024

[ON HOLD] Reject solution containing outside-of-auction order #2712

Closed

This was referenced May 10, 2024

feat: Ext. solvers operating driver tracker #2716

Open

Add autopilot configuration to define observable solver addresses #2722

Closed

m-lord-renkse mentioned this issue May 29, 2024

Circuit breaker: Instantiate configuration + draft struct in domain #2761

Closed

github-actions bot added the stale label Jul 15, 2024

mfw78 removed the stale label Jul 15, 2024

github-actions bot added the stale label Sep 14, 2024

sunce86 removed the stale label Sep 15, 2024

github-actions bot added the stale label Nov 15, 2024

github-actions bot closed this as completed Nov 22, 2024

m-lord-renkse removed the stale label Nov 22, 2024

m-lord-renkse reopened this Nov 22, 2024

github-actions bot added the stale label Jan 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Implement basic circuit breaker in autopilot #2667

feat: Implement basic circuit breaker in autopilot #2667

fleupold commented Apr 26, 2024

sunce86 commented May 9, 2024 •

edited

Loading

MartinquaXD commented May 9, 2024

sunce86 commented May 9, 2024 •

edited

Loading

fleupold commented May 10, 2024

sunce86 commented May 10, 2024

fleupold commented May 10, 2024

sunce86 commented May 14, 2024 •

edited

Loading

fleupold commented May 14, 2024

MartinquaXD commented May 15, 2024

fleupold commented May 15, 2024

github-actions bot commented Jul 15, 2024

mfw78 commented Jul 15, 2024

github-actions bot commented Sep 14, 2024

github-actions bot commented Nov 15, 2024

github-actions bot commented Jan 22, 2025

feat: Implement basic circuit breaker in autopilot #2667

feat: Implement basic circuit breaker in autopilot #2667

Comments

fleupold commented Apr 26, 2024

Problem

Suggested solution

Alternatives considered

Additional context

Acceptance criteria

sunce86 commented May 9, 2024 • edited Loading

MartinquaXD commented May 9, 2024

sunce86 commented May 9, 2024 • edited Loading

fleupold commented May 10, 2024

sunce86 commented May 10, 2024

fleupold commented May 10, 2024

sunce86 commented May 14, 2024 • edited Loading

fleupold commented May 14, 2024

MartinquaXD commented May 15, 2024

fleupold commented May 15, 2024

github-actions bot commented Jul 15, 2024

mfw78 commented Jul 15, 2024

github-actions bot commented Sep 14, 2024

github-actions bot commented Nov 15, 2024

github-actions bot commented Jan 22, 2025

sunce86 commented May 9, 2024 •

edited

Loading

sunce86 commented May 9, 2024 •

edited

Loading

sunce86 commented May 14, 2024 •

edited

Loading