Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Implement basic circuit breaker in autopilot #2667

Open
Tracked by #2716
fleupold opened this issue Apr 26, 2024 · 15 comments
Open
Tracked by #2716

feat: Implement basic circuit breaker in autopilot #2667

fleupold opened this issue Apr 26, 2024 · 15 comments
Labels
E:7.1 Ext. solvers operating driver See https://github.com/cowprotocol/pm/issues/57 for details stale

Comments

@fleupold
Copy link
Contributor

Problem

Solvers that are running their own driver, could theoretically submit solutions outside of the competition (simply calling settle with a bunch of signed orders). In this case the duration for which a malicious solver can continue this behavior is critical for the damage that gets accrued (the more orders settled at limit price, the higher the foregone surplus).

It's therefore important for the protocol to run an automated and diverse set of "watch dogs" that monitor misbehaviour and "circuit break" a malicious solvers from continuing damage.

Suggested solution

Implement basic competition checks in the autopilot upon indexing settlements and allow the autopilot to be configured with an account that can removeSolver from the current allow list implementation.

Alternatives considered

Fully rely on external systems (e.g. the EBBO check, a new standalone circuit breaker tool) for this task. However, we believe that having multiple independent implementations of this crucial logic is important to ensure robust workings of the system. Moreover, the autopilot is still taking a central role in arbitrating auctions and thus is in a good position to enforce and potentially halt the issuance of new auctions in case something goes wrong.

We also already have reliable and reorg-resistant indexing of all settlements, which should™️ make adding this logic a fairly contained task (compared to building a new standalone system from scratch). This is not to say that we shouldn't have multiple independent services that can perform this task in the future.

Additional context

Since the barn and production environment operate on separate auction but share the same settlement contract instance, we probably cannot simply audit all settlement events that are emitted and ban if a settlement is emitted "out of competition". If we did, the prod autopilot would have to check the barn autopilot if a settlement satisfied its competition and vice versa. Moreover, there are some legit out of competition settlements (e.g. for periodic fee withdrawals). I'd suggest configuring each driver with a set of allow-listed solver addresses it tracks settlements for (and ignores all other settlements). It's then up to the autopilot configuration to make sure all allow-listed solver addresses are covered.
Monitoring all settlements and querying both the production and barn orderbook API can be left for the EBBO circuit breaker.

The current allow list contract is located at https://etherscan.io/address/0x2c4c28DDBdAc9C5E7055b4C863b72eA0149D8aFE with https://app.safe.global/balances?safe=eth:0xA03be496e67Ec29bC62F01a428683D7F9c204930 being the owner.

With the help of Zodiac, we can enable a Roles Modifier Module on the Safe that allows an arbitrary set of accounts to perform specific operations on behalf of the Safe without requiring additional signatures for them signers. This tx creates a role allowing a single EOA to remove solvers from the allow list and this tx executes a removal on behalf of the multi-sig on Sepolia. The latter is what we would expect the autopilot to perform in case wrong-doing is detected.

Acceptance criteria

Every driver that is available in the autopilot should be registered with their public address for solution submission. For those addresses, whenever a settlement is observed the autopilot checks:

  1. The settlement references an auction ID
  2. The address that submitted the settlement won this auction
  3. Exactly the orders that were referenced in the winning solution got settled and matched at the prices indicated in the solution
  4. Pre and post interactions found in the settlement are a subset of the pre/post interactions of the matched orders

If any of those checks fails, the autopilot should send a transaction to remove the affected solver from the allow list and send an alert.

@fleupold fleupold added the E:7.1 Ext. solvers operating driver See https://github.com/cowprotocol/pm/issues/57 for details label Apr 26, 2024
@sunce86
Copy link
Contributor

sunce86 commented May 9, 2024

I'd suggest configuring each driver with a set of allow-listed solver addresses it tracks settlements for (and ignores all other settlements)

Instead I would do the following:

  1. During competition time, when we have current auction available, consider only solutions with orders that either
  • are part of the auction
  • not part of the auction but also not existent in the db (JIT)
  1. During competition, for winning solution, save Quality to db so it can be checked in postprocessing. Quality means EncodedSettlement metadata saved to db (tokens, clearing prices, trades). Instead of saving everything we could hash it and compare hashes later (I think we can assume solvers won't deliver better solution than promised because why would they do that instead of pocketing the diff themselves).

This way we make sure both checks are true: solver did not settle orders outside competition but also with promised quality.

@MartinquaXD
Copy link
Contributor

Instead of saving everything we could hash it and compare hashes later (I think we can assume solvers won't deliver better solution than promised because why would they do that instead of pocketing the diff themselves).

I'm worried that using just hashes is too unstable. I think given that settlements are usually not very big we should start by storing all necessary data unhashed and only later optimize for space if this actually becomes a concern.
Also when there is a discrepancy between the executed and expected value it would already be easier to analyse what went wrong which should help with recovering damages.

@sunce86
Copy link
Contributor

sunce86 commented May 9, 2024

In that case, if we don't want to hash but instead save full metadata, I would suggest saving dto representation of domain settlement object: https://github.com/cowprotocol/services/blob/main/crates/autopilot/src/domain/settlement/mod.rs#L22. This object contains subset of encoded settlement data that autopilot is interested in: it can derive surplus/fee/score based on it. If we ever add a new field autopilot is interested in, it will be part of this struct.

This object is also easily recoverable from plain calldata that is provided by /reveal/ driver endpoint.

Edit: we have a settlement_call_data database table where we save calldata of the winning solution, so we might as well fetch it from there (and convert to Settlement) instead of saving individual fields.

@fleupold
Copy link
Contributor Author

I'd suggest configuring each driver with a set of allow-listed solver addresses it tracks settlements for (and ignores all other settlements)

Instead I would do the following: [...]
This way we make sure both checks are true: solver did not settle orders outside competition but also with promised quality.

I still think we need to configure each autopilot with a list of solver addresses it needs to "observe" as otherwise a solver could just return an empty solution and still settle out of competition (without specifying an auction ID). By only adding solver addresses to the on-chain allow list that are tracked by at least one autopilot (barn vs. prod) we make sure every solver is being observed.

All of the promised solution data is already stored in the competitions table (we may have to refactor this one to be more "stable" as it was considered a pure debugging table in the past).

Wrt /reveal I think we can even consider deleting this endpoint altogether. It was used to ensure we have the uninternalised calldata of the winning solution (to make sure buffer trading is "ok"), but since recently we treat buffer trading as slippage and thus don't require proof that the trading was indeed possible. The only advantage I see from /reveal is that it could simplify implementing cowprotocol/contracts#84 by simply signing the entire calldata (instead of creating a new struct), but I'm not sure the extra round trip is worth it.

@sunce86
Copy link
Contributor

sunce86 commented May 10, 2024

I still think we need to configure each autopilot with a list of solver addresses it needs to "observe" as otherwise a solver could just return an empty solution and still settle out of competition (without specifying an auction ID). By only adding solver addresses to the on-chain allow list that are tracked by at least one autopilot (barn vs. prod) we make sure every solver is being observed.

I still don't understand why we need this configuration. Can you elaborate this case further as I don't see how we would be unable to detect it?

without specifying an auction ID

This is a slashable violation itself. We would call removeSolver on this violation.

@fleupold
Copy link
Contributor Author

I still don't understand why we need this configuration. Can you elaborate this case further as I don't see how we would be unable to detect it?

We are running two autopilots, barn and prod. How would the barn autopilot be able to "check" that a settlement of the prod autopilot is ok (it requires access to the prod db or API)? Therefore it must either ignore settlements from a certain list of addresses (knowing that some other component checks it) or only focus on settlements coming from a certain list of addresses. As long as we make sure all addresses that are allow listed to solve are covered by at least one autopilot this is fine (we can have another more generalized backup circuit breaker that the solver team is working on which checks all settlements based on API access).

Note, that there are also fee withdrawals which create Settlement events and shouldn't lead to removals, e.g.: https://etherscan.io/tx/0x54f65d217e31cc07d759f3c23b70c564bd906bad942f4b813ce56971cc6854d7/advanced#eventlog

@sunce86
Copy link
Contributor

sunce86 commented May 14, 2024

  1. Pre and post interactions found in the settlement are a subset of the pre/post interactions of the matched orders

Pre/post interactions are not a part of promised solution (solution returned on /solve during competition). Does this mean we need to add interactions to this response? Is that what we want to compare here? Or do we want to compare with pre/post interactions in the order database table?

@fleupold
Copy link
Contributor Author

Correct, pre and post interactions are available to the autopilot via the user order. In the first phase (where we are simply circuit breaking whenever we see a bad settlement) we should be able to assert that those interactions were part of the final settlement.

Later, when we want to sign off on the solution things may be a bit more complicated (the autopilot can sign off on a subset of interactions as solvers are allowed to add additional pre/post interactions to their settlements).

@MartinquaXD
Copy link
Contributor

Later, when we want to sign off on the solution things may be a bit more complicated

This basically brings us back to needing the /reveal endpoint, no? Otherwise solvers either have to produce the complete executable solution on /solve which MMs don't like because they only want to hedge risks if they win or we blindly trust that solvers actually end up executing the pre- and post-interactions they promised in /solve.

Until the autopilot actually signs the winning calldata I don't think we need to have the pre- and post-interactions in the /solve endpoint. Whatever is returned there is not be trusted anyway and it's reasonable to expect solvers to include these custom interactions in their solution.
Since we are able to recover custom interactions for orders only with the calldata (by checking our DB with the settled order uids) I don't think we need to add the info in the /solve endpoint. Or am I missing something here?

@fleupold
Copy link
Contributor Author

This basically brings us back to needing the /reveal endpoint, no?

I don't think so. The autopilot can still sign off on the list of pre and post interactions it knows from the orders (without knowing anything from the driver), as long as the driver makes sure they encode that list first (or last).
This way in the smart contract checking the settlement transaction, we can just look at the first n pre/post interactions of the solution and use that subset to re-compute the commitment and verify the autopilot signature.

This might be a bit too technical for our current issues, so I wouldn't discuss this in detail on this issue.

m-lord-renkse added a commit that referenced this issue May 31, 2024
# Description
Related to #2667

POC implementation for using "Roles" safe module to grant special role
to an EOA to sign and execute "removeSolver" function on behalf of the
gpv2_authenticator manager/owner safe.

Need to add tests to see if this actually works.

# Changes
<!-- List of detailed changes (how the change is accomplished) -->

- [ ] Added `Roles` smart contract
- [ ] Added EOA account as configuration
- [ ] Implemented `remove_solver` function

## How to test
todo

---------

Co-authored-by: Mateo <[email protected]>
Co-authored-by: Mateo-mro <[email protected]>
MartinquaXD added a commit that referenced this issue Jun 1, 2024
commit ab487f4
Author: MartinquaXD <[email protected]>
Date:   Sat Jun 1 08:44:04 2024 +0200

    Fix clippy warning

commit 576cf19
Author: MartinquaXD <[email protected]>
Date:   Sat Jun 1 08:39:19 2024 +0200

    Implemented suggested changes

commit 60650da
Author: Mateo <[email protected]>
Date:   Fri May 31 18:22:47 2024 +0200

    Remove log from the mock solver

commit 859380a
Author: Mateo <[email protected]>
Date:   Fri May 31 15:37:34 2024 +0200

    Linter

commit 4a67faf
Author: Mateo <[email protected]>
Date:   Fri May 31 15:22:14 2024 +0200

    Implement e2e tests for JIT orders

commit debfa39
Author: Dusan Stanivukovic <[email protected]>
Date:   Fri May 31 09:37:51 2024 +0200

    Circuit breaker remove solver (#2705)

    # Description
    Related to #2667

    POC implementation for using "Roles" safe module to grant special role
    to an EOA to sign and execute "removeSolver" function on behalf of the
    gpv2_authenticator manager/owner safe.

    Need to add tests to see if this actually works.

    # Changes
    <!-- List of detailed changes (how the change is accomplished) -->

    - [ ] Added `Roles` smart contract
    - [ ] Added EOA account as configuration
    - [ ] Implemented `remove_solver` function

    ## How to test
    todo

    ---------

    Co-authored-by: Mateo <[email protected]>
    Co-authored-by: Mateo-mro <[email protected]>
Copy link

This issue has been marked as stale because it has been inactive a while. Please update this issue or it will be automatically closed.

@github-actions github-actions bot added the stale label Jul 15, 2024
@mfw78 mfw78 removed the stale label Jul 15, 2024
@mfw78
Copy link
Contributor

mfw78 commented Jul 15, 2024

Not stale

Copy link

This issue has been marked as stale because it has been inactive a while. Please update this issue or it will be automatically closed.

@github-actions github-actions bot added the stale label Sep 14, 2024
@sunce86 sunce86 removed the stale label Sep 15, 2024
Copy link

This issue has been marked as stale because it has been inactive a while. Please update this issue or it will be automatically closed.

@m-lord-renkse m-lord-renkse reopened this Nov 22, 2024
Copy link

This issue has been marked as stale because it has been inactive a while. Please update this issue or it will be automatically closed.

@github-actions github-actions bot added the stale label Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
E:7.1 Ext. solvers operating driver See https://github.com/cowprotocol/pm/issues/57 for details stale
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants