-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Notifies solver if their solution simulates successfully at least once #2077
Notifies solver if their solution simulates successfully at least once #2077
Conversation
CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅ |
I have read the CLA Document and I hereby sign the CLA |
I just realized that the information I need is actually being logged internally here: services/crates/solver/src/driver_logger.rs Lines 210 to 215 in e185f8e
So I guess the changes suggested in the PR are not what they should :( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good direction.
The important part is that we are missing the implementation in the solvers
crate and populating the SolverRejectionReason
with the new data.
The code you referred to in solver
crate is for the legacy driver and not relevant for this PR.
while let Some(block) = stream.next().await { | ||
if let Err(err) = self.simulate_settlement(&settlement).await { | ||
observe::winner_voided(block, &err); | ||
*score_ref = None; | ||
*self.settlement.lock().unwrap() = None; | ||
if let Some(id) = settlement.notify_id() { | ||
notify::simulation_failed(&self.solver, auction.id(), id, &err); | ||
notify::simulation_failed(&self.solver, auction.id(), id, &err, simulated_once); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for simulated_once
, just call
notify::simulation_failed(&self.solver, auction.id(), id, &err, true)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I am misinterpreting the code then. So if the condition on (new) line 195 is true in the first iteration of the loop, then it would return that it simulated successfully at least once, where in fact it hasn't, right? What am I missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first ever simulation happens when the solution is encoded, so the successful simulation is the prerequisite for a solution to be encoded.
This whole code block is not even executed if the first simulation failed. So, if the simulation fails here, it means that is the second, third .... attempt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got you, thanks!
SimulationFailed(eth::BlockNo, Transaction), | ||
/// Failed simulation during competition. Last parameter is true | ||
/// if has simulated at least once. | ||
SimulationFailed(eth::BlockNo, Transaction, bool), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Would suggest this to be something like
SimulationFailed(eth::BlockNo, Transaction, SucceededAtLeastOnce),
where one would define this type as
type SucceededAtLeastOnce = bool;
@@ -103,7 +104,7 @@ pub enum Kind { | |||
Timeout, | |||
EmptySolution, | |||
DuplicatedSolutionId, | |||
SimulationFailed(BlockNo, Tx), | |||
SimulationFailed(BlockNo, Tx, bool), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, substitute bool with more readable type.
Thanks for the help. I tried to follow your directions, but I am a bit confused as it seems to be at least two different versions of the code in the same repo. I was wondering that maybe a better name than SimulatesSuccessfullyAtLeastOnce could be SimulatesSuccesfullyWhenEncoded, or SimulatesSuccessfullyAtLiquidityBlock, since like you explained if it passes encoding than the flag is true. Anyone, please let me know what are the next changes :) |
The next step would be to also update the |
I think I did here. No? |
The change looks overall good to me. It would be good to run it end to end to see if you receive the expected notifications. We are currently in the process of creating a playground to facilitate running the whole system easily. In the meantime you can
For correct configuration files, please look at the example @sunce86 do you have any remaining comments? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lg
Thanks for this walkthrough. I have some questions:
What is "mysolver" here? I thought this was my solver, but after running it I think it is what you call "solver engine" in step 3 (which is not my solver engine)? If so, what should I set "mysolver" to? It can't be the "solver-name" inside the toml, since these have spaces.
I can't find any driver.config.toml in any directory of the services repo. Am I missing it?
This is clear. |
It's a placeholder for the name of the solver engine you are spawning in step 3. This one connects to your Quasilabs API. The colocated setup looks roughly: autopilot -> driver -> solver engine -> legacy solver (Quasilabs) You can cally
services/crates/driver/example.toml Line 1 in eec5702
|
@fleupold Thanks for clarifying, I was able to run the full stack. Well, sort of, it seems the solver engine does too many requests per second: 0: RPC error: Error { code: ServerError(429), message: "Your app has exceeded its compute units per second capacity. If you have retries enabled, you can safely ignore this message. If not, check out https://docs.alchemy.com/reference/throughput", data: None } Also I was wondering, since I don't have access to my solver's private key, will the (local) driver/solver engine be able to simulate the solutions I'm going to send? |
I've exhausted both my infura and my alchemy usage for the day after using the driver for 5 minutes :) Another question, it seems the time_limit I am getting for solving are in the order of 2 seconds. I found out about the "--solve-deadline" parameter in the component of step 1 (autopilot?) but that is set to 15. Anyway, although it is very valuable to get to know the new stack, maybe this is not the place to ask for help. Since it seems it is not so easy to set it up, I guess I'll open another issue or contact you on telegram. |
cc @mfw78 who is wokring on making running the system locally e2e much easier. |
Do you think we can merge this meanwhile? |
Sure, would you mind serving traffic in staging this week then to test this change? |
I just run
and it seems there are no changes to commit (?) My cargo version seems pretty recent: |
You can see what it complains about in https://github.com/cowprotocol/services/actions/runs/7100150555/job/19362557587?pr=2077 and should be able to reproduce locally
|
Description
As a solver, I am interested in automatically identifying "bad" tokens so that they can be automatically censored. A "bad" token is one that has some logic that makes it fail most of the times. When I receive a simulation failure, I run an online learning procedure to try to identify them. However, some (most?) simulation failures happen just due to volatilty, in which case this procedure should be skipped, or otherwise there can be false positives.
Changes
This PR adds a boolean to the notification sent to solvers, stating if the driver was able to simulate the solver solution at least once. If so, then this is a very strong indicator that the solution was rejected due to volatility.
I considered other options to avoid this PR:
How to test
I did not test it since I forgot how to. In fact I would be very surprised if I changed things in the right places. Anyway, if I failed to do so, consider the PR as an elaborate feature request :)
Thanks!