-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: Investigate auction propagation delay between driver <> solver engine #2133
Comments
There are only 2 async operations happening in the measured code path:
Edit: Looks like the culprit could actually be lock contention. If you configure enough solvers you can reproduce slow times even without cache misses. |
# Description We've noticed weird delays between the autopilot sending an auction and the driver being ready to start working on it. The linked issue suggests that we spend a lot of time converting the auction and enriching it with token meta data. This should be super fast because all the data should be cached at all times. However, was one token that caused it all to break down: `0xeeeee...` (the address we use to denote the chains native token) This is roughly what happened: 1. every driver receives auction 2. every driver collects all the addresses for which we don't have cached metadata 3. this always yielded `0xeeee...` (it's not a real token so we never get data when calling ERC20 functions on it) 4. every driver wasted some time sending RPC requests for `0xeeee` 5. every driver tried to take an exclusive lock to write no data (empty vec) to the cache Overall this wasted time with unnecessary network requests and serially taking an exclusive lock for all drivers. # Changes * most importantly filter out `0xeee` * avoid sending duplicate RPC requests by using `RequestSharing` * don't take an exclusive lock if some other driver cached the same data in the meantime * moved logging of token balance update task out of the critical section All changes combined results in us now spending a couple microseconds on getting cached balances instead of seconds. And because this was the bulk of the work captured by the log `auction task execution time` we reduced that time to ~11ms. ## How to test e2e tests should still pass and manual test confirmed the reduced latency. Fixes #2133
Follow up on this. Each driver now spends 2-3 ms on this part most of the time. |
Background
Similar to #2095 there is another 1.7s propagation delay between the driver and the solver-enginer (cf. https://production-6de61f.kb.eu-central-1.aws.cloud.es.io/app/r/s/U1XTl)
It would be good to see if we can reduce this as well since we are leaking time the external solvers can actually spend on finding a solution.
The text was updated successfully, but these errors were encountered: