-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce memory consumption of RecentBlockCache
#2102
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(2) is a safe one, I don't think anyone would care if liquidity for older blocks is gone.
For (1), do you expect to significantly affect the liquidity fetching time even for deep and well known liquidity pools like ETH/USDC and similar?
Btw, I know it's not a popular way of solving problems in our system, but I do think (at some point) we should optimize our system to be
- super fast and reliable for quoting and solving most used and well known tokens
- less reliable for quoting and solving rare tokens.
Liquidity fetching times for any tokens that are part of auctions should not be affected at all and I expect the mentioned pool to always be part of an auction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice find, but I'm a bit sceptical if this will fix the steady memory increase we see over a 3h time horizons in shadow (in your graph I also don't really see the last graph being significantly flatter than the 2nd).
I'd expect the fact that we already limit the number of keys we keep up to date to 1000 and the existing remove_cached_blocks_older_than
logic to still evict pools regularly and lead to some upper bound of memory usage that should be reached after a few minutes (not hours).
E.g. look at these two logs:
- from 09:30UTC: DEBUG uniswap_v2_cache: shared::recent_block_cache: the cache now contains entries for 27713 block-key combinations
- from 12:30 UTC: DEBUG uniswap_v2_cache: shared::recent_block_cache: the cache now contains entries for 22954 block-key combinations
Less values in the latter, yet a memory consumption 2x higher
// remember the key for future background updates of the cached | ||
// liquidity. | ||
for key in found_keys { | ||
mutexed.recently_used.cache_set(key, ()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also call cache_set
in get
. Should the same logic apply there? (also a 🤓 reminder that having this cache_set
in two code paths is error prone)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I'm not sure how we can get around the 2 cache_set()
s nicely though.
If we only have a single one at the end of the main function we'll not remember any keys for background updates when we time out requests after having to fetch all liquidity fresh when we restart the system.
Will leave the 2 cache_set()
for now and keep thinking about ways to improve that.
let mut cached_keys = HashSet::new(); | ||
for ((_block, key), values) in self.entries.iter_mut().rev() { | ||
if !cached_keys.insert(key) { | ||
*values = vec![]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that for new entries this vector is getting longer and longer for some reason (ie a bug with the BalancerPoolFetcher)?
Since we are now iterating over all entries anyways, maybe we can log the number of total values remaining after pruning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that for new entries this vector is getting longer and longer for some reason (ie a bug with the BalancerPoolFetcher)?
I have no evidence that this is impossible. I also had the hunch that we could have a bug along those lines so I temporarily had an assertion verifying that all values are unique and it didn't cause any crashes in my 5 minutes test so I'd say we probably don't have an issues with that.
Since we are now iterating over all entries anyways, maybe we can log the number of total values remaining after pruning?
Makes sense, will add that.
🤯 speaking of this log, I noticed we never actually see a log for BalancerV2 liquidity, could it be we never actually prune the recent block cache for this data source? This would explain the leak. It looks like the services/crates/driver/src/boundary/liquidity/uniswap/v2.rs Lines 192 to 198 in 72c207f
Whereas I don't see Also I'm not sure how we handle Uni v3 liquidity? |
Description
Our
RecentBlockCache
works like this:Whenever we see a new block we fetch the current liquidity for all the liquidity sources and write them to the cache together with their block. We have a max cache duration. Whenever the cached state exceeds that duration we remove the oldest entries.
This implementation uses unnecessarily much memory in 2 ways:
There is no access pattern "Give me cached liquidity specifically from an older block with number X"
That means it's enough to keep the most recent data for any liquidity pool cached at any point.
We can see these 2 things at play with this log.
After ~1h of operation it shows a single
RecentBlockCache
holding ~20K items. On an average auction we can fetch ~800 uni v2 sources. We currently have a configuration where we cache up to 10 blocks worth of data. Meaning we have roughly 8K cache entries for liquidity that is needed in auction and 12K entries that's only needed for quotes.Also this is only for a single univ2 like liquidity source. In total we have 4 different ones configured in our backend.
Changes
We address
1
by not remembering liquidity sources for background updates for quote requests.We address
2
by throwing away all the duplicated data.How to test
I did a manual set up where I run an autopilot locally in shadow mode (fetch auction from prod mainnet) and a driver with all liquidity sources enabled.
I collected 3 graphs in total to measure the impact of this change on the memory.