Reduce memory consumption of `RecentBlockCache` #2102

MartinquaXD · 2023-11-30T12:42:22Z

Description

Our RecentBlockCache works like this:

somebody requests liquidity
cache checks if it's already known
if it's not in the cache query the blockchain
store in cache
remember requested liquidity source for updating it in the background

Whenever we see a new block we fetch the current liquidity for all the liquidity sources and write them to the cache together with their block. We have a max cache duration. Whenever the cached state exceeds that duration we remove the oldest entries.

This implementation uses unnecessarily much memory in 2 ways:

We can fetch liquidity for quotes. For those requests it's okay to return liquidity that is not 100% up-to-date. However, we still remember the requested liquidity source for future updates. This is not great because we can receive quote requests for all sorts of random tokens we'll never see again.
We cache state for the same liquidity source for multiple blocks. But the cache only has 2 access patterns:
- "Give me the most recent available on the blockchain"
- "Give me the most recent available in the cache"
  There is no access pattern "Give me cached liquidity specifically from an older block with number X"
  That means it's enough to keep the most recent data for any liquidity pool cached at any point.

We can see these 2 things at play with this log.
After ~1h of operation it shows a single RecentBlockCache holding ~20K items. On an average auction we can fetch ~800 uni v2 sources. We currently have a configuration where we cache up to 10 blocks worth of data. Meaning we have roughly 8K cache entries for liquidity that is needed in auction and 12K entries that's only needed for quotes.
Also this is only for a single univ2 like liquidity source. In total we have 4 different ones configured in our backend.

Changes

We address 1 by not remembering liquidity sources for background updates for quote requests.
We address 2 by throwing away all the duplicated data.

How to test

I did a manual set up where I run an autopilot locally in shadow mode (fetch auction from prod mainnet) and a driver with all liquidity sources enabled.
I collected 3 graphs in total to measure the impact of this change on the memory.

This graph is the status quo (very noisy and not really reproducable across runs)

This graph applies one optimization that is not part of this PR to make the memory consumption more predictable across runs. I want to merge that optimization as well but right now it's very hacky. However, I will include this optimization in all my graphs because it makes the impact of each optimization easier to spot.

The effects of this PR's optimization. The memory usage is more stable over all and grows less over time.

sunce86

(2) is a safe one, I don't think anyone would care if liquidity for older blocks is gone.

For (1), do you expect to significantly affect the liquidity fetching time even for deep and well known liquidity pools like ETH/USDC and similar?
Btw, I know it's not a popular way of solving problems in our system, but I do think (at some point) we should optimize our system to be

super fast and reliable for quoting and solving most used and well known tokens
less reliable for quoting and solving rare tokens.

MartinquaXD · 2023-11-30T13:07:53Z

For (1), do you expect to significantly affect the liquidity fetching time even for deep and well known liquidity pools like ETH/USDC and similar?

Liquidity fetching times for any tokens that are part of auctions should not be affected at all and I expect the mentioned pool to always be part of an auction.
And for those fringe tokens that only show up during quotes we'll fetch the pool once and keep reusing it for a while. This means that a user that stays on the page and waits for the price to change will likely receive the exact same price from quasimodo and baseline until we evict the cached pool after 10 blocks but other price estimators that don't rely on the liquidity we fetch will continue to provide the most up-to-date prices.

fleupold

Nice find, but I'm a bit sceptical if this will fix the steady memory increase we see over a 3h time horizons in shadow (in your graph I also don't really see the last graph being significantly flatter than the 2nd).

I'd expect the fact that we already limit the number of keys we keep up to date to 1000 and the existing remove_cached_blocks_older_than logic to still evict pools regularly and lead to some upper bound of memory usage that should be reached after a few minutes (not hours).

E.g. look at these two logs:

from 09:30UTC: DEBUG uniswap_v2_cache: shared::recent_block_cache: the cache now contains entries for 27713 block-key combinations
from 12:30 UTC: DEBUG uniswap_v2_cache: shared::recent_block_cache: the cache now contains entries for 22954 block-key combinations

Less values in the latter, yet a memory consumption 2x higher

fleupold · 2023-11-30T15:55:58Z

crates/shared/src/recent_block_cache.rs

+                // remember the key for future background updates of the cached
+                // liquidity.
+                for key in found_keys {
+                    mutexed.recently_used.cache_set(key, ());


We also call cache_set in get. Should the same logic apply there? (also a 🤓 reminder that having this cache_set in two code paths is error prone)

Good catch. I'm not sure how we can get around the 2 cache_set()s nicely though.
If we only have a single one at the end of the main function we'll not remember any keys for background updates when we time out requests after having to fetch all liquidity fresh when we restart the system.
Will leave the 2 cache_set() for now and keep thinking about ways to improve that.

fleupold · 2023-11-30T17:19:48Z

crates/shared/src/recent_block_cache.rs

+        let mut cached_keys = HashSet::new();
+        for ((_block, key), values) in self.entries.iter_mut().rev() {
+            if !cached_keys.insert(key) {
+                *values = vec![];


Is it possible that for new entries this vector is getting longer and longer for some reason (ie a bug with the BalancerPoolFetcher)?

Since we are now iterating over all entries anyways, maybe we can log the number of total values remaining after pruning?

Is it possible that for new entries this vector is getting longer and longer for some reason (ie a bug with the BalancerPoolFetcher)?

I have no evidence that this is impossible. I also had the hunch that we could have a bug along those lines so I temporarily had an assertion verifying that all values are unique and it didn't cause any crashes in my 5 minutes test so I'd say we probably don't have an issues with that.

Since we are now iterating over all entries anyways, maybe we can log the number of total values remaining after pruning?

Makes sense, will add that.

fleupold · 2023-11-30T17:47:44Z

🤯 speaking of this log, I noticed we never actually see a log for BalancerV2 liquidity, could it be we never actually prune the recent block cache for this data source? This would explain the leak.

It looks like the run_maintenance of UniswapV2 pool_cache is called here:

services/crates/driver/src/boundary/liquidity/uniswap/v2.rs

Lines 192 to 198 in 72c207f

    
           tracing::info_span!("maintenance", block) 
        
               .in_scope(|| async move { 
        
                   if let Err(err) = pool_cache.run_maintenance().await { 
        
                       tracing::warn!(?err, "error updating pool cache"); 
        
                   } 
        
               }) 
        
               .await;

Whereas I don't see run_maintenance being called for balancer v2 pool_fetching:

Also I'm not sure how we handle Uni v3 liquidity?

MartinquaXD added 2 commits November 29, 2023 21:33

Don't keep updating liquidity that was only needed for a quote

f6abf26

Remove any unnecessarily cached liquidity

72c207f

MartinquaXD requested a review from a team as a code owner November 30, 2023 12:42

MartinquaXD mentioned this pull request Nov 30, 2023

Reduce memory used for balancer v2 liquidity #2099

Merged

sunce86 approved these changes Nov 30, 2023

View reviewed changes

fleupold approved these changes Nov 30, 2023

View reviewed changes

fleupold reviewed Nov 30, 2023

View reviewed changes

MartinquaXD and others added 9 commits December 1, 2023 08:14

Add logs for total number of cached items

def98fa

Disregard keys for quotes for second cache_set

ab330f1

Fix marks_recently_used test

f0754c3

Fix test evicts_old_blocks_from_cache

f862e94

fixup

7a333db

Fix test auto_updates_recently_used

8d4424e

Merge remote-tracking branch 'origin/main' into recent-block-cache-gc

261b6ff

Better variable name

7efc194

Merge branch 'main' into recent-block-cache-gc

5f3c7ed

MartinquaXD enabled auto-merge (squash) December 1, 2023 09:02

Fix now failing e2e test

2cfb4f2

MartinquaXD merged commit 1fe4c44 into main Dec 1, 2023
8 checks passed

MartinquaXD deleted the recent-block-cache-gc branch December 1, 2023 09:46

github-actions bot locked and limited conversation to collaborators Dec 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce memory consumption of `RecentBlockCache` #2102

Reduce memory consumption of `RecentBlockCache` #2102

MartinquaXD commented Nov 30, 2023

sunce86 left a comment

MartinquaXD commented Nov 30, 2023

fleupold left a comment

fleupold Nov 30, 2023

MartinquaXD Dec 1, 2023

fleupold Nov 30, 2023

MartinquaXD Nov 30, 2023

fleupold commented Nov 30, 2023

Reduce memory consumption of RecentBlockCache #2102

Reduce memory consumption of RecentBlockCache #2102

Conversation

MartinquaXD commented Nov 30, 2023

Description

Changes

How to test

sunce86 left a comment

Choose a reason for hiding this comment

MartinquaXD commented Nov 30, 2023

fleupold left a comment

Choose a reason for hiding this comment

fleupold Nov 30, 2023

Choose a reason for hiding this comment

MartinquaXD Dec 1, 2023

Choose a reason for hiding this comment

fleupold Nov 30, 2023

Choose a reason for hiding this comment

MartinquaXD Nov 30, 2023

Choose a reason for hiding this comment

fleupold commented Nov 30, 2023

Reduce memory consumption of `RecentBlockCache` #2102

Reduce memory consumption of `RecentBlockCache` #2102