Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjustments to Token Info Fetcher #1931

Merged
merged 5 commits into from
Oct 9, 2023
Merged

Adjustments to Token Info Fetcher #1931

merged 5 commits into from
Oct 9, 2023

Conversation

nlordell
Copy link
Contributor

@nlordell nlordell commented Oct 9, 2023

Description

In debugging with @fleupold on Friday, we noticed that the token info fetcher was potentially locking up concurrent quotes for certain price estimators, notably the ParaSwap and Quasimodo estimator that make use of token information (decimals specifically). This is caused by two things:

  1. The token info cache will hold an async Mutex for the entire fetching of token information.
  2. We are querying token info for 0xeee...eee in the autopilot which will always fail and not get included in the cache, meaning it will always hold the lock for a node roundtrip.

The combination of these two issue causes quotes that need token information to execute not fully concurrently. In particular, if quote 1 would need token info for A and quote 2 for B, then quote 2 will need to wait for token info of A to finish fetching before starting to fetch token info of B (even though they can run in parallel).

Changes

  • Implement exception for 0xeee...eee address.
  • Only don't cache results on node failures and not on contract failures (i.e. if a contract doesn't have decimals, then we will cache this fact instead of querying every time).
  • Allow concurrent access to the cache, using futures::Shared. I didn't opt for using RequestSharing as it didn't quite fit and the semantics are slightly different.

How to test

Added new test to verify the logic

Related Issues

@nlordell nlordell requested a review from a team as a code owner October 9, 2023 07:47
Copy link
Contributor

@fleupold fleupold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I didn't opt for using RequestSharing as it didn't quite fit and the semantics are slightly different.

What in particular is different about the semantics? Is it that request sharing itself doesn't keep the result around as soon as the future is completed but here we want to store the result indefinitely?

crates/shared/src/token_info.rs Outdated Show resolved Hide resolved
let info = fetch.await;
if info.is_err() {
let mut cache = self.cache.lock().unwrap();
if let Some(Err(_)) = cache.get(&address).and_then(|fetch| fetch.peek()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this condition checking? Can't we just uncoditionally remove the future in case of error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is checking that the shared future stored in the map is resolved to an error (with Shared::peek).

Imagine 3 concurrent requests Rn where:

  1. R1 starts a shared fetch of some token information
  2. R2 uses the same shared fetch
  3. The shared future resolves to an error, R1 removes the shared future from the cache
  4. R3 starts a new shared fetch of token information for the same address, the cache entry was removed in 3, so it creates a new one
  5. R2 gets scheduled and removes the new shared future that was created by R3.

This isn't extremely contrived, as there is an await point before clearing the cache, so if R2 doesn't get polled for some time, this would happen. That being said, I don't think this will happen regularly, and at worse we will be slightly sub-optimal and duplicate some queries.

Another approach would be to peek at the shared future when we get the first cache lock and replace it use the shared future in case it is an error. The downside being that we can hold some useless Error(string)s in the cache that won't ever get used.

@nlordell
Copy link
Contributor Author

nlordell commented Oct 9, 2023

Is it that request sharing itself doesn't keep the result around as soon as the future is completed but here we want to store the result indefinitely?

Correct, we would use RequestSharing to share the updating of some cache behind a Arc<Mutex<_>>, instead of just keeping a shared future behind an Arc<Mutex<_>>. Overall, I didn't find it made the code clearer, but happy to propose an alternative so others can decide.

inner: Box<dyn TokenInfoFetching>,
cache: Arc<Mutex<HashMap<H160, TokenInfo>>>,
inner: Arc<dyn TokenInfoFetching>,
cache: Arc<Mutex<HashMap<H160, SharedTokenInfo>>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might have to keep an eye on memory consumption here.
Storing a whole future instead of a String and a u8 for A LOT of tokens might be a problem. Maybe it makes sense to have sth like:

enum CacheEntry {
    Resolved(TokenInfo),
    InFlight(SharedTokenInfoRequest)
}

No need to change in this PR, though.

Copy link
Contributor Author

@nlordell nlordell Oct 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storing a whole future instead of a String and a u8 for A LOT of tokens might be a problem

Its a boxed future though, so it should just be a ptr + vtable, which is as big as a TokenInfo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, to my surprise, SharedTokenInfo is smaller than a TokenInfo (without counting the data that exists on the heap): https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=4d4fb54c802e8db652c6608eb9c83d35.

This is likely because of a combination of boxing and null-pointer optimization.

With that in mind, I imagine that this type is fairly space-optimized.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its a boxed future though, so it should just be a ptr + vtable, which is as big as a TokenInfo.

Yeah. The handle to the future is very small but you'll still keep the actual allocation for the future around which I was mainly worried about.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't though right?

Internally, Shared has an UnsafeCell to:

enum FutureOrOutput<Fut: Future> {
    Future(Fut),
    Output(Fut::Output),
}

Here, Fut = BoxFuture, so its 16 bytes on x86_64 (ptr + vtable). AFAIU, once the unsafe cell gets set to FutureOrOutput::Output, then the future will get dropped right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, true. I assumed the SharedFuture would basically keep the future alive the whole time and simply have a special case for returning a resolved value instead of swapping the future allocation for the resolved result. 👌

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example here:

use futures::FutureExt;
use std::{
    future::Future,
    pin::Pin,
    task::{Context, Poll},
};

struct Foo;

impl Future for Foo {
    type Output = i32;
    
    fn poll(self: Pin<&mut Self>, _: &mut Context) -> Poll<Self::Output> {
        Poll::Ready(42)
    }
}

impl Drop for Foo {
    fn drop(&mut self) {
        println!("Foo::drop");
    }
}

fn main() {
    let foo = Foo.boxed().shared();

    {
        println!("cloning handles");
        for i in 0..3 {
            let answer = futures::executor::block_on(foo.clone());
            println!("answer {i}: {answer}");
        }
    }
    
    // We now have a handle to `foo` which was resolved, if we await it, we
    // shouldn't drop the future again.
    println!("foo is resolved");
    let _ = futures::executor::block_on(foo);
}

Output:

cloning handles
Foo::drop
answer 0: 42
answer 1: 42
answer 2: 42
foo is resolved

The future gets dropped the first time it resolves, and only dropped once.

The only issue would be if there are "dangling" futures in the HashMap. This can be fixed by removing dangling Shared futures that aren't resolved. I also don't suspect this will happen in practice either (especially since follow up token info requests will drive the Shared future forward).

@nlordell nlordell enabled auto-merge (squash) October 9, 2023 13:08
@nlordell nlordell merged commit e1da131 into main Oct 9, 2023
@nlordell nlordell deleted the token-info-fetching-fixes branch October 9, 2023 13:32
@github-actions github-actions bot locked and limited conversation to collaborators Oct 9, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CachedTokenInfoFetcher hogs cache Mutex while requests are in flight
3 participants