Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use more than one CPU core in IPA #742

Open
akoshelev opened this issue Jul 7, 2023 · 3 comments
Open

Use more than one CPU core in IPA #742

akoshelev opened this issue Jul 7, 2023 · 3 comments
Labels
performance This affects protocol performance

Comments

@akoshelev
Copy link
Collaborator

Overview

Currently IPA is not capable of using more than one core at a time. Futures that we create while running a query are not spawned as executor's task, therefore they are all polled inside one thread. Multi-threaded Tokio runtime that we use just bounce the parent future and all its children between threads/cores but it only polls one future at a time.

All of the above makes IPA quite slow at the moment. One core does not generate enough work for the network layer and our throughput is quite miserable. We need to find a way how to parallelize work and leverage all (ok more than one) cores available on the host.

TODO: add more context and possible solutions

@akoshelev
Copy link
Collaborator Author

I want to try async_scoped. Yes, it is generally unsafe to have task scopes, but in our case it may be acceptable, because we don't use mem::forget in prod code.

@akoshelev
Copy link
Collaborator Author

I made the initial prototype that uses async_scoped and it seems to be working fine for a toy protocol and @benjaminsavage's attribution and capping. However there is one major inconvenience - internally scope uses FuturesUnordered which makes the order it resolves the spawned futures unpredictable.

   for i in 0..10 {
        spawner.spawn(async move { let record_id = RecordId::from(i); /* do something */ })
   }

   let results = spawner.collect().await; // results may appear in any order

It is definitely happening in our tests when I plug in async_scoped

// H1
collected results = [Ok((RecordId(1), (20_mod31, 22_mod31))), Ok((RecordId(0), (1_mod31, 20_mod31)))]
// H2
collected results = [Ok((RecordId(0), (20_mod31, 10_mod31))), Ok((RecordId(1), (22_mod31, 21_mod31)))]
// H3
collected results = [Ok((RecordId(0), (10_mod31, 1_mod31))), Ok((RecordId(1), (21_mod31, 20_mod31)))]

There are several ways we can tackle this issue:

Ignore and let caller's to re-order results

  for i in 0..10 {
        spawner.spawn(async move { let record_id = RecordId::from(i); /* do something */  (record_id, meaningful_result) })
   }

   let mut results: Vec<(RecordId, _)> = spawner.collect().await;

   // sort by record_id
   results.sort_by(|a, b| a.record_id.cmp(&b.record_id));
   
   // drop record_id
   results.map(|a| a.1).collect()

least favorite option, because of performance penalty and mental overhead. In IPA we mostly process things in order, so this code will have to appear everywhere where we process things in parallel

Create a Scope wrapper that tracks the order internally

  for i in 0..10 {
        ipa_spawner.spawn(async move { /* do something */ meaningful_result })
   }

   // collect returns results in the order they were spawned by keeping a buffer large enough
   // to accommodate all results 
   let mut results: Vec<_> = ipa_spawner.collect().await; 

   // sort by record_id
   results.sort_by(|a, b| a.record_id.cmp(&b.record_id));
   
   // drop record_id
   results.map(|a| a.1).collect()

This removes the mental and performance overhead, however it is likely not going to play well with seq_try_join_all API for two reasons:

  • it needs a more granular control than collect, because it removes individual futures as soon as they are resolved from the front of the queue and adds new ones to the back. It can't wait until the whole active set is resolved before spawning new futures
  • Ordering is important so seq_try_join_all will likely need to manage spare capacity to keep futures that are outside of active window which will complicate the implementation.

(for parallel version of seq_try_join_all does not to manage future polling manually, because all futures are polled by the executor. It can simply traverse the list from the head and remove futures already resolved)

Implement IPA-owned version of async-scoped

While it is possible to work with the authors of async_scoped and bring FuturesOrdered or other implementations that will allow scope stream to yield elements in order they were spawned, it could be possible that async_scoped API is not shaped for our needs

 impl<'a, T: Send + 'static, Sp: Spawner<T> + Blocker> Scope<'a, T, Sp> {
 pub fn spawn<F: Future<Output = T> + Send + 'a>(&mut self, f: F) { ... }

spawn method does not return a handle and Scope object does not provide any control over individual futures being spawned in parallel. We could implement our own Scope object, but the core functionality of async_scoped is inside the Scope object

    let handle = Sp::spawn(unsafe {
            std::mem::transmute::<_, Pin<Box<dyn Future<Output = T> + Send>>>(
                Box::pin(f) as Pin<Box<dyn Future<Output = T>>>
            )
        });

this is what we need to get our hands on. There is a risk that we get it wrong and introduce use-after-free or other meaty UB but this may be the only way to achieve what we want: processing many records in parallel and adding more work as soon as there is capacity available for it.

The shape of this API may look like this (if I can make it work)

  impl <'a, T: Send + 'static> ParallelScope<T> {
     pub <unsafe?> fn spawn<F: Future<Output = T> + Send + 'a>(&mut self, f: F) -> ScopeHandle<'a> { ... }
  }

ScopeHandle<'_> will work similarly to JoinHandle and will be stored inside SequentialFutures. It does not need to be polled in order to activate the future, but it can be checked/removed from the list once resolved. As async_scoped we would need to be careful with drops/panics and make sure if ParallelScope is dropped, it cancels all the futures before returning.

Thoughts @andyleiserson @martinthomson ?

@akoshelev
Copy link
Collaborator Author

#873 is the first attempt to address this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance This affects protocol performance
Projects
None yet
Development

No branches or pull requests

2 participants