Use more than one CPU core in IPA #742

akoshelev · 2023-07-07T05:31:03Z

Overview

Currently IPA is not capable of using more than one core at a time. Futures that we create while running a query are not spawned as executor's task, therefore they are all polled inside one thread. Multi-threaded Tokio runtime that we use just bounce the parent future and all its children between threads/cores but it only polls one future at a time.

All of the above makes IPA quite slow at the moment. One core does not generate enough work for the network layer and our throughput is quite miserable. We need to find a way how to parallelize work and leverage all (ok more than one) cores available on the host.

TODO: add more context and possible solutions

akoshelev · 2023-07-17T20:15:44Z

I want to try async_scoped. Yes, it is generally unsafe to have task scopes, but in our case it may be acceptable, because we don't use mem::forget in prod code.

akoshelev · 2023-10-03T23:01:53Z

I made the initial prototype that uses async_scoped and it seems to be working fine for a toy protocol and @benjaminsavage's attribution and capping. However there is one major inconvenience - internally scope uses FuturesUnordered which makes the order it resolves the spawned futures unpredictable.

   for i in 0..10 {
        spawner.spawn(async move { let record_id = RecordId::from(i); /* do something */ })
   }

   let results = spawner.collect().await; // results may appear in any order

It is definitely happening in our tests when I plug in async_scoped

// H1
collected results = [Ok((RecordId(1), (20_mod31, 22_mod31))), Ok((RecordId(0), (1_mod31, 20_mod31)))]
// H2
collected results = [Ok((RecordId(0), (20_mod31, 10_mod31))), Ok((RecordId(1), (22_mod31, 21_mod31)))]
// H3
collected results = [Ok((RecordId(0), (10_mod31, 1_mod31))), Ok((RecordId(1), (21_mod31, 20_mod31)))]

There are several ways we can tackle this issue:

Ignore and let caller's to re-order results

  for i in 0..10 {
        spawner.spawn(async move { let record_id = RecordId::from(i); /* do something */  (record_id, meaningful_result) })
   }

   let mut results: Vec<(RecordId, _)> = spawner.collect().await;

   // sort by record_id
   results.sort_by(|a, b| a.record_id.cmp(&b.record_id));
   
   // drop record_id
   results.map(|a| a.1).collect()

least favorite option, because of performance penalty and mental overhead. In IPA we mostly process things in order, so this code will have to appear everywhere where we process things in parallel

Create a `Scope` wrapper that tracks the order internally

  for i in 0..10 {
        ipa_spawner.spawn(async move { /* do something */ meaningful_result })
   }

   // collect returns results in the order they were spawned by keeping a buffer large enough
   // to accommodate all results 
   let mut results: Vec<_> = ipa_spawner.collect().await; 

   // sort by record_id
   results.sort_by(|a, b| a.record_id.cmp(&b.record_id));
   
   // drop record_id
   results.map(|a| a.1).collect()

This removes the mental and performance overhead, however it is likely not going to play well with seq_try_join_all API for two reasons:

it needs a more granular control than collect, because it removes individual futures as soon as they are resolved from the front of the queue and adds new ones to the back. It can't wait until the whole active set is resolved before spawning new futures
Ordering is important so seq_try_join_all will likely need to manage spare capacity to keep futures that are outside of active window which will complicate the implementation.

(for parallel version of seq_try_join_all does not to manage future polling manually, because all futures are polled by the executor. It can simply traverse the list from the head and remove futures already resolved)

Implement IPA-owned version of `async-scoped`

While it is possible to work with the authors of async_scoped and bring FuturesOrdered or other implementations that will allow scope stream to yield elements in order they were spawned, it could be possible that async_scoped API is not shaped for our needs

 impl<'a, T: Send + 'static, Sp: Spawner<T> + Blocker> Scope<'a, T, Sp> {
 pub fn spawn<F: Future<Output = T> + Send + 'a>(&mut self, f: F) { ... }

spawn method does not return a handle and Scope object does not provide any control over individual futures being spawned in parallel. We could implement our own Scope object, but the core functionality of async_scoped is inside the Scope object

    let handle = Sp::spawn(unsafe {
            std::mem::transmute::<_, Pin<Box<dyn Future<Output = T> + Send>>>(
                Box::pin(f) as Pin<Box<dyn Future<Output = T>>>
            )
        });

this is what we need to get our hands on. There is a risk that we get it wrong and introduce use-after-free or other meaty UB but this may be the only way to achieve what we want: processing many records in parallel and adding more work as soon as there is capacity available for it.

The shape of this API may look like this (if I can make it work)

  impl <'a, T: Send + 'static> ParallelScope<T> {
     pub <unsafe?> fn spawn<F: Future<Output = T> + Send + 'a>(&mut self, f: F) -> ScopeHandle<'a> { ... }
  }

ScopeHandle<'_> will work similarly to JoinHandle and will be stored inside SequentialFutures. It does not need to be polled in order to activate the future, but it can be checked/removed from the list once resolved. As async_scoped we would need to be careful with drops/panics and make sure if ParallelScope is dropped, it cancels all the futures before returning.

Thoughts @andyleiserson @martinthomson ?

akoshelev · 2023-12-22T20:54:58Z

#873 is the first attempt to address this

akoshelev mentioned this issue Aug 9, 2023

500k query size onwards starts to hang at ipa::protocol::sort::apply_sort #758

Closed

akoshelev mentioned this issue Feb 8, 2024

Use a bitmap inside OrderingSender instead of imposing a strict add order #948

Closed

andyleiserson added the performance This affects protocol performance label Jun 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use more than one CPU core in IPA #742

Use more than one CPU core in IPA #742

akoshelev commented Jul 7, 2023

akoshelev commented Jul 17, 2023

akoshelev commented Oct 3, 2023

akoshelev commented Dec 22, 2023

Use more than one CPU core in IPA #742

Use more than one CPU core in IPA #742

Comments

akoshelev commented Jul 7, 2023

Overview

akoshelev commented Jul 17, 2023

akoshelev commented Oct 3, 2023

Ignore and let caller's to re-order results

Create a Scope wrapper that tracks the order internally

Implement IPA-owned version of async-scoped

akoshelev commented Dec 22, 2023

Create a `Scope` wrapper that tracks the order internally

Implement IPA-owned version of `async-scoped`