Moving collect to a better place in OPRF IPA #835

benjaminsavage · 2023-11-06T16:43:50Z

I updated the code to filter out all the users with just a single row of data, and never even emit them to the outbound stream, in an effort to reduce the number of futures being created.

I also moved the "collect" to after the users are "chunked" by OPRF, to just sort them by number of rows descending.

This is a hacky workaround that prevents the infra stall that's caused when a batch of 1024 rows fails to generate 1024 multiplications at depth + 1.

Running IPA for 100000 records took 238.103110663s

akoshelev · 2023-11-07T18:47:03Z

src/protocol/ipa_prf/prf_sharding/mod.rs

-    let collected_per_user_results = stream_of_per_user_circuits.collect::<Vec<_>>().await;
-    let per_user_attribution_outputs = sh_ctx.parallel_join(collected_per_user_results).await?;
-    let flattenned_stream = per_user_attribution_outputs.into_iter().flatten();
+    let flattenned_stream = seq_join(sh_ctx.active_work(), stream_of_per_user_circuits)


Suggested change

let flattenned_stream = seq_join(sh_ctx.active_work(), stream_of_per_user_circuits)

let flattenned_stream = sh_ctx.try_join(stream_of_per_user_circuits)

try_join is a sequential join?

akoshelev · 2023-11-07T19:06:51Z

src/protocol/ipa_prf/prf_sharding/mod.rs

@@ -435,8 +440,11 @@ where
    let first_row = first_row.unwrap();
    let rows_chunked_by_user = chunk_rows_by_user(input_stream, first_row);

+    let mut collected = rows_chunked_by_user.collect::<Vec<_>>().await;
+    collected.sort_by(|a, b| std::cmp::Ord::cmp(&b.len(), &a.len()));


this sort may be a bottleneck in the future - we can't kick off processing until we receive the very last PRF shard. We should probably start thinking about how things will look like with multiple shards - I would assume that some sort of consistent hashing is required to map PRF pseudonyms to shards, meaning that each shard will have to wait until the very last event is sent to it and mapper indicated that no more events will be send.

In this model, the approach proposed here works, but we will have to keep all impressions and conversions in memory while receiving them from the mapper, i.e. no streaming

I agree this is seriously sub-optimal. I think we should land this code, but re-evaluate this once we have the shuffling and sharding in place to see how we can deal with it.

benjaminsavage requested a review from akoshelev November 6, 2023 16:45

benjaminsavage changed the title ~~debugging stall in streaming OPRF ipa~~ Moving collect to a better place in OPRF IPA Nov 7, 2023

benjaminsavage added 3 commits November 7, 2023 23:38

debugging stall in streaming OPRF ipa

17af924

OMG it works

8e8b0e4

making clippy happy

50ef10e

benjaminsavage force-pushed the debugging_stall branch from a96a1d9 to 50ef10e Compare November 7, 2023 15:39

akoshelev approved these changes Nov 7, 2023

View reviewed changes

benjaminsavage merged commit 4dd9554 into main Nov 8, 2023
7 of 10 checks passed

benjaminsavage deleted the debugging_stall branch November 8, 2023 02:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Moving collect to a better place in OPRF IPA #835

Moving collect to a better place in OPRF IPA #835

benjaminsavage commented Nov 6, 2023 •

edited

Loading

akoshelev Nov 7, 2023

benjaminsavage Nov 8, 2023

akoshelev Nov 7, 2023

benjaminsavage Nov 8, 2023

	let flattenned_stream = seq_join(sh_ctx.active_work(), stream_of_per_user_circuits)
	let flattenned_stream = sh_ctx.try_join(stream_of_per_user_circuits)

Moving collect to a better place in OPRF IPA #835

Moving collect to a better place in OPRF IPA #835

Conversation

benjaminsavage commented Nov 6, 2023 • edited Loading

akoshelev Nov 7, 2023

Choose a reason for hiding this comment

benjaminsavage Nov 8, 2023

Choose a reason for hiding this comment

akoshelev Nov 7, 2023

Choose a reason for hiding this comment

benjaminsavage Nov 8, 2023

Choose a reason for hiding this comment

benjaminsavage commented Nov 6, 2023 •

edited

Loading