Compute expected ancestry proof size #6

serban300 · 2024-08-16T06:26:27Z

Related to: paritytech/polkadot-sdk#4523

This is a follow-up to paritytech/polkadot-sdk#5188

The main purpose of this PR is to add logic for computing the expected ancestry proof size .

This PR also aligns some functions from the ancestry_proof.rs file with the mmr.rs file, which should make it easier to upstream the ancestry proofs logic in the future.

serban300 · 2024-08-16T06:42:12Z

Converted to draft. Need to check some test failures

serban300 · 2024-08-17T12:23:56Z

Converted to draft. Need to check some test failures

Fixed

acatangiu · 2024-08-19T08:37:50Z

cc @Lederstrumpf

acatangiu

high-level looks good, would like an audit too for peace of mind

src/ancestry_proof.rs

ordian

since i'm new to the code, left a couple of questions mainly about algorithmic complexity.
still need some time to get an understanding of why this change preserves validity

ordian · 2024-08-23T17:13:48Z

src/ancestry_proof.rs

-
-    let mut sibs_processed_from_back = Vec::new();
+    {
+        if !queue.insert_sorted_by(value, cmp_nodes) {


previously, the q was sorted by the pos first and now it's (height, pos)
my understanding is that the nodes will be sorted by pos, so this algorithm might be running in $\mathcal{O}({nˆ2})$, is that correct?

In the worst case scenario we want to prove all the nodes. Let's say the number of nodes = n. The complexity of inserting in an ordered queue is O(log(n)). So I think the total complexity would be O(n * log(n)).

But if we want to prove just one node, we need to start from it and go up until we reach the root. Let's say the number of leaves = l. The height of the MMR is log(l). The number of nodes in the proof is at most the height of the MMR (we need at most one node per level) so the queue length is at most log(l). So the complexity should be O(log(l) * log(l))

I hope I'm not missing anything.

The complexity of inserting in an ordered queue is O(log(n)).

This is not true. Insertion at a random position in a VecDeque is $\mathcal{O}(n)$, because if you insert in the middle, you need to shift every element on the right. If you want to guarantee logarithmic complexity, I'd suggest using a BTreeSet instead of a VecDeque.
But if the number in the proof is always bounded to a small number, that's probably fine as is. Just want to make sure we are not introducing a DoS vector in the worst case.

Good point, you're right. It's O(n), because it's a vec-based structure. I missed that.

For our use case (ancestry proofs), yes the number of nodes in the proof should be small, but we should probably make this scalable in general. I will change it to a BTreeSet or a LinkedList.

LinkedList would still have $\mathcal{O}(n)$ for finding the insertion position, so the right choice is BTreeSet or BinaryHeap 👍🏽

LinkedList would still have O ( n ) for finding the insertion position,

If it's sorted I think it should have O(log(n)). But yes, BTreeSet would have O(1) amortized. It's only that for small numbers it would be worse than a sorted structure. And usually the number of nodes should be small.
BinaryHeap is also a good option.

Or sorry, in a linked list it's not that easy to take advantage of the fact that it's sorted. You're right.

Yes, BTreeSet/BTreeMap since to be the best option. Done.

ordian · 2024-08-23T17:15:47Z

src/ancestry_proof.rs

@@ -321,3 +294,41 @@ fn calculate_root<
    let peaks_hashes = calculate_peaks_hashes::<_, M, _>(nodes, mmr_size, proof_iter)?;
    bagging_peaks_hashes::<_, M>(peaks_hashes)
 }
+
+pub fn expected_ancestry_proof_size(prev_mmr_size: u64, mmr_size: u64) -> usize {


given the title of the PR, this seems like the main change in the PR. however, the rest seems like a totally unrelated rewriting of algorithm/refactoring. any reason to mix the two in one PR? or is it not a refactoring and actually this function depends on it?

also, what is the complexity of this function (seems non-trivial to guesstimate given nested loops)

given the title of the PR, this seems like the main change in the PR.

Yes, exactly.

however, the rest seems like a totally unrelated rewriting of algorithm/refactoring. any reason to mix the two in one PR? or is it not a refactoring and actually this function depends on it?

Yes, it's just a refactoring. No particular reason. Just did it for readability. I tried to follow the previous implementation to understand how to compute the expected proof size and it was just very hard. So I thought it might be helpful to refactor that as well. Also if we're doing a security audit for this PR, maybe it would be good to cover both changes. Also we'll need to upstream ancestry proofs at some point, and this change would help there, because it makes the code more similar to the upstream.

also, what is the complexity of this function (seems non-trivial to guesstimate given nested loops)

Worst case scenario, for each peak we need to walk from a node inside it to the peak height. So should be something like O (num_peaks * log(num_leafs))

ordian

Still don't fully grasp the details (probably best to ask @Lederstrumpf), but let's kick off the audit.

src/util.rs

Lederstrumpf · 2024-08-29T10:50:32Z

src/ancestry_proof.rs

+    let mut expected_proof_size: usize = 0;
+    let mut prev_peaks = get_peaks(prev_mmr_size);
+    let peaks = get_peaks(mmr_size);
+
+    for (peak_idx, peak) in peaks.iter().enumerate() {
+        let mut local_prev_peaks: Vec<u64> = take_while_vec(&mut prev_peaks, |pos| *pos <= *peak);
+
+        // Pop lowest local prev peak under the current peak
+        match local_prev_peaks.pop() {
+            Some(mut node) => {
+                let mut height = pos_height_in_tree(node);
+                while node < *peak {
+                    if pos_height_in_tree(node + 1) > height {
+                        // Node is right sibling
+                        node += 1;
+                        height += 1;
+                    } else {
+                        // Node is left sibling
+                        expected_proof_size += 1;
+                        node += parent_offset(height);
+                        height += 1;
+                    }
+                }
+            }
+            None => {
+                if peak_idx <= peaks.len() {
+                    // Account for rhs bagging peaks
+                    expected_proof_size += 1;
+                }
+                break;
+            }
+        }
+    }
+
+    expected_proof_size


This works by checking the frequency of left-sibling-hood when iterating up from the last prev_peak.
In some sense, it's a myopic climber ascending a dark mountain and feeling out its shape from relative location :P
The mountain peaks here are the co-path items for proving said last prev_peak.
We can also determine the exact shape of the mountain from just knowing its width (i.e. the number of leaves beneath it) though, and skip the climbing step.

I've opened up #7 to show the changes, but feel free to cherry pick the new algorithm over here instead.

Thank you ! Sounds good ! I'll take a look.

serban300 · 2024-08-30T08:54:19Z

Closing in favor of #7

* Adjust node proofs verification logic * Compute optimal ancestry proof size * Add comments * Use BTreeSet/BTreeMap * use entry API * calculate proof size from flipped sub-mmr Calculating the number of copath items required for the proof of the `prev_peaks` living beneath some peak can be done simultaneously more succinctly & efficiently as follows: Each peak is a perfect merkle tree. Therefore if we have some `prev_peaks` living beneath a given `peak`, then the number of co-path items to prove said `prev_peaks` equals the number of peaks that would be formed for a sub-mmr with `leaf_count = leaves(peak) - leaves(prev_peaks)`. This is akin to flipping the construction of the sub-mmr to be right-to-left and counting its peaks. * add benchmarks for ancestry & node proofs * add proof size logic from #6 for benching * Revert "add proof size logic from #6 for benching" This reverts commit 6cd3808. * improve readability with slice comparison Suggested-by: Serban Iorga <[email protected]> * Revert "Compute optimal ancestry proof size" This reverts commit 5e0fa6e. * Revert "Adjust node proofs verification logic" This reverts commit dc1f5e8. * add tests: "Adjust node proofs verification logic" Co-authored-by: Serban Iorga <[email protected]> * add tests: "Compute optimal ancestry proof size" Co-authored-by: Serban Iorga <[email protected]> * add applicable comment from "Add comments" Co-authored-by: Serban Iorga <[email protected]> --------- Co-authored-by: Serban Iorga <[email protected]>

serban300 requested a review from acatangiu August 16, 2024 06:26

serban300 self-assigned this Aug 16, 2024

serban300 mentioned this pull request Aug 16, 2024

Beefy: add runtime support for reporting fork voting paritytech/polkadot-sdk#4523

Closed

3 tasks

serban300 marked this pull request as draft August 16, 2024 06:41

serban300 added 2 commits August 17, 2024 15:19

Adjust node proofs verification logic

dc1f5e8

Compute optimal ancestry proof size

5e0fa6e

serban300 force-pushed the small_fixes branch from 741173a to 5e0fa6e Compare August 17, 2024 12:22

serban300 changed the title ~~Accept only optimal ancestry proofs~~ Compute expected ancestry proof size Aug 17, 2024

serban300 marked this pull request as ready for review August 17, 2024 12:23

acatangiu approved these changes Aug 19, 2024

View reviewed changes

src/ancestry_proof.rs Outdated Show resolved Hide resolved

src/ancestry_proof.rs Outdated Show resolved Hide resolved

src/ancestry_proof.rs Show resolved Hide resolved

Add comments

7038203

serban300 requested a review from ordian August 20, 2024 12:32

ordian reviewed Aug 23, 2024

View reviewed changes

Use BTreeSet/BTreeMap

0790862

ordian approved these changes Aug 27, 2024

View reviewed changes

src/util.rs Outdated Show resolved Hide resolved

use entry API

cd4e797

Lederstrumpf mentioned this pull request Aug 29, 2024

Rework ancestry_expected_proof_size #7

Merged

Lederstrumpf reviewed Aug 29, 2024

View reviewed changes

serban300 closed this Aug 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compute expected ancestry proof size #6

Compute expected ancestry proof size #6

serban300 commented Aug 16, 2024 •

edited

Loading

serban300 commented Aug 16, 2024

serban300 commented Aug 17, 2024

acatangiu commented Aug 19, 2024

acatangiu left a comment

ordian left a comment

ordian Aug 23, 2024

serban300 Aug 26, 2024

ordian Aug 27, 2024

serban300 Aug 27, 2024

ordian Aug 27, 2024 •

edited

Loading

serban300 Aug 27, 2024 •

edited

Loading

serban300 Aug 27, 2024

serban300 Aug 27, 2024

ordian Aug 23, 2024

serban300 Aug 26, 2024 •

edited

Loading

ordian left a comment

Lederstrumpf Aug 29, 2024

serban300 Aug 29, 2024

serban300 commented Aug 30, 2024

Compute expected ancestry proof size #6

Compute expected ancestry proof size #6

Conversation

serban300 commented Aug 16, 2024 • edited Loading

serban300 commented Aug 16, 2024

serban300 commented Aug 17, 2024

acatangiu commented Aug 19, 2024

acatangiu left a comment

Choose a reason for hiding this comment

ordian left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ordian Aug 27, 2024 • edited Loading

Choose a reason for hiding this comment

serban300 Aug 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

serban300 Aug 26, 2024 • edited Loading

Choose a reason for hiding this comment

ordian left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

serban300 commented Aug 30, 2024

serban300 commented Aug 16, 2024 •

edited

Loading

ordian Aug 27, 2024 •

edited

Loading

serban300 Aug 27, 2024 •

edited

Loading

serban300 Aug 26, 2024 •

edited

Loading