-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compute expected ancestry proof size #6
Conversation
Converted to draft. Need to check some test failures |
Fixed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
high-level looks good, would like an audit too for peace of mind
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since i'm new to the code, left a couple of questions mainly about algorithmic complexity.
still need some time to get an understanding of why this change preserves validity
src/ancestry_proof.rs
Outdated
|
||
let mut sibs_processed_from_back = Vec::new(); | ||
{ | ||
if !queue.insert_sorted_by(value, cmp_nodes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
previously, the q was sorted by the pos first and now it's (height, pos)
my understanding is that the nodes
will be sorted by pos
, so this algorithm might be running in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the worst case scenario we want to prove all the nodes. Let's say the number of nodes = n. The complexity of inserting in an ordered queue is O(log(n)). So I think the total complexity would be O(n * log(n)).
But if we want to prove just one node, we need to start from it and go up until we reach the root. Let's say the number of leaves = l. The height of the MMR is log(l). The number of nodes in the proof is at most the height of the MMR (we need at most one node per level) so the queue length is at most log(l). So the complexity should be O(log(l) * log(l))
I hope I'm not missing anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The complexity of inserting in an ordered queue is O(log(n)).
This is not true. Insertion at a random position in a VecDeque is
But if the number in the proof is always bounded to a small number, that's probably fine as is. Just want to make sure we are not introducing a DoS vector in the worst case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, you're right. It's O(n), because it's a vec-based structure. I missed that.
For our use case (ancestry proofs), yes the number of nodes in the proof should be small, but we should probably make this scalable in general. I will change it to a BTreeSet or a LinkedList.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LinkedList would still have
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LinkedList would still have O ( n ) for finding the insertion position,
If it's sorted I think it should have O(log(n)). But yes, BTreeSet
would have O(1) amortized. It's only that for small numbers it would be worse than a sorted structure. And usually the number of nodes should be small.
BinaryHeap is also a good option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or sorry, in a linked list it's not that easy to take advantage of the fact that it's sorted. You're right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, BTreeSet/BTreeMap since to be the best option. Done.
@@ -321,3 +294,41 @@ fn calculate_root< | |||
let peaks_hashes = calculate_peaks_hashes::<_, M, _>(nodes, mmr_size, proof_iter)?; | |||
bagging_peaks_hashes::<_, M>(peaks_hashes) | |||
} | |||
|
|||
pub fn expected_ancestry_proof_size(prev_mmr_size: u64, mmr_size: u64) -> usize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given the title of the PR, this seems like the main change in the PR. however, the rest seems like a totally unrelated rewriting of algorithm/refactoring. any reason to mix the two in one PR? or is it not a refactoring and actually this function depends on it?
also, what is the complexity of this function (seems non-trivial to guesstimate given nested loops)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given the title of the PR, this seems like the main change in the PR.
Yes, exactly.
however, the rest seems like a totally unrelated rewriting of algorithm/refactoring. any reason to mix the two in one PR? or is it not a refactoring and actually this function depends on it?
Yes, it's just a refactoring. No particular reason. Just did it for readability. I tried to follow the previous implementation to understand how to compute the expected proof size and it was just very hard. So I thought it might be helpful to refactor that as well. Also if we're doing a security audit for this PR, maybe it would be good to cover both changes. Also we'll need to upstream ancestry proofs at some point, and this change would help there, because it makes the code more similar to the upstream.
also, what is the complexity of this function (seems non-trivial to guesstimate given nested loops)
Worst case scenario, for each peak we need to walk from a node inside it to the peak height. So should be something like O (num_peaks * log(num_leafs))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still don't fully grasp the details (probably best to ask @Lederstrumpf), but let's kick off the audit.
let mut expected_proof_size: usize = 0; | ||
let mut prev_peaks = get_peaks(prev_mmr_size); | ||
let peaks = get_peaks(mmr_size); | ||
|
||
for (peak_idx, peak) in peaks.iter().enumerate() { | ||
let mut local_prev_peaks: Vec<u64> = take_while_vec(&mut prev_peaks, |pos| *pos <= *peak); | ||
|
||
// Pop lowest local prev peak under the current peak | ||
match local_prev_peaks.pop() { | ||
Some(mut node) => { | ||
let mut height = pos_height_in_tree(node); | ||
while node < *peak { | ||
if pos_height_in_tree(node + 1) > height { | ||
// Node is right sibling | ||
node += 1; | ||
height += 1; | ||
} else { | ||
// Node is left sibling | ||
expected_proof_size += 1; | ||
node += parent_offset(height); | ||
height += 1; | ||
} | ||
} | ||
} | ||
None => { | ||
if peak_idx <= peaks.len() { | ||
// Account for rhs bagging peaks | ||
expected_proof_size += 1; | ||
} | ||
break; | ||
} | ||
} | ||
} | ||
|
||
expected_proof_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works by checking the frequency of left-sibling-hood when iterating up from the last prev_peak
.
In some sense, it's a myopic climber ascending a dark mountain and feeling out its shape from relative location :P
The mountain peaks here are the co-path items for proving said last prev_peak
.
We can also determine the exact shape of the mountain from just knowing its width (i.e. the number of leaves beneath it) though, and skip the climbing step.
I've opened up #7 to show the changes, but feel free to cherry pick the new algorithm over here instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you ! Sounds good ! I'll take a look.
Closing in favor of #7 |
* Adjust node proofs verification logic * Compute optimal ancestry proof size * Add comments * Use BTreeSet/BTreeMap * use entry API * calculate proof size from flipped sub-mmr Calculating the number of copath items required for the proof of the `prev_peaks` living beneath some peak can be done simultaneously more succinctly & efficiently as follows: Each peak is a perfect merkle tree. Therefore if we have some `prev_peaks` living beneath a given `peak`, then the number of co-path items to prove said `prev_peaks` equals the number of peaks that would be formed for a sub-mmr with `leaf_count = leaves(peak) - leaves(prev_peaks)`. This is akin to flipping the construction of the sub-mmr to be right-to-left and counting its peaks. * add benchmarks for ancestry & node proofs * add proof size logic from #6 for benching * Revert "add proof size logic from #6 for benching" This reverts commit 6cd3808. * improve readability with slice comparison Suggested-by: Serban Iorga <[email protected]> * Revert "Compute optimal ancestry proof size" This reverts commit 5e0fa6e. * Revert "Adjust node proofs verification logic" This reverts commit dc1f5e8. * add tests: "Adjust node proofs verification logic" Co-authored-by: Serban Iorga <[email protected]> * add tests: "Compute optimal ancestry proof size" Co-authored-by: Serban Iorga <[email protected]> * add applicable comment from "Add comments" Co-authored-by: Serban Iorga <[email protected]> --------- Co-authored-by: Serban Iorga <[email protected]>
Related to: paritytech/polkadot-sdk#4523
This is a follow-up to paritytech/polkadot-sdk#5188
The main purpose of this PR is to add logic for computing the expected ancestry proof size .
This PR also aligns some functions from the
ancestry_proof.rs
file with themmr.rs
file, which should make it easier to upstream the ancestry proofs logic in the future.