Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Add iter_from() method in CyclicBoundedVec #686

Merged
merged 5 commits into from
May 9, 2024

Conversation

vadorovsky
Copy link
Contributor

ConcurrentMerkleTree uses CyclicBoundedVec for storing changelog entries. And it often needs to iterate over changelogs which are full and started overwriting elements from the start.

Even though CyclicBoundedVec was doing good job in preserving the bounds, before this change, it didn't provide any safe way of iterating over elements with a custom index as a start.

That unfortunately resulted in attempts to solve that problem in the code of ConcurrentMerkleTree, which eventually resulted in bugs. The bugs were harder to spot, because mixing the logic of Merkle proof patching with leaked internals of CyclicBoundedVec made it confusing to read, debug and test.

To avoid such issues, introduce the iter_from() method which allows iteration from the given index and simplify the ConcurrentMerkleTree to just use that iterator.

@vadorovsky vadorovsky force-pushed the vadorovsky/cyclic-bounded-vec-iter-from branch 3 times, most recently from 044a8bc to bfc8c26 Compare May 8, 2024 10:16
@vadorovsky vadorovsky marked this pull request as ready for review May 8, 2024 10:16
@vadorovsky vadorovsky requested a review from ananas-block as a code owner May 8, 2024 10:16
@vadorovsky vadorovsky force-pushed the vadorovsky/cyclic-bounded-vec-iter-from branch 5 times, most recently from 81646b8 to 10fbd44 Compare May 8, 2024 13:58
@ananas-block
Copy link
Contributor

you might need to regenerate accounts.
(see js test failure)

@@ -335,7 +336,8 @@ where
Self {
capacity,
length: 0,
next_index: 0,
first_index: 0,
last_index: 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need a first and a last index?
Don't we always start from index 0 except when providing a different index to start from?

Copy link
Contributor Author

@vadorovsky vadorovsky May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We used to do that, but I think it's not a correct thing to do.

When we don't provide a different index, I think we should start from the oldest element. The oldest element can be somewhere in the middle, if we went over capacity.

Let's assume that we have a bounded vec of integers incremented by 1. The capacity is 16. We append 22 elements. It ends up looking like:

                  $  ^
[16, 17, 18, 20, 21, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]

^ - first_index
$ - last_index

Every time we are over capacity and add more elements, we need to increment first_index and last_index.

In my opinion, a default iterator should iterate from 5 to 15, then from 0 to 21. Overall - from 5 to 21 with a reset.

To be precise, my way of thinking is that for the cyclic vector above, the result of iter().collect() should be:

[5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 21]

What you are proposing, would iterate... how exactly? From 16 to 21? From 16 to 15?
We used to iterate from 16 to 15, which I think it's incorrect. This approach made the previous implementation of .to_slice() a bit useless and it was the reason why we ended up with doing all this manual iteration dance in ConcurrentMerkleTree.

Maybe I should rename first_index and last_index to head and tail to make it clearer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your example makes it clear what it means

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

last index is always first index minus 1 except if first index = 0 then its length - 1, and it is 0 if it is empty.
So what is the reasoning to put it into a variable and not just compute it when needed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair I can see that it makes sense for the iterator to start at the current index

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrt heads and tail it's confusing in a different way, I think we just need a doc comment then it's clear.

self.length += 1;
} else if self.next_index == self.capacity() {
self.next_index = 0;
self.last_index += 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the looks suspicious to me since in line 446 self.first_index is used as current_index.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 446 initializes an iterator. Our iterator (CyclicBoundedVecIterator) is a wrapper containing a reference to CyclicBoundedVec. current_index is the internal counter, belonging only to the iterator.

And there are two cases how we initialize current_index:

  • iter() - we just set it to the current value of first_index (line 446 you pointed). So the iterator starts with first_index. It increments current_index by each next() call. It stops when last_index is reached, after this point each next() call returns None.
  • iter_from(start: usize) - we set it to the provided start value. It increments current_index by each next() call. It stops when last_index is reached, after this point each next() call returns None.

And BTW, to be clear - only iter_from() is being used in ConcurrentMerkleTree for patching changelogs.

self.current = new_current;
element
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does this achieve equivalence to what it is meant to replace?
Can you still iterate over the cyclic vec in a normal way?

Copy link
Contributor Author

@vadorovsky vadorovsky May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This iterator struct makes it possible to do

for changelog_entry in changelog_entries.iter()

Which will iterate from the oldest (not necessarily 0!) to the newest element.

Copy link
Contributor Author

@vadorovsky vadorovsky May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If your question is more about "how the heck all this code creates an iterator" - it was confusing for me as well some time ago. This was helpful

https://dev.to/wrongbyte/implementing-iterator-and-intoiterator-in-rust-3nio

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

///
/// Should iterate over elements 6..7 and 8..11 - 6 iterations.
#[test]
fn test_cyclic_bounded_vec_iter_reset() {
Copy link
Contributor

@ananas-block ananas-block May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am missing a test for the edge case that we need to cover in the Merkle tree.
Consider an example:
// capacity 5 [0,1,2,3,4]
// start with index 2 and want to do 4 iterations

let elements = cyclic_bounded_vec.iter_from(2);
let expected_result = [&3, &4, &0, &1];
for i in 0..4 {
    assert_eq!(elements.next(), expected_result[i]);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can add it

Copy link
Contributor Author

@vadorovsky vadorovsky May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your expected_result is not correct for such example though. The [0, 1, 2, 3, 4] vector didn't cycle over yet. Starting from 2 should result in [2, 3, 4]. 0 and 1 are elements which are older than than 2. Why would you want to iterate over changelog entries which predate your requested index? We want to use only newer changelogs, not older ones.

That's exactly why I'm keeping track of first_index and last_index - to figure out if I really need to jump to the beginning. These counters are making the algorithm aware of whether the vector already cycled or not.

Your example would be somehow correct if we did an explicit push of 0 and 1 elements afterwards and the vector would cycle over. But in these tests, I'm always incrementing pushed elements by 1 though, just to make it crystal clear which elements are newer and older.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a test for the input you mentioned, but with output which IMO is correct. LMK WDYT :)

Copy link
Contributor

@ananas-block ananas-block May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair I didn't intend to put any meaning into the numbers itself.
And I got the two wrong it should have been included since we are starting from index two inclusive.

// capacity 5 [43,22,55,13,42]
// using first and last index these would be: first 2, last 1
// without first and last, self.current_index would be 1,
// and the user requests to start at index 2
let elements = cyclic_bounded_vec.iter_from(2);
let expected_result = [55,13,42, 43, 22];
for i in 0..4 {
    assert_eq!(elements.next(), expected_result[i]);
}

@vadorovsky vadorovsky force-pushed the vadorovsky/cyclic-bounded-vec-iter-from branch from 10fbd44 to 7aa9fd2 Compare May 9, 2024 09:14
@vadorovsky vadorovsky requested a review from sergeytimoshin as a code owner May 9, 2024 09:14
@vadorovsky vadorovsky force-pushed the vadorovsky/cyclic-bounded-vec-iter-from branch from 7aa9fd2 to 83ad050 Compare May 9, 2024 09:24
vadorovsky added 5 commits May 9, 2024 15:14
`ConcurrentMerkleTree` uses `CyclicBoundedVec` for storing changelog
entries. And it often needs to iterate over changelogs which are full
and started overwriting elements from the start.

Even though `CyclicBoundedVec` was doing good job in preserving the
bounds, before this change, it didn't provide any safe way of iterating
over elements with a custom index as a start.

That unfortunately resulted in attempts to solve that problem in the
code of `ConcurrentMerkleTree`, which eventually resulted in bugs.
The bugs were harder to spot, because mixing the logic of Merkle proof
patching with leaked internals of `CyclicBoundedVec` made it confusing
to read, debug and test.

To avoid such issues, introduce the `iter_from()` method which allows
iteration from the given index and simplify the `ConcurrentMerkleTree` to
just use that iterator.

Other changes:

* Stop returning `Result` in `push()`. `push()` can never panic, unless
  we have a bug in the implementation. Our tests should make sure that
  the structure is bulletproof.
@vadorovsky vadorovsky force-pushed the vadorovsky/cyclic-bounded-vec-iter-from branch from 07899a8 to ab79d1f Compare May 9, 2024 13:14
@ananas-block ananas-block merged commit e6dbd11 into main May 9, 2024
13 checks passed
@ananas-block ananas-block deleted the vadorovsky/cyclic-bounded-vec-iter-from branch May 9, 2024 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants