Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meta issue for potential improvements to SSZ parital proofs #143

Open
pipermerriam opened this issue Oct 30, 2020 · 0 comments
Open

Meta issue for potential improvements to SSZ parital proofs #143

pipermerriam opened this issue Oct 30, 2020 · 0 comments

Comments

@pipermerriam
Copy link
Member

pipermerriam commented Oct 30, 2020

What was wrong?

Partial SSZ proofs are being implemented in #142 (TODO, more PRs). There are a few areas that could benefit from potential optimizations. This issue tracks those.

What can it be improved?

Use bitarray library for tree paths

The bitarray library looks like a solid fast implementation of bit-arrays. The API might be cleaner, and we might get some performance benefits as well. A benchmarking suite should be in place to measure performance before tackling this.

Better stream decoding of proofs

Currently the stream decoding can exit cleanly on a condition that should raise an exception. The ProofElement.deserialize should be adjusted to allow callers to differentiate between:

  • stream is empty (exit without error)
  • stream is not empty, but in trying to parse the element the stream becomes empty (should exit with error).

Initial idea for doing this relatively simply would be to wrap the stream: IO[bytes] in something that allows you to peek and see if the stream is empty or not.

Proof.hash_tree_root is a foot gun 🦶 🔫

The Proof.hash_tree_root property doesn't necessarily reflect the actual merkle root. This API is currently a foot-gun since it'd be easy to forget to validate the proof object. We should address this.

Generation of ProofTree cost

We should look into the actual cost of tree generation. The current algorithm is recursive which incurs a high cost of many call frames. First step is to benchmark this, after which we can optimize the algorithm.

Proof.serialize doesn't need to generate the tree

We should adjust Proof.serialize to avoid genration of the ProofTree since it should be possible to serialize the proof without generating the tree.

Optimizations for chunking and computation of the hash_tree_root

Our "chunking" approach could probably be optimized by splitting up two different use cases.

We could have a highly optimized compute_hash_tree_root function that operates on a stream and merklizes as it goes. The current approach of computing all chunks, and then merklizing has an O(N) memory footprint. This should be reducable to something like O(log(n)) if we are continually grouping and merklizing as we process the stream.

Similarly, we can optimize the creation of partial proofs. Currently, we compute a full proof and then trim it down to a partial proof. We should instead be able to use the above approach of streaming the chunks and merklizing them as we go to reduce the overall computation needed for partial proof computation. We avoid ever creating the full proof and instead directly create the partial proof from the raw data. This allows us to use the streaming approach to chunk and merklize the parts of the data that are not used by the partial proof.

Compression

We aren't using any compression at this stage. We should try using snappy to compress serialized proofs and see if we can squeeze out some extra bytes.

Better benchmarking

We should update the benchmarking test suite to also test using "real" data. This will be especially important if we implement compression because currently we're using a lot of random data which isn't going to compress well. Real data will have more parts that actually compress.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant