Meta issue for potential improvements to SSZ parital proofs #143

pipermerriam · 2020-10-30T17:11:03Z

What was wrong?

Partial SSZ proofs are being implemented in #142 (TODO, more PRs). There are a few areas that could benefit from potential optimizations. This issue tracks those.

What can it be improved?

Use `bitarray` library for tree paths

The bitarray library looks like a solid fast implementation of bit-arrays. The API might be cleaner, and we might get some performance benefits as well. A benchmarking suite should be in place to measure performance before tackling this.

Better stream decoding of proofs

Currently the stream decoding can exit cleanly on a condition that should raise an exception. The ProofElement.deserialize should be adjusted to allow callers to differentiate between:

stream is empty (exit without error)
stream is not empty, but in trying to parse the element the stream becomes empty (should exit with error).

Initial idea for doing this relatively simply would be to wrap the stream: IO[bytes] in something that allows you to peek and see if the stream is empty or not.

`Proof.hash_tree_root` is a foot gun 🦶 🔫

The Proof.hash_tree_root property doesn't necessarily reflect the actual merkle root. This API is currently a foot-gun since it'd be easy to forget to validate the proof object. We should address this.

Generation of `ProofTree` cost

We should look into the actual cost of tree generation. The current algorithm is recursive which incurs a high cost of many call frames. First step is to benchmark this, after which we can optimize the algorithm.

`Proof.serialize` doesn't need to generate the tree

We should adjust Proof.serialize to avoid genration of the ProofTree since it should be possible to serialize the proof without generating the tree.

Optimizations for chunking and computation of the `hash_tree_root`

Our "chunking" approach could probably be optimized by splitting up two different use cases.

We could have a highly optimized compute_hash_tree_root function that operates on a stream and merklizes as it goes. The current approach of computing all chunks, and then merklizing has an O(N) memory footprint. This should be reducable to something like O(log(n)) if we are continually grouping and merklizing as we process the stream.

Similarly, we can optimize the creation of partial proofs. Currently, we compute a full proof and then trim it down to a partial proof. We should instead be able to use the above approach of streaming the chunks and merklizing them as we go to reduce the overall computation needed for partial proof computation. We avoid ever creating the full proof and instead directly create the partial proof from the raw data. This allows us to use the streaming approach to chunk and merklize the parts of the data that are not used by the partial proof.

Compression

We aren't using any compression at this stage. We should try using snappy to compress serialized proofs and see if we can squeeze out some extra bytes.

Better benchmarking

We should update the benchmarking test suite to also test using "real" data. This will be especially important if we implement compression because currently we're using a lot of random data which isn't going to compress well. Real data will have more parts that actually compress.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Meta issue for potential improvements to SSZ parital proofs #143

Meta issue for potential improvements to SSZ parital proofs #143

pipermerriam commented Oct 30, 2020 •

edited

Loading

Meta issue for potential improvements to SSZ parital proofs #143

Meta issue for potential improvements to SSZ parital proofs #143

Comments

pipermerriam commented Oct 30, 2020 • edited Loading

What was wrong?

What can it be improved?

Use bitarray library for tree paths

Better stream decoding of proofs

Proof.hash_tree_root is a foot gun 🦶 🔫

Generation of ProofTree cost

Proof.serialize doesn't need to generate the tree

Optimizations for chunking and computation of the hash_tree_root

Compression

Better benchmarking

pipermerriam commented Oct 30, 2020 •

edited

Loading

Use `bitarray` library for tree paths

`Proof.hash_tree_root` is a foot gun 🦶 🔫

Generation of `ProofTree` cost

`Proof.serialize` doesn't need to generate the tree

Optimizations for chunking and computation of the `hash_tree_root`