Clarification on architecture #39

cfilipov · 2023-10-20T20:49:49Z

cfilipov
Oct 20, 2023

Hello, I've been reading through the code and trying to get a better understanding of how things work.

Fireproof stores documents using prolly-trees, which are a type of B-Tree that allows for efficient replication as the data changes.

Additionally, the architecture doc describes Pail as a "Merkle clock causal event log", but Pail's GitHub description says it's a "DAG based key value store", though it does also contain a Merkle clock implementation.

My take away from the docs and blog posts is an architecture that centers on prolly trees to store documents for efficient key-based retrieval along with a Merkle clock/event log where each event references a prolly tree root. However, my read on the code is that fireproof extends Pail to add persistence and uses it as the main KV store for the documents. The prolly tree is also persisted but only used for query operations. The get and put functions bypass the prolly tree entirely and retrieve the document directly from the persisted Pail store.

All this is new to me so maybe I'm not understanding the basics, but it seems Pail iterates over every shard to find the value of a key on get, I would have expected a get to work off the prolly tree. Overall I'm surprised to see that prolly trees play a fairly small role in fireproof.

I guess my question is: is my understanding above correct? And what are the trade offs leading to using prolly trees only for index and not for the KV store itself? I've been following Dolt and Mikeal's work and others related to prolly trees so I was excited to find this project but surprised to see this implementation detail.

jchris · 2023-12-11T16:29:06Z

jchris
Dec 11, 2023
Maintainer

Thanks for the detailed questions. You are correct. In the first pass implementation I used prolly trees as the mutated structure to cache the application of the Merkle event log -- it worked and provided similar logical guarantees as the current system, but overall had a write amplification effect that seems unnecessary.

Because each event log is unique, and that is what is stored in Fireproof's core document repository, optimizing for lean writes is more important than optimizing for multi-client index deduplication and verification. For the core write workload, the Pail CRDT (which is basically a Pail k/v kept up to date with a Merkle event log) makes sense as it's optimized for efficient writes.

The prolly trees are still in the codebase, used for indexes. This preserves the optionality of seeking index agreement across users, or allowing users to compare index diffs in a useful way. So far this hasn't showed up as big requirement, people are more into the content-verified nature of the data structures overall, than in using it for safe index diffs. I'm looking forward to seeing demand for prolly-tree index features.

2 replies

cfilipov Jan 16, 2024
Author

Thanks for the response. From my limited understanding of Pail, it seems the underlying data structure would result in lower read performance vs a prolly tree, specifically more paging as the history of changes increases? I'm trying to understand the tradeoffs in memory use and disk IO with the two approaches.

jchris Jan 17, 2024
Maintainer

The Pail log will keep history in an efficient-append format, which can be work to load. However, the computed CRDT should have similar seqs as the prolly tree.

The Fireproof indexes are prolly trees so if you need those performance characteristics for it you could define an index.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fireproof

Clarification on architecture #39

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Fireproof

Clarification on architecture #39

cfilipov Oct 20, 2023

Replies: 1 comment · 2 replies

jchris Dec 11, 2023 Maintainer

cfilipov Jan 16, 2024 Author

jchris Jan 17, 2024 Maintainer

cfilipov
Oct 20, 2023

Replies: 1 comment 2 replies

jchris
Dec 11, 2023
Maintainer

cfilipov Jan 16, 2024
Author

jchris Jan 17, 2024
Maintainer