-
Notifications
You must be signed in to change notification settings - Fork 266
Chunking Lists
Warning: This is out of date
- Chunk large lists into smaller lists that can be shared between/after minor "mutations".
- Adding/removing an element should only affect 1 chunk (2 if adjacent chunk is affected)
Given a list of refs (primitives are treated the same. We just care about the refs here):
[sha1-0, sha1-1, ..., sha1-n]
we run a rolling hash on the Refs and split when that hash matches a given pattern, probably using 111111
(64 - 1) which will lead to an average chunk size of 64 elements.
We then create a compound list, similarly to how have a compoundBlob
:
[ref0, len0, ref1, len1, ... refk, lenk]
where ref0
points to a list of the elements in the first chunk.
The representation is also a list so we split it again until we end up with a list that isn't chunked.
Same as types.List
today with the given big-oh notation
Len() uint64 // O(1)
Empty() bool // O(1)
Get(idx uint64) Value // O(log64(n))
Slice(start uint64, end uint64) List // O(log64(n))
Set(idx uint64, v Value) List // O(log64(n))
Append(v ...Value) List // O(log64(n) + k), k number of appended values
Insert(idx uint64, v ...Value) List
Remove(start uint64, end uint64) List
The length is stored in the data structure for compound lists.
At every level we have about 64 elements. We do a binary search over those to find the right sub list. We might need to continue the search until we found a leaf. O(log2(64) * log64(n)) => O(log64(n))
Finding the right chunks is O(log64(n))
. I believe that we only need to rechunk the start chunk and the adjacent chunk. We can just reuse the rest of the chunks until the end chunk. Proof?
Finding the right chunk is O(log64(n))
. Changing the value in there requires that we rechunk that chunk, possibly reading the next chunk and then propagating this up to the parent compound list.
It is clear that we will need to update the chunk after the current chun at every level but I believe that we only need to update 2 chunks at every level. I need to find a proof or a convincing argument for this.
Find the last chunk and append to that. Rechunk last chunk and new content
Find correct chunk. Mutate that and rechunk from there and the adjacent one. Do the same with the parents recursivey. Proof is missing here too.