Optimations to reduce the total number of keys in the network #163

pipermerriam · 2020-11-06T15:03:25Z

What was wrong?

A naive approach to storing the chain history in Alexandria looks like this (as 2020/11/05)

11 million headers
11 million entries in the canonical header index
11 million block bodies
- a few million uncles
- ~1 billion transactions
~1 billion receipts
~1 billion entries in the canonical transaction index

Depending on how you ad these numbers up you get something in the 2-3 billion range for total number of objects that need to be stored and retrievable from the network.

Content retrieval depends on the healthy spread of advertisements. We can put a reasonably firm upper bound on how many advertisements we can send out per second at 100 which is probably optimistic. This means that a node that wished to serve all of the content can sent about 8 million per day (100 * 24 * 60 * 60). That means it'll take about a full year running 24/7 to broadcast everything.

We can also look at this from the view of number of nodes in the network. If we assume we can bootstrap the network to have somewhere between 1k-100k nodes we can examine what each node on the network will need to store.

How can it be fixed?

We need to optimize the storage scheme to reduce the total number of objects in the network.

Headers

For headers we may be able to group headers by an "epoch" boundary based on block height. We would pick an epoch size N and then group every N headers together for storage and retrieval.

The primary problem with this scheme is that it makes headers only retrievable if you know the block numbers. Since the block hash tends to be the canonical way to definitively reference a header, this would make serving eth_getBlockByHash difficult, requiring some sort of secondary index to lookup block numbers by block hash.

It's likely that we need to store headers individually.

Canonical Header Index

The "Canonical Header Index" maps block number to block hash. Since this index is based on block numbers, it should be easy to group by epochs. Instead of storing a single entry for each block numbers, we would instead store an entry for each N blocks.

If we used an epoch size of 1000, then retrieving the index entry for height 4567 would mean fetching the index of epoch 5 and digging out the 467th entry from the index.

This would let us reduce 11 million entries down to 11 thousand.

Block Bodies

Block bodies house are uncles and transactions.

We do want uncles and transactions to be individually retrievable.

Simply by grouping these by block we can reduce the 1 billion transactions down to just the 11 million blocks that they are naturally grouped by. This also works well with eth_getTransactionByBlockHashAndIndex style endpoints which retrieve transactions by their location in the block.

Receipts

Receipts naturally group themselves by blocks as well, letting us reduce the 1 billion receipts down to just the 11 million blocks they are part of.

Canonical Transaction Index

This one is unsolved. Transactions need to be retrievable by hash.

The text was updated successfully, but these errors were encountered:

pipermerriam · 2020-11-06T15:18:35Z

Ideas that don't seem to work for the canonical transaction index. The transaction ind

Binary Search from tiered indexes

The idea is to store multiple tiers of indexes, each which allow you to perform a sort of binary search. We might have an index for each million blocks, and then successively smaller chunks. To find a the entry for a given transaction hash you would scan the top level indices, and then work down to the lower level more granular indices as you find the index containing the transaction you need.

This approach suffers from a few problems.

First, those hosting the top level indexes would be constantly slammed with requests since every lookup for a transaction would need to hit multiple of those indexes. This traffic pattern likely doesn't scale. (NOTE: Skip graphs don't suffer from this problem)

Second, the top level indexes would be incredibly expensive to build, and prohibitively expensive to verify. Both processes would require fetching all of the blocks under that index which for the top level ones would mean fetching millions of blocks.

Grouping by truncated hash

We could group them by the first few bytes of the hash and index all of the transactions that start with the same common prefix.

These indexes would be near impossible to build and verify since it would require fetching every single transaction in order to find the ones that share a prefix. We can potentially mitigate this by building these indexes iteratively, however this does introduce complexity that would be nice to avoid, because it would mean that these indexes would continually change and never be finalized which violates some current invariants with how we treat content.

There is also the problem of picking the prefix size. Over time the total number of transaction will grow and we would need to adjust the prefix size which would be disruptive and prohibitively difficult to migrate.

pipermerriam · 2020-11-06T15:35:03Z

Exploring whether it is feasible to do the canonical transaction index without any reduction.

At ~1 billion transactions and a network size between 1k-100k nodes, that means that with the following replication factors we would expect each node in the network to store:

	1k	10k	50k	100k
R=1	1M	100k	20k	10k
R=5	5M	500k	100k	50k
R=10	10M	1M	200k	100k

With each advertisement being about 128 bytes we get roughly the following storage sizes

count	size
10k	1.2mb
100k	12mb
1M	120mb
10M	1.2Gb

In order to stick to the desired <1GB per node storage requirement, I believe we can establish an upper bound of about 1M advertisements that each node can store which means that having to store 1 billion transaction index entries and having a health replication factor of 10 or more would mean the network could only be healthy at over 10k nodes which is feasible but far from ideal.

We can also set an upper bound on the total number of advertisements that it is feasible for a node to broadcast in a given time. At 100/sec which is an over-estimation, we can broadcast:

time	amount
1 minute	6k
10 minutes	60k
30 minutes	180k
1 hour	360k

Ideally, a node coming back online should be able to publish it's full database of advertisements in less than an hour. This suggests that we need to aim for an upper bound of around 200k items per node. Ideally we come in much lower than this.

lithp · 2020-11-12T00:10:39Z

Brainstorming some ideas:

Merry-go-round index population. Building a canonical transaction index (CTI) is difficult because it requires scanning through every block body. If you don't have every block body you must fetch them from the network. Over all nodes, this amounts to a lot of duplicated work. The network could de-duplicate this work by scanning through block bodies and gossiping them around.
- Gossip is more efficient because looking up a block body requires one recursive lookup per CTI-serving-node. Gossip allows the block-body-serving-nodes to scan through block bodies at their leisure, putting an overhead on the network which is constant with the number of CTI-serving-nodes. Gaining this benefit does not require any kind of coordination mechanism, each block-body-serving node can scan through at it's own frequency and phase. There's also no need for any intermediate nodes to become involved, this works if each CTI-serving-node tries to find a couple block-body-serving-nodes and asks those nodes to start gossiping to them.
Index siphoning. Since we really don't want CTI-serving-nodes to scan through all blocks themselves they might instead try to bootstrap off of existing CTI-serving-nodes. Don't ask me how the first ones are built, but once they exist they have exactly the information the next generation of CTI-serving-nodes need. We could support the equivalent of a DNS zone transfer. You reach out to a node and tell it the prefix you're interested in and it gives you everything it has on that prefix. Some care will have to be put into preventing this from becoming a DOS vector but worst case this could devolve into the same kind of merry-go-round sync described above: you subscribe to updates from a CTI-serving-node and then populate your index from entries it gives you.

pipermerriam · 2020-11-12T19:33:31Z

@lithp I may be mis-understanding your post, but I don't see any part that actually reduces the overall cost of the network hosting the CTI.

Here's the best plan I have so far so that maybe we can anchor to something concrete.

For any block B there will be some node that stores the block body, aka the uncle list and transaction list for that block.

Within the block there's some set of transactions T_1, T_2, ..., T_N. The person hosting the block would publish advertisements for both the block body itself, as well as one CTI entry for every transaction in the block. The block itself is one unit of content, and the CTI entries are each their own unit of content. A CTI entry is txn_hash -> (block_hash, txn_idx, inclusion_proof). The inclusion_proof is against the Header.transactions_trieso that once the user fetches the header and verifies the proof is valid they can know that the transaction is indeed part of the transactions for that block.

Things I understand about this setup:

Total number of CTI entries is O(N) to the total number of historical transactions.
Assuming we can get 10k nodes in the network, each node will end up storing about 120MB worth of advertisements with the vast majority of these being CTI entries.
Verifying a CTI entry involves fetching the corresponding header and validating the inclusion proof against the Header.transactions_trie
Fetching a transaction by hash involves fetching the CTI entry, then fetching the header to verify the CTI entry, then fetching the block body (and verifying it against the header) to retrieve the actual transaction.

This setup is possibly viable, but it means that as much as 99% of the content in the network will be CTI entries and that sucks.

lithp · 2020-11-16T18:03:22Z

I may be mis-understanding your post, but I don't see any part that actually reduces the overall cost of the network hosting the CTI.

Yes, sorry, I started out trying to answer this question, but I think I forgot what the original question had been while writing this up. The idea with both ideas was to prevent anyone from needing to advertise individual CTI entries. Advertising and storing a billion entries is not going to be fun.

The idea behind both ideas was to prevent anyone from needing to advertise individual CTI entries. Each node which decides it wants to serve CTI entries picks the prefix that they want to serve. They can then make a single advertisement, they only have to advertise that prefix. The problem then becomes how do they build their index, and I was trying to brainstorm ways to solve that problem, it seems like it might be easier than the problem of managing 1B advertisements.

pipermerriam · 2020-11-16T18:32:25Z

it seems like it might be easier than the problem of managing 1B advertisements.

Yeah, I'm leaning that way more and more. Advertising 1B things SUCKS because if we can trim that down, we immediately drop 2 orders of magnitude to only being in the 10's of millions.

pipermerriam · 2020-11-16T18:34:39Z

Skip graph of the transaction hashes would allow for ordered traversal of the transaction keys for building the index.... but that requires the network maintain the skip graph.... which would have 1B entries in it... which would require every one of those entries to be broadcast at some point... but it would not require those to be in the advertisements table....

I think that's an improvement over just doing it with advertisements since it reduces the advertisement size, and reduces total storage needs since skip graph entries don't have to be constantly re-newed like advertisements.

I think some complexity still shows back up in managing the skip graph. Ideally we want to be able to reject "junk* entries that don't represent a canonical transaction hash. That means that for each skip graph entry you'd need to be able to validate that the transaction hash is indeed part of the canonical chain... so we'd need to do something like bundle the block number and transaction index into the skip graph entry so that those values can be validated... which means the skip graph alone would be the canonical transaction index and we wouldn't need to use the advertisement system for that at all.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimations to reduce the total number of keys in the network #163

Optimations to reduce the total number of keys in the network #163

pipermerriam commented Nov 6, 2020

pipermerriam commented Nov 6, 2020 •

edited

Loading

pipermerriam commented Nov 6, 2020 •

edited

Loading

lithp commented Nov 12, 2020

pipermerriam commented Nov 12, 2020

lithp commented Nov 16, 2020

pipermerriam commented Nov 16, 2020

pipermerriam commented Nov 16, 2020 •

edited

Loading

Optimations to reduce the total number of keys in the network #163

Optimations to reduce the total number of keys in the network #163

Comments

pipermerriam commented Nov 6, 2020

What was wrong?

How can it be fixed?

Headers

Canonical Header Index

Block Bodies

Receipts

Canonical Transaction Index

pipermerriam commented Nov 6, 2020 • edited Loading

Binary Search from tiered indexes

Grouping by truncated hash

pipermerriam commented Nov 6, 2020 • edited Loading

lithp commented Nov 12, 2020

pipermerriam commented Nov 12, 2020

lithp commented Nov 16, 2020

pipermerriam commented Nov 16, 2020

pipermerriam commented Nov 16, 2020 • edited Loading

pipermerriam commented Nov 6, 2020 •

edited

Loading

pipermerriam commented Nov 6, 2020 •

edited

Loading

pipermerriam commented Nov 16, 2020 •

edited

Loading