[IPC-5] Benchmark Fendermint #160

aakoshh · 2023-11-13T09:21:20Z

Related to #164

We never benchmarked Fendermint, nor did we take targeted attempts for optimising performance. We know there are inefficiencies:

check_tx runs every transaction in in its full while holding a mutex; this used to be just a balance and nonce check, but since Fixes to make the Go example work fendermint#234 we execute transactions to be able to support followup eth_call on pending state. NB not every Ethereum backend supports queries on pending state, we just did it because Lilypad expected it.
Queries on pending state have to take a mutex on the check state; they are read-only and could work happily on a clone, or even an RwLock, but the state we use is based on the FVM StateTree which can't be cloned or shared between threads. We could change the way it works so we back it with a cloneable state instead of keeping an instance of the FVM executor around.
We could try to parallelise execution by optimistic execution and collision detection like BlockSTM, which was one of the areas of research in the past. With ABCI++ we could have access to the whole block at the same time, not transaction-by-transaction as we do now.
There is some cloning action going on in the RocksDB database component due to the use of optimistic concurrency control.
For a regular FVM transaction a transfer between accounts is executed directly by the FVM, but if EVM accounts are involved then for safety reasons the EVM wasm actor is inolved, so funds don't have a chance of being locked up forever.

If we are interested in what are the theoretical limits we can achieve, it would be interesting to benchmark the system with different configurations and compare the relative performance. I say relative because micro benchmarks like done e.g. on a laptop won't provide meaningful absolute values, but they would still help guiding expectations and to find hotspots.

I can think of the following scenarios:

Transfer between native FVM accounts using an in-memory blockstore. This is a theoretical limit with sequential execution, it does as little extra as possible.
Transfer between EVM accounts using an in-memory blockstore. This would show the relative overhead of going through wasm over doing just 1).
Transfer between native FVM accounts using the RocksDB blockstore. This would bring in IO and our RocksDB implementation of a blockstore to show the overhead over 1). At this point we can simulate the effect of batching done by blocks by flushing to disk every N transactions and see how that affects the throughput.
The same as 3) but with EVM actors. This would show whether the RocksDB component has a similar effect regardless of native of wasm actors.
Executing eth_call through the API to simulate transfers. This would show the overhead of going through CometBFT. We can vary the concurrency bounds settings in the TowerABCI service. Doing a call would isolate the effect of execution because it doesn't involve checking transactions and gossiping, it's just an execution, but with all the ETH<->Tendermint<->Fendermint<->FVM transformations added. In theory these can be executed in parallel by the system, but we haven't really dug into how the concurrency bounds work (there is only one value, applied to all 4 connections to CometBFT, which is strange; the default is 1).
Broadcasting transactions with check_tx disabled (validators can always put in invalid transactions in Tendermint). This would show the effect of CometBFT gossipig and block creation over 5).
Broadcasting transactions with check_tx enabled. Here it could be interesting to see how splitting the transaction range between nodes on the network have an effect, because then the check_tx load is shared between them, while they all do execution. The theoretical limit of throughput in blocks if we can do T transactions and have N validators is T / (N+1) * N, that is, every validator has to execute N+1 "batches" of transactions (1 for the checks submitted directly to them, N for the received from everyone else and added to blocks), so the size of a "batch" is T/(N+1) and we have N of these in a block.

_IPC-5

The text was updated successfully, but these errors were encountered:

raulk changed the title ~~Benchmark Fendermint~~ [IPC-5] Benchmark Fendermint Dec 14, 2023

jsoares transferred this issue from consensus-shipyard/fendermint Dec 19, 2023

jsoares added the s:fendermint label Dec 19, 2023

maciejwitowski added the improvement non-functional change label May 13, 2024

maciejwitowski mentioned this issue May 13, 2024

Fendermint benchmarking and performance improvements #921

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[IPC-5] Benchmark Fendermint #160

[IPC-5] Benchmark Fendermint #160

aakoshh commented Nov 13, 2023 •

edited by raulk

Loading

[IPC-5] Benchmark Fendermint #160

[IPC-5] Benchmark Fendermint #160

Comments

aakoshh commented Nov 13, 2023 • edited by raulk Loading

aakoshh commented Nov 13, 2023 •

edited by raulk

Loading