WIP: Replacing BFT with DAG-BFT for better performance #13

Rock-Lee-520 · 2023-05-25T10:22:51Z

Background:

DAG-BFT is a novel consensus algorithm that combines BFT and DAG technologies to achieve better performance. Unlike traditional BFT algorithms, DAG-BFT uses a directed acyclic graph (DAG) to represent the dependency relationships between transactions. This allows nodes to process transactions in parallel and increases system throughput while ensuring security.

The core idea of DAG-BFT is to sort transactions by hash value and use DAG to construct a directed acyclic graph. In this graph, each transaction corresponds to a node, and transactions with equal hash values are considered the same node. When a node receives transactions from other nodes, it needs to verify their validity before linking them to its own node on the DAG.

Because DAG-BFT uses a DAG model, communication between nodes becomes more decentralized and efficient. Nodes only need to communicate with their neighboring nodes, rather than all other nodes as in traditional BFT algorithms. This means that DAG-BFT can support larger networks while also increasing system fault tolerance.

Overall, DAG-BFT is a very promising consensus algorithm that can increase system throughput and fault tolerance while ensuring security by combining DAG and BFT technologies.

The latest research about DAG-BFT:

The earliest idea of DAG-BFT was inspired by the paper All You Need is DAG. Its core concept is to separate the network communication layer from the consensus logic in order to achieve consensus on an infinitely growing sequence of transactions.

Narwhal and Tusk is a consensus system leveraging directed acyclic graphs (DAG). DAG-based consensus has been developed over the last 30 years. The closest theoretical ancestor of Narwhal & Tusk is DAG-Rider . Narwhal and Tusk started as research prototype at Facebook Novi.

The main takeaway from the Narwhal&Tusk paper is that in scalable and high throughput consensus systems, data dissemination should be reliable and decoupled from the ordering mechanism.

The Narwhal system implements a highly scalable and efficient DAG construction, which together with Tusk/Bullshark, sequences 100k+ transactions per second (tps) in a geo-replicated environment while keeping latency below 3s. For comparison, tendermint' BFT sequences less than 1k tps in the same environment.

Bullshark replaces Tusk for even greater performance,A simplified version of Bullshark that is used in Sui network today.

Compared to previous partially synchronous BFT consensus protocols (e.g., PBFT, SBFT, Tendermint/Hotstuff), Bullshark has (arguably) the simplest implementation (200 lines of code on top of the Narwhal system).

The figure below illustrates the latency and throughput of Bullshark, Tusk, and an improved version of HotStuff for varying numbers of validators in the failure-free case:

As the figure shows, Bullshark strikes a balance between the high throughput of Tusk and the low latency of HotStuff. Its throughput is 2x higher than HotStuff, reaching 110,000 tx/s (for a committee of 10) and 130,000 tx/s (for a committee of 50), while its latency is 33% lower than Tusk. This is because Bullshark commits in two rounds on the DAG while Tusk requires 3. Both Bullshark and Tusk scale better than HotStuff when increasing the committee size due to the underlying Narwhal system.

The figure below depicts the performance of Bullshark, Tusk, and an improved version of HotStuff when a committee of 10 validators suffers 1 to 3 crash faults:

HotStuff suffers a massive degradation in throughput as well as a dramatic increase in latency. For 3 faults, the throughput of HotStuff drops by over 10x and its latency increases by 15x compared to no faults. In contrast, both Bullshark and Tusk maintain high throughput: the underlying Narwhal DAG continues collecting and disseminating transactions despite the crash faults, and is not overly affected by the faulty validators. The reduction in throughput is in great part due to losing the capacity of faulty validators to disseminate transactions. When operating with 3 faults, both Bullshark and Tusk provide a 10x throughput increase and about 7x latency reduction with respect to HotStuff.

The separation between data dissemination and the metadata ordering guarantees network speed throughput regardless of the DAG construction speed or the latency of the consensus used for the ordering. Data is disseminated at network speed by the workers, and the DAG contains digests that refer to all the disseminated data. Once a vertex in the DAG is committed, the vertex’s entire causal history (that contains most of the disseminated data) is ordered.

Solution:

We are currently discussing why we should use DAG-BFT instead of BFT, and in the future, we will demonstrate how to replace the existing tendermint' BFT with the new DAG-BFT at DIP.

Many thanks to

Many thanks to Rati Gelashvili, Eleftherios Kokoris-Kogias, Alberto Sonnino , Sui team , Facebook Novi team and all the researches

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Replacing BFT with DAG-BFT for better performance #13

WIP: Replacing BFT with DAG-BFT for better performance #13

Rock-Lee-520 commented May 25, 2023 •

edited

Loading

WIP: Replacing BFT with DAG-BFT for better performance #13

WIP: Replacing BFT with DAG-BFT for better performance #13

Comments

Rock-Lee-520 commented May 25, 2023 • edited Loading

Background:

The latest research about DAG-BFT:

Solution:

Many thanks to

Rock-Lee-520 commented May 25, 2023 •

edited

Loading