The subject matter described herein relates to data processing. More specifically, the subject matter relates to methods, systems, and computer readable media for providing Byzantine fault tolerance (BFT).
Computer systems may involve multiple components or parts that can cause faults or failures. For example, a distributed computing system may involve computers that share data storage and are connected via links and network devices. In this example, one or more components in the distributed computing system may fail and may be referred to as a Byzantine fault because the fault and its related symptoms appear differently to different observers (e.g., other system components).
Byzantine fault tolerance (BFT) generally refers to the ability of a computing system or a related application to handle Byzantine faults. For example, a Byzantine fault (e.g., a misconfigured or malfunctioning authentication module) may appear as faulty to only some components of the system. In this example, other components of the system may be unable to identify or note the fault and, as such, those components may assume that the system is working normally. Continuing with this example, a computing system that provides Byzantine fault tolerance may be able to avoid Byzantine failure (e.g., a system failure due to a Byzantine fault) because the computing system may use a fault detection mechanism which can achieve agreement among various system components about whether a Byzantine fault is occurring and then act accordingly.
One mechanism for providing BFT may include utilizing a BFT protocol such that system components can reach consensus regarding potential Byzantine faults. However, issues exist in many known BFT protocols. For example, various BFT protocols are susceptible to attacks that cause system deadlocks, thereby preventing consensus and negatively impacting those systems' performances.
Methods, systems, and computer readable media for providing Byzantine fault tolerance (BFT) are disclosed. According to one method, a method for providing BFT occurs at a computing platform executing a BFT protocol, wherein the computing platform is acting as a leader participant of a round of the BFT protocol. The method comprising: receiving signed round-change messages from multiple participants in the round; broadcasting a signed lock message indicating that signed round-change messages have been received from a predetermined number of the participants in the round voting for a same candidate block; receiving signed commit messages from multiple participants in the round; and broadcasting a signed decide message indicating the candidate block is a finalized block after the predetermined number of the participants in the round have sent signed commit messages indicating the candidate block.
According to one system, a system for providing BFT includes at least one processor and a computing platform implemented using the at least one processor. The computing platform is executing a BFT protocol and is acting as a leader participant of a round of the BFT protocol. The computing platform is configured for: receiving signed round-change messages from multiple participants in the round; broadcasting a signed lock message indicating that signed round-change messages have been received from a predetermined number of the participants in the round voting for a same candidate block; receiving signed commit messages from multiple participants in the round; and broadcasting a signed decide message indicating the candidate block is a finalized block after the predetermined number of the participants in the round have sent signed commit messages indicating the candidate block.
The subject matter described herein can be implemented in software in combination with hardware and/or firmware. For example, the subject matter described herein can be implemented in software executed by a processor. In one example implementation, the subject matter described herein may be implemented using a computer readable medium having stored thereon computer executable instructions that when executed by the processor of a computer cause the computer to perform steps. Example computer readable media suitable for implementing the subject matter described herein include non-transitory devices, such as disk memory devices, chip memory devices, programmable logic devices, and application specific integrated circuits. In addition, a computer readable medium that implements the subject matter described herein may be located on a single device or computing platform or may be distributed across multiple devices or computing platforms.
As used herein, each of the terms “node” and “host” refers to a physical computing platform or device including one or more processors and memory.
As used herein, the term “module” refers to hardware, firmware, or software in combination with hardware and/or firmware for implementing features described herein.
The subject matter described herein will now be explained with reference to the accompanying drawings of which:
The subject matter described herein relates to methods, systems, and computer readable media for providing Byzantine fault tolerance (BFT).
1 Introduction
Lamport, Shostak, and Pease [14] and Pease, Shostak, and Lamport [15] initiated the study of reaching consensus in the face of Byzantine failures and designed the first synchronous solution for Byzantine agreement. Dolev and Strong [8] proposed an improved protocol in a synchronous network with O(n3) communication complexity. By assuming the existence of digital signature schemes and a public-key infrastructure, Katz and Koo [12] proposed an expected constant-round BFT protocol in a synchronous network setting against
Byzantine faults.
For an asynchronous network, Fischer, Lynch, and Paterson [10] showed that there is no deterministic protocol for the BFT problem in face of a single failure in an asynchronous network. Their proof is based on a diagonalization construction and has two assumptions: (1) when a process writes a bit on the output register, it is finalized and cannot change anymore; and (2) an honest process runs infinitely many steps in a run. Several researchers have tried to design BFT consensus protocols to circumvent the impossibility. For example, to circumvent this impossibility result, Ben-Or [1] initiated the probabilistic approach to BFT consensus protocols in completely asynchronous networks and Dwork, Lynch, and Stockmeyer [9] designed BFT consensus protocols in partial synchronous networks. Castro and Liskov [5] initiated the study of practical BFT (PBFT) consensus protocol design and introduced the PBFT protocol for partial synchronous networks. The core idea of PBFT has been used in the design of several widely adopted BFT systems such as Tendermint BFT [3]. Tendermint BFT has been used in more than 40% Proof of State blockchains (see, e.g., [13]) such as the “Internet of Blockchain” Cosmos [6]. More recently, Yin et al [20] improved the PBFT/Tendermint protocol by changing the mesh communication network in PBFT to hub-like (or star) communication networks in HotStuff and by using threshold cryptography. Facebook's Libra blockchain has adopted HotStuff in their LibraBFT protocol [17].
There are generally two kinds of partial synchronous networks for Byzantine Agreement protocols. In Type I partial synchronous networks, all messages are guaranteed to be delivered. In this type of networks, Denial of Service (DoS) attacks are not allowed and reliable point to point communication channels for all pairs of participants are required for the underlying networks. In Type II partial synchronous networks, the network becomes synchronous after an unknown Global Synchronization Time (GST). In this type of networks, Denial of Service (DoS) attacks are allowed before GST though it is not allowed after GST. The Type II network is more realistic and is commonly used in the literature.
Several partial synchronous network models for BFT design assume the existence of reliable broadcast communication channels for certain message transmission. In particular, these protocols normally leverage the gossip-based broadcast protocol in Bracha [2] which is based on the existence of reliable point-to-point communication channels for all pairs of participants. In particular, the broadcast protocol in Bracha [2] assumes a complete network to achieve “a reliable message system in which no messages are lost or generated”. Since our Internet infrastructure is not a complete network, one needs to be very careful in building Internet based BFT protocols using Bracha's results. Specifically, one should not assume that there is a reliable broadcast channel before GST of Type II networks.
The subject matter described herein shows that one can launch attacks against several widely deployed BFT protocols (e.g., Tendermint BFT, Ethereum's Casper FFG, and GRANDPA BFT [11]) so that participants reach a deadlock before GST and the deadlock cannot be removed after GST. Thus, after such attacks, the participants can never reach an agreement even after GST. That is, these BFT protocols cannot achieve the liveness property in type II partial synchronous networks. For Type I networks, one does not know when the message could be delivered. Thus the broadcast protocol may be “unreliable” until the end of a fixed unknown time period. That is, the same attack in the Type II networks could be used to show that these protocol will reach deadlock before the end of this unknown time period. On the other hand, all these protocols will change views after certain timeout and after a view change, participants would not accept messages from previous views. That is, even all messages are delivered at the end of this unknown time period, participants discard these messages if they have changed views already. Thus these protocols will remain deadlocked. In a summary, our attacks show that these BFT protocols are insecure in all types of partial synchronous networks (including both Type I and Type II networks).
It should also be noted that though Tendermint [3] BFT protocol claims security in Type II asynchronous networks, it actually uses a Type I network model since it assumes a reliable point to point communication channel for each pair of participants in the network and no message is ever lost (including messages before GST). However, our discussion in the preceding paragraph shows that Tendermint is not secure in the Type I networks either. It should also be noted that in the first version of the LibraBFT specification (accessed on Jul. 19, 2019), its network model is a Type II partial synchronous network. In the current version [17] of the LibraBFT specification (dated as Nov. 8, 2019 and accessed on Feb. 9, 2020), its network model is essentially a Type I partial synchronous network since all messages are delivered in the end (see pages 3 of Section 2 in [17]).
Based on the security requirement analysis for BFT protocols in asynchronous networks, we propose a BFT finality gadget protocol for blockchains, referred to herein as Blockchain DLS (BDLS). It should be noted that the first BFT protocol (i.e., the DLS protocol) for Type II networks was proposed by Dwork, Lynch, and Stockmeyer [9]. DLS protocol leverages a star network where participants only exchange messages via round leaders. The PBFT protocol allows all participants to broadcast their messages to all other participants. By leveraging this kind of mesh network, PBFT protocol was able to achieve consensus with reduced round complexity. By leveraging the lock-mechanisms in PBFT/Tendermint BFT protocols and changing the mesh network back to star network, HotStuff BFT/LibraBFT is able to achieve consensus with reduced communication complexity but increased round complexity. The BDLS protocol described herein is based on the original DLS protocol [9] and is able to achieve consensus with both reduced round complexity and reduced communication complexity. Specifically, BDLS has the same round complexity as PBFT and has reduced communication complexity than HotStuff BFT/LibraBFT. BDLS is proved to be secure in Type II partial synchronous networks and achieves the best performance among existing BFT protocols for blockchains. Though both BDLS and HotStuff BFT leverages star networks, BDLS employs the lock-mechanisms used in DLS protocol while HotStuff employs the lock-mechanisms used in PBFT/Tendermint BFT protocols. Thus BDLS could achieve consensus in 4 steps while HotStuff requires 7 steps to achieve consensus in synchrony.
2 Synchronous, Asynchronous, and Partial Synchronous Networks
Assume that the time is divided into discrete units called slots
T0, T1, T3 . . . where the length of the time slots are equal. Furthermore, we assume that: (1) the current time slot is determined by a publicly-known and monotonically increasing function of current time; and (2) each participant has access to the current time. In a synchronous network, if an honest participant P1 sends a message m to a participant P2 at the start of time slot Ti
For Type I asynchronous networks, the protocol designer supplies the consensus protocol first, then the adversary chooses her Δ. For Type II asynchronous networks, the adversary picks the Δ and the protocol designer (knowing Δ) supplies the consensus protocol, then the adversary chooses the GST. The definition of partial synchronous networks in [5, 20, 17] is the second type of partial synchronous networks. That is, the value of Δ is known but the value of GST is unknown. In such kind of networks, the adversary can selectively delay, drop, or re-order any messages sent by honest participants before an unknown time GST. But the network will become synchronous after GST. Several BFT protocols in the literature (e.g., Tendermint, GRANDPA, and the current version of LibraBFT dated on Nov. 8, 2019) uses Type II networks, but they also assume that no message gets lost. With this additional assumption, the network is actually a Type I network since all messages are delivered within a time period GST+Δ where GST is unknown and Δis known.
For the Type I network model, Denial of Service (DoS) attack is not allowed since message could be lost with DoS attacks. We think that it is more natural to use Type II asynchronous networks for distributed BFT protocol design and analysis. Thus the subject matter described herein generally refers to Type II network scenarios.
3 Reliable Broadcast Communication Channels
The difference between point-to-point communication channels and broadcast communication channels has been extensively studied in the literature. A reliable broadcast channel requires that the following two properties be satisfied.
For complete networks, reliable broadcast protocols have been proposed in Bracha [2]. For a given integer k, a network is called k-connected if there exist k-node disjoint paths between any two nodes within the network. In non-complete networks, it is well known that (2t+1)-connectivity is necessary for reliable communication against t Byzantine faults (see, e.g., Wang and Desmedt [19] and Desmedt-Wang-Burmester [7]). On the other hand, for broadcast communication channels, Wang and Desmedt [18] showed that there exists an efficient protocol to achieve probabilistically reliable and perfectly private communication against t Byzantine faults when the underlying communication network is (t+1)-connected. The crucial point to achieve these results is that: in a point-to-point channel, a malicious participant P1 can send a message m1 to participant P2 and send a different message m2 to participant P3 though, in a broadcast channel, the malicious participant P1 has to send the same message m to multiple participants including P2 and P3. If a malicious P1 sends different messages to different participants in a reliable broadcast channel, it will be observed by its neighbors.
Though broadcast channels at physical layers are commonly used in local area networks, it is not trivial to design reliable broadcast channels over the Internet infrastructure since the Internet connectivity is not a complete graph and some direct communication paths between participants are missing (see, e.g., [14, 19]). Quite a few broadcast primitives have been proposed in the literature using message relays (see, e.g., Srikanth and Toueg [16], Bracha [2], and Dwork-Lynch-tockmeyer [9]). In the message relay based broadcast protocol, if an honest participant accepts a message signed by another participant, it relays the signed message to other participants. However, in order for these message relay based broadcast protocol to be reliable, it requires that the network graph is complete which is not true for the Internet environments.
A broadcast channel is unreliable if a malicious participant could broadcast a message m1 to a proper subset of the participants but not to other participants. That is, some participants will receive the message m1 while other participants will receive a different message m2 or receive nothing at all. In next sections, we show that several BFT protocols are insecure due to the lack of reliable broadcast channels before GST (messages before GST could get lost or re-ordered by the definition). Thus it is important to design BFT protocols that could tolerate unreliable broadcast channels before GST.
In the following sections, if not specified explicitly, we will assume that there are n=3t+1 participants P0, . . . , Pn−1 for the BFT protocol and at least t of them are malicious. Furthermore, we assume that each participant has a public and private key pair where the public key is known to all participants. We use the notation <⋅>i to denote that the message is digitally signed by the participant Pi.
4 Security Analysis of Tendermint BFT Protocol
Buchman, Kwon, and Milosevic [3] initiated the study of BFT protocols as a finality gadget for blockchains. Specifically, the authors in [3] proposed Tendermint BFT as an overlay atop a block proposal mechanism.
4.1 Tendermint BFT Protocol
Tendermint BFT protocol [3] is based on the PBFT protocol. In Tendermint BFT, there are n=3t+1 participants P0, . . . , Pn−1 and at most t of them are malicious. Each participant maintains five variables step, lockedV, lockedR, validV, and ValidR throughout the protocol run. For each blockchain height h, the protocol runs from round to round until it reaches an agreement for the height h. Then the protocol moves to the next blockchain height. For each round, it contains three steps: propose, pre vote, and precommit. For each height h, the participants start the process by initializing their five variables to: step=propose, lockedV=nil, lockedR=−1, validV=nil, and ValidR=−1. Then it starts from round 0 until an agreement is reached for the height h. There is a public function proposer(h,r) that returns the round leader for a given round r of the height h. The round r of the height h proceeds as follows:
PROPOSAL,h,r,v,vri (1)
to all participants. All other participants Pj initialize the timeout counter to execute OnTimeoutPropose(h,r).
4.2 Attacks on Tendermint BFT Protocol
In this section, we show that Tendermint BFT does not achieve the liveness property in partial synchronous networks. We describe our attack in the Type II networks where the broadcast channel is unreliable before GST.
Specifically, we show that if a malicious participant could choose to broadcast a message to a subset of the users before GST, then the system will reach a deadlock and no new block will be created anymore (even after GST). In other words, the Tendermint BFT will reach deadlock before GST and the deadlock could not be removed after GST. We then extend these attacks on
Tendermint BFT to Type I networks. For simplicity, we assume that for a given height h, the leader participant is P0 and the participants in P1={P0, . . . , Pt−1} are malicious. Furthermore, let P2={Pt, . . . , P2t}, and P3={P2t+1, . . . , P3t}.
Attack 1. In round 0 of height h, P0 chooses a minimal valid value v and broadcasts PROPOSAL,h,0,,v,−1 to participants in P1∪P2. After receiving PROPOSAL,h,0,,v,−1 from P0, each participant P1∈P1 broadcasts PREVOTE,h,0,H(v) to participants in P2 and each participant Pj∈P2 broadcasts PREVOTE,h,0,H(v) to all participants and sets step=prevote. Each participant Pj∈P2 receives 2t+1 messages PREVOTE,h,0,H(v). Thus the participant Pj∈P2 sets lockedV=v, lockedR=0, step=precommit, validV=v, validR=0, and then broadcasts PRECOMMIT,h,0,H(v). Since each participant receives at most t+1 pre-commit messages for the value v, no decision will be made during the round 0. After timeout for round 0, all participants moves to round 1 of height h. The participants in P1 will become dormant from now on. If a participant in P2 becomes the leader of round 1, it will broadcast the proposal PROPOSAL,h,1,v,0. Since participant Pj in P3 has received at most t+1 prevote messages for the value v in round 0, Pj will do nothing until timeout. Thus no honest participant can collect sufficient prevote messages for v to move ahead. After timeout for round 1, the system will move to round 2 of height h. On the other hand, if a participant Pj in P3 becomes the leader of round 1, it will broadcast the proposal PROPOSAL,h,1,v′,−1. Since P0 has selected the value v as the minimal valid value and new transactions have been inserted into the system since then, the honest leader for round 1 will select a valid value v′≠v with high probability. Thus participants in P2 will not accept the proposal for v′ and will broadcast PROVOTE,h,1,nil. That is, no agreement could be made during round 1 and the system will move to round 2 of height h after timeout. This process will continue forever without making an agreement for the height h even after GST.
Attack 2. One can launch an attack on Tendermint BFT so that some participants in P2 will decide on a value v for the height h (though no participant in P3 decides on any value for the height h) and then the system moves to the deadlock. It is noted that due to the lock function in Tendermint BFT and due to the blockchain property, the adversary will not be able to let the participants in P3 to decide on a different value for the height h or h+1.
In the preceding Attack 1, the malicious user needs to control t participants in the set P1. Indeed, we can revise the attack in such a way that the malicious user only needs to control one user P0 to launch a similar attack. We use the same set P1, P2, P3. But this time, we assume that only the leader P0 is malicious and all other participants are honest.
Attack 3. In round 0 of height h, P0 chooses a minimal valid value v and broadcasts PROPOSAL,h,0,,v,−1 to participants in P1∪P2. P0 then broadcasts PREVOTE,h,0,H(v) to participants in P1∪P2 and becomes dormant. After receiving PROPOSAL,h,0,v,−1 from P0, each participant Pj∈(P1{P0})∪P2 broadcasts PREVOTE,h,0,H(v) to all participants and sets step=prevote. Each participant Pj∈P1∪P2 receives 2t+1 messages PREVOTE,h,0,H(v). The participant Pj∈(P1{P0})∪P2 sets lockedV=v, lockedR=0, step=precommit, validV=v, validR=0, and broadcasts PRECOMMIT,h,0,H(v). Since each participant receives at most 2t pre-commit messages for the value v, no decision will be made during the round 0. A similar argument as in the Attack 1 can be used to show that the protocol will enter a deadlock. Please note in this Attack 3, participant Pj in P3 has received at most 2t prevote messages for the value v in round 0, which is still insufficient for P1 to accept a proposal for a locked value v from other participants.
5 Casper FFG
Buterin and Griffith [4] proposed the BFT protocol Casper the Friendly Finality Gadget (Casper FFG) as an overlay atop a block proposal mechanism. In Casper FFG, weighted participants validate and finalize blocks that are proposed by an existing proof of work chain or other mechanisms. To simplify our discussion, we assume that there are n=3t+1 validators of equal weight. The Casper FFG works on the checkpoint tree that only contains blocks of height 100*k in the underlying block tree. Each validator Pj can broadcast a signed vote (Pi:s,t) where s and t are two checkpoints and s is an ancestor of t on the checkpoint tree. For two checkpoints a and b, we say that a→b is a supermajority link if there are at least 2t+1 votes for the pair. A checkpoint a is justified if there are supermajority links a0→a1→ . . . →a where a0 is the root. A checkpoint a is finalized if there are supermajority links a0→a1→ . . . →ai→a where a0 is the root and a is the direct son of ai. In Casper FFG, an honest validator Pi should not publish two distinct votes
Pi:s1,t1ANDPi:s2,t2
such that either
h(t1)=h(t2) OR h(s1)<h(s2)<h(t2)<h(t1)
here h(⋅) denotes the height of the node on the checkpoint tree. Otherwise, the validator's deposit will be slashed. Casper FFG is proved to achieve accountable safety and plausible liveness in [4] where
In order to achieve the liveness property, [4] proposed to use the “correct by construction” fork choice rule: the underlying block proposal mechanism should “follow the chain containing the justified checkpoint of the greatest height”.
The authors in [4] proposed to defeat the long-range revision attacks by a fork choice rule to never revert a finalized block, as well as an expectation that each client will “log on” and gain a complete up-to-date view of the chain at some regular frequency (e.g., once per month). In order to defeat the catastrophic crashes where more than t validators crash-fail at the same time (i.e., they are no longer connected to the network due to a network partition, computer failure, or the validators themselves are malicious), the authors in [4] proposed to slowly drains the deposit of any validator that does not vote for checkpoints, until eventually its deposit sizes decrease low enough that the validators who are voting are a supermajority. Related mechanism to recover from related scenarios such as network partition is considered an open problem in [4].
No specific network model is provided in [4]. Thus it is important to investigate the security of Casper FFG in various network models. The specification in [4] does not have sufficient details to guarantee its claimed plausible liveness. The authors mentioned that the Casper FFG could be used on top of most proof of work chains. However, without further restrictions on the block generation mechanisms, Casper FFG can reach deadlock (so plausible liveness property will not be satisfied). Assume that, at time T, the checkpoint a is finalized (where there is a supermajority link from a to its direct child b) and no vote for b's descendant checkpoint has been broadcast by any validator yet. Now assume that the underlying block production mechanism produced a fork starting from b. That is, b has two descendant checkpoints c and d. If t honest validators vote for c, t+1 honest validators vote for d, and t malicious validators vote randomly, then we reach a deadlock (since no link from b to its descendant can have a supermajority). If the checkpoints are 100 blocks away from each other and if it is expensive/slow to generate blocks (e.g., using proof of work (PoW)) then this kind of fork may be hard to happen though there is still a possibility.
6 Another Finality Gadget: Polkadot's GRANDPA
Based on the Casper FFG protocol, the project Polkadot (https://wiki.polkadot.network/) proposed a new BFT finality gadget protocol GRANDPA [11]. Specifically, Polkadot implements a nominated proof-of-stake (NPoS) system. At certain time period, the system elects a group of validators to serve for block production and the finality gadget. Nominators also stake their tokens as a guarantee of good behavior, and this stake gets slashed whenever their nominated validators deviate from their protocol. On the other hand, nominators also get paid when their nominated validators play by the rules. Elected validators get equal voting power in the consensus protocol. Polkadot uses BABE as its block production mechanism and GRANDPA as its BFT finality gadget. Here we are interested in the finality gadget GRANDPA (GHOST-based Recursive ANcestor Deriving Prefix Agreement) that is implemented for the Polkadot relay chain. GRANDPA contain two protocols, the first protocol works in partially synchronous networks and tolerates ⅓ Byzantine participants. The second protocol works in full asynchronous networks (requiring a common random coin) and tolerates ⅕ Byzantine participants. In contrast to Casper FFG, GRANDPA voters can cast votes simultaneously for blocks at different heights and GRANDPA only depends on finalized blocks to affect the fork-choice rule of the underlying block production mechanism.
The first GRANDPA protocol assumes that after an unknown time GST, the network becomes synchronous. However, it also assumes that all messages are delivered before time GST+Δ for some given value Δ. That is, no message gets lost. This network model is equivalent to our Type I asynchronous network and will not tolerate DoS attacks and network partition attacks. In the following paragraphs, we will show that GRANDPA is not even secure in the synchronous network.
Assume that there are n=3t+1 participants P0, . . . , Pn−1 and at most t of them are malicious. Each participant stores a tree of blocks produced by the block production mechanism with the genesis block as the root. A participant can vote for a block on the tree by digitally signing it. For a set S of votes, a participant Pi equivocates in S if Pi has more than one vote in S. S is called tolerant if at most t participants equivocate in S. A vote set S has supermajority for a block B if
|{Pj:Pi votes for B*}∪{Pi:Pi eguivocates}|≥2t+1
where Pi votes for B* mean that Pi votes for B or votes for a descendant of B. The ⅔-GHOST function g(S) returns the block B of the maximal height such that S has a supermajority for B. If a tolerant vote set S has a supermajority for a block B, then there are at least t+1 voters who do vote for B or its descendant but do not equivocate. Based on this observation, it is easy to check that if s⊆T and T is tolerant, then g(S) is an ancestor of g(T).
The authors in [11] defined the following concept of possibility fora vote set to have a supermajority for a block: “We say that it is impossible for a set S to have a supermajority for a block B if at least 2t+1 voters either equivocate or vote for blocks who are not descendant of B. Otherwise it is possible for S to have a supermajority for B.” Then the authors [11] claimed that “a vote set S is possible to have a supermajority for a block B if and only if there exists a tolerant vote set T⊇S such that T has a supermajority for B”. However, this claim has semantic issues in practice. For example, assume that blocks B and C are inconsistent and the vote set S contains the following votes:
1. t malicious voters vote for B, one honest voter votes for B.
2. 2t honest voters vote for C.
By the definition of [11], S is not impossible to have a supermajority for B. Thus S is possible to have a supermajority for a block B. Since honest voters will not equivocate, there does not exist a semantically valid tolerant vote set T⊇S such that T has a supermajority for B. This observation could easily be used to show that the GRANDPA protocol cannot achieve the liveness property (see our discussion in next paragraphs).
6.1 GRANDPA Protocol
The GRANDPA protocol starts from round 1. For each round, one participant is designated as the primary and all participants know who is the primary. Each round consists of two phases: prevote and precommit. Let Vr,i and Cr,i be the sets of prevotes and precommits received by Pi during round r respectively. Let E0,i be the genesis block and Er,i be the last ancestor block of g(Vr,i) that is possible for Cr,i to have a supermajority. If either Er,i<g(Vr,i) or it is impossible for Cr,i to have a supermajority for any children of g(Vr,i), then we say that Pi sees that round r is completable. Let Δ be a time bound such that it suffices to send messages and gossip them to everyone. The protocol proceeds as follows.
At any time after the precommit step of round r, if Pi sees that B=g(Cr,i) is descendant of the last finalized block and Vr,i has a supermajority, then Pi finalizes B.
6.2 Attacks on GRANDPA Protocol
In this section, we show that GRANDPA protocol cannot achieve the liveness property even in the synchronous networks. Assume that Er−1,0= . . . =Er−1,n−1. During round r, the block production mechanisms produced a fork for Er−1,0. That is, two child blocks B and C of Er−1,0 are produced. At round r, t+1 voters (including all malicious voters) prevote for B and the remaining honest 2t voters prevote for C. For each voter Pi, we have g(Vr,i)=Er−1,i. Thus each Pi precommits g(Vr,i)=Er−1,i. Now each voter Pi estimates Er,i=g(Vr, i)=Er−1,i. Since it is possible for Cr,i to have a supermajority for any child of Er,i, the round r is not completable. That is, the process stuck at round r forever.
Even if one can revise the “possible” definition in the GRANDPA to resolve the issues that we have discussed in the preceding paragraph, our attacks on Tendermint could be easily mounted against GRANDPA protocol also. Thus GRANDPA protocol could not be secure in Type II networks.
7 A Secure BFT protocol in Type II Partial Synchronous Networks
In this section, we propose a Byzantine Agreement Protocol that achieves safety and liveness properties in Type II partial synchronous networks. Though our protocol could be used in other scenarios such as State Machine Replication (SMR), we present the protocol as a finality gadget for blockchains. Assume that there is a separate block proposal mechanism that produces children blocks for finalized blocks by our BFT finality gadget. Let B0, . . . , Bh−1 be the blockchain where B0 is the genesis block and Bh−1 is the most recently finalized head block. The block proposal mechanism may produce several child blocks B0h, B1h, . . . , Bn
7.1 The BFT Protocol (BDLS)
Our BFT protocol is based on the original DLS protocol in Dwork, Lynch, and Stockmeyer [9] and we call it a Blockchain version of DLS (BDLS). For each blockchain height h, BDLS protocol runs from round to round until it reaches an agreement for the height h. Then the protocol moves to the next blockchain height h+1. Let P0, . . . , Pn−1 be the n=3t+1 participants of the protocol. Assume that there are n
Generally, we can use a robust threshold signature scheme to reduce the authenticator complexity, e.g., achieve linear authenticator complexity. For simplicity, the following protocol description is based on a standard digital signature scheme. It could be easily revised to use a threshold signature scheme. Following Dwork, Lynch, and Stockmeyer [9], we assume that all messages after the unknown global stabilization time (GST) will be delivered in the same round and messages before round GST could get lost or re-ordered. Furthermore, though all participants have a common numbering for the round, they do not know when the round GST occurs. A candidate block B′ is acceptable to Pi if Pi does not have a lock on any value except possibly B′. There is a public function leader(h,r) that returns the round leader for a given round r of the height h. For each height h, the BDLS protocol proceeds from round to round (starting from round 0) until the participant decides on a value. The round r of the height h starts when at least 2t+1 participants submit a round-change message to the leader participant. The round r proceeds as follows where Pi=leader(h,r) is the leader for round r:
lock,h,r,B′,proofi (2)
commit,h,r,B′j. (3)
decide,h,r,B′,proof)i. (4)
Remark 1: In the BDLS protocol, the lock release step is a mesh network broadcast. In some applications, one may prefer a star network to reduce the total number of messages from n2 to n, e.g., to achieve linear communication complexity. One may achieve this kind of needs by replacing the “lock release” step with the following additions to the protocol. At the Step 1 of round r, each participant P1 sends the message
all-locked-values, h,r,Bj′j
instead of only sending the message h,r,Bj′j to Pi, where “all-locked-values” is the set of candidate blocks that Pj has locks on. During Step 2, if Pi cannot lock a candidate block during round r, then it broadcasts the candidate block B″=max{B:B∈BLOCKi} together with all locked candidate blocks by all participants. It is straightforward to check that our security analysis in the next section remains unchanged for this protocol revision.
Remark 2: During Step 5 of the BDLS protocol, when a participant receives a decide message, it propagates/broadcasts the decide message to its neighbors. It is recommended that each participant keep broadcasting the signed decide message for height h regularly until it receives at least 2t broadcasts of the decide message for height h from other 2t participants. The importance of this propagation/broadcast is illustrated in Section 9.
Remark 3: To achieve linear communication/authenticator complexity with threshold digital signature schemes, participant 13 may send the signed message (h,r,jh,r,Bj′j) to the leader Pi during step 1. It should be noted that if there are 2t+1 participants that send the same Bj′ to the leader, then the leader Pi can assembly a signature for h,r,Bj′. If there is no such value Bj′, then the leader can only assembly a digital signature for h,r, which can be used for the select message. In the security proof for BDLS in the next section, the leader does not need to assemble a digital signature for Bj′ if it only broadcasts a select message.
7.2 Liveness and Safety
The security of BDLS protocol is proved by establishing a series of Lemmas. The proofs for Lemmas 7.1, 7.2, 7.3 and Theorem 7.4 follow from straightforward modifications of the corresponding Lemmas/Theorem in [9]. For completeness, we include these proofs here also.
Lemma 7.1 It is impossible for two candidate blocks B′ and B″ to get locked in the same round r of height h.
Proof. In order for two blocks B′ and B″ to get locked in one round r of height h, the leader Pi=leader(h,r) must send two conflict lock messages (2) with different proofs. This can only happen if there exist at least t+1 participants Pj each of whom equivocates two messages h,r,B′j and h,r,B″j to Pi. This is impossible since there are at most t malicious participants.
Lemma 7.2 If the leader Pi decides a block value B′ at round r of height h and r is the smallest round at which a decision is made. Then at least t+1 honest participants lock the candidate block B′ at round r. Furthermore, each of the honest participants that locks B′ at round r will always have a lock on B′ for round r′≥r.
Proof. In order for Pi to decide on B′, at least 2t+1 participants send commit messages (3) to Pi at round r of height h. Thus at least t+1 honest participants have locks on B′ at round r. Assume that the second conclusion is false. Let r′>r be the first round that the lock on B′ is released. In this case, the lock is released during the lock release step of round r′ if some participant has a lock on another block B″≠B′ with associated round r″ where r′≥r″≥r. Lemma 7.1 shows that it is impossible for a participant to have a lock on B″ in round r. Thus the participant acquired the lock on B″ in round r″ with r′≥r″>r. This implies that, at the step 1 of round r″, more than 2t+1 participants send signed messages (h,r″,B″) to the leader participant. That is, at least 2t+1 participants have not locked B′ at the step 1 of round r″. This contradicts the fact that at least t+1 participants have locked B′ at the start of round r″.
Lemma 7.3 Immediately after any lock release step at or after the round GST, the set of candidate blocks locked by honest participants contains at most one value.
Proof. This follows from the lock release step.
Theorem 7.4 (Safety) Assume that there are at most t malicious participants. It is impossible for two participants to decide on different block values.
Proof. Suppose that an honest participant Pi decides on B at round r and this is the smallest round at which the decision is made. Lemma 7.2 implies that at least t+1 participants will lock B′ in all future rounds. Consequently, no other block values other than B′ will be acceptable to 2t+1 participants. Thus no participants will decide on any other values than B′.
Theorem 7.5 (Liveness) Assume that there are at most t malicious participants and valid candidate child blocks for Bh are always produced by the block proposal mechanism before the start of first round for height h for all h. Then BDLS protocol will finalize blocks for each height h. That is, the BDLS protocol will not reach a deadlock.
Proof. We consider two cases. For the first case, assume that no decision has been made by any honest participants and no honest participant locks a candidate block at round r where r≥GST is the first round after GST that the leader participant is honest. In this case, if Pi receives 2t+1 signed messages for a candidate block B′ in step 1 of round r, then all honest participants will decides on B′ by the end of round r. Otherwise, Pi broadcasts the maximal candidate block B″ during step 2 of round r. Thus all honest participants will receive this maximum block and this candidate becomes the maximum acceptable candidate block for all honest participants. Then, in round r′>r where r′ is the smallest round after r that the leader participant is honest, all honest participants decide on a maximal block.
For the second case, assume that no candidate block is locked at the start of round GST and some participants hold a lock on a candidate block B′. By Lemma 7.3, there are at most one value locked by honest participants at the end of round GST. Furthermore, at the end of round GST, all the honest participants either decide on B′ or obtain a lock on B′. Thus if no decision is made during round GST, the decision will be made during round GST+1.
7.3 Complexity Analysis
In this section, we compare the performance of PBFT, Tendermint BFT, HotStuff BFT and our BDLS protocols. Three kinds of primitives are used in these protocol design: (1) broadcast from the leader to all participants; (2) all participants send messages to the leader; and (3) all participants broadcast. We use the following symbols to denote these primitives:
: leader broadcasts
: all participants send messages to the leader
: all participants broadcast
In the following, we compare the performance of these protocols after the network is synchronized (that is, after GST) and when the round has an honest leader. For all of these protocols, they will reach agreement within one run of the protocol assuming all participants have all the necessary input values at the start of the protocol and the leader is honest.
8 Implementation and Performance Evaluation
8.1 Chained BDLS and Other Implementation Related Issues
In order to improve efficiency, several blockchain BFT protocols (e.g., Ethereum Casper FFG, HotStuff BFT, and LibraBFT) adopt the chaining paradigm where the BFT protocol phases for commitment are spread across rounds. That is, every phase is carried out in a round and contains a new proposal. The same techniques could be used to construct a chained BDLS. As noted in HotStuff BFT and LibraBFT, the block tree in chained LibraBFT and chained HotStuff BFT may contain “chains” that have gaps in round numbers. Thus the commit logic for LibraBFT and HotStuff BFT requires a 3-chain with contiguous round numbers whose last descendant has been certified. Since BDLS is a 2-phase BFT protocol, chained BDLS “decide” logic requires a 2-chain with contiguous round numbers whose last descendant has been certified.
For chained BFT protocol implementation, the BFT protocol participants for various rounds/heights should be relatively static. If the BFT protocol participants change from rounds to rounds or from heights to heights, it is not realistic to implement chained BFT protocols. Thus chained BFT protocol implementation is suitable for permissioned blockchains such as Libra blockchain while it is not suitable for permissionless blockchains where BFT protocol participants change frequently. The same rule applies to threshold digital signature scheme implementation for BFT protocols. That is, for permissionless blockchains where BFT protocol participants change frequently, it may have limited advantage in using threshold digital signature schemes since the expensive key set-up process has to be run each time when the participants set changes.
In most distributed BFT protocols, when the participants could not reach an agreement in one round, participants move to a new round by submitting round-change request. Thus BFT participants may be in different status and receive different messages. It is important to maximize the period of time when at least 2t+1 honest participants are in the same round. PBFT protocol achieves round synchronization by exponentially increasing the timeout length for each round. That is, if the round 0 of height h has a timeout length of Δ, then the round r of height h will have a timeout length of 2r Δ. On the other hand, Tendermint BFT achieves round synchronization by linearly increasing the timeout length for each round. That is, the round r has a timeout length of rΔ where Δ is the timeout length for round 0 of height h. HotStuff proposes a functionality called PaceMaker to achieve round synchronization without details on how to implement the PaceMaker. LibraBFT implemented the PaceMaker functionality in the following way. When a participant gives up on a certain round r, it broadcasts a timeout message carrying a certificate for entering the round. This brings all honest participants to r within the transmission delay bound. When timeout messages are collected from a quorum of participants, they form a timeout certificate. BDLS may use any of these recommended approaches for round synchronization.
8.2 BDLS with Pacemaker Mechanism
Though BDLS may use a PBFT mechanism to keep round synchronization (that is, the timeout period for round r is 2r Δ), it may be more efficient to use a pacemaker or heartbeat mechanism for BDLS round synchronization. Similar to LibraBFT, the advancement of rounds in BDLS is governed by a module referred to herein as Pacemaker. Pacemaker keeps track of votes and of time. In some embodiments, BDLS may be modified to include Pacemaker so that Pacemaker can be seamlessly integrated into the protocol without extra workload. The major change is Step 1 where Pacemaker timeout messages are combined with round-change messages for efficiency. The round r of the height h for a participant Pj starts when its Pacemaker receives round-change messages from at least 2t+1 participants or if its timeout for round r−1 or if it receives a “lock” or a “select” or a “decide” message for round r. Specifically, the round r proceeds as follows where Pi=leader(h,r) is the leader for round r:
lock,h,r,B′,proofi (5)
select,h,r,B″,proof (6)
commit,h,r,B′j (7)
decide,h,r,B′, proofi (8)
r1=max{r′:Pj holds a lock lock,h,r′,B′,proofi′}.
lock−release,h,r,lock,h,r1,B′,proofi
8.3 BFT Consensus Algorithm
Referring to
In step 203, it is determined whether the message is a round-change message. In step 204, if the message is a round-change message, the round-change message information is stored by the participant for the round indicated by the message. In step 205, it is determined whether the number of received round-change messages for the message round reaches or exceeds the predetermined number (e.g., 2t+1, where t is the number of malicious participants) of participants. In step 206, if the threshold is reached, the participant sends a round-change message if the participant has not already. In step 207, the participant enters a lock status for the round. In step 208, the participant sets a lock timeout timer, wherein if the lock status is removed if the timer runs out. In step 209, it is determined whether the participant is the current participant leader (for the round). If step 210, the current participant leader sets a collection timeout timer so that round-change messages can be received or collected (e.g., the timeout period may be based on round trip latency and/or other information).
Referring to
In step 219, it is determined whether the message is a select message. In step 220, if the message is a select message, it is determined whether the message round is greater than the current round of a participant. If the message round is greater than the current round, step 221 occurs and if not then step 222 occurs. In step 221, the participant moves its current round to the message round including clearing all previous round timers and then step 222 occurs. In step 222, the participant stores the candidate block from the select message as its candidate block and enters a commit status and starts a commit timeout timer (step 237 shown in
In step 223, it is determined whether the message is a commit message. In step 224, if the message is a commit message, it may be determined whether the participant is the current participant leader (for the round). If step 225, the current participant leader determines whether the current round is the same as the round in the commit message and the current candidate block is the same as the candidate block in the commit message. If so, in step 226, the current participant leader determines whether commit messages from at least 2t+1 participants. If so, in step 227, the current participant leader enters a commit status and broadcasts a decide message indicating the candidate block to other participants (step 232) and the current participant leader increments its current height by one (from the height indicated in the decide message), and then enters a round changing status.
In step 228, it may be determined whether a received message is a decide message , In step 229, if the message is a decide message, it may be determined whether the height in the message is the greater than the current height stored at the participant. If so, in step 230, the participant broadcasts the decide message to other participants. In step 231, the participant decides on the candidate block for the height indicated in the decide message and increments its current height by one (from the height indicated in the decide message), and then enters a round changing status.
After entering a round changing status, in step 233, the participant broadcasts a round-change message indicating the current (new) height and sets a round-change timeout timer (step 234), where the round-change status expires at the end of the timer.
Referring to
In step 241, if the timer is a lock release timeout timer and it expires, then the participant broadcasts a round-change message indicating a new round (e.g., increments the current round by 1) (step 242).
In step 243, if the timer is a round-change timeout timer and it expires, then the participant broadcasts a round-change message indicating a new height (e.g., increments the current height by 1) (step 244). In step 245, the participant sets a new round-change timeout timer.
In step 246, if the timer is a collect timeout timer, then before it expires, it is determined whether the participant has received round-change messages from at least 2t+1 participants, and that these messages indicate the same candidate block B′ and B′ is not NULL (step 247). If so, in step 248, the participant broadcasts a lock message to other participants, where the lock message indicates that round-change messages indicating a same candidate block have been received from a at least 2t+1 participants and, after broadcasting the lock message, the participant stops the collect timeout timer (step 249).
In step 246, if the timer is a collect timeout timer and it expires, the participant adds all received candidate blocks to its local variable BLOCKj (step 250). In step 251, the participant broadcasts a lock message to other participants, where the lock message indicates the maximal candidate block from the received candidate blocks and, after broadcasting the lock message, the participant stops the collect timeout timer (step 249).
It will be appreciated that algorithm 200 is for illustrative purposes and that different and/or additional actions may be used. It will also be appreciated that various actions described above with regard to algorithm 200 may occur in a different order or sequence.
Referring to
BFT module 406 may include logic and/or software for performing various functions and/or operations described herein. In some embodiments,
BFT module 406 may include or utilize processor(s) 402 or other hardware to execute software and/or logic. For example, BFT module 406 may perform various functions and/or operations associated with providing BFT and/or related operations. In this example, BFT module 406 may be used in various applications, e.g., a consensus application, a blockchain application, a distributed computing application, and/or an authentication application.
In some embodiments, computer system 400 may include one or more communications interface(s) 412 for communicating with nodes, modules, and/or other entities. For example, one or more communications interface(s) 112 may be used for communications between BFT module 406 and a system operator and a same or different communications interface for communicating with other modules or network nodes.
In some embodiments, processor(s) 402 and memory 404 can be used to execute BFT module 406. In some embodiments, storage 410 can include any storage medium, storage device, or storage unit that is configured to store data accessible by processor(s) 402 via system bus 408. In some embodiments, storage 410 can include one or more databases hosted by or accessible by computer system 400.
In some embodiments, BFT module 406 may perform a method and/or technique (e.g., algorithm 200 or a variation thereof) for providing BFT in an asynchronous (e.g., partially synchronous) environment. For example, BFT module 406 may perform algorithm or a variation of BDLS described herein. In this example, BFT module 406 may perform different actions based on different types of signed messages, current states, and/or various timers when reaching a consensus decision or related functionality.
In some embodiments, BFT module 406 may be associated with participants performing a distributed computing application, e.g., blockchain generation or digital currency mining. In such embodiments, BFT module 405 may utilize algorithm 200 or a similar algorithm to determine a candidate block for a given height and round. For example, computer system 400 may utilize BFT module 406 to execute a BFT protocol, wherein computer system 400 acts as a leader participant of a round in a consensus decision. In this example, computer system 400 or BFT module 406 may receive signed round-change messages from multiple participants in the round; broadcast (e.g., send to multiple participants) a signed lock message indicating that signed round-change messages have been received from a predetermined number of participants (e.g., at least 2t+1 participants, where t represents an amount of malicious participants in the round) indicating a same candidate block (e.g., ); receiving signed commit messages from multiple participants in the round; and broadcasting a signed decide message indicating the candidate block is a finalized block (e.g., after a predetermined number of participants in the round have sent signed commit messages indicating the candidate block).
It will be appreciated that
In some embodiments, process 500 may include steps 502-508 and may be performed by or at one or more devices or modules, e.g., a smartphone or computer implemented using at least one processor.
In some embodiments, a computing platform may execute a BFT protocol including process 500. In such embodiments, the computing platform executing process 500 may act as a leader participant of a round of the BFT protocol, e.g., for achieving consensus in bit mining or another distributed computing application.
Referring to process 500, in step 502, signed round-change messages may be received from multiple participants in a round.
In step 504, a signed lock message indicating that signed round-change messages have been received from a predetermined number of the participants in the round voting for a same candidate block may be broadcasted.
In step 506, signed commit messages may be received from multiple participants in the round.
In step 508, a signed decide message indicating the candidate block is a finalized block may be broadcasted after the predetermined number of the participants in the round have sent signed commit messages indicating the candidate block.
In some embodiments, a predetermined number of the participants in a round may include at least 2t+1 participants, where t represents an amount of malicious participants in the round.
In some embodiments, a participant in the round receives the decide message from the leader participant or another participant and sends the decide message to other participants in the round.
In some embodiments, a candidate block may be a maximal acceptable candidate block for a round.
In some embodiments, a leader participant may change for a subsequent round.
In some embodiments, a round may be associated with a blockchain height and a signed decide message may indicate an agreed upon blockchain height (e.g., agreed upon by at least a predetermined number of participants).
In some embodiments, a participant in a round may utilize a round synchronization technique and a height synchronization technique, wherein the round synchronization technique involves the participant incrementing by one a current blockchain height variable associated with the participant in response to receiving the decide message, and wherein the height synchronization technique involves the participant sending a signed round-change message to the leader in response to the participant receiving a signed look message, a commit message, or a decide message for a subsequent round relative to a current round variable associated with the participant.
In some embodiments, a participant in a round may utilize one or more timers, wherein the one or more timers may include an operation timeout timer, a round changing status timer, or a lock status timer, a commit status timer, or a lock release status timer.
In some embodiments, a participant in a round may utilize an application programming interface (API) for obtaining a participant list for the round or a related blockchain height.
In some embodiments, a participant in a round may check a local participant list after receiving a BFT related message.
It will be appreciated that process 500 is for illustrative purposes and that different and/or additional actions may be used. It will also be appreciated that various actions described herein may occur in a different order or sequence.
It should be noted that computer system 400, BFT module 406, and/or functionality described herein may constitute a special purpose computing device. Further, system 400, BFT module 406, and/or functionality described herein can improve the technological field of BFT and/or related consensus applications (e.g., blockchain applications, distributed data storage applications, etc.), by providing mechanisms and/or techniques for providing BFT using algorithm 200 or similar functionality. As such, various BFT techniques and/or mechanisms described herein can provide improved BFT relative to some existing BFT protocols. For example, such BFT techniques and/or mechanisms described herein, e.g., BDLS or algorithm 200, can provide improved liveness and safety in Type II partial synchronous networks and/or other distributed networks.
The disclosure of each of the following references is incorporated herein by reference in its entirety to the extent not inconsistent herewith and to the extent that it supplements, explains, provides a background for, or teaches methods, techniques, and/or systems employed herein.
8.4 Performance Evaluation
In this section, performance of the BDLS consensus algorithm with a Pacemaker module in Section 8.2 implemented using Go Programming Language is evaluated. The implementation is based on algorithm 200 depicted in
A first testing platform utilized for evaluating an implementation of the BDLS consensus algorithm includes an AMD Ryzen 7 2700X eight-core processor with 64 gigabyte (GB) RAM and Linux 4.19.84-microsoft-standard operating system. A second testing platform utilized for evaluating an implementation of the BDLS consensus algorithm includes a BCM2835 Broadcom chip with 4 cores and 1 GB RAM and a Linux raspberry pi 4.19.75-v7I+ operating system (e.g., for approximating performance of the BFT implementation during a heavy load scenario).
Using the two testing platforms, scenarios involving 20 participants, 30 participants, 50 participants, 80 participants, and 100 participants were tested.
During testing, various network scenarios were simulated by changing values for the following parameters:
8.5 Static and Dynamic BFT Participants
For blockchain environments, the BFT participants may change from height to height (or even from round to round). In such embodiments, to obtain the BFT participant team, each participant may use an API call to obtain the participant list for the height h before submitting the round-change message for a new height h. However, for a permissionless blockchain, the full participant list may not be available at the time when it submits the round-change message. Thus each time, when a participant receives a BFT message, the participant may check whether the sender of the message is in its local list of participants or not. If not, the participant may use an API to check whether the sender is a qualified participant for this height or not. If the sender is a qualified participant, the participant may expand its participant list and adjust the parameters accordingly.
On the other hand, some applications of BDLS BFT protocol may involve static BFT participants. To make the BDLS package more efficient for these applications, one may use an API call to check whether BFT participants change from round to round. If the participant list does not change, the BLDS protocol may not carry out the extra checks discussed in the preceding paragraph.
9 Importance of Propagating Decision Messages
During Step 5 of the BDLS protocol, when a participant receives a decide message, it propagates the decide message to its neighbors. In this section, we show the importance of this process by the potential issues for the HotStuff protocol since it does not have this decision message propagation process.
9.1 HotStuff BFT Protocol
HotStuff BFT [20] includes basic HotStuff protocol and chained HotStuff protocol. For simplicity, we only review the basic HotStuff BFT protocol. Similar to PBFT and Tendermint BFT, there are n=3t+1 participants P0, . . . , Pn−1 and at most t of them are malicious. The view is defined and changes in the same way as in PBFT. The major differences between PBFT and HotStuff BFT are:
With these two differences, HotStuff achieves authenticator complexity O(n) for both the correct leader scenario and the faulty leader scenario. On the other hand, the corresponding authenticator complexity for PBFT is O(n2) for the correct leader scenario and O(n3) for the faulty leader scenario respectively. For simplicity, we will describe the HotStuff BFT protocol using a standard digital signature scheme instead of threshold digital signature schemes. Our analysis does not depend on the underlying signature schemes.
HotStuff BFT has revised the validRound and lockedRound variables in Tendermint BFT to its prepareQC and lockedQC variables respectively. Though Tendermint BFT participants set the values for two variables in the same phase, HotStuff BFT participants set the values for these variables in different steps.
In HotStuff BFT, each participant stores a tree of pending commands as its local data structure and keeps the following state variables viewNumber (initially 1), prepareQC(initially nil, storing the highest QC for which it voted pre-commit), and lockedQC (initially nil, storing the highest QC for which it voted commit).
Each time when a new-viewstarts, each participant should send its prepareQC variable to the leader. There is a public function LEADER(viewNumber)that determines the current leader participant. When a client sends an operation request m to the leader Pi, the n participants carry out the four phases of the BFT protocol: prepare, pre-commit, commit and decide.
9.2 What Happens if Leader Does not Reliably Broadcast Decide Messages in HotStuff
In the following, we describe three scenarios with completely different semantics where the client receives different responses. However, the HotStuff trees are identical for these three scenarios. First assume that at the end of view v−1, we have lockedQC=prepareQC and the HotStuff path corresponding to lockedQC.node is a0→a1→al where a0 is the root.
Assume that the views v and v+1 are executed before GST. That is, the broadcast channel is not reliable before the end of view v+1. Assume that the leader for view v is Pi and the leader for view v+1 is Pi′. Furthermore, assume that both Pi and Pi′ are malicious,
Scenario I: The leader Pi for view v receives 2t+1 new-view messages that contain the identical highQC=prepareQC with the corresponding path a0→a1→al. Pi extends the path to the new path a0→a1→al→b and creates a proposal for the new leaf node b. Pi then broadcasts the digitally signed new leaf node proposal (together with highQC) to all participants in a preparemessage. All participant accept this new leaf node proposal and sends a preparevote message to Pi by signing it. In the pre-commit phase, Pi receives 2t+1 preparevotes for the current proposal, it combines them into a prepareQC and broadcasts prepareQC in a pre-commitmessage to all participants. All participant set their prepareQCvariable to this received prepareQC value and vote for it by sending the signed prepareQC back to Pi. During the commit phase, Pi receives 2t+1 pre-commitvotes. It combines them into a precommitQC and broadcasts it in a commitmessage. All participant set their lockedQCvariable to this received precommitQC value and vote for it by sending the signed precommitQC back to Pi. In the decide phase, Pi receives 2t+1 commitvotes, it combines them into a commitQC. Pi only send the commitQC to one honest participant Pj but not to anyone else. After timeout, the view v+1 starts. During view v+1, the leader participant extends the path a0→a1→al→b to a0→a1→→al→b→c by including a new client command to the node c. Assume that all messages during view v+1 are delivered and all participants behaves honestly. Thus at the end of view v+1, all participants (except Pj) only executed the commands contained the node c and Pj executed the commands contained both in b and c. Since the client only received one response from Pj that the commands in node b is executed, it will not accept it.
Scenario II: In this scenario, the leader participant Pi for view v does not send any decide message in the last step of view v. All other steps are identical to the Scenario I. Thus at the end of view v+1, all participants executed the command contained in the node c though no participants executed the command contained in the node b.
Scenario III: In this scenario, the leader participant Pi for view v sends the decide message to all participants in the last step of view v. All other steps are identical to the Scenario I. Thus at the end of view v+1, all participants executed the commands contained in the nodes b and c.
For all these three scenarios, the path corresponding to the prepareQC at the end of view v+1 is a0→a1→al→b→c though the internal states of honest participants are different.
In the HotStuff BFT protocol [20], it is mentioned that “[i]n practice, a recipient who falls behind can catch up by fetching missing nodes from other replicas”. For all three of the scenarios that we have described, at the end of view v+1, the participant who falls behind may fetch the prepareQC corresponding to the path a0→a1→al→b→c. But it does not know which scenario has happened. It should be noted that in the HotStuff BFT protocol, the node on the tree only contains the following information: the hash of the parent node and the client command. However, it does not contain any information whether the command has been executed. Our analysis shows that it is important to include in the tree node whether a given command has been executed.
BFT consensus in the lens of blockchain. arXiv preprint arXiv:1803.05069, 2018.
It will be understood that various details of the subject matter described herein may be changed without departing from the scope of the subject matter described herein. Furthermore, the foregoing description is for the purpose of illustration only, and not for the purpose of limitation, as the subject matter described herein is defined by the claims as set forth hereinafter.
This application relates to and claims priority to U.S. Provisional Patent Application Ser. No. 62/877,942 filed Jul. 24, 2019 and 62/948,752 filed Dec. 16, 2019, the disclosures of which are incorporated by reference herein in their entireties.
Number | Date | Country | |
---|---|---|---|
62877942 | Jul 2019 | US | |
62948752 | Dec 2019 | US |