[Not Applicable]
[Not Applicable]
In a software program, conditional branch instructions comprise a significant percentage of all instructions performed by a control processor. Execution of a conditional branch instruction usually involves a number of steps. These steps involve decoding the branch instruction, evaluating the condition posed by the branch instruction, using the result of the evaluation to determine if the next instruction is either the next sequential instruction or the targeted instruction of the branch, and fetching the next instruction. For pipelined control processors, executing all the steps of a branch instruction may require a number of clock cycles. Hence, the overall performance degradation may be significant when there are a large number of branches executed.
A common method of improving performance is to employ a branch history table (BHT) or branch prediction buffer to predict if a branch is taken prior to evaluation of the conditions associated with the branch decision. A branch address is mapped to one of the entries of the BHT and the value associated with the entry predicts if the branch is taken or not. A typical BHT contains thousands of entries, of which each entry may contain a few bits. The number of bits determines the prediction scheme used. For example, there are two prediction states if one bit is used, while there are four prediction states when two bits are used. In the one-bit prediction scheme, an entry remembers the result of the last branch that mapped to a particular entry. When the entry is accessed again, the same result will be predicted. However, if the prediction is incorrect, the value stored in the entry will be corrected.
There are a number of advantages associated with prediction schemes employing more than one bit such as that predicted by a two-bit prediction scheme. For example, the prediction scheme is more accurate using two bits when two conditional branch instructions map to the same entry having the conditions that 1) one of the instructions is executed much more frequently than the other instruction, and 2) the instruction that is executed much more frequently usually results in one outcome of the possible outcomes.
In another instance, a two-bit prediction scheme yields marginal performance improvement when a single conditional branch instruction maps to an entry which almost always results in the same outcome. It is found that the performance of the two-bit scheme is only slightly better than that of the one-bit scheme in this instance; as a consequence, it may not be cost effective to employ twice the memory for implementing the BHT as compared to that of a one-bit prediction scheme. The benefits of an additional bit provided by a two-bit scheme contributes to performance when the actual results of a particular branch instruction is unstable. However, the improvement may not warrant the increase in memory required to implement a typical two-bit prediction scheme.
Further limitations and disadvantages of conventional and traditional approaches will become apparent to one of skill in the art, through comparison of such systems with some aspects of the present invention as set forth in the remainder of the present application with reference to the drawings.
Aspects of the invention provide for a method and system of reducing the size of memory by implementing a predictive scheme used in execution of conditional branch instructions.
In one embodiment, a method of predicting the next state of a conditional branch instruction involves sharing one or more indexed entries in a bit array of a branch history table (BHT) used in implementing a two-bit predictive scheme. Aspects of the present invention incorporates the use of the BHT to predict if a conditional branch instruction is taken or not taken. The BHT utilizes four states in which the next instruction comprises either a branch that is strongly taken, a branch that is taken, a branch that is not taken, and a branch that is strongly not taken.
In one embodiment, a method of predicting the next state of a conditional branch instruction comprises indexing a branch history table comprising a first bit array and a second bit array, in which the second bit array contains a fraction of the number of entries of the first bit array. The number of entries contained in the second bit array may be one-half, one-quarter, or one-eighth the number of entries contained in the first bit array.
In one embodiment, a method of predicting the next state of a conditional branch instruction is performed by mapping a second bit array using a subset of bits used for mapping a first bit array. In one embodiment, the number of bits used for mapping the first bit array exceeds the subset of bits used for mapping the second bit array by one. In one embodiment, the number of bits used for mapping the first bit array exceeds the subset of bits used for mapping the second bit array by two. In yet another embodiment, the number of bits used for mapping the first bit array exceeds the subset of bits used for mapping the second bit array by three.
In yet another embodiment, a system for predicting the next state of a conditional branch instruction is composed of a bit array containing a fraction of a number of entries contained in one or more bit arrays.
In one embodiment, a system for predicting the next state of a conditional branch instruction is composed of a first bit array containing a number of entries and a second bit array containing a fraction of the number of entries. In one embodiment, the fraction of the number of entries is equal to one half of the number of entries. In one embodiment, the fraction of the number of entries is equal to one quarter of the number of entries. In yet another embodiment, the fraction of the number of entries is equal to one eighth of the number of entries.
These and other advantages, aspects, and novel features of the present invention, as well as details of illustrated embodiments, thereof, will be more fully understood from the following description and drawings.
Aspects of the present invention may be found in a method and system to implement a branch prediction scheme used when a branch instruction such as a conditional branch instruction is executed in a software program. A two-bit prediction scheme is presented that utilizes less memory space than that required in a typical two-bit prediction scheme. The two-bit prediction scheme employs a reduction of memory space by utilizing a fraction of a number of entries implemented in a first bit array to implement a second bit array in such a two-bit prediction scheme. Aspects of the invention provides for a system that uses a fraction of memory space previously used for addressing entries in a branch history table of a typical two-bit prediction scheme. In one embodiment, the system employs half the number of entries typically used to address a second bit array of a two-bit prediction scheme. In another embodiment, the system may employ one quarter (25%) of the number of entries typically used to address the second bit of a two-bit prediction scheme. Yet, in one embodiment, the system may employ one eighth (12.5%) of the number of entries typically used to address the second bit of a two-bit prediction scheme. As a result of the decreased number of addressable entries, the memory size required to implement a two-bit branch history table is reduced.
As shown in
For example, when the actual result or outcome is “not taken”, a state transition occurs from state 00 (strongly taken) to state 01 (taken) if the initial state was 00 (strongly taken). As a result, the prediction states will be influenced by the actual outcomes of a conditional branch instruction. It is to be understood that with an n-bit prediction scheme, the number of possible prediction states will equal the value 2n. For example, there are four prediction states when n=2. Of course, the number of prediction states (and the number of bits used to implement the prediction scheme) may vary depending on a particular implementation.
Because the actual results are used to correct or update the prediction states, a conditional branch instruction that is encountered in the future will utilize the updated prediction states. The use of such predictive schemes, as described in
In a number of circumstances, there are advantages associated with the use of more than a single bit in a conditional branch instruction predictive scheme. This may occur, for example, when two conditional branch instructions map to the same entry, such that 1) one of the instructions is executed at a higher frequency than the other instruction, and 2) the instruction that is executed at a higher frequency generates a particular outcome more frequently than the other possible outcomes; in one instance, for example, the executed instruction results in an outcome that is always taken.
In another instance, a conditional branch instruction maps to an entry that usually generates the same outcome. Typically, the second bit provides a more significant effect in a two-bit predictive scheme when an outcome of a conditional branch instruction is unstable. However, in most instances, the outcomes of conditional branch instructions are usually very stable and predictable. Although it is found that the performance of a two-bit predictive scheme is slightly better than that provided by a one-bit predictive scheme, the performance improvement may not justify the cost of implementing the second bit. As a consequence, aspects of the present invention provide for a method and system to implement a two-bit scheme utilizing a fraction of the usual memory space typically used to implement the BHT of the second bit.
Likewise, the second bit array 6 is mapped using bits [m−1:2] of the branch addresses 1, 2 associated with their respective conditional branch instructions. In this example, the bits [m−1:2] associated with branch addresses 1, 2 of each branch instruction map to the same indexed entry 9 in the second bit array 6. However, the value for bit m may differ, for example, facilitating the mapping of two different indexed entries (indicated by reference numbers 7, 8) in the first bit array 5 previously described. For example, if the values for indexed entries 7, 8, 9 are 0, 1, 0, respectively, then the predicted result of instruction a (0,0) is taken while the predicted result of instruction b (1,0) is not taken. The embodiment illustrates how the addresses 1,2 associated with two different conditional branch instructions may share the same entry in the second bit array 6. As a result, the memory size of a bit array may be reduced based on the number of entries addressed. In this embodiment, the second bit array is configured to be one-half the size of the first bit array.
By sharing one or more entries of a bit array, it is possible to further reduce the memory size requirements in a two-bit predictive scheme. For example, if the second bit array 6 is mapped using bits [m−2:2] of the [m:2] bits used in addressing the two-bit BHT shown in
While the invention has been described with reference to certain embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the scope of the invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the invention without departing from its scope. Therefore, it is intended that the invention not be limited to the particular embodiment disclosed, but that the invention will include all embodiments falling within the scope of the appended claims.
This application makes reference to and claims priority from U.S. Provisional Patent Application Ser. No. 60/486,997, entitled “Shared Two-Bit Branch Prediction”, filed on Jul. 14, 2003, the complete subject matter of which is incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
60486997 | Jul 2003 | US |