The present disclosure relates to a data processing apparatus. More particularly, it relates to branch prediction in a data processing apparatus.
A data processing apparatus which performs data processing operations in response to executed data processing instructions may be provided with branch prediction capability in order to predict whether a branch defined in a branch instruction will likely be taken or not. This capability allows a target instruction to be retrieved earlier from a target address, in particular before it is definitively known whether the branch will be taken not. This mitigates against the latency associated with the retrieval of the target instruction from the target address in memory.
One approach to branch prediction may be to combine relative weights stored in a number of tables to generate a combined value which, if it exceeds a threshold, causes the branch to be predicted and taken. One example of such a table is a bias table providing a prediction based on the address of the branch instruction. However, the limited space typically available for storing such tables can result in conflicts (aliasing), since several different branches can share the same entry.
It would be desirable to provide an improved technique for branch prediction.
Some embodiments provide an apparatus comprising: branch target storage to store entries comprising indications of branch instruction source addresses and indications of branch instruction target addresses, wherein the entries each further comprise a bias weight; history storage to store history-based weights for branch instruction source addresses, wherein a history-based weight is dependent on whether a branch to a branch instruction target address from a branch instruction source address has previously been taken for at least one previous encounter with the branch instruction source address; and prediction generation circuitry to receive the bias weight and the history-based weight of the branch instruction source address and to generate either a taken prediction or a not-taken prediction for the branch.
Some embodiments provide a method of branch target prediction in a data processing apparatus comprising the steps of: storing entries in branch target storage comprising indications of branch instruction source addresses and indications of branch instruction target addresses, wherein the entries each further comprise a bias weight; storing in history storage history-based weights for branch instruction source addresses, wherein a history-based weight is dependent on whether a branch to a branch instruction target address from a branch instruction source address has previously been taken for at least one previous encounter with the branch instruction source address; and receiving the bias weight and the history-based weight of the branch instruction source address and generating either a taken prediction or a not-taken prediction for the branch.
Some embodiments provide an apparatus comprising: means for storing entries comprising indications of branch instruction source addresses and indications of branch instruction target addresses, wherein the entries each further comprise a bias weight; means for storing history-based weights for branch instruction source addresses, wherein a history-based weight is dependent on whether a branch to a branch instruction target address from a branch instruction source address has previously been taken for at least one previous encounter with the branch instruction source address; and means for receiving the bias weight and the history-based weight of the branch instruction source address and for generating either a taken prediction or a not-taken prediction for the branch.
The present techniques will be described further, by way of example only, with reference to embodiments thereof as illustrated in the accompanying drawings, in which:
Some embodiments provide apparatus comprising: branch target storage to store entries comprising indications of branch instruction source addresses and indications of branch instruction target addresses, wherein the entries each further comprise a bias weight; history storage to store history-based weights for branch instruction source addresses, wherein a history-based weight is dependent on whether a branch to a branch instruction target address from a branch instruction source address has previously been taken for at least one previous encounter with the branch instruction source address; and prediction generation circuitry to receive the bias weight and the history-based weight of the branch instruction source address and to generate either a taken prediction or a not-taken prediction for the branch.
The present techniques recognise that increasing the size of a table in order to seek to avoid aliasing is undesirable because it requires greater area to be occupied in the apparatus, has higher power consumption, and may lower the maximum operating frequency of the apparatus. The present techniques also recognise that whilst less aliasing may be achieved by careful tuning of the indexing function into a table, it is typically nonetheless very difficult to avoid the fact that a small number of branches will still conflict. Storing the bias weight as an additional component of the entries in a branch target storage which already stores corresponding source and target addresses not only allows very efficient storage of these bias weights, but also due to the fact that the entire source address is matched means that the potential for aliasing is removed. For example in the context of a relatively small branch targets storage (such as a micro branch target buffer (μBTB)), which may only store say 64 separate entries, this allows 64 different bias weights to be efficiently stored and a conflict free “bi-modal” weight (i.e. that generated in the prediction generation circuitry) to be provided. By contrast using a standard (separate) bias table significantly more than 64 weights, and a well-tuned indexing function, would be required to achieve even near conflict-free prediction. The prediction generation circuitry can make use of the bias weight and history-based weight in a number of ways in dependence on, for example, the relative importance of these weights, their empirically found ability to make accurate branch predictions and so on.
In some embodiments the prediction generation circuitry is capable of combining the bias weight and the history-based weight to give a combined value and to generate either the taken prediction or the not-taken prediction for the branch in dependence on the combined value. Thus, rather than taking one of the bias weight and the history-based weight in order to generate the branch prediction, a combination of both is used, allowing the prediction to benefit from specific situations in which the bias weight proves better than the history-based weight in providing an accurate branch prediction, and vice versa.
The particular manner in which the bias weight and the history-based weight are combined may take a variety of forms, but in some embodiments the prediction generation circuitry comprises addition circuitry to add the bias weight and the history-based weight to produce the combined value as a sum and threshold circuitry responsive to the sum to generate the taken prediction if the sum exceeds a threshold value. Accordingly a perceptron-type predictor can be provided, and efficiently implemented in terms of the storage space required to support it.
Whilst the apparatus may make use of only one “bias table” (provided by the branch target storage and one history storage), in some embodiments the apparatus further comprises at least one further storage to store at least one further set of weights, wherein the at least one further storage is responsive to a function of at least one of: the branch instruction source address, a global history value, and path information to select a further weight from the at least one further set of weights, and the prediction generation circuitry is capable of combining the further weight with the bias weight and the history-based weight and of generating either the taken prediction or the not-taken prediction for the branch. The provision of more than one history storage in this manner can for example allow different weights to be selected as a function of different items of information or combinations thereof (such as partial addresses, global history, path history, and so on) and a more adaptable predictor is thus provided, since it can respond differently to a greater range of contexts for a given branch instruction.
In some embodiments the apparatus further comprises weight update circuitry responsive to an outcome of the branch to update the bias weight stored in the branch target storage for the branch instruction source address. This enables the branch prediction circuitry to provide a branch prediction which has been dynamically updated during normal operation to take into account the correctness of its previous branch predictions in that normal operation (rather than for example as defined by a pre-configuration before the normal operation began) which can improve the accuracy of further branch predictions which it provides.
The history-based weight may be selected from the history storage in a variety of ways, but in some embodiments the apparatus further comprises a global history storage to store a global history value for branches encountered and index generation circuitry to combine an at least partial branch instruction source address and the global history value to generate a history index used to select the history-based weight from the history storage. For example the global history storage may be provided in the form of a global history register. Such embodiments provide a branch prediction mechanism which is very efficient in terms of the storage capacity it requires, which is nonetheless able to generate its branch prediction both in dependence on global branch prediction outcomes and on specific branch instructions (i.e. their source addresses) and therefore with good accuracy.
The index generation circuitry may combine the at least partial branch instruction source address and the global history value in a variety of ways but in some embodiments the index generation circuitry comprises a hash function to generate the history index.
Whilst the bias weight may be the only “additional” information which is stored in the branch target storage, the present techniques recognise that the efficiency associated with adding to the entries stored in the branch targets storage may find further application. For example in some embodiments the entries each further comprise a selection value and wherein the prediction generation circuitry further comprises selection circuitry responsive to the selection value to generate either the taken prediction or the not-taken prediction for the branch based on either the bias weight or the history-based weight. Accordingly this “meta data” provides one mechanism for the branch prediction to be made based on the particular type of weight which has been found to be useful for this entry (i.e. this branch target address).
In some embodiments the apparatus comprises more than one history storage to store at least one further set of history-based weights, and further comprises history combination circuitry to pass a single history-based weight to the prediction generation circuitry in dependence on outputs of the more than one history storage. Thus, whilst more than one history storage can be provided, only a single history-based weight can nonetheless be passed to the prediction generation circuitry, enabling a simple configuration of the prediction generation circuitry which thus only receives two input values: the bias weight from the branch targets storage and the single history-based weight.
Whilst the history-based weights may take a variety of forms, for example as weights whose possible values exhibit a relatively high degree of granularity (for example in one particular embodiment they could range from −32 to +32), the apparatus may benefit from these weights being provided in the form of “takenness” indications (counter values), for example comprising the four possibilities: strongly not taken, weakly not taken, weakly taken, and strongly taken. These can be efficiently represented by a two-bit value. These kinds of history-based weights can for example be combined to produce the single history-based weight by an efficient mechanism such as a majority decision. Hence in some embodiments the at least one further set of history-based weights are branch prediction counter values and the history combination circuitry is capable of generating the single history-based weight on a majority basis from the branch prediction counter values.
In some embodiments the apparatus comprises multiple pipeline stages and the apparatus further comprises instruction fetch circuitry in a pipeline stage after the prediction generation circuitry, wherein the instruction fetch circuitry is responsive to the taken prediction generated by the prediction generation circuitry to retrieve an instruction stored at the branch instruction target address. The efficient storage, and rapid retrieval from that storage, of the bias weight is more relevant to an earlier pipeline stage since once a later pipeline stage has been reached, further context and relevant information may already be available on which to base a branch prediction. In this context such embodiments may implement the present techniques in an earlier, simpler branch predictor (such as a μBTB), rather than later, more complex branch predictor (such as a branch target address cache (BTAC)).
Some particular embodiments will now be described with reference to the figures.
Note that when a new entry is allocated into the branch target buffer (in any of the example embodiments described herein) the bias weight (and any similar values) are set to a default value. This function is provided by the update circuitry 58 described above (though not explicitly shown in
By way of brief overall summary an apparatus which produces branch predictions and a method of operating such an apparatus are provided. A branch target storage used to store entries comprising indications of branch instruction source addresses and indications of branch instruction target addresses is further used to store bias weights. A history storage stores history-based weights for the branch instruction source addresses and a history-based weight is dependent on whether a branch to a branch instruction target address from a branch instruction source address has previously been taken for at least one previous encounter with the branch instruction source address. Prediction generation circuitry receives the bias weight and the history-based weight of the branch instruction source address and generates either a taken prediction or a not-taken prediction for the branch. The reuse of the branch target storage to bias weights reduces the total storage required and the matching of entire source addresses avoids problems related to aliasing.
In the present application, the words “configured to . . . ” or “arranged to” are used to mean that an element of an apparatus has a configuration able to carry out the defined operation. In this context, a “configuration” means an arrangement or manner of interconnection of hardware or software. For example, the apparatus may have dedicated hardware which provides the defined operation, or a processor or other processing device may be programmed to perform the function. “Configured to” or “arranged to” does not imply that the apparatus element needs to be changed in any way in order to provide the defined operation.
Although illustrative embodiments of the invention have been described in detail herein with reference to the accompanying drawings, it is to be understood that the invention is not limited to those precise embodiments, and that various changes, additions and modifications can be effected therein by one skilled in the art without departing from the scope of the invention as defined by the appended claims. For example, various combinations of the features of the dependent claims could be made with the features of the independent claims without departing from the scope of the present invention.