Disclosed aspects are directed to branch prediction in processing systems. More specifically, exemplary aspects are directed to hybrid branch prediction techniques for identifying and filtering out static branch instructions; and selectively applying complex branch prediction techniques for non-static branch instructions.
Processing systems may employ instructions which cause a change in control flow, such as conditional branch instructions. The direction of a conditional branch instruction is based on how a condition evaluates, but the evaluation may only be known deep down an instruction pipeline of a processor. To avoid stalling the pipeline until the evaluation is known, the processor may employ branch prediction mechanisms to predict the direction of the conditional branch instruction early in the pipeline. Based on the prediction, the processor can speculatively fetch and execute instructions from a predicted address in one of two paths—a “taken” path which starts at the branch target address, with a corresponding direction referred to as the “taken direction”; or a “not-taken” path which starts at the next sequential address after the conditional branch instruction, with a corresponding direction referred to as the “not-taken direction”.
When the condition is evaluated and the actual branch direction is determined, if the branch was mispredicted, (i.e., execution followed a wrong path) the speculatively fetched instructions may be flushed from the pipeline, and new instructions in a correct path may be fetched from the correct next address. Accordingly, improving accuracy of branch prediction for conditional branch instructions mitigates penalties associated with mispredictions and execution of wrong path instructions, and correspondingly improves performance and energy utilization of a processing system.
Conventional branch prediction mechanisms may include one or more state machines which may be trained with a history of evaluation of past and current branch instructions. For example, a bimodal branch predictor uses two bits per branch instruction (which may be indexed using a program counter (PC) of the branch instruction, and also using functions of the branch history as well as a global history involving other branch instruction histories) to represent four prediction states: strongly taken, weakly taken, weakly not-taken, and strongly not-taken, for the branch instruction. While such branch prediction mechanisms are relatively inexpensive and involve a smaller footprint (in terms of area, power consumption, latency, etc.), their prediction accuracies are also seen to be low.
More complex branch prediction mechanisms are emerging in the art for improving prediction accuracies. Among these, complex branch prediction mechanisms, so called neural branch predictors (e.g., Perceptron, Fast Path branch predictors, Piecewise Linear branch predictors, etc.) utilize bias weights and weight vectors derived from individual branch histories and/or global branch histories in making branch predictions. However, these complex branch prediction mechanisms may also incur added costs in terms of area, power, and latency. The energy and resources expended in utilizing the complex branch prediction mechanisms are seen to be particularly wasteful when mispredictions occur, albeit at a lower rate than the mispredictions which may result from the use of the simpler branch prediction mechanisms such as the bimodal branch predictor.
Among the branch instructions which are predicted using the known branch prediction techniques, it is recognized that some branch instructions (e.g., in conventional program codes/applications) are fixed direction branch instructions, in the sense that they always resolve in a fixed or static direction: static/always taken or always/static not-taken. Thus, the energy expenditure associated with branch prediction mechanisms, particularly the complex branch prediction mechanisms, is seen to be wasteful for such static branch prediction mechanisms since their outcomes are invariant.
However, there are no known mechanisms for efficiently recognizing which branch instructions are static branch instructions for selectively filtering these out and applying the complex branch prediction mechanisms for predicting only the branch instructions whose direction may vary and thus benefit from prediction. Thus, there is a corresponding need to improve energy consumption, efficiency, and prediction accuracy of conventional branch prediction mechanisms, e.g., by avoiding the aforementioned wasteful utilization of complex branch prediction mechanisms.
Exemplary aspects of the invention are directed to systems and method for branch prediction. In this disclosure, fixed direction branch instructions refer to branch instructions which always resolve in the same direction, always-taken or always-not-taken. A subset of branch instructions in a program code or application executed by a processor may have outcomes which vary and thus benefit from complex branch prediction mechanisms, while the remaining branch instructions may be fixed direction branch instructions, which are always-taken or always-not-taken and accordingly, deploying complex branch prediction mechanisms may be wasteful for these remaining branch instructions. Correspondingly, an exemplary branch prediction mechanism comprises detecting the subset of branch instructions which are not fixed direction branch instructions, for this subset of branch instructions, utilizing complex branch prediction mechanisms such as a neural branch predictor. Detecting the subset may involve an exemplary process of determining, e.g., by using a state machine, the branch instructions whose outcomes change between a taken direction and a not-taken direction in separate instances of their execution. For the remaining branch instructions which are fixed direction branch instructions, e.g., which are filtered out by the above process, the complex branch prediction techniques are avoided and their fixed direction obtained from the process of filtering.
For example, an exemplary aspect is directed to a method of branch prediction, wherein the method comprises detecting a subset of branch instructions executable by a processor which are not fixed direction branch instructions, wherein the fixed direction branch instructions are always-taken or always-not-taken. For the subset of branch instructions, the method comprises obtaining branch predictions from a neural branch predictor.
Another exemplary aspect is directed to an apparatus, wherein the apparatus comprises a filter configured to detect a subset of branch instructions which are executable by a processor and are not fixed direction branch instructions, wherein the fixed direction branch instructions are always-taken or always-not-taken. The apparatus further comprises a neural branch predictor configured to provide branch predictions for the subset of branch instructions.
Yet another exemplary aspect is directed to a non-transitory computer-readable storage medium comprising code, which, when executed by a computer, causes the computer to perform operations for branch prediction. The non-transitory computer-readable storage medium comprises code for detecting a subset of branch instructions which are not fixed direction branch instructions, wherein the fixed direction branch instructions are always-taken or always-not-taken, and code for obtaining branch predictions from a neural branch predictor, for the subset of branch instructions.
Another exemplary aspect is directed to an apparatus comprising means for detecting a subset of branch instructions which are not fixed direction branch instructions, wherein the fixed direction branch instructions are always-taken or always-not-taken, and means for obtaining branch predictions from a neural branch predictor, for the subset of branch instructions.
The accompanying drawings are presented to aid in the description of aspects of the invention and are provided solely for illustration of the aspects and not limitation thereof.
Aspects of the invention are disclosed in the following description and related drawings directed to specific aspects of the invention. Alternate aspects may be devised without departing from the scope of the invention. Additionally, well-known elements of the invention will not be described in detail or will be omitted so as not to obscure the relevant details of the invention.
The word “exemplary” is used herein to mean “serving as an example, instance, or illustration.” Any aspect described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects. Likewise, the term “aspects of the invention” does not require that all aspects of the invention include the discussed feature, advantage or mode of operation.
The terminology used herein is for the purpose of describing particular aspects only and is not intended to be limiting of aspects of the invention. As used herein, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises,” “comprising,” “includes,” and/or “including,” when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
Further, many aspects are described in terms of sequences of actions to be performed by, for example, elements of a computing device. It will be recognized that various actions described herein can be performed by specific circuits (e.g., application specific integrated circuits (ASICs)), by program instructions being executed by one or more processors, or by a combination of both. Additionally, these sequence of actions described herein can be considered to be embodied entirely within any form of computer-readable storage medium having stored therein a corresponding set of computer instructions that upon execution would cause an associated processor to perform the functionality described herein. Thus, the various aspects of the invention may be embodied in a number of different forms, all of which have been contemplated to be within the scope of the claimed subject matter. In addition, for each of the aspects described herein, the corresponding form of any such aspects may be described herein as, for example, “logic configured to” perform the described action.
Exemplary aspects of this disclosure are directed to systems and methods for branch prediction which overcome the aforementioned drawbacks of conventional branch prediction mechanisms. As previously noted, in this disclosure, fixed direction branch instructions refer to branch instructions which always resolve in the same direction, always-taken or always-not-taken. A subset of branch instructions in a program code or application executable by a processor may have outcomes which vary and thus benefit from complex branch prediction mechanisms. The remaining branch instructions may be fixed direction branch instructions, which are always-taken or always-not-taken and accordingly, deploying complex branch prediction mechanisms may be wasteful for these remaining branch instructions. Correspondingly, an exemplary branch prediction mechanism comprises detecting the subset of branch instructions which are not fixed direction branch instructions, for this subset of branch instructions, utilizing complex branch prediction mechanisms such as a neural branch predictor. Detecting the subset may involve an exemplary process of determining, e.g., by using a state machine, the branch instructions whose outcomes change between a taken direction and a not-taken direction in separate instances of their execution. For the remaining branch instructions which are fixed direction branch instructions, e.g., which are filtered out by the above process, their predicted direction may correspond to their fixed direction, obtained in the process of filtering them out. The above exemplary techniques will now be explained in further detail with reference to the figures.
With reference now to
In an exemplary implementation, branch instruction 102 may have a corresponding address or program counter (PC) value of 102pc. When branch instruction 102 is fetched by processor 110 for execution, logic such as hash 104 (e.g., implementing an XOR function) may utilize the PC value 102pc (and/or other information such as a history of branch instruction 102 or global history) to access filter 106. Filter 106 may involve a state machine, as will be discussed in the following sections, and generally configured to filter out fixed direction branch instructions from a subset of branch instructions whose directions may change. For fixed direction branch instructions, the corresponding direction 121 (always-taken/always-not-taken) is obtained from filter 106.
Further, from filter 106, the subset of branch instructions which are not fixed direction branch instructions are directed to a more complex branch prediction mechanism, exemplarily shown as neural branch predictor 122 (although it will be understood that the precise implementation of the complex branch prediction mechanism is not germane to this discussion, and as such, in various examples, neural branch predictor 122 may be implemented as a Perceptron, Fast Path, Piecewise Linear predictor, etc., as known in the art). From neural branch predictor 122, prediction 123 is obtained for those branch instructions whose outcome may vary.
In exemplary aspects, for branch instructions which are filtered out as fixed direction branch instructions (e.g., by filter 106), neural branch predictor 122 may not be employed and the branch instructions may be speculatively executed in a direction corresponding to direction 121. Correspondingly, in such cases, neural branch predictor 122 may not be utilized and so neural branch predictor 122 may be bypassed, or even gated off or powered down which can lead to energy savings for the cases of fixed direction branch instructions.
Continuing with the description of
Referring now to
The indexed weight vector is shown as selected perceptron 204′ in logic block 210, wherein logic block 210 is used to obtain prediction 123. Specifically, global history 208 is provided as another input to logic block 210, and using a combination of the indexed bias weight, selected perceptron 206, and global history 208, partial sum 206 for branch instruction 102 is calculated e.g., using the example formula, partial sum=bias weight+vector product (selected Perceptron, Global History). Prediction 123 is obtained in one example as corresponding to the sign of partial sum (e.g., using the example formula, prediction=sign (partial sum)) as shown. In some examples, positive and negative signs may respectively correspond to taken and not-taken predictions, without loss of generality. In the illustrated example, the sign of the partial sum is shown to correspond to a “taken” prediction (while the opposite sign may have resulted in a “not-taken” prediction). As mentioned with reference to
With reference now to
Focusing on filter 106, a set of counters 302 are shown to be associated with PC values of branch instructions which may be used as a tag, identified as PC history 304. The PC value 102pc may index into one of counters 302 to obtain the value of the counter. In one implementation if there is a match between 102pc and the corresponding PC history 304 at the indexed location, then corresponding counter 302 at the indexed location may be read out. Counters 302 may be 2-bit counters and may be repurposed from conventional bimodal branch prediction mechanisms which use similar 2-bit counters as state machines to represent the previously mentioned states of strongly taken, weakly taken, weakly not-taken, and strongly not-taken, as known in the art. In filter 106, counters 302 may be utilized to represent, state machines with transitions from one state to another effected through incrementing the counters, wherein determinations of whether a particular branch instruction is a fixed direction branch instruction or not may be based on the state or counter value for a particular branch instruction.
The value of counter 302 read out from the indexed location using 102pc is used as an initial value or state associated with the counter for 102pc, which will be used in the flow chart comprising steps or blocks 306-320. For the following discussion it will be assumed that all counters 302 including the counter corresponding to branch instruction 102 are initialized to a value of “0”.
At block 306, the value of counter 302 corresponding to 102pc for branch instruction 102 is obtained. At block 308, it is determined whether the value of the counter is “0”, and if it is, then in one implementation of filter 106 at block 310, direction 121 may be generated as branch instruction 102 being a fixed direction branch instruction which is always-not-taken. Viewed another way, all branch instructions are initialized or set to an initial prediction state as always-not-taken branch instructions (keeping in mind that in other implementations, all branch instructions may be initialized to an initial state as always-taken instead, with corresponding modifications made to the remaining process steps without deviating from the scope of this disclosure). Branch instruction 102 is speculatively executed in direction 121 set to not-taken.
At block 316, the actual outcome of branch instruction 102 being speculatively executed based on the prediction of being not-taken is obtained, e.g., from bus 115 and if it is determined whether the prediction of not-taken is accurate. If the prediction is correct, then counter 302 is retained at a “0” value and the process returns to block 306. In other words, the initial prediction state of branch instruction 102 as being always-not-taken is maintained until there is a different value of counter 302 encountered in block 306.
If the prediction is not correct, i.e., branch instruction 102 was mispredicted as not-taken, then the value of counter 302 is incremented, and the incremented value (e.g., “1” in this case) is stored in counter 302 following path 317, and the process returns to block 306. Subsequently the process moves to block 312 corresponding to the value of counter 302 being “1”, which leads to direction 121 of branch instruction 102 being a fixed direction branch instruction with a direction of always-taken in block 314. In other words, upon a misprediction of the branch instruction as being a fixed direction always-not-taken branch instruction, the branch instruction is treated as a fixed direction always-taken-branch instruction. Branch instruction 102 is then speculatively executed in direction 121 set to taken.
Subsequently, the process once again returns to block 316 to determine whether the prediction of taken was correct. If the prediction was correct, then counter 302 is retained at the value of “1” to continue providing a fixed direction prediction of taken for branch instruction 102 by returning to block 306 upon each visit to block 316. If at any point in block 316, it is determined that branch instruction 102 was mispredicted as an always-taken fixed direction branch instruction, then counter 302 is further incremented, in this case, to a value of “2”, and the process updates counter 302 via path 317 and returns to block 306.
From block 306, for values of counter 302 greater than or equal to “2”, in block 318, a decision is made to use neural branch predictor 122 (e.g., Perceptron) for predictions of the branch instruction 102 going forward. Viewed another way, branch instruction 102 qualifies as a branch instruction which is among the subset of branch instructions that are predicted using neural branch predictor 122 after having been mispredicted at least once as an always-not-taken branch instruction (i.e., with counter 302 at a value of “0”) and at least once as an always-taken branch instruction (i.e., with counter 302 at a value of “1”). In yet other words, branch instruction 102 is detected or identified as belonging to a subset of branch instructions for which neural branch predictor 122 will be deployed after ensuring that branch instruction 102 is neither a fixed direction always-not-taken branch instruction nor a fixed direction always-taken branch instruction by using the above filtering process.
In block 320, prediction 123 for branch instruction 102 is obtained from the sign of corresponding partial sum 206 in block 320 (e.g., as explained with reference to
Although it is possible to end the process in block 320, this may mean that each branch instruction which has qualified once as belonging to the subset of branch instructions for which neural branch predictor 122 will be used for predictions thereof will continue to have neural branch predictor 122 used in its prediction for each subsequent instance of the branch instruction. However, with time, the nature of some branch instructions may change and transition from a dynamically varying direction to a fixed direction. In order to account for these scenarios, the counters may be periodically or randomly reset to zero in block 322 and path 323 to provide the update of the reset to counters 302, which will cause the related branch instructions to once again go through the filtering process and qualify once again (if appropriate) as belonging to the subset of branch instructions for which neural branch predictor 122 will be used.
In this manner, exemplary aspects may limit the use of neural branch predictor 122 for predicting a subset of branch instructions which are not filtered out as fixed direction branch instructions. Correspondingly, wasteful power/energy consumption by neural branch predictor 122 is minimized or eliminated.
Accordingly, it will be appreciated that exemplary aspects include various methods for performing the processes, functions and/or algorithms disclosed herein. For example,
Block 402 includes detecting a subset of branch instructions which are not fixed direction branch instructions, wherein the fixed branch instructions are always-taken or always-not-taken (e.g., following the steps in blocks 306-316 of filter 106 to determine, at block 308, that branch instruction 102 is not a fixed direction branch instruction).
In block 404, for the subset of branch instructions, obtaining branch predictions from a neural branch predictor (e.g., obtaining branch prediction using neural branch predictor 122 in block 320).
As discussed with reference to
Another example apparatus, in which exemplary aspects of this disclosure may be utilized, will now be discussed in relation to
Accordingly, a particular aspect, input device 530 and power supply 544 are coupled to the system-on-chip device 522. Moreover, in a particular aspect, as illustrated in
It should be noted that although
Those of skill in the art will appreciate that information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.
Further, those of skill in the art will appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the aspects disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The methods, sequences and/or algorithms described in connection with the aspects disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor.
Accordingly, an aspect of the invention can include a computer-readable media embodying a method for branch prediction. Accordingly, the invention is not limited to illustrated examples and any means for performing the functionality described herein are included in aspects of the invention.
While the foregoing disclosure shows illustrative aspects of the invention, it should be noted that various changes and modifications could be made herein without departing from the scope of the invention as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the aspects of the invention described herein need not be performed in any particular order. Furthermore, although elements of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.