Disclosed embodiments relate to branch prediction mechanisms. More particularly, exemplary embodiments are directed to techniques for predicting outcome of instructions, such as compare instructions, and further, encoding the predictions in the instructions.
Branch prediction mechanisms are conventionally employed in computer processors to predict the direction of branches. The direction taken by a branch, such as a conditional branch, may depend on the evaluation of a condition to true or false. For example, a branch instruction may resemble the form, “if <condition—1> jump,” wherein, if condition—1 evaluates to true, the operational flow may jump to executing instructions at a new location indicated by a target address specified by the instruction (this scenario is also referred to as the branch being “taken”). If condition—1 evaluates to false, then the operational flow may continue to execute the next sequential instruction after the branch instruction (this scenario is also referred to as the branch being “not-taken”).
In order to improve instruction level parallelism (ILP), processors may implement branch prediction mechanisms to predict whether the branch will be taken or not taken before the branch instruction is encountered. In this manner, the conditional branch instruction may be scheduled to execute prior to resolution of the condition, condition—1. If the prediction turns out to be false, conventionally used correction mechanisms may include flushing the instructions which were wrongly executed based on the incorrect branch prediction and replaying the instructions in the correct path.
With regard to predicting the outcome of the above conditional branch instruction, several approaches are known in the art. In a first approach, a history of evaluation of the conditional branch instruction itself may be studied, and predictions of taken or not-taken may be made based on the history. The success of this first approach relies on the same conditional branch instruction being evaluated the same way, without focusing on the underlying condition.
A second approach includes the use of predicate registers. The semantics of a predicated branch instruction may resemble the form: “if <predicate—1> jump.” In such predicated branch instructions, the value of the predicate register, predicate—1, would control the direction of the conditional branch between taken and not-taken. Thus, the same predicate register may be used for predicting the direction of several branch instructions, in contrast to the first approach. Moreover, the predicate register may also be employed in conditional instructions that are not branch instructions.
Processors which adopt the use of predicate registers may include instructions to generate the values for the predicate registers, referred to herein as “producer instructions.” The one or more instructions, such as conditional branch instructions, which employ the predicate registers are referred to herein as “consumer instructions.” The consumer instructions are said to be predicated on the producer instructions. Generally, producer instructions which involve a comparison of two operands or values, such as “greater than,” “less than,” “equal to” or combinations thereof, may be used to write or set the predicate registers. An example producer instruction may take the form, “predicate—1=compare (A, B),” wherein the result of a comparison operation of operands A and B will set the predicate register, predicate—1. Thereafter, the value of predicate—1 may control the direction of a consumer instruction, such as the conditional branch described above.
The second approach also suffers from some drawbacks. For example, the correct use of predicate registers requires that they are appropriately updated. In other words, the producer instruction, such as the compare instruction must be fully evaluated, and the corresponding predicate register must be set before any following consumer instruction may be allowed to execute. This creates a bottleneck because implementing logic for performing compare operations may involve significant latency. Moreover, waiting for the producer instruction to fully evaluate and write to the predicate register before allowing the consumer instructions to execute, imposes serialization, thus destroying parallelism.
Accordingly, there is a corresponding need in the art to overcome the drawbacks of the aforementioned approaches related to prediction mechanisms.
Exemplary embodiments of the invention are directed to systems and methods for branch prediction. More particularly, exemplary embodiments are directed to techniques for predicting outcome of a producer instruction, such as a compare instruction, and encoding the predictions in prediction fields of the producer instruction. A consumer instruction such as a conditional branch instruction predicated on the producer instruction may be speculatively executed based on the predicted evaluation of the producer instruction based on the prediction field.
For example, an exemplary embodiment is directed to a method of predicting evaluation of a producer instruction comprising: encoding a prediction field in the producer instruction; and predicting evaluation of the producer instruction, in a processor, using the prediction field.
Another exemplary embodiment is directed to processing system comprising: a memory; a producer instruction stored in the memory, the producer instruction comprising a prediction field; and logic configured to predict evaluation of the producer instruction using the prediction field.
Yet another exemplary embodiment is directed to a processing system comprising: a producer instruction stored in a storage means, the producer instruction comprising a prediction field; and means for predicting evaluation of the producer instruction using the prediction field.
Another exemplary embodiment is directed to a non-transitory computer-readable storage medium comprising code, which, when executed by a processor, causes the processor to perform operations for predicting evaluation of a producer instruction, the non-transitory computer-readable storage medium comprising: code for encoding a prediction field in the producer instruction; and code for predicting evaluation of the producer instruction, in a processor, using the prediction field.
The accompanying drawings are presented to aid in the description of embodiments of the invention and are provided solely for illustration of the embodiments and not limitation thereof.
Aspects of the invention are disclosed in the following description and related drawings directed to specific embodiments of the invention. Alternate embodiments may be devised without departing from the scope of the invention. Additionally, well-known elements of the invention will not be described in detail or will be omitted so as not to obscure the relevant details of the invention.
The word “exemplary” is used herein to mean “serving as an example, instance, or illustration.” Any embodiment described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments. Likewise, the term “embodiments of the invention” does not require that all embodiments of the invention include the discussed feature, advantage or mode of operation.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of embodiments of the invention. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises”, “comprising,”, “includes” and/or “including”, when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
Further, many embodiments are described in terms of sequences of actions to be performed by, for example, elements of a computing device. It will be recognized that various actions described herein can be performed by specific circuits (e.g., application specific integrated circuits (ASICs), by program instructions being executed by one or more processors, or by a combination of both. Additionally, these sequence of actions described herein can be considered to be embodied entirely within any form of computer readable storage medium having stored therein a corresponding set of computer instructions that upon execution would cause an associated processor to perform the functionality described herein. Thus, the various aspects of the invention may be embodied in a number of different forms, all of which have been contemplated to be within the scope of the claimed subject matter. In addition, for each of the embodiments described herein, the corresponding form of any such embodiments may be described herein as, for example, “logic configured to” perform the described action.
Exemplary embodiments are directed to improving efficiency and performance of prediction mechanisms. More specifically, embodiments are configured to expedite and lower costs of implementing prediction for producer instructions, such as compare instructions. Moreover, embodiments allow convenient reuse of the same prediction mechanisms in a single producer instruction for multiple consumer instructions, such as conditional instructions, and more particularly, consumer branch instructions.
In an exemplary embodiment, a producer instruction, such as a compare instruction is configured to include a field for storing prediction information within the producer instruction itself, such that when the producer instruction is read out, the corresponding prediction information may be used to predict evaluation of the producer instruction. Moreover, embodiments allow the prediction information to include one or more prediction state bits to represent a strength or confidence level in the prediction. The prediction state bits may be updated once the actual resolution of the producer instruction is known deep in the pipeline. Prediction logic may be configured to generate a prediction of evaluation of the producer instruction as true or false based on the prediction state bits and other information. For example, the prediction logic may also take into account, other information such as, a history of evaluation of the producer instruction.
With reference now to
In an exemplary implementation, compare instruction 102 may have a corresponding address or program counter (PC) value of 102pc. Further, as shown, compare instruction 102 may comprise several fields, some of which may correspond to conventional instruction formats. For example, field 102op may represent the operation code (commonly known as “op-code”) which comprises encodings for specific operations (e.g. greater than, less than, equal to, etc.). Field 102s may correspond to a source register; field 102i may include an immediate value; and field 102d may correspond to a destination register. Deviating now from conventional instruction formats, compare instruction 102 may include prediction field 102p representing a prediction state in exemplary embodiments.
In one implementation, prediction field 102p may be a single-bit field which may encode the two prediction states, true and false, in one example, the “true” state may correspond to a consumer conditional branch instruction predicated on the producer instruction to be predicted as “taken,” and a “false” state may correspond to a prediction of “not-taken.” In other implementations, (as will be further described below with reference to
With continuing reference to
Prediction logic 104 may have one input as compare instruction 102. The address or PC value, 102pc may also be an input to prediction logic 104. Other information as appropriate may also be input to prediction logic 104. Prediction logic 104 may be configured to extract the relevant information from compare instruction 102, such as prediction states in prediction field 102p. Prediction logic 104 may then correlate the PC value from field 102pc and other information with the prediction state represented by prediction field 102p to index into prediction history table 106. The correlating and indexing may be performed, for example, by logic implementing a hash or XOR functions on the PC value and prediction states. Thereafter, the value stored in the indexed location of prediction history table 106 may be read out as prediction 107, which represents the predicted evaluation of compare instruction 102.
Some embodiments may avoid the use of prediction logic 104 and prediction history table 106, and directly derive prediction 107 of compare instruction 102 from the prediction state bits stored in prediction field 102p. While such implementations are less expensive than the above-described embodiments with prediction logic 104 and prediction history table 106, they may suffer from decreased accuracy of predictions. Skilled persons will recognize suitable implementations for predicting producer instructions, based on a desired tradeoff between accuracy and costs.
As illustrated in
Turning now to
Thus, a bimodal predictor has a buffer for anomalies. In other words, if a particular producer instruction has a tendency to evaluate to true, then a single anomalous false evaluation will not alter the prediction to false. In comparison if a single bit prediction state were employed for the producer instruction with a tendency to evaluate to true, a single anomalous false evaluation would toggle the prediction to false, and thus destroy the indication of the tendency to evaluate to true.
The above-described operational flow for bimodal prediction may be implemented in logic using a two-bit saturating up-down counter. The counter may count up for each evaluation of true and count down for each evaluation of false. While counting up, if the count value reaches the upper extreme value “11” (corresponding to state S11: strongly true), the counter will saturate and remain at this state until a false evaluation causes the counter to count down. Similarly, while counting down, if the count value reaches the lower extreme value “00” (corresponding to state S00: strongly false), the counter will saturate and remain in this state until a true evaluation causes the counter to count up.
Thus, embodiments may embed a prediction field, such as a bimodal prediction field, within a producer instruction, and thereby predict the evaluation of the producer instruction, rather than predict the evaluation of a corresponding consumer instruction. In certain embodiments, embedding a prediction field in a producer instruction may not incur additional costs. For example, compare instruction 102 may have unused or reserved bits, which may be used to store prediction field 102p comprising bimodal prediction states. When compare instruction 102 is first encountered, it is loaded from instruction cache 108 (or from memory if it is not present in instruction cache 108), and executed for example in execution pipeline 112 in processor 110 to obtain the evaluation. Using update logic 114 and updated prediction 115, compare instruction 102 with the updated prediction field 102p may be stored back in instruction cache 108 or memory. The next time compare instruction 102 is encountered, the updated prediction field 102p is consulted to make prediction 107 (e.g. using prediction logic 104 and prediction history table 106). A consumer instruction of compare instruction 102p, such as a conditional branch instruction is then speculatively executed, for example, in execution pipeline 112 using prediction 107, without waiting for compare instruction 102 to complete execution in execution pipeline 112. Once compare instruction 102 completes execution in execution pipeline 112, prediction field 102p may be updated if necessary using update logic 114 as previously described. It will be understood that the consumer conditional branch instruction may need to be replayed if prediction 107 did not match evaluation 113, and updated prediction 115 is used to update prediction field 102p in compare instruction 102 at its storage location, for example, instruction cache 108.
Additionally, it will also be understood that in exemplary embodiments, prediction logic 104 and prediction history table 106 may be reused by multiple producer instructions without any need to replicate such hardware. Accordingly, embodiments comprise low-cost solutions for accurate prediction of individual producer instructions. Moreover, as previously described, several consumer instructions may be predicated on a single producer instruction. Thus, one or more consumer instructions predicated on a single producer instruction may be speculatively scheduled in parallel to exploit ILP, without waiting for the producer instruction to complete execution.
It will be appreciated that embodiments include various methods for performing the processes, functions and/or algorithms disclosed herein. For example, as illustrated in
Those of skill in the art will appreciate that information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.
Further, those of skill in the art will appreciate that the various illustrative blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The methods, sequences and/or algorithms described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor.
Referring to
In a particular embodiment, input device 430 and power supply 444 are coupled to the system-on-chip device 422. Moreover, in a particular embodiment, as illustrated in
It should be noted that although
Accordingly, an embodiment of the invention can include a computer readable media embodying a method for predicting evaluation of a producer instruction. Accordingly, the invention is not limited to illustrated examples and any means for performing the functionality described herein are included in embodiments of the invention.
While the foregoing disclosure shows illustrative embodiments of the invention, it should be noted that various changes and modifications could be made herein without departing from the scope of the invention as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the embodiments of the invention described herein need not be performed in any particular order. Furthermore, although elements of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.