The present invention is related to processor architecture, and more particularly to systems for predicting target addresses for indirect branch instructions.
In a processor instruction set, an indirect branch instruction is an instruction that directs a processor to branch program control to a target address specified by the indirect branch instruction. For example, an indirect branch instruction may specify that a target address is stored in some register, where the next instruction should be fetched at the target address found in that register.
A problem is that the target address may not be known when the indirect branch instruction is decoded because it needs to be computed. The processor could wait for the target address to be computed and stored in the designated register before fetching the next instruction at the target address. However, this will slow down the processor. To avoid this, some processor instruction sets include a hint instruction whereby the assembler inserts a hint instruction specifying a predicted target address. This can speed up processor performance, although there is a penalty if the prediction is found to be wrong because then the processor pipeline will need to be flushed and control will need to go back to the original branch.
Some processor architectures include hardware-based prediction of target addresses. In the case in which both hardware-based and software-based predictions of target addresses are available, the processor architecture must be designed in such a way to use either the software hint or the hardware prediction. The way in which the hardware makes this choice can affect performance and power.
Embodiments of the invention are directed to systems and methods for qualifying software branch-target hints with hardware-based predictions.
In an embodiment, a processor includes a branch target address cache storing a table of entries, where each entry has a tag field to store instruction addresses, a target field to store predicted target addresses, and a state field to store state values. Upon decoding an indirect branch instruction, where an entry in the branch target address cache has a tag field value matching the address of the indirect branch instruction, the processor loads into a program counter the value of the target field of the entry depending upon the state value stored in the state field of the entry.
In another embodiment, a method qualifies software target-branch hints with hardware-based predictions. The method includes decoding an indirect branch instruction having an instruction address; computing a target address of the indirect branch instruction; and accessing a branch target address cache to determine if an entry has a stored address value matching the instruction address. The method further includes, provided there is a match, determining a state value stored in the entry, the state value belonging to a set, where the entry has a stored target value; and using the stored target value as the predicted target address for the indirect branch instruction only if the state value belongs to a proper subset of the set.
The accompanying drawings are presented to aid in the description of embodiments of the invention and are provided solely for illustration of the embodiments and not limitation thereof
Aspects of the invention are disclosed in the following description and related drawings directed to specific embodiments of the invention. Alternate embodiments may be devised without departing from the scope of the invention. Additionally, well-known elements of the invention will not be described in detail or will be omitted so as not to obscure the relevant details of the invention.
The term “embodiments of the invention” does not require that all embodiments of the invention include the discussed feature, advantage or mode of operation. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of embodiments of the invention. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises”, “comprising”, “includes” and/or “including”, when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof
Further, many embodiments are described in terms of sequences of actions to be performed by, for example, elements of a computing device. Specific circuits (e.g., application specific integrated circuits (ASICs)), program instructions being executed by one or more processors, or a combination of both, may perform the various actions described herein. Additionally, these sequences of actions described herein can be considered to be embodied entirely within any form of computer readable storage medium having stored therein a corresponding set of computer instructions that upon execution would cause an associated processor to perform the functionality described herein. Thus, the various aspects of the invention may be embodied in a number of different forms, all of which have been contemplated to be within the scope of the claimed subject matter. In addition, for each of the embodiments described herein, the corresponding form of any such embodiments may be described herein as, for example, “logic configured to” perform the described action.
Continuing further with a brief description of some of the functional units illustrated in
Target Address Predictor 119 provides hardware prediction for target addresses of indirect branch instructions. As will be described later, embodiments add additional information to the predicted target addresses so that both software hints and hardware prediction are handled in a unified approach. Accordingly, any of the well-known methods of using hardware for predicting the target addresses associated with indirect branch instructions may be used in the disclosed embodiments.
The above-described functional units in the processor system of
Furthermore, many functional blocks are left out or simplified for ease of discussion and illustration. For example, if an instruction is not found in Instruction Cache 104, then an instruction cache miss occurs and another level of system memory hierarchy is accessed to load the desired instruction. Similar comments apply to data stored in Data Cache 118, where the processor system handles a data cache miss by accessing another level of memory hierarchy. As another example of the simplification implied in
Furthermore, the way in which the capabilities of register renaming, instruction scheduling, and the functionality of Instruction Reorder Buffer 112 are utilized to facilitate out-of-order processing and parallelism are well known in the art of processor architecture, and need not be described in this specification to support the disclosed embodiments.
According to embodiments, a branch target address cache, denoted as BTAC 120 in
When an indirect branch instruction is loaded and decoded by Fetch Functional Unit 102, the BTAC table in BTAC 120 is searched using the address of the decoded indirect branch instruction as a key. If a valid entry is found in the BTAC table having a value in the TAG field matching the indirect branch instruction address, then a hit is declared and depending upon the value stored in the STATE field of that entry, the value stored in the TARGET field for that entry may be placed in program counter register PC 124. If the value in the TARGET field is placed in PC 124, then the next instruction loaded by Fetch Functional Unit 102 is fetched from Instruction Cache 104 (or a higher level in the memory hierarchy) at the predicted target address stored in PC 124.
For some embodiments, in determining whether to use the value provided in the TARGET field of an entry in the BTAC table for which there is a hit, the value of the state in the STATE field of the entry is compared to a threshold. For some embodiments, the value in the TARGET field for that BTAC table entry is taken as the predicted target address and placed into PC 124 only if the value of the state for that entry exceeds the threshold, whereas for some embodiments this is done only if the value of the state is equal to or greater than the threshold.
The above determination involving the comparison of the state value to a threshold may be generalized as follows. The value provided in the TARGET field in the entry for which there is a hit is taken as the predicted target address and placed into PC 124 only if the value of the state in the STATE field of the entry belongs to some set of state values. In practice, this set of state values is a proper subset of the set of all possible state values. An example is given below.
Suppose for the embodiment illustrated in
On the other hand, if the hardware prediction was wrong, that is, if it is found at a later time that the predicted target address is incorrect, then the state transition labeled 212 HW Incorrect is taken, indicating that the state transitions from the SH state to the WH state (204). Various pipelines will need to be flushed and program control needs to move back to the indirect branch instruction for which the target address was incorrectly predicted. Such techniques for handling a branch misprediction are well known in the art of processor architecture and need not be described in this specification because it is ancillary to the teaching of the disclosed embodiments.
Suppose for the embodiment illustrated in
On the other hand, if the hardware prediction was wrong, then the state transition labeled 216 HW Incorrect is taken, indicating that the state transitions from the WH state to the WS state (206).
Now suppose there is a hit on an entry in the BTAC table for which the state is the WS state (206). Then the value in the TARGET field of the BTAC table entry is ignored, and the target address suggested by the relevant software hint for the indirect branch instruction is taken as the predicted target address and placed into PC 124. If later it is determined that the predicted target address suggested by the software hint for the indirect branch instruction is indeed the correct target address, then the state transition for the state in the table entry for that indirect branch instruction is the state transition labeled 218 SW Correct in
On the other hand, if the software prediction was wrong, then the state transition labeled 220 SW Incorrect is taken, indicating that the state transitions from the WS state to the WH state.
Finally, suppose there is a hit on an entry in the BTAC table for which the state is the SS state (208). Then the value in the TARGET field of the BTAC table entry is ignored, and the target address suggested by the relevant software hint for the indirect branch instruction is taken as the predicted target address and placed into PC 124. If later it is determined that the predicted target address suggested by the software hint for the indirect branch instruction is indeed the correct target address, then the state transition for the state in the table entry for that indirect branch instruction is the state transition labeled 222 SW Correct in
On the other hand, if the software prediction was wrong, then the state transition labeled 224 SW Incorrect is taken, indicating that the state transitions from the SS state to the WS state.
In the example of
Alternatively, the state may be encoded by the following two-bit code: State_Value=002 for the SS state; State_Value=012 for the WS state; State_Value=102 for the WH state; and State_Value=112 for the SH state. The hardware prediction is taken only if the state value is such that State_Value≧102. In this case, the threshold as previously discussed is 102.
The above example embodiment is easily generalized to systems employing more than four states.
If the assembler is actively providing software hints, but there is no hit in the BTAC table, then the processor system proceeds with software prediction. If the assembler is not providing software hints, then the processor system may use any well-known technique for hardware prediction, as well as no prediction if a hardware-based predicted target address is not available.
If software hinting is not active (the “N” branch of 304), then standard hardware prediction techniques follow. A determination is made as to whether there is a hit in the BTAC table (314). If there is a hit (the “Y” branch of 314), then the processor system proceeds with hardware prediction (312). If there is not a hit (the “N” branch of 314), then the processor system does not proceed with target address prediction (313).
An example of assembly language code for an ARM® processor containing a software-based branch instruction hint and an indirect branch instruction is provided in Table 1 below, where comments on the instructions follow the semi-colon. (ARM is a trademark of ARM Ltd.) In the example of Table 1, the assembler has provided the instruction hint PLI indicating that the predicted target address for the indirect branch instruction BLX is the value stored in register R1. Note that the first instruction computes the value stored in register R1. However, the target address for the BLX instruction is easily predicted, for it always is the invariant value stored in register R1, so that in this example the hardware prediction should override the software prediction. This would be the case for the embodiments described above, so that it is expected that embodiments for examples of the type illustrated in Table 1 are more time and power efficient than prior art systems relying only upon software hints.
Embodiments may find widespread application in numerous systems, such as a cellular phone network. For example,
Embodiments may be used in data processing systems associated with Communication Device 406, or with Base Station 404C, or both, for example.
Those of skill in the art will appreciate that information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.
Further, those of skill in the art will appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The methods, sequences and/or algorithms described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor.
Accordingly, an embodiment of the invention can include a computer readable media embodying a method for qualifying software branch-target hints with hardware-based predictions. Accordingly, the invention is not limited to illustrated examples and any means for performing the functionality described herein are included in embodiments of the invention.
While the foregoing disclosure shows illustrative embodiments of the invention, it should be noted that various changes and modifications could be made herein without departing from the scope of the invention as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the embodiments of the invention described herein need not be performed in any particular order. Furthermore, although elements of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.