Embodiments of the present invention relate generally to prediction techniques, and may be applied more specifically to branch prediction for processors.
Processor design is typically an exercise in trading off performance, power consumption, and efficiency. Techniques that do not require making this tradeoff, that is, that provide an advantage for all three metrics, are highly desirable because they can give a design an advantage over competing designs. Better branch prediction is such a technique. It increases performance by reducing the time spent speculating on a mispredicted path, reduces power consumption by allowing the processor to run at a lower frequency (and hence voltage) and still meet its performance target, and increases efficiency by reducing the work wasted on misspeculation.
Thus a need exists for improved prediction techniques that may be applied to processor branch prediction and other areas.
Various embodiments of the present invention are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements.
A method, apparatus, system, and article for a prophet/critic hybrid predictor are described. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of embodiments of the invention. It will be apparent, however, to one skilled in the art that embodiments of the invention can be practiced without these specific details. In other instances, structures and devices are shown in block diagram form in order to avoid obscuring embodiments of the invention.
Reference in the specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment.
In the execution of software instructions, processors encounter numerous branches. For example, software instructions may include a conditional branch to a subroutine if a variable has a certain value; otherwise, execution continues sequentially along the current instruction path. To increase performance, modem processors speculatively pre-fetch and execute software instructions to avoid wasting the processor's time waiting for instructions to execute and to keep the processor busy. Pre-fetching instructions along the correct path is critical to keeping the processor busy doing useful work. Branch instructions (e.g., conditional branch instructions) pose the challenge of predicting which branch will be taken when the processor executes the software such that instructions associated with the correct branch (i.e., instructions in the correct instruction path) can be pre-fetched for later execution by the processor. If instructions in the incorrect branch path (i.e., instructions following a branch that was mispredicted) are pre-fetched, then time may be wasted in speculatively executing instructions along an incorrect instruction path. In this case, the incorrect instructions may need to be flushed and the process may need to be repaired back to the correct branch path. Thus, accurate branch prediction is important to processor performance.
Some embodiments of the present invention will initially be described by drawing an analogy between a processor executing a software program and taking a ride in a taxi. The taxi is the processor, the driver is the branch predictor, and the passenger is the processor's pipeline. The system of roads represents the paths through the software program. The intersections are branches; that is, points where the driver must decide a particular path to follow. It is the driver's (branch predictor's) job to navigate the taxi through the system of roads, making the correct turns at intersections (branches), to get to the destination (correct point in the software program). Wrong turns waste the passenger's time (incorrect branch predictions waste the processor pipeline's time).
Conventional branch predictors are analogous to a taxi with just one driver. The taxi driver (branch predictor) gets the passenger to the destination using knowledge of the roads acquired from previous trips; i.e., using branch history information stored in the branch predictor's memory structures. When the taxi driver (branch predictor) reaches an intersection (branch), he uses the historical knowledge to decide which way to turn. The driver accesses this knowledge in the context of his current location. Conventional branch predictors access branch history information in the context of the current location (e.g., the program counter) plus a history of the most recent branch decisions that led to the current location.
The prophet/critic hybrid predictors of various embodiments of the present invention are analogous to a taxi with two drivers: a front-seat driver and a back-seat driver. The front-seat driver has the same role as the driver in the single-driver taxi. This role is called the prophet (or prophet predictor). The back-seat driver has the role of a critic (or critic predictor). The critic watches the turns (branch predictions) the prophet makes at intersections (branches) but may not say anything unless the prophet made a bad turn (incorrect branch prediction). When the critic thinks the prophet made a bad turn, the critic may wait until the prophet makes a few more turns (additional branch predictions) to be confident they are lost before saying anything.
Conventional branch predictors may make predictions using branch history information. Once a branch has been predicted, the predictor cannot use the information from subsequent predictions to re-predict the branch. In contrast, embodiments of the prophet/critic hybrid predictor of the present invention may use information from subsequent predictions to improve prediction accuracy.
Referring now to
The prophet predictor 100 may use branch history 102 to predict the direction of a current branch (e.g., taken or not taken). The BUP's branch history 102 for the example shown in
Still referring to
Sometime after the prophet predictor 100 has moved on to predict branches that follow the BUP, the critic predictor 110 may make its own critic prediction for the BUP 116 based on the BUP history+future 112. The critic prediction for the BUP 116 may also be referred to as a “critique” of the prophet's prediction for the BUP 106. The critique 116, whether it agrees or disagrees with the prophet's prediction 106, may be used to generate a final branch prediction for the BUP 120. In one embodiment, the critic prediction for the BUP 116 may be the final prediction for the BUP 120. In another embodiment, the critic prediction for the BUP 116 may be combined with the prophet prediction for the BUP 106 to generate the final prediction for the BUP 120. In one embodiment, the critic prediction for the BUP 116 may be a single bit indicating agreement or disagreement with the prophet and the bit may be exclusive ORed (XORed) with the prophet prediction for the BUP 106 to generate the final prediction for the BUP 120.
Still referring to
Referring now to
For the example shown in
In one embodiment, the branch history register 202 and branch outcome registers 212 may store one bit per branch and may use a 0 bit to represent a branch that is “not taken” and a 1 bit to represent branch that is “taken.”
In one embodiment, the prophet predictor 200 may be based on any one of a variety of branch predictors that predict branches based on branch history information 202 and/or a program counter 220. In one embodiment, the critic predictor 210 may predict branches based on branch future information. In another embodiment, the critic predictor 210 may predict branches based on branch future information and branch history information. In another embodiment, the critic predictor 210 may predict branches based on branch future information and the program counter 220. In another embodiment, the critic predictor 210 may predict branches based on branch future information, branch history, and the program counter 220.
Still referring to
Referring now to
In one embodiment, new entries may be added into the tag table 460 when a branch under prediction misses the filter 460 and the branch is also mispredicted by the prophet predictor. When an entry needs to be added to the tag table 460, a tag 462 may be added in two steps. First, the branch address 450 and the branch outcome register 412 values may be combined according to the first hash function 452 (or other suitable algorithm) to generate an address for the new tag 462. Second, the branch address 450 and the branch outcome register 412 may be hashed according to the second hash function 492 (or other suitable algorithm) to generate the key 496 to be stored as the new tag 462. In this manner, a new tag 462 may be generated for a mispredicted branch so that the next time that branch is encountered, the filtered critic's 400 prediction 416 will be used for the branch. In one embodiment, replacement of existing tags 462 in the filter or tag table 460 are managed according to a least-recently-used (LRU) replacement algorithm.
Referring now to
The prophet/critic hybrid branch predictor (300, 310) may use a branch target buffer to identify conditional branches. When a conditional branch is identified by the branch target buffer, the prophet 300 may make an initial prediction and insert it into the fetch target queue 330. This prediction may be immediately consumed by the instruction cache 340, but since insertions occur at the end of the fetch target queue 330 and the fetch target queue 330 is usually full, the prophet's 300 prediction usually spends many cycles in the fetch target queue before it is consumed by the instruction cache 340. When the prophet's 300 prediction is inserted in the fetch target queue 330, it may also be inserted in the critic's 310 branch outcome register 312 as a future bit for branches previously predicted by the prophet 300. As subsequent predictions are inserted in the fetch target queue 330 by the prophet 300, the critic 310 may gather them as future bits for its branch under prediction (BUP). In one embodiment, when the critic 310 has gathered a predetermined number of future bits (or branch future information) for its branch under prediction, it provides a critique 316 of the prophet's 300 prediction for the BUP.
Still referring to
The critique 316 is usually provided well before the prediction is consumed by the instruction cache 340. However, there may be cases where the instruction cache 340 requires a prediction but the critic 310 has not gathered the predetermined number of future bits. To address this situation, the critic 310 may provide a critique 316 of the prophet's 300 prediction using the available future bits, or the prophet's 300 prediction can be passed to the instruction cache 340 without having been critiqued by the critic 310.
Referring now to
Embodiments may be implemented in logic circuits, state machines, microcode, or some combination thereof. Embodiments may be implemented in code and may be stored on a storage medium having stored thereon instructions that can be used to program a computer system to perform the instructions. The storage medium may include, but is not limited to, any type of disk including floppy disks, optical disks, compact disk read-only memories (CD-ROMs), compact disk rewritables (CD-RWs), and magneto-optical disks, semiconductor devices such as read-only memories (ROMs), random access memories (RAMs), dynamic random access memories (DRAMs), erasable programmable read-only memories (EPROMs), flash memories, electrically erasable programmable read-only memories (EEPROMs), magnetic or optical cards, network storage devices, or any type of media suitable for storing electronic instructions.
Embodiments may be implemented in software for execution by a suitable computer system configured with a suitable combination of hardware devices.
Referring now to
The processor 610 may include a branch predictor 612 which may be implemented according to any embodiment of the hybrid prophet/critic predictor of the present invention.
The processor 610 may be coupled over a host bus 615 to a memory hub 630 in one embodiment, which may be coupled to a system memory 620 (e.g., a dynamic RAM) via a memory bus 625. The memory hub 630 may also be coupled over an Advanced Graphics Port (AGP) bus 633 to a video controller 635, which may be coupled to a display 637. The AGP bus 633 may conform to the Accelerated Graphics Port Interface Specification, Revision 2.0, published May 4, 1998, by Intel Corporation, Santa Clara, Calif.
The memory hub 630 may also be coupled (via a hub link 638) to an input/output (I/O) hub 640 that is coupled to a input/output (I/O) expansion bus 642 and a Peripheral Component Interconnect (PCI) bus 644, as defined by the PCI Local Bus Specification, Production Version, Revision 2.1 dated June 1995. The I/O expansion bus 642 may be coupled to an I/O controller 646 that controls access to one or more I/O devices. As shown in
The PCI bus 644 may also be coupled to various components including, for example, a network controller 660 that is coupled to a network port (not shown). Additional devices may be coupled to the I/O expansion bus 642 and the PCI bus 644, such as an input/output control circuit coupled to a parallel port, serial port, a non-volatile memory, and the like.
Thus, a method, apparatus, system, and article for a hybrid prophet/critic predictor have been described. While the present invention has been described with respect to a limited number of embodiments, those skilled in the art, having the benefit of this disclosure, will appreciate numerous modifications and variations therefrom. It is intended that the appended claims cover all such modifications and variations as fall within the true spirit and scope of this present invention.