SYSTEM AND METHOD FOR CONTROLLING RESTARTING OF INSTRUCTION FETCHING USING SPECULATIVE ADDRESS COMPUTATIONS

Description

BACKGROUND OF THE INVENTION

This invention relates generally to improving computer system efficiency, and more particularly to controlling restarting of instruction fetching using speculative address computations in conjunction with a recycle queue.

Pipeline restarts are very costly in today's highly pipelined microprocessors. One type of pipeline restart is due to instruction stream changes by taken branches. Pipelined microprocessors can include branch prediction logic that attempts to anticipate a branch target address. Due to limitations, the branch prediction logic may miss a prediction. These surprise (non-predicted) guess taken (SGT) branches are detected based on instruction text decoded later in the pipeline, requiring a flush and restart at the correct branch target address. Since the restart is dependent on the computation of the target address there can also be additional latency between the detection and restart point, further increasing the cycle per instruction (CPI) penalty. Modern pipelined microprocessors utilize branch prediction schemes to avoid these costly restart performance penalties. However, the efficient detection and resolution of SGT branches is still important, as there is an inherent “learning” period where the branch prediction logic primes branch target buffers (BTBs) to record branch addresses taken. This learning period is exposed at startup where a branch is encountered for the first time and also due to branch history capacity effects, where a learned branch may be removed and needs to be re-learned when later re-encountered.

In order to minimize the effect of these execution dependencies some microprocessors allow speculative execution in conjunction with a recycle mechanism. For example, speculation may be done on the result of data cache (D-Cache) accesses and addressing mode consistency. In cases where the speculation is incorrect, the operation/instruction is repeated/recycled. This allows a performance gain over always stalling the pipeline until the result (e.g., target address calculated utilizing D-Cache operand return) of an operation is known for certain, as in most cases the speculated result is correct. In such a scheme, the SGT target fetch is speculatively initiated as soon as a branch address is computed and later repeated/restarted if the branch is recycled. However, restarting instruction fetching on speculated address values can cause additional performance bottlenecks, as an instruction can be recycled multiple times before all speculations are resolved. Additionally, useful data in the instruction cache may be replaced with speculatively fetched data that may not actually be useful.

It would be beneficial to develop an approach to control restarting of instruction fetching using speculative address computations in a processor. Such an approach should limit repeated instruction fetching to cases where a wrong target value has been identified. Furthermore, modifications to higher levels of a cache memory hierarchy should be avoided when speculative instructions are not located in local cache. Accordingly, there is a need in the art for controlling restarting of instruction fetching using speculative address computations in a processor.

BRIEF SUMMARY OF THE INVENTION

An exemplary embodiment includes a system for controlling restarting of instruction fetching using speculative address computations in a processor. The system includes a predicted target queue to hold branch prediction logic (BPL) generated target address values. The system also includes target selection logic including a recycle queue. The target selection logic selects a saved branch target value between a previously speculatively calculated branch target value from the recycle queue and an address value from the predicted target queue. The system further includes a compare block to identify a wrong target in response to a mismatch between the saved branch target value and a current calculated branch target, where instruction fetching is restarted in response to the wrong target.

Another exemplary embodiment includes a method for controlling restarting of instruction fetching using speculative address computations in a processor. The method includes receiving a current calculated branch target value, and selecting a saved branch target value between a previously speculatively calculated branch target value in a recycle queue and a predicted target queue. The method also includes identifying a wrong target in response to a mismatch between the saved branch target value and the current calculated branch target, and restarting instruction fetching in response to the wrong target.

A further exemplary embodiment includes a system for controlling restarting of instruction fetching using speculative address computations in a processor. The system includes an instruction fetching unit (IFU) including branch prediction logic (BPL). The BPL generates address values for a predicted target queue. The system also includes an instruction decoding unit (IDU) including surprise (non-predicted) guess taken (SGT) detection logic, and an address generator (AGEN) to generate a calculated branch target value. The calculated branch target is compared against a previously utilized (for target fetching) branch target value, and in response to a miscompare, instruction fetching is restarted at the IFU.

BRIEF DESCRIPTION OF THE DRAWINGS

Referring now to the drawings wherein like elements are numbered alike in the several FIGURES:

FIG. 1 depicts a block diagram of a system for controlling restarting of instruction fetching using speculative address computations in accordance with an exemplary embodiment;

FIG. 2 depicts a block diagram of queues for generating a wrong target indication in accordance with an exemplary embodiment;

FIG. 3 depicts a block diagram of an instruction stream in a processor pipeline in accordance with an exemplary embodiment; and

FIG. 4 depicts a process for controlling restarting of instruction fetching using speculative address computations in accordance with an exemplary embodiment.

DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS

An exemplary embodiment of the present invention provides control for restarting of instruction fetching using speculative address computations. In a pipelined processor, branch prediction logic (BPL) predicts branch target addresses using a history of previously taken branches in an attempt to prevent pipeline stalls that can occur when a branch instruction causes a change in a fetching target address. Surprise guess taken (SGT) branches are guess taken branches that were not predicted by the BPL. Once detected, instruction fetching needs to wait on the calculation of the branch's target address before it can begin fetching the new target stream. Deeply pipelined in-order processors can experience a substantial latency if the pipeline stalls while waiting until all older dependencies are resolved. Speculative execution may be used to mitigate the impact of pipeline stalls, allowing the instructions to advance deeper in the pipeline before reacting. If the speculation is incorrect, the instruction can be recycled by returning it to an earlier position in the pipeline to correctly resolve the instruction. A given instruction may be recycled many times before it is finally executed with all speculations resolved.

As further described herein, the restart cost associated with re-fetching instructions can be further minimized by detecting and handling cases where the branch target address utilized for a target fetch was incorrect and only repeating the fetch in those cases. This is accomplished utilizing a recycle queue that stores the computed targets of instructions at execution time for use upon potential recycle. The recycle queue is used in conjunction with existing wrong target (wrt) compares for traditional predicted target verification. This allows an early restart of instruction fetching as soon as the initial target address of the branch is computed. If, upon recycle, it is determined that the last address used to speculatively restart was wrong, a corrective restart is taken. Otherwise, there is no further restart and the early fetch, based on the now verified to be correct speculative resource, removes a number of cycles from the restart penalty.

In a recycle window, all instructions may be recycled despite their relative dependencies and re-executed in order. For recycles due to data cache misses for instance, there are instructions that are known to be unaffected by the recycle because they are not dependent on the instruction with the cache miss. For these instructions, their target address should be the same both before and after recycle, allowing the use of the compare output as a self-correcting check for enhancing system reliability, availability, and serviceability (RAS). In other words, the recycle window is taken advantage of to obtain the RAS quality of having “N” additional redundant target address calculations, where N is the number of recycle passes, without the actual hardware cost.

In an exemplary embodiment, a configurable throttle mechanism is employed to stop instruction fetches from going beyond the first level (L1) of instruction caching. When a fetch request misses at a level in the cache hierarchy, a request is typically made to the next higher level, where it can also miss. This chain of requests may go all the way back to disk before a request is successful, with increasing latencies at every level. The retrieved line of data is then returned and installed at each level of the hierarchy back down to the L1. By ensuring that instruction fetches based on speculated results under recycle do not go out to the next higher level (L2), any negative performance impact of fetching on an incorrect target address is avoided. This is most useful in the pathological case where the incorrect target address misses the L1 and retrieves a line from memory that replaces the line containing the true restart target instruction. Then on completion of the recycle window, the true target address is fetched after a wrt restart and now misses in the L1 and potentially other levels, adding the associated penalty to the overall SGT restart latency. Using the configurable throttle mechanism, higher levels in the cache hierarchy can remain undisturbed, preventing an incorrectly fetched instruction from replacing data at a location in local cache that may be needed shortly.

Turning now to FIG. 1, a system 100, which represents a portion of a pipeline in a microprocessor, is depicted. The system 100 includes an instruction fetching unit (IFU) 102 that obtains instruction text from an instruction cache (I-Cache) 104 and delivers it to an instruction decode unit (IDU) 106, which parses and determines individual instructions. To execute instructions, a combination of an operand address generator unit (AGEN) 108, a load store unit (LSU) 110 which accesses data cache (D-Cache) 112, and one or more arithmetic units 114 are utilized. BPL resides in the front end of the IFU 102, detecting instruction stream modifying instructions (e.g., taken branches) and facilitating pipeline restarts with minimal penalties. SGT detection is later in the pipeline in the IDU 106 and has a dependency on later stages in the AGEN 108. There is a second level of caching (L2) 116 that connects the I-Cache 104 and D-Cache 112 to the rest of the system memory hierarchy (not depicted). Each of the functional blocks in the system 100 can encompass multiple pipeline stages and may support parallel execution of instruction groups, e.g., a superscalar architecture.

Instructions that are fetched in the system 100 can include branches that redirect the sequence of instructions executed. The BPL in the IFU 102 may predict that a branch will be taken and start fetching instructions at the predicted branch target address. When a branch is not detected by the BPL in the IFU 102, it is later detected as a surprise branch in the IDU 106. If this is a taken indirect branch, a pipeline restart cannot be completed until a target address is generated in the AGEN 108 and delivered to the IFU 102, creating costly multi-stage bubbles in the pipeline. Even at the time when the AGEN 108 performs address calculations, the restart address may not be correct due to instruction dependencies if speculative execution is allowed to minimize the restart penalty. For example, the target of a branch may be dependent on the result of an older load instruction and resolved by an address generation interlock (AGI). To minimize these penalties the pipeline in the system 100 is optimized to execute the dependent instruction as early as possible based on the result of the load instruction before knowing that the result is correct. If a miss occurs in accessing the D-Cache 112 for the load instruction, the resulting data delivered is unpredictable and can lead to dirty/incorrect calculated target addresses for branches that were speculatively executed on the result. In such a case, both the load and the branch are recycled. Various signals can be exchanged in the system 100 to communicate target address issues. For example, the IDU 106 declares detection of an SGT via signal 118 to the IFU 102. The LSU 110 can determine address issues associated with accessing the D-Cache 112 and output a reject signal 120 to both the IFU 102 and the AGEN 108. The AGEN 108 also outputs a restart address signal 122 to the IFU 102 to trigger restarting of instruction fetching.

FIG. 2 depicts a block diagram of queues for generating a wrt signal in accordance with an exemplary embodiment as part of the system 100 of FIG. 1. A recycle queue 202 receives calculated branch target 204 results based on speculative results of older instructions. The recycle queue 202 can receive the calculated branch target 204 values from the AGEN 108 of FIG. 1. These addresses may be used for early fetching of SGT branch targets. This enables detection of cases where the initial calculated branch target 204 was incorrect by comparing values against recomputed addresses under recycle. By definition, a branch cannot be both dynamically predicted and SGT. This supports utilization of pre-existing compare hardware in the system 100 of FIG. 1 to detect the wrt at the minimal additional hardware cost. Target selection logic 206 includes a combination of the recycle queue 202 and a pre-wrt compare multiplexer 208. A wrt can be detected by compare block 210 as a mismatch between the calculated branch target 204 and a saved branch target value output from the target selection logic 206. A corrective restart on the calculated branch target 204 is requested for instruction fetching (I-Fetch) logic 212 in response to the wrt.

FIG. 2 also depicts an example of dual pipelined instruction decoding and execution. Decode pipe 0 branch information 214 and decode pipe 1 branch information 216 provide input to both the I-Fetch logic 212 and sequential branch information queue 218. In an exemplary embodiment, the sequential branch information queue 218 can determine whether a SGT occurred. For example, the sequential branch information queue 218 may be incorporated in the IDU 106 of FIG. 1, determining whether a branch taken was a surprise or not. If an SGT is detected, the pre-wrt compare multiplexer 208 outputs a value from the recycle queue 202 to the compare block 210; otherwise a value from a predicted target queue 220 is output to the compare block 210. The predicted target queue 220 can hold predicted branch target 222 values, as calculated by BPL in the IFU 102 of FIG. 1. The predicted branch target 222 values are also referred to as speculatively computed address values, which can be generated by the BPL in the IFU 102 of FIG. 1. If no SGT is detected, a value from the predicted target queue 220 is compared against the calculated branch target 204 to determine if the wrong target was fetched on the last pass of the branch. Therefore, it is not necessary to restart the I-Fetch 212 to re-fetch an instruction on each pass of recycling, but only if the wrong target is detected. This takes advantage of the most common cases of there being no need to recycle, or that the speculated address was correct and does not change under recycle.

The I-Fetch 212 may also include throttle logic 224. The throttle logic 224 can limit access to a higher level of memory upon a cache miss when instruction fetching is restarted. The throttle logic 224 of FIG. 2 blocks/throttles the I-Fetch 212 from going to the L2 cache 116 if there is an L1 miss (at I-Cache 104 and/or D-Cache 112). Therefore, if there is a incorrect calculated branch target 204 and the address/data happens to be in the I-Cache 104, it is delivered and later flushed on the corrective restart based on the wrt. However, if it is not in the L1, this mechanism avoids the unnecessary memory hierarchy activity and potential performance degradation of removing a potentially useful line in the I-Cache 104 to bring in that based on an incorrect calculated branch target 204.

FIG. 3 illustrates an exemplary timing diagram 300 with multiple groups of instructions 302 passing through various pipeline stages over a series of cycles. The pipeline stages may represent more detailed pipeline stages within the system 100 of FIG. 1. For example, pipeline stages can include D0-D3 to decode instructions (e.g., at IDU 106 of FIG. 1), G1-G3 to dispatch instructions, A0 to perform address generation (e.g., at AGEN 108 of FIG. 1), A1-A3 for cache access (e.g., at LSU 110 of FIG. 1), A4 to execute instructions (e.g., at arithmetic unit 114 of FIG. 1), A5 to put away results and recycle instructions on failure conditions, and A6-A7 to retire instructions. Writes to the sequential branch information queue 218 of FIG. 2 and potential SGT detection may occur in stage D3 as indicated at arrow 304. In an exemplary embodiment, the write of an entry to the recycle queue 202 of FIG. 2 occurs at stage A4, as indicated by arrow 306. At stage A2310 occurring after writing the entry to the recycle queue 202 of FIG. 2, a SGT branch early target fetch may be detected as a wrt on recycle as indicated at arrow 308. Stage A3316 corresponds to an instruction fetching restart 312 for the correct target following the wrt. By stage A7314, the corresponding entry for the wrt determination is removed from the recycle queue 202.

Turning now to FIG. 4, a process 400 for controlling restarting of instruction fetching using speculative address computations in a processor will now be described in reference to FIGS. 1-3 and in accordance with an exemplary embodiment. At block 402, the process 400 starts. At block 404, a calculated branch target 204 is received. A check as to whether this is the first pass through the process 400 may be performed in block 406. If this is not the first pass, a compare is performed between the calculated branch target 204 and a value previously saved in the recycle queue 202 in block 410. If the values are equivalent, then no further actions need to be performed. However, if block 412 determines that the calculated branch target 204 and the value previously saved in the recycle queue 202 are not equivalent (wrt), a further check may be performed to determine if a possible mismatch or miscompare 414 occurred. The wrt may also initiate writing the calculated branch target 204 in the recycle queue 202 in block 408. The check for the possible miscompare 414 can analyze the state of received and other control information 416 to assist in making the determination. If a miscompare was possible or this is the first pass as determined by block 406, a further check is performed at block 418 to test for a SGT branch. If an SGT branch is detected, then block 420 restarts the target fetch (e.g., at I-Fetch 212 or IFU 102). However, if it was determined at block 414 that a miscompare was not possible, then block 420 restarts the target fetch. A further check is performed at block 422 to determine if the restart of the target fetch in block 420 results in an L1 cache hit (e.g., D-Cache 112). If not, an L1 cache miss occurred, and a further check is performed at block 424 to determine if recycling is active. On a SGT branch target fetch or if recycling initiated the restart, then throttling is performed at block 426 to prevent higher levels of the cache hierarchy from being modified as a result of restarting fetching; otherwise, the fetch is allowed to proceed to L2 cache 116 at block 428. This prevents altering higher levels of the memory hierarchy for fetching intermediate values that may not be correctly resolved.

RAS benefits can result from the knowledge that some instructions, being non-dependent on resources that can change under recycle, should have the same calculated target address through each pass of recycle. This check may be performed in block 414. If there is a miscompare, but one is not expected/possible, an error is signaled. This scheme provides multi-bit flip protection under numerous recycle conditions. For example, in a recycle due to address mode changes, the compare in block 414 can be isolated to the address range that is not affected by the addressing mode change. Thus, results of a mismatch in block 412 are further verified as a function of instruction type in block 414, prior to restarting the instruction fetching in block 420.

To optimize for the restarting, the IFU 102 may initiate fetching before a branch's calculated target address 204 can be confirmed, i.e., past recycle. It should be noted that in most cases the calculated branch target 204 is correct, leading to an overall performance gain. This is because in most cases there is either no recycle or the initial calculated target addresses are correct (e.g., the branch is not necessarily dependent on a recycled instruction). However, because there can be an incorrectly calculated address, the case that the initial restart address is incorrect is handled via blocks 406-414.

It will be understood that the process 400 can be applied to any processing circuitry that incorporates a processor pipeline. For example, process 400 can be applied to various digital designs, such as a microprocessor, an application specific integrated circuit (ASIC), a programmable logic device (PLD), or other such digital devices capable of processing instructions. Therefore, the system 100 of FIG. 1 can represent a variety of digital designs that incorporate processing circuitry.

Technical effects and benefits include increased processing system efficiency by reducing delay penalties associated with waiting to fully resolve dependencies. For example, speculative restarting can remove five cycles from the restart penalty over waiting until the results are beyond the recycle point (e.g., L1 miss). Each recycle can add another five or more to the base restart penalty. This is now only observed by the processor if the initial calculated branch target was incorrect, requiring a restart on recycle due to a wrong target. The wrong target restart point, though two cycles worse than the speculative restart point, can be at least three cycles better than waiting for the recycle point, per recycle. Another benefit includes increased RAS quality of the address generator stages. Using a recycle queue can enable multiple iterations of checking for mismatches between a currently calculated branch target and saved branch target values. Mismatches can be further verified to identify error conditions, such as single event upsets, that caused an address bit to change state, rather than a true SGT.

While the invention has been described with reference to exemplary embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted for elements thereof without departing from the scope of the invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the invention without departing from the essential scope thereof. Therefore, it is intended that the invention not be limited to the particular embodiment disclosed as the best mode contemplated for carrying out this invention, but that the invention will include all embodiments falling within the scope of the appended claims. Moreover, the use of the terms first, second, etc. do not denote any order or importance, but rather the terms first, second, etc. are used to distinguish one element from another.

Claims

1. A system for controlling restarting of instruction fetching using speculative address computations in a processor, the system comprising: a predicted target queue to hold branch prediction logic (BPL) generated target address values;target selection logic including a recycle queue, the target selection logic selecting a saved branch target value between a previously speculatively calculated branch target value from the recycle queue and an address value from the predicted target queue; anda compare block to identify a wrong target in response to a mismatch between the saved branch target value and a current calculated branch target, wherein instruction fetching is restarted in response to the wrong target.
2. The system of claim 1 wherein the target selection logic selects the previously speculatively calculated branch target value from the recycle queue in response to detecting that a surprise branch was taken.
3. The system of claim 2 wherein an instruction decode unit determines that the surprise branch that was not predicted by branch prediction logic is guessed taken.
4. The system of claim 1 further comprising: throttle logic to limit access to a higher level of memory upon a cache miss when the instruction fetching is restarted.
5. The system of claim 4 wherein the throttle logic limits instruction cache (ICache) access in response to one of: a target computation of a surprise taken branch which is dependent on older recycled pipelined instructions and all speculatively repeated computations of the branch target; anda wrong target detected on a predicted taken branch target address in the predicted target queue based on a target computation which is dependent on older recycled pipelined instructions.
6. The system of claim 1 wherein the address value from the predicted target queue is computed by branch prediction logic.
7. The system of claim 1 wherein the mismatch is further verified as a function of instruction type prior to restarting the instruction fetching.
8. A method for controlling restarting of instruction fetching using speculative address computations in a processor, the method comprising: receiving a current calculated branch target value;selecting a saved branch target value between a previously speculatively calculated branch target value in a recycle queue and a predicted target queue;identifying a wrong target in response to a mismatch between the saved branch target value and the current calculated branch target; andrestarting instruction fetching in response to the wrong target.
9. The method of claim 8 wherein the previously speculatively calculated branch target value is selected in response to detecting that a surprise branch was taken.
10. The method of claim 9 wherein an instruction decode unit determines if the surprise branch was taken.
11. The method of claim 8 further comprising: limiting access to a higher level of memory upon a cache miss when the instruction fetching is restarted.
12. The method of claim 11 wherein instruction cache (ICache) access is limited in response to one of: a target computation of a surprise taken branch which is dependent on older recycled pipelined instructions and all speculatively repeated computations of the branch target; anda wrong target detected on a predicted taken branch.
13. The method of claim 8 wherein the mismatch is further verified as a function of instruction type prior to restarting the instruction fetching.
14. The method of claim 8 further comprising: performing a corrective restart in response to determining that the last address used to speculatively restart is wrong.
15. A system for controlling restarting of instruction fetching using speculative address computations in a processor, the system comprising: an instruction fetching unit (IFU) including branch prediction logic (BPL), the BPL generating address values for a predicted target queue;an instruction decoding unit (IDU) including surprise guess taken (SGT) detection logic; andan address generator (AGEN) to generate a calculated branch target, wherein the calculated branch target is compared against one of the address values as a function of the SGT detection logic, and in response to a miscompare, instruction fetching is restarted at the IFU.
16. The system of claim 15 further comprising: a load store unit (LSU), wherein the LSU is capable of determining a data cache access miss and signaling a reject to recycle.
17. The system of claim 15 wherein the address values are stored in a predicted target queue and the previously speculatively calculated branch target is stored in a recycle queue.
18. The apparatus of claim 15 further comprising: throttle logic to limit access to a higher level of memory upon a cache miss when the instruction fetching is restarted.
19. The system of claim 18 wherein the throttle logic limits instruction cache (ICache) access in response to one of: a target computation of a surprise taken branch which is dependent on older recycled pipelined instructions and all speculatively repeated computations of the branch target; anda wrong target detected on a predicted taken branch target address in the predicted target queue based on a target computation which is dependent on older recycled pipelined instructions.
20. The system of claim 19 wherein the miscompare is further verified as a function of instruction type prior to restarting the instruction fetching.

SYSTEM AND METHOD FOR CONTROLLING RESTARTING OF INSTRUCTION FETCHING USING SPECULATIVE ADDRESS COMPUTATIONS

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

US Classifications

International Classifications

Abstract

Description

Claims