I. Field of the Disclosure
The technology of the disclosure relates generally to processing memory instructions in an out-of-order (OOO) computer processor, and, in particular, to avoiding re-fetching and re-executing instructions due to hazards.
II. Background
Out-of-order (OOO) processors are computer processors that are capable of executing computer program instructions in an order determined by an availability of each instruction's input operands, regardless of the order of appearance of the instructions in a computer program. By executing instructions out-of-order, an OOO processor may be able to fully utilize processor clock cycles that would otherwise be wasted while the OOO processor waits for data access operations to complete. For example, instead of having to “stall” (i.e., intentionally introduce a processing delay) while input data is retrieved for an older program instruction, the OOO processor may proceed with executing a more recently fetched instruction that is able to execute immediately. In this manner, processor clock cycles may be more productively utilized by the OOO processor, resulting in an increase in the number of instructions that the OOO processor is capable of processing per processor clock cycle.
However, out-of-order execution of memory instructions may result in the occurrence of “punts.” Punts are circumstances in which one or more memory instructions must be re-fetched and re-executed due to a detected hazard. For example, a punt may result from an occurrence of a read-after-write (RAW) hazard, a read-after-read (RAR) hazard, and/or a resource constraint hazard such as a lack of available load queue entries or store queue entries, as non-limiting examples. Re-fetching and re-execution of memory instructions may reduce processor performance and result in greater power consumption.
Aspects disclosed in the detailed description include predicting memory instruction punts in a computer processor using a punt avoidance table (PAT). In this regard, in one aspect, an instruction processing circuit in a computer processor accesses a PAT for predicting and preempting memory instruction punts. As used herein, a “punt” refers to a process of re-fetching and re-executing a memory instruction and one or more older memory instructions in a computer processor, in response to a hazard condition arising from out-of-order execution of the memory instruction. The PAT contains one or more entries, each comprising an address of a memory instruction that was previously executed out-of-order and that resulted in a memory instruction punt. During execution of a computer program, an instruction processing circuit detects a memory instruction in an instruction stream, and determines whether the PAT contains an entry having an address corresponding to the memory instruction. If the PAT contains an entry having an address corresponding to the memory instruction, the instruction processing circuit may preempt a punt by preventing the detected memory instruction from taking effect before at least one pending memory instruction older than the detected memory instruction. As non-limiting examples, the instruction processing circuit in some aspects may perform an in-order dispatch of the at least one pending memory instruction older than the detected memory instruction, or may prevent an early return of data by the detected memory instruction until the at least one pending memory instruction older than the detected memory instruction has completed. In this manner, the instruction processing circuit may reduce the occurrence of memory instruction punts, thus providing improved processor performance.
Further, in some exemplary aspects in which the hazard encountered by the instruction processing circuit is a read-after-write (RAW) hazard, the instruction processing circuit may prevent the detected memory instruction from taking effect before any pending memory store instructions older than the detected memory instruction. As another exemplary aspect, when the hazard encountered by the instruction processing circuit is a read-after-read (RAR) hazard, the instruction processing circuit may prevent the detected memory instruction from taking effect before any pending memory load instructions older than the detected memory instruction. For aspects in which the hazard is a resource constraint hazard, the instruction processing circuit may prevent the detected memory instruction from taking effect before any pending memory instructions older than the detected memory instruction.
In another aspect, an instruction processing circuit in an OOO computer processor is provided. The instruction processing circuit is communicatively coupled to a front-end circuit of an execution pipeline, and comprises a PAT providing a plurality of entries. The instruction processing circuit is configured to prevent a detected memory instruction from taking effect before at least one pending memory instruction older than the detected memory instruction to preempt a memory instruction punt, responsive to determining that an address of the detected memory instruction is present in an entry of the plurality of entries of the PAT.
In another aspect, an instruction processing circuit is provided in an OOO computer processor. The instruction processing circuit comprises a means for providing a plurality of entries in a PAT. The instruction processing circuit also comprises a means for preventing a detected memory instruction from taking effect before at least one pending memory instruction older than the detected memory instruction to preempt a memory instruction punt, responsive to determining that an address of the detected memory instruction is present in an entry of the plurality of entries of the PAT.
In another aspect, a method for predicting memory instruction punts is provided. The method comprises detecting, in an instruction stream, a memory instruction. The method further comprises determining whether an address of the detected memory instruction is present in an entry of a PAT. The method also comprises, responsive to determining that the address of the detected memory instruction is present in the entry, preventing the detected memory instruction from taking effect before at least one pending memory instruction older than the detected memory instruction, to preempt a memory instruction punt.
In another aspect, a non-transitory computer-readable medium is provided, having stored thereon computer-executable instructions, which when executed by a processor, cause the processor to detect, in an instruction stream, a memory instruction. The computer-executable instructions stored thereon further cause the processor to determine whether an address of the detected memory instruction is present in an entry of a PAT. The computer-executable instructions stored thereon also cause the processor to, responsive to determining that the address of the detected memory instruction is present in the entry, prevent the detected memory instruction from taking effect before at least one pending memory instruction older than the detected memory instruction, to preempt a memory instruction punt.
With reference now to the drawing figures, several exemplary aspects of the present disclosure are described. The word “exemplary” is used herein to mean “serving as an example, instance, or illustration.” Any aspect described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects.
Aspects disclosed in the detailed description include predicting memory instruction punts in a computer processor using a punt avoidance table (PAT). In this regard,
The OOO computer processor 100 includes a memory interface circuit 106, an instruction cache 108, and a load/store unit 110 comprising a data cache 112 and a load/store queue 114. In some aspects, the data cache 112 may comprise an on-chip Level 1 (L1) data cache, as a non-limiting example. The OOO computer processor 100 further comprises an execution pipeline 116 that includes the instruction processing circuit 102. The instruction processing circuit 102 provides a front-end circuit 118, an execution unit 120, and a completion unit 122. The OOO computer processor 100 additionally includes registers 124, which comprise one or more general purpose registers (GPRs) 126, a program counter 128, and a link register 130. In some aspects, such as those employing the ARM® ARM7™ architecture, the link register 130 is one of the GPRs 126, as shown in
In exemplary operation, the front-end circuit 118 of the execution pipeline 116 fetches instructions (not shown) from the instruction cache 108, which in some aspects may be an on-chip Level 1 (L1) cache, as a non-limiting example. The fetched instructions are decoded by the front-end circuit 118 and issued to the execution unit 120. The execution unit 120 executes the issued instructions, and the completion unit 122 retires the executed instructions. In some aspects, the completion unit 122 may comprise a write-back mechanism (not shown) that stores the execution results in one or more of the registers 124. It is to be understood that the execution unit 120 and/or the completion unit 122 may each comprise one or more sequential pipeline stages. In the example of
While processing instructions in the execution pipeline 116, the instruction processing circuit 102 may execute memory instructions, such as memory load instructions and/or memory store instructions, in an order that is different from the program order in which the instructions are fetched. As a result, under some circumstances, the out-of-order execution of memory instructions may result in the occurrence of memory instruction “punts,” in which a memory instruction and one or more older memory instructions must be re-fetched and re-executed due to a detected hazard. For example, a younger memory load instruction executed prior to an older memory store instruction to the same memory address may result in a RAW hazard, thereby requiring the memory load instruction and the memory store instruction to be re-fetched and re-executed. Similarly, a younger memory load instruction executed prior to an older memory load instruction to the same memory address may cause a RAR hazard to occur, necessitating the re-fetching and re-executing of both memory load instructions. In some aspects, younger memory load instructions may consume all of an available resource (e.g., load queue entries (not shown) or store queue entries (not shown), as non-limiting examples), preventing older memory instructions from executing, and thereby requiring all of the pending memory instructions to be re-fetched and re-executed. In each of these circumstances, the re-fetching and re-execution of memory instructions may negatively affect processor performance and may result in greater power consumption.
In this regard, the instruction processing circuit 102 of
The instruction processing circuit 102 determines whether an address of the memory instruction being fetched is present in an entry of the PAT 104. If the address of the memory instruction is found in an entry of the PAT 104 (i.e., a “hit”), it may be concluded that a previous out-of-order execution of the memory instruction resulted in a punt, and may be likely to do so again. To preemptively preclude the possibility of a punt, the instruction processing circuit 102 prevents the detected memory instruction from taking effect (i.e., from being dispatched out-of-order and/or from providing an early return of data, as non-limiting examples) before the at least one pending memory instruction older than the detected memory instruction. As non-limiting examples, the instruction processing circuit 102 in some aspects may perform an in-order dispatch of the at least one pending memory instruction older than the detected memory instruction, or may prevent an early return of data by the detected memory instruction until the at least one pending memory instruction older than the detected memory instruction has completed. According to some aspects, the instruction processing circuit 102 may prevent the early return of data by the detected memory instruction by adding one or more attributes (not shown) to the detected memory instruction. These attributes may indicate that an early return of data (e.g., from the data cache 112) for the detected memory instruction is to be blocked, and that the detected memory instruction should instead wait for all older memory operation hazards to be resolved.
As noted above, different operations for preventing the detected memory instruction from taking effect before the at least one pending memory instruction older than the detected memory instruction may be applied to different types of memory instructions depending on a type of hazard that is associated with the entry of the PAT 104. As a non-limiting example, if a previous out-of-order execution of the memory instruction resulted in a RAW hazard, the instruction processing circuit 102 may prevent the detected memory instruction from taking effect before any pending memory store instructions older than the detected memory instruction. If a RAR hazard resulted from the previous out-of-order execution of the memory instruction, the instruction processing circuit 102 may prevent the detected memory instruction from taking effect before any pending memory load instructions older than the detected memory instruction. For aspects in which the hazard is a resource constraint hazard, the instruction processing circuit 102 may prevent the detected memory instruction from taking effect before any pending memory instructions older than the detected memory instruction.
According to some aspects disclosed herein, if the instruction processing circuit 102 detects a memory instruction but does not find the address of the memory instruction in an entry of the PAT 104, a “miss” occurs. In this case, the instruction processing circuit 102 may continue processing of the memory instruction. If a hazard associated with the detected memory instruction subsequently occurs upon execution of a pending memory instruction older than the memory instruction, an entry containing the address of the memory instruction may be generated in the PAT 104. The memory instruction and the pending memory instruction may then be re-fetched and re-executed.
To illustrate an exemplary PAT 200 that may correspond to the PAT 104 of
According to some aspects, each entry 202(0)-202(Y) of the PAT 200 may also include an optional hazard indicator field 208 for storing a hazard indicator such as a hazard indicator 210. The hazard indicator 210 in some aspects may comprise one or more bits that provide an indication of the type of hazard (e.g., a RAW hazard, a RAR hazard, or a resource constraint hazard, as non-limiting examples) corresponding to the associated memory instruction. The instruction processing circuit 102 may employ the hazard indicator 210 in determining the appropriate action to take to preempt a memory instruction punt. In some aspects of the PAT 200 that do not include the hazard indicator field 208, the PAT 200 may be dedicated to tracking a single type of hazard. For instance, the PAT 200 may be dedicated to tracking only RAW hazards, as a non-limiting example. Some aspects may provide that multiple PATs 200 are provided, each tracking a different hazard type.
Some aspects may also provide that each of the entries 202(0)-202(Y) of the PAT 200 further includes a bias counter field 212 storing a bias counter value 214. The entries 202(0)-202(Y) of the PAT 200 may also include a bias threshold field 216 storing a bias threshold value 218. The bias counter value 214 and the bias threshold value 218 may be used by the instruction processing circuit 102 to judge a relative likelihood of a memory instruction punt occurring as a result of out-of-order execution of an associated memory instruction. The instruction processing circuit 102 may then determine whether to preempt the memory instruction punt or to continue conventional processing of the memory instruction based on the bias counter value 214 and the bias threshold value 218. For example, the bias counter value 214 may be incremented upon each occurrence of a hazard associated with the memory instruction corresponding to the entry 202(0). If the memory instruction is again detected in the instruction stream, the instruction processing circuit 102 may prevent the memory instruction from taking effect before pending memory instructions older than the memory instruction only if the bias counter value 214 exceeds the bias threshold value 218. Some aspects may provide that, instead of being stored in the bias threshold field 216, the bias threshold value 218 may be stored in a location separate from the PAT 200, such as in one of the registers 124 of
It is to be understood that some aspects may provide that the entries 202(0)-202(Y) of the PAT 200 may include other fields in addition to the fields 204, 208, 212, and 216 illustrated in
To better illustrate exemplary communications flows between the instruction processing circuit 102 and the load/store unit 110 of
As shown in
The PAT 104 illustrated in
Referring now to
In
Turning to
It is to be understood that, in some aspects in which the hazard 318 is a RAR hazard, the instruction processing circuit 102 may prevent the memory load instruction 302(2) from taking effect before the pending memory load instruction 302(1). According to aspects in which the hazard 318 is a resource constraint hazard, the instruction processing circuit 102 may prevent the memory load instruction 302(2) from taking effect before any of the pending memory instructions 302(0)-302(1) older than the memory load instruction 302(2). Some aspects may provide that the type of hazard 318 may be determined based on a hazard indicator such as the hazard indicator 210 of
To illustrate exemplary operations for predicting memory instruction punts using the PAT 104 of
If, at decision block 402, the address 304 of the detected memory instruction 302(2) is determined to be present, the instruction processing circuit 102 in some aspects may further determine whether the bias counter value 214 of a bias counter field 212 of the entry 306(0) of the PAT 104 exceeds a bias threshold value 218 (block 406). If not, the instruction processing circuit 102 may conclude that the likelihood of a memory instruction punt is relatively low. In that case, the instruction processing circuit 102 continues conventional processing of the instruction stream 300 (block 404). If the instruction processing circuit 102 does not utilize the optional bias counter value 214, or if the instruction processing circuit 102 determines at optional decision block 406 that the bias counter value 214 exceeds the bias threshold value 218, processing resumes at block 408 of
Referring now to
In some aspects, operations of block 408 for preventing the detected memory instruction 302(2) from taking effect before the at least one pending memory instruction 302(0)-302(1) may be accomplished by the instruction processing circuit 102 first determining a type of hazard associated with the entry 306(0) of the PAT 104 (block 411). Some aspects may provide that the type of hazard may be ascertained using a hazard indicator such as the hazard indicator 210 of
If the entry 306(0) of the PAT 104 is determined at decision block 411 to be associated with a RAW hazard, the instruction processing circuit 102 may prevent the detected memory instruction 302(2) from taking effect before any pending memory store instructions 302(0) older than the detected memory instruction 302(2) (block 412). If it is determined at decision block 411 that the entry 306(0) of the PAT 104 is associated with a RAR hazard, the instruction processing circuit 102 may prevent the detected memory instruction 302(2) from taking effect before all pending memory load instructions 302(1) older than the detected memory instruction 302(2) (block 414). If the entry 306(0) of the PAT 104 is associated with a resource constraint hazard, the instruction processing circuit 102 may prevent the detected memory instruction 302(2) from taking effect before all pending memory instructions 302(0)-302(1) older than the detected memory instruction 302(2) (block 416). Processing then resumes at block 418 of
In
If the instruction processing circuit 102 determines at decision block 422 that the address 304 is not present, or if the instruction processing circuit 102 does not use the optional bias counter value 214, the instruction processing circuit 102 may generate the entry 306(0) in the PAT 104, the entry 306(0) comprising the address 304 of the detected memory instruction 302(2) (block 428). The instruction processing circuit 102 next re-executes the detected memory instruction 302(2) and the at least one pending memory instruction 302(0) (block 426). The instruction processing circuit 102 then continues processing the instruction stream 300 (block 420).
Predicting memory instruction punts using a PAT according to aspects disclosed herein may be provided in or integrated into any processor-based device. Examples, without limitation, include a set top box, an entertainment unit, a navigation device, a communications device, a fixed location data unit, a mobile location data unit, a mobile phone, a cellular phone, a computer, a portable computer, a desktop computer, a personal digital assistant (PDA), a monitor, a computer monitor, a television, a tuner, a radio, a satellite radio, a music player, a digital music player, a portable music player, a digital video player, a video player, a digital video disc (DVD) player, and a portable digital video player.
In this regard,
Other master and slave devices can be connected to the system bus 508. As illustrated in
The CPU(s) 502 may also be configured to access the display controller(s) 520 over the system bus 508 to control information sent to one or more displays 526. The display controller(s) 520 sends information to the display(s) 526 to be displayed via one or more video processors 528, which process the information to be displayed into a format suitable for the display(s) 526. The display(s) 526 can include any type of display, including but not limited to a cathode ray tube (CRT), a liquid crystal display (LCD), a plasma display, etc.
Those of skill in the art will further appreciate that the various illustrative logical blocks, modules, circuits, and algorithms described in connection with the aspects disclosed herein may be implemented as electronic hardware, instructions stored in memory or in another computer-readable medium and executed by a processor or other processing device, or combinations of both. The master and slave devices described herein may be employed in any circuit, hardware component, integrated circuit (IC), or IC chip, as examples. Memory disclosed herein may be any type and size of memory and may be configured to store any type of information desired. To clearly illustrate this interchangeability, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. How such functionality is implemented depends upon the particular application, design choices, and/or design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
The various illustrative logical blocks, modules, and circuits described in connection with the aspects disclosed herein may be implemented or performed with a processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
The aspects disclosed herein may be embodied in hardware and in instructions that are stored in hardware, and may reside, for example, in Random Access Memory (RAM), flash memory, Read Only Memory (ROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), registers, a hard disk, a removable disk, a CD-ROM, or any other form of computer readable medium known in the art. An exemplary storage medium is coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a remote station. In the alternative, the processor and the storage medium may reside as discrete components in a remote station, base station, or server.
It is also noted that the operational steps described in any of the exemplary aspects herein are described to provide examples and discussion. The operations described may be performed in numerous different sequences other than the illustrated sequences. Furthermore, operations described in a single operational step may actually be performed in a number of different steps. Additionally, one or more operational steps discussed in the exemplary aspects may be combined. It is to be understood that the operational steps illustrated in the flow chart diagrams may be subject to numerous different modifications as will be readily apparent to one of skill in the art. Those of skill in the art will also understand that information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.
The previous description of the disclosure is provided to enable any person skilled in the art to make or use the disclosure. Various modifications to the disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the spirit or scope of the disclosure. Thus, the disclosure is not intended to be limited to the examples and designs described herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The present application claims priority under 35 U.S.C. §119(e) to U.S. Patent Application Ser. No. 62/205,400 filed on Aug. 14, 2015 and entitled “PREDICTING MEMORY INSTRUCTION PUNTS IN A COMPUTER PROCESSOR USING A PUNT AVOIDANCE TABLE (PAT),” the contents of which is incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
62205400 | Aug 2015 | US |