Disclosed embodiments are directed to data hazard detection. More particularly, exemplary embodiments are directed to data hazard tracking in processors employing instructions with register ranges, without expanding the instructions.
Modern processing systems may support execution of instructions in a pipelined fashion as well as out of program order. In the case of pipelined execution, an operation may start execution before the prior operation has completed. When executing out of program order, operations may start execution of an instruction before starting the execution of one or more programmatically prior instructions. These techniques are employed to minimize wastage of instruction cycles, and exploit parallelism in instruction sequences. However, pipelining and out-of-order execution may lead to data hazards which are situations where incorrect operation would result if a programmatically younger instruction were to read or write operands (“operands” may be source or destination operands specified by an instruction) before an older instruction has read or written them.
Data hazards arise from the order imposed by the program being executed and include Read-After-Write (RAW), Write-After-Read (WAR), Write-After-Write (WAW) hazards. While data hazards often arise when operands have the same data size, they may also arise in cases where operands overlap in the registers used. For example, if an older instruction writes a quadword (the size of a quadword is four times the size of a word) and a younger instruction requires a word of that quadword, a hazard may arise. It will be erroneous for the younger instruction to execute before it can procure the required word produced by the older instruction.
In some architectures, operands of instructions may be expressed as a range of register addresses. For example, storage instructions for loading multiple registers, or Single Instruction Multiple Data (SIMD) instructions may comprise operands spanning several registers and expressed in terms of a range of registers. Likewise, different data types may span a different number of registers. For example, a data word may comprise one 32-bit register while a doubleword may comprise a range of two contiguous 32-bit registers and a quadword may comprise a range of four contiguous 32-bit registers. In order to detect and resolve data hazards for such instructions, it is necessary to determine if any of the registers covered by the range may give rise to a dependency. Conventional techniques for determining whether any of the component registers in a range of registers of an instruction operand may cause a data hazard include expanding the range of registers into component registers and checking for hazards on each of the component registers.
As can be seen, such conventional techniques may require a large number of compare operations to be performed. The number of compare operations increases with the number of registers expressed in the instruction operands, and also with the number of instructions which may be in flight in the pipeline. Further, conventional techniques require expansion of the range of registers expressed in instruction operands into component registers before comparison operations may be performed for checking data hazards. This expansion places an increased demand on storage space in an instruction queue holding instructions prior to dispatch, thus offsetting the benefits and efficiency of a condensed expression of the operands as a range of registers.
Accordingly there is a need in the art for efficient techniques for detecting data hazards for instructions comprising operands expressed in terms of a range of registers, without requiring expansion.
Exemplary embodiments of the invention are directed to systems and method for tracking data hazards.
For example, an exemplary embodiment is directed to method for tracking data hazards in a processor comprising: tracking a first instruction; and comparing the first instruction to a second instruction to determine if there is a data hazard, prior to expanding the second instruction.
Another exemplary embodiment is directed to a processor comprising: a pipelined architecture configured to execute a first and a second instruction; and hit detection logic for comparing the first instruction to the second instruction to determine if there is a data hazard, prior to expanding the second instruction.
Another exemplary embodiment is directed to a processing system for tracking data hazards in a processor comprising: means for tracking a first instruction; and means for comparing the first instruction to a second instruction to determine if there is a data hazard, prior to expanding the second instruction.
Yet another exemplary embodiment is directed to a non-transitory computer-readable storage medium comprising code, which, when executed by a processor, causes the processor to perform operations for tracking data hazards in the processor, the non-transitory computer-readable storage medium comprising: code for tracking a first instruction; and code for comparing the first instruction to a second instruction to determine if there is a data hazard, prior to expanding the second instruction.
The accompanying drawings are presented to aid in the description of embodiments of the invention and are provided solely for illustration of the embodiments and not limitation thereof.
Aspects of the invention are disclosed in the following description and related drawings directed to specific embodiments of the invention. Alternate embodiments may be devised without departing from the scope of the invention. Additionally, well-known elements of the invention will not be described in detail or will be omitted so as not to obscure the relevant details of the invention.
The word “exemplary” is used herein to mean “serving as an example, instance, or illustration.” Any embodiment described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments. Likewise, the term “embodiments of the invention” does not require that all embodiments of the invention include the discussed feature, advantage or mode of operation.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of embodiments of the invention. As used herein, the singular forms “a” “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises”, “comprising,”, “includes” and/or “including”, when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
Further, many embodiments are described in terms of sequences of actions to be performed by, for example, elements of a computing device. It will be recognized that various actions described herein can be performed by specific circuits (e.g., application specific integrated circuits (ASICs)), by program instructions being executed by one or more processors, or by a combination of both. Additionally, these sequence of actions described herein can be considered to be embodied entirely within any form of computer readable storage medium having stored therein a corresponding set of computer instructions that upon execution would cause an associated processor to perform the functionality described herein. Thus, the various aspects of the invention may be embodied in a number of different forms, all of which have been contemplated to be within the scope of the claimed subject matter. In addition, for each of the embodiments described herein, the corresponding form of any such embodiments may be described herein as, for example, “logic configured to” perform the described action.
Exemplary embodiments include techniques for detecting data hazards on instructions comprising operands expressed as a range of registers, without requiring prior expansion of the range of registers into component registers. Accordingly, embodiments may require less compare operations than conventional techniques described above, which require expansion to component registers before comparison. Moreover, exemplary embodiments may conserve storage space in instruction queues by operating on an un-expanded range of registers.
As discussed herein, the term “expanded instruction” may refer to an instruction comprising operands expressed as a range of registers expanded into an equivalent instruction with operands expressed as expanded component registers or alternately, as expanded into smaller ranges. Correspondingly, “non-expanded instructions” may refer to the original instruction which has not been expanded. The size/bit-width of component registers may be based on the size/bit-width of data path elements. Register ranges in instructions may be expressed in terms of a start address and an end address, including all the component registers within the range. Register ranges may also be limited to comprise only a subset of component registers within the range, such as even-numbered registers, odd-numbered registers, real/complex registers etc. In detecting data hazards, embodiments may support several forms of comparisons, such as comparisons of non-expanded instructions with non-expanded instructions, expanded instructions with non-expanded instructions, and expanded instructions with expanded instructions. Once hazards have been detected according to exemplary embodiments, they may be resolved according to well-known techniques, such as register renaming or selective delaying of younger instructions to enforce in-order execution.
With reference now to
While each entry in OOQ 106 may have room for instructions comprising three operand fields, each of pipelines VX 116, VL 118, and VS 120 may support specific instruction formats. For example, pipeline VX 116 may support instructions with a total of three operand fields—two source operand fields and one combination source and destination operand field. The three operand fields may be expressed as a range of registers. Similarly, pipelines VL 118 and VS 120 may each support instructions with two operand combination sourer and destination operand fields, once again where each operand field may be expressed as a range of registers. Accordingly, a total of seven operand fields may comprise register ranges among the instructions executed by pipelines VX 116, VL 118, and VS 120 in each pipeline stage.
Two pipeline stages, 108 and 112 are illustrated for each pipeline VX 116, VL 118, and VS 120. These pipeline stages may include one or more of expand, decode, and resolve stages. In one example, data hazards may be detected by hazard detection logic 114 when instructions reach pipeline stage 112. It will be recalled that because instructions may be released out-of-order from OOQ 106 to pipelines VX 116, VL 118, and VS 120, some instructions still residing in OOQ 106 may be older than instructions which have reached pipeline stage 112. Thus, operands of instructions in pipeline stage 112 may be checked for hazard conditions against older instructions residing in OOQ 106. Operands of instructions in pipeline stage 112 in the various pipelines, VX 116, VL 118, and VS 120, may be in expanded or non-expanded format and thus may be expressed as individual registers, a set of component registers or a range that is a subset of a register range. Both expanded and non-expanded instructions in pipeline stage 112 may be checked for hazard conditions against instructions OOQ 106 using hazard detection logic 114. A detailed implementation of hazard detection logic 114 has been provided with reference to
As previously described, each of entries 106_0-106_15 of OOQ 106 may comprise instructions comprising a maximum of three (3) operand fields, and the total number of operand fields of instructions in pipeline stage 112 of pipelines VX 116, VL 118, and VS 120 is seven (7). Accordingly, in detecting hazards, potential overlaps may exist between the 7 operand fields in pipeline stage 112 and each of the 3 operand fields of entries 106_0-106_15 in OOQ 106. Thus, hazard detection for each entry in OOQ 106 may involve 7×3=21 comparisons of operand fields expressed as registers. It will be recognized that these 21 comparisons includes comparisons of all source and destination operand fields of instructions in pipeline stage 112 with each of entries 106_0-106_15. Accordingly, the 21 comparisons will include detection of all Write-After-Read (WAR), Read-After-Write (RAW), Write-After-Write (WAW) and Read-After-Read (RAR) conditions.
However, it will also be recognized that RAR is not a true data hazard condition because reading a register does not modify its value. Thus, a younger instruction may read a register before an older instruction reads the same register, without creating a hazard. Therefore, by culling out the comparisons for RAR conditions, only 17 comparisons may be required for testing entries 106_0-106_15 for potential data hazards.
In each of the 17 comparisons, when an operand is expressed in the form of a range of registers, embodiments may be configured to implement the comparisons without expanding the range of registers into component registers. The size of each register in the range of registers may be based on a granularity of data access of a register file such as VRF 122. In order to detect a dependency between a first operand expressed as a first range of registers spanning between register addresses {first_start, first_end} and a second operand expressed as a second range of registers spanning between register addresses {second_start, second_end}, a dependency may be assumed to exist if there are any common registers (i.e. overlap) between the two ranges, {first_start, first_end} and {second_start, second_end}. Thus, if the first operand pertains to a first instruction, and the second operand pertains to a second instruction, then a data hazard between the first instruction and the second instruction is detected by comparing the first range and the second range and detecting at least one common register between the first range and the second range.
The first instruction may be a younger instruction in pipeline stage 112 of one of the pipelines VX 116, VL 118 or VS 120; and the second instruction may be an older instruction currently in flight or yet to be read from the OOQ 106 (instructions may remain in the OOQ 106 until they have written back to the register file). A dependency between the first operand and second operand may potentially result in a data hazard (i.e. one of the 17 comparisons, excluding comparisons for RAR conditions) if there is a common register between the first and second operands. In other words, a data hazard may be detected between the first range and the second range by implementing the logical function (second_start≦first_end) and (second_end≧first_start). If this logical function evaluates positively, i.e. to a “hit,” a data hazard may be determined to exist.
It will be recognized that a hit may indicate either a partial overlap comprising at least one common register or a complete overlap across the entire range of registers. Regardless of whether the overlap is partial or complete, a data hazard is assumed to exist, and must be resolved such that the younger of the two instructions does not access the register before the older instruction.
In one embodiment that has been illustrated in
Hit detection may be further gated to ensure that only older instructions are compared to the instruction being evaluated in pipeline stage 112. For example if a particular valid instruction in OOQ 106 is younger than the instruction in pipeline stage 112, hit detection may be gated from raising a hit flag for that particular instruction. Furthermore, the hit detection may be gated by the above-described mask, thereby saving the power consumed by the comparators. OOQ 106 may be written in-order but read out-of-order. When read out-of-order, the mask may be configured to enable compares to all older instructions from an arbitrary pointer (pointing to one of the entries 106_0-106_15) in the queue. The pointer may be used to track the age of the instruction being evaluated.
It will be noted that in cases where OOQ 106 is implemented as a circular queue, the instruction indices (i.e. 0-15) may wrap around. Initially, as new instructions are written into the queue, they will assume the next vacant position with the highest index (it will be recalled that entry 106_15 is the youngest instruction, while entry 106_0 is the oldest). Eventually all of the positions may be taken and the new instructions will need to be assigned vacated positions with lower indices. At this point, it may no longer be sufficient to label an instruction with a higher index as a younger instruction. Therefore the pointer will need to be reset accordingly.
With reference now to
Also shown is an operand field 212_VX1 with similar start and end address fields and a valid field. As described previously operand field 212_VX1 may be one of the three operand fields of an instruction in pipeline stage 112 of pipeline VX 116. Pipeline VX 116 may comprise instructions with three operand fields, whereas pipelines VL 118 and VS 120 may each comprise instructions with two such operand fields. Accordingly, the remaining operand fields of the VX 116, VL 118, and VS 120 pipeline have been schematically represented by 212_VX2, 212_VX3, 212_VL1, 212_VL2, 212_VS1, and 212_VS2.
Each of the circles represents comparison logic for triggering a hit signal. As noted previously, only 17 such hit detection operations may need to be performed for potential data hazards for each entry of OOQ106. The remaining of the 21 total dependencies correspond to RAR conditions which would not constitute a data hazard. Only a few representative circles have been labeled for the sake of clarity in hazard detection logic 114 of
Accordingly, it will be appreciated that embodiments include various methods for performing the processes, functions and/or algorithms disclosed herein. For example, as illustrated in
Those of skill in the art will appreciate that information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.
Further, those of skill in the art will appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The methods, sequences and/or algorithms described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor.
Referring to
In a particular embodiment, input device 430 and power supply 444 are coupled to the system-on-chip device 422. Moreover, in a particular embodiment, as illustrated in
It should be noted that although
Accordingly, an embodiment of the invention can include a computer readable media embodying a method for tracking data hazards prior to dispatch in a processor. Accordingly, the invention is not limited to illustrated examples and any means for performing the functionality described herein are included in embodiments of the invention.
While the foregoing disclosure shows illustrative embodiments of the invention, it should be noted that various changes and modifications could be made herein without departing from the scope of the invention as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the embodiments of the invention described herein need not be performed in any particular order. Furthermore, although elements of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.