The present invention is illustrated by way of example and not limitation in the accompanying figures.
Embodiments of the invention relate to processors and computer systems. More particularly, at least one embodiment of the invention relates to a technique to efficiently allocate and deallocate various processing resources based on the need for such resources.
Some embodiments of the invention allow one or more resources within a processor to be enabled or disabled based on whether or not they are needed to complete an operation, such as an instruction or uop (hereafter referred to generically as “instruction”), or “on demand”, without significantly degrading processor performance. At least one embodiment of the invention allows one or more execution structures, such as an execution stack (including one or more execution logic or resources), used by an instruction to be disabled if the performance of the instruction does not use the one or more execution structures and to re-enable the one or more stacks if a the performance of a subsequent instruction uses the stack without the subsequent instruction having to be delayed from being processed for a significant amount of time.
In particular, one embodiment enables or disables a SIMD and/or an FP stack depending upon whether an instruction being processed corresponds to a SIMD and/or an FP operation. Furthermore, one embodiment performs the detection of the whether the instruction corresponds to a SIMD and/or FP operation at a point in a processor pipeline, such that the instruction can be detected and the corresponding stack(s) enabled without the execution of the instruction having to be delayed significantly.
In order to detect whether the performance of an instruction does not use one or more of the stacks illustrated in
In one embodiment, the signal 221 is a signal indicating the type of instruction being allocated. For example, in one embodiment, the signal 221 may indicate whether the instruction being allocated corresponds to a SIMD operation or an FP operation or both. In one embodiment, whether an instruction corresponds to a SIMD or FP operation or both may be determined from various fields within the instruction. In some embodiments, other information may be signaled to the stack controller, including whether the instruction being allocated corresponds to an integer operation or some other type of operation, from which the detector may determine whether to enable a corresponding processing resource, such as the INT stack.
In one embodiment, each stack, or other resource, which is to be enabled or disabled based on the type of instruction to be processed corresponds to two bits, the state of which is controlled by the stack controller 220. For example, in the embodiment illustrated in
In one embodiment, the SIMD.valid bit being a first state (e.g., logical “1”), may indicate that the instruction being allocated corresponds to a SIMD operation, in which case the stack controller may enable the SIMD stack. Likewise, the FP.valid bit being in a first state (e.g., logical “1”), may indicate that the instruction being allocated corresponds to an FP operation, in which case the stack controller may enable the FP stack. In one embodiment, the SIMD.valid bit and the FP.valid bit being in a first state (e.g., logical “1”) indicates that the instruction being allocated corresponds to an SIMD FP operation, in which case the stack controller may enable the FP stack and the SIMD stack.
Conversely, the opposite logical state of the SIMD.valid and/or the FP.valid bits (e.g., “0”) may not cause the stack controller to enable the corresponding stack(s). In one embodiment, the SIMD or FP stacks may remain in the same state (enabled or disabled) they were prior to the allocation of the instruction if their corresponding bits indicate that the instruction being allocated does not correspond to an operation that uses one or both of them. In other embodiments, the stack controller may disable the stack(s) not to be used by the instruction being allocated if the stack(s) is/are in an enabled state, depending on the state of the SIMD.valid and FP.valid bits.
In addition to the SIMD.valid and FP.valid bits, the stack controller 220 may maintain two or more bits to indicate one of two generations, in which a SIMD or FP instruction may be stored in a re-order buffer (ROB) 226. In one embodiment, the ROB may be a sequentially written structure in which instructions are written in the order in which they are allocated. When the instructions are retired from the ROB, the corresponding entries may be deallocated in the order in which they were allocated.
In one embodiment, the ROB entry to be written can be tracked by a write pointer, or a “head pointer”, which increments after every ROB write operation to point to the next entry to be written. Similarly, the ROB entry to be retired can be tracked by a retire pointer or a “tail pointer”, in one embodiment, which increments after every retirement to point to the next ROB entry to be retired.
The term, “generation”, may refer to a complete traversal of the ROB by the tail pointer during which all ROB entries are retired and the tail pointer has returned back to the beginning of the ROB. Accordingly, when the tail pointer returns to the beginning of the ROB, or “wraps” back, the ROB generation may be said to have switched to the next generation. Similarly, a generation can be defined from the point of view of the head pointer, such that the generation wraps when all ROB entries are written and head pointer returns back to the beginning of the ROB. Because ROB entries may not be retired before they are written, the head pointer remains ahead of the tail pointer and hence head pointer enters a new ROB generation before the tail pointer, in one embodiment.
For example, in one embodiment a ROB may contain entries corresponding to each SIMD and/or FP instruction that is allocated by allocation unit 201 of
In one embodiment, the ROB may toggle between two generations. Accordingly, the current generation of the ROB indicated by the tail or the head pointer can be tracked with a bit associated with the tail or head pointer itself. For example, a generation bit may toggle from a “0” to a “1” state and back to a 0 state as the corresponding pointer (tail or head) moves from a ROB generation 0 to a ROB generation 1 and back to ROB generation 0, respectively.
In one embodiment, the stack controller 220 may maintain at least two bits, such as SIMD.wrap and FP.wrap, which may be used to detect when the last SIMD or FP instruction has retired from the processor and hence there are no instructions remaining in the processor that use the SIMD or FP stack. This information can be used to power down the SIMD or FP stack, i.e., set SIMD.valid or FP.valid bits to 0, in one embodiment.
For example, when a SIMD instruction is allocated and allocator 201 sends a signal 221 to stack control 220, the SIMD.wrap bit is set to the current value of the wrap bit of the head pointer, which indicates the generation of the ROB entry written by the last SIMD instruction. When the tail pointer wraps to a new generation, the previous generation of the tail pointer is sent to the stack control 220 via signal 202. The previous ROB generation is compared against SIMD.wrap. If there is a match, this indicates that the ROB generation containing the last SIMD uop is retired and hence there are no more SIMD uops in the processor. Hence, the SIMD stack can be powered down by setting the SIMD.valid to 0, for example.
Similar operations may be applied for the FP stack vis-à-vis the FP.wrap bit, in one embodiment. Furthermore, in some embodiments, the above operations may be applied to other resources within a processor, including memory stacks or other resources that may not always be used for each instruction.
In one embodiment, the head and tail pointers are used along with the SIMD.valid, FP.valid, SIMD.wrap, and FP.wrap bits to determine whether a corresponding stack is to be enabled or disabled. For example, if a SIMD instruction is allocated and the corresponding entry 315 stored in the ROB, head pointer 305 may point to the entry by storing the appropriate buffer entry into an appropriate field of the pointer. Likewise, the tail pointer may traverse the ROB from top to bottom until the oldest entry that has been retired 320 is found. In order to track the generation of each entry pointed to by the head and tail pointers, a bit or bits, such as a SIMD.wrap bit may be used, in conjunction with other information, by the stack controller 220 of
For example, when an SIMD instruction is retired, and the ROB's tail pointer wraps, the wrap bit of the last SIMD instruction to be allocated is compared to the most recent SIMD.wrap state caused by the retirement. If they are the same then this may indicate that the last SIMD instruction allocated corresponded to the previous “generation” of the ROB traversal which has been completely retired (i.e., the previous wrap bit state belongs to an instruction of the previous traversal generation, because the wrap bit state has changed). The previous SIMD.wrap bit state being equal to the current SIMD.wrap bit state implies that the last SIMD instruction in the ROB has retired and that there are no SIMD instructions being allocated or executed. Therefore, the SIMD.valid bit may be cleared by the stack controller, and the SIMD stack disabled. A similar technique may be followed for FP instructions using corresponding FP.valid and FP.wrap bits in order to control the FP stack. Other stacks or processor resources, such as INT stack control, may be controlled using the techniques described above.
In at least one embodiment, the SIMD.wrap bit may be replaced by storing an indication of the ROB entry of the last SIMD instruction or uop to be recorded in the stack controller (via an “SIMD.robid” bit for example). In one embodiment, whenever a SIMD instruction or uop is allocated in the ROB, the SIMD.robid, for example, is updated to point to it, similar to the head pointer. When an instruction or uop retires, the retiring ROB identifier (similar to the tail pointer) may be compared to the stored SIMD.robid, and if they are equal, the SIMD.valid bit can be cleared in order to power down the corresponding stack.
Illustrated within the processor of
The main memory may be implemented in various memory sources, such as dynamic random-access memory (DRAM), a hard disk drive (HDD) 520, or a memory source located remotely from the computer system via network interface 530 containing various storage devices and technologies. The cache memory may be located either within the processor or in close proximity to the processor, such as on the processor's local bus 507.
Furthermore, the cache memory may contain relatively fast memory cells, such as a six-transistor (6T) cell, or other memory cell of approximately equal or faster access speed. The computer system of
The system of
Other embodiments of the invention, however, may exist in other circuits, logic units, or devices within the system of
Processors referred to herein, or any other component designed according to an embodiment of the present invention, may be designed in various stages, from creation to simulation to fabrication. Data representing a design may represent the design in a number of manners. First, as is useful in simulations, the hardware may be represented using a hardware description language or another functional description language. Additionally or alternatively, a circuit level model with logic and/or transistor gates may be produced at some stages of the design process. Furthermore, most designs, at some stage, reach a level where they may be modeled with data representing the physical placement of various devices. In the case where conventional semiconductor fabrication techniques are used, the data representing the device placement model may be the data specifying the presence or absence of various features on different mask layers for masks used to produce an integrated circuit.
In any representation of the design, the data may be stored in any form of a machine-readable medium. An optical or electrical wave modulated or otherwise generated to transmit such information, a memory, or a magnetic or optical storage medium, such as a disc, may be the machine-readable medium. Any of these mediums may “carry” or “indicate” the design, or other information used in an embodiment of the present invention, such as the instructions in an error recovery routine. When an electrical carrier wave indicating or carrying the information is transmitted, to the extent that copying, buffering, or re-transmission of the electrical signal is performed, a new copy is made. Thus, the actions of a communication provider or a network provider may be making copies of an article, e.g., a carrier wave, embodying techniques of the present invention.
Thus, techniques for steering memory accesses, such as loads or stores are disclosed. While certain embodiments have been described, and shown in the accompanying drawings, it is to be understood that such embodiments are merely illustrative of and not restrictive on the broad invention, and that this invention not be limited to the specific constructions and arrangements shown and described, since various other modifications may occur to those ordinarily skilled in the art upon studying this disclosure. In an area of technology such as this, where growth is fast and further advancements are not easily foreseen, the disclosed embodiments may be readily modifiable in arrangement and detail as facilitated by enabling technological advancements without departing from the principles of the present disclosure or the scope of the accompanying claims.
Various aspects of one or more embodiments of the invention may be described, discussed, or otherwise referred to in an advertisement for a processor or computer system in which one or more embodiments of the invention may be used. Such advertisements may include, but are not limited to news print, magazines, billboards, or other paper or otherwise tangible media. In particular, various aspects of one or more embodiments of the invention may be advertised on the internet via websites, “pop-up” advertisements, or other web-based media, whether or not a server hosting the program to generate the website or pop-up is located in the United States of America or its territories.