In various embodiments, information that is typically present in configuration registers and status registers (or combinations thereof) such as control and configuration information (note the terms control and configuration are used interchangeably herein), exception status indicators, masks for such status indicators and so forth, may be stored in a register file. In so doing, the expense of updating the state of such configuration registers may be reduced. That is, the register file may include storage for multiple replicated copies of data from various instructions that write to at least a portion of the information present in status and configuration registers. To maintain ordering of this data and accurate use by different instructions, dependencies between an instruction that writes to such a control register and instructions dependent thereon may be tracked. Furthermore, the sequence of operations performed using this data may also be tracked. That is, because the dependencies are tracked, dependent operations may be held until the writing instruction is executed so that the control information provided by the writing instruction is present in the indicated entry of the register file. After execution of the writing instruction, the dependent instructions may be scheduled for execution, as the proper values in the control register to be used by these instructions are guaranteed to be present in the indicated entry of the register file. In other words, the execution of the writer instruction that loads the control information into the indicated entry of the register file can be used as a trigger to allow execution of dependent instructions.
Various control and status registers may take advantage of embodiments of the present invention to enable replicated copies of the contents of these registers to be stored so that multiple writer instructions and dependent instructions (e.g., reader instructions) can be performed in a processor without the need for frequent updates to the actual contents of these registers, enabling low latency between issuance of a writer instruction and one or more instructions dependent thereon. While the scope of the present invention is not limited in this regard, various control and status registers, including a floating point control word (FCW) that is used to provide control and mask information for use in connection with floating point operations may have replicated copies of its state available in a register file. Similarly, a multimedia control and status register (e.g., the MXCSR as present in an x86 processor) that is used in performing operations on single instruction multiple data (SIMD) may also have multiple replicated copies of its information available in a register file.
While embodiments of the present invention may be implemented in many different processor types, referring now to
As shown in
As shown in
Referring still to
As described above, reservation station 30 controls passing of μops to execution units 40 for execution of various operations. While the scope of the present invention is not limited in this regard, the execution units may include a floating point unit (FPU), an integer unit (IU), and address generation unit (AGU), among others. As further shown in
In some embodiments, register file 75 may include a plurality of 16-bit registers, while in other embodiments such registers may be 32 bits, although the scope of the present invention is not limited in this regard. In one embodiment, each entry 76 may include two dedicated portions, one portion for storage of replicated MXCSR information and one portion for storage of replicated FCW information. However, in other implementations separate registers of register file 75 for replicated MXCSR information and replicated FCW information may exist.
Referring now to Table 1, below, shown is a programmer's view of the MXCSR and FCW registers.
As shown in Table 1, the MXCSR register may include control information used for performing operations on, e.g., single instruction multiple data (SIMD) (i.e., bits 6-15 of the MXCSR). This information may be used to control rounding modes and other operations, as well as to identify exceptions to be masked. In addition, Table 1 shows the presence of exception flags of the MXCSR (i.e., bits 0-5). During operation of embodiments of the present invention, such exception flags may be provided in connection with retirement of instructions in a one per thread copy in a retirement register file of a reorder buffer of a retirement unit, for example, which may be written by retiring instructions in the order in which they retire. As further shown in Table 1, a programmer's view of the FCW includes control information (i.e., bits 8-11 of the FCW) which may be used to control rounding and precision. Furthermore, the FCW includes a plurality of bits to identify exceptions to mask (i.e., bits 0-5).
In various embodiments, multiple replicated entries of at least portions of the information in the MXCSR and the FCW (for example) can be stored in register file 75. The MXCSR format may be set forth in Table 2, which shows a layout of a register file entry for replicated MXCSR and FCW information in accordance with one embodiment of the present invention.
By aligning the contents of an entry in register file 75 in this way, reformatting of the data, e.g., via a multiplexer or other control logic before providing the information to an execution unit can be avoided. Note that in the embodiment of Table 2, the configuration information includes control data and mask information. However, the exception information of the MXCSR (as shown in Table 1) may not be present in the replicated entries of register file 75, and may instead be provided on a once at retirement basis of a given reader instruction that is dependent on the information in an entry of register file 76. While shown with this particular implementation in Tables 1 and 2, the scope of the present invention is not limited in this manner.
For example, although shown in
When a writer μop is provided for execution in execution units 40, an entry 76 may be written in register file 75 to store the desired state information of the μop. Then, when dependent μops to this writer μop are provided to execution units 40, the operations of these sops may be performed using the state information present in the corresponding entry 76. In this way, updating of state information in control and status registers 60 may be avoided and these dependent μops may be dispatched to execution units 40 without first retiring the writer μop and committing information to the architectural state of processor 10 (i.e., writing state information of the writer μop to control and status registers 60).
As further shown in
Referring now to
When needed resources for the write μop are available, the μop may be allocated into a reservation station (block 130). The reservation station may track dependency of operations and allocate μops for passing into an execution unit according to various schemes.
Referring still to
Referring still to
To enable execution of μops that are present in the reservation station, a dispatch process is performed. Referring now to
Referring still to
To take advantage of the reduced time between dispatch of the writer μop and its dependent μops, embodiments may wake up dependent readers present in CAM entries of the reservation station after the writer μop has been dispatched (block 230). Accordingly, one or more dependent μops having the same ID as the writer μop may be woken up within the CAM of the reservation station, and the reservation station may dispatch these dependent readers to the appropriate execution unit (block 240). In other words, the writer μop that writes, e.g., control information to a renamed control register may be used to schedule dependent μops. That is, because these dependent μops may be of the same ID as the writer Lop, the dispatching of these dependent reader μops will not occur until the writer μop has been executed by writing the requested control information to the indicated register of the register file. Such dispatching of dependent readers may occur after execution of the writer μop but prior to, and in some implementations, well prior to retirement of the writer μop. For example, one dependent μop may be a floating point add operation that is to operate in accordance with both a precision control and rounding control that is set forth in the writer μop. To effect this operation, a FPU adder may perform this floating point add based on the control information accessed from the register file entry of the writer μop, rather than default values present in the MXCSR. Note that while shown with this implementation in the embodiment of
After instructions are executed in an execution unit, they may be passed to a retirement unit which takes the instructions that may be executed out of program order and reorders them back into program order. Referring now to
Finally, when the dependent μops have retired, the retirement unit may report the retired writer μop back to the allocator (block 340). In this way, the allocator may de-allocate the ID associated with the writer μop, making it available to a new incoming μop. In some implementations, such reporting of retirement of a first writer μop may not occur until retirement of a next writer μop, thus guaranteeing that all μops dependent on the first writer μop have also retired. While shown with this particular implementation the embodiment of
Embodiments may be implemented in many different system types. Referring now to
First processor 570 further includes point-to-point (P-P) interfaces 576 and 578. Similarly, second processor 580 includes P-P interfaces 586 and 588. As shown in
First processor 570 and second processor 580 may be coupled to a chipset 590 via P-P interconnects 552 and 554, respectively. As shown in
In turn, chipset 590 may be coupled to a first bus 516 via an interface 596. In one embodiment, first bus 516 may be a Peripheral Component Interconnect (PCI) bus, as defined by the PCI Local Bus Specification, Production Version, Revision 2.1, dated June 1995 or a bus such as a PCI Express™ bus or another third generation input/output (I/O) interconnect bus, although the scope of the present invention is not so limited.
As shown in
Embodiments may be implemented in code and may be stored on a storage medium having stored thereon instructions which can be used to program a system to perform the instructions. The storage medium may include, but is not limited to, any type of disk including floppy disks, optical disks, compact disk read-only memories (CD-ROMs), compact disk rewritables (CD-RWs), and magneto-optical disks, semiconductor devices such as read-only memories (ROMs), random access memories (RAMs) such as dynamic random access memories (DRAMs), static random access memories (SRAMs), erasable programmable read-only memories (EPROMs), flash memories, electrically erasable programmable read-only memories (EEPROMs), magnetic or optical cards, or any other type of media suitable for storing electronic instructions.
While the present invention has been described with respect to a limited number of embodiments, those skilled in the art will appreciate numerous modifications and variations therefrom. It is intended that the appended claims cover all such modifications and variations as fall within the true spirit and scope of this present invention.