The present invention relates to the field of information or data processing. More specifically, this invention relates to the field of implementing a computational or mathematical unit in a processor achieving an increased effective physical file size and physical register reuse via register mapping techniques.
Information or data processors are found in many contemporary electronic devices such as, for example, personal computers, personal digital assistants, game playing devices, video equipment and cellular phones. Processors used in today's most popular products are known as hardware as they comprise one or more integrated circuits. Processors execute software to implement various functions in any processor based device. Generally, software is written in a form known as source code that is compiled (by a complier) into object code. Object code within a processor is implemented to achieve a defined set of assembly language instructions that are executed by the processor using the processor's instruction set. An instruction set defines instructions that a processor can execute. Instructions include arithmetic instructions (e.g., add and subtract), logic instructions (e.g., AND, OR, and NOT instructions), and data instructions (e.g., move, input, output, load, and store instructions). As is known, computers with different architectures can share a common instruction set. For example, processors from different manufacturers may implement nearly identical versions of an instruction set (e.g., an x86 instruction set), but have substantially different architectural designs.
Within a processor, numerical data is typically expressed using integer or floating-point representation. Mathematical computations within a processor are generally performed in computational units designed for maximum efficiency for each computation. Thus, it is common for a processor architecture to have an integer computational unit and a floating-point computational unit. As the use of graphic processing and scientific computing has expanded, the use of a processor's integer and floating-point mathematical capabilities has been increasing. Other factors, such as use for audio processing, are also contributing to an increased use of a processor's mathematical capabilities. To accommodate these and other needs, and to meet the ever growing demand for increased integer and floating-point performance, the computational capability of processors is continually evolving.
In any processor architecture, there exists a limited number of physical registers for storing instructions and data. Typically, an integer computation unit and floating-point computational unit will have its own set of physical registers available. However, in either computational unit, once committed, a physical register is unable to be used again until the completion of the instruction or until the data has been processed and sent to another storage location. At that time, the physical register becomes available and is added to a “free list” of available registers for reassignment. The longer a physical register remains unavailable, the more performance may suffer. This is particularly true if a data value is known, as storing a known value in a physical register for the duration of the instruction processing is wasteful of the limited resources. Moreover, moving a known value from one register to another register wastes operational cycles of the processor and consumes power.
An apparatus is provided for an efficient technique for processing known register values while improving processor performance. The apparatus comprises a processor having a plurality of physical registers available for use in computations and a decoder for determining that a logical register contains a known value. A renaming unit maps the logical register containing the known value to an address outside an address range for the plurality of physical registers once the known value is determined. Thereafter, scheduling and execution units perform computations using the known value without storing the known value in one of the plurality of physical registers.
An apparatus is also provided for an efficient technique for processing registers having a zero value while improving processor performance. The apparatus comprises a processor having a plurality of physical registers available for use in computations and a decoder for determining that a logical register contains a zero value. A renaming unit maps the logical register containing the zero value to an address outside an address range for the plurality of physical registers once the known value is determined. Thereafter, scheduling and execution units perform computations using the zero value without storing the zero value in one of the plurality of physical registers.
A method is provided for an efficient technique for processing known register values while improving processor performance. The method comprises determining that a logical register of a processor has a known value and then mapping that logical register to a physical register address outside an expected range of physical register addresses; which indicates that the logical register represents the known value. Thereafter the processor processes any instruction using the known value without storing the known value in a physical register.
A method is also provided for an efficient technique for processing register having a zero values while improving processor performance. The method comprises determining that a logical register of a processor has a zero value and then mapping that logical register to a physical register address outside an expected range of physical register addresses; which indicates that that the logical register represents the zero value. Thereafter the processor processes any instruction using the zero value without storing the zero value in a physical register.
The present invention will hereinafter be described in conjunction with the following drawing figures, wherein like numerals denote like elements, and
The following detailed description is merely exemplary in nature and is not intended to limit the invention or the application and uses of the invention. As used herein, the word “exemplary” means “serving as an example, instance, or illustration.” Thus, any embodiment described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments. Moreover, as used herein, the word “processor” encompasses any type of information or data processor, including, without limitation, Internet access processors, Intranet access processors, personal data processors, military data processors, financial data processors, navigational processors, voice processors, music processors, video processors or any multimedia processors. All of the embodiments described herein are exemplary embodiments provided to enable persons skilled in the art to make or use the invention and not to limit the scope of the invention which is defined by the claims. Furthermore, there is no intention to be bound by any expressed or implied theory presented in the preceding technical field, background, brief summary, the following detailed description or for any particular processor microarchitecture.
Referring now to
Referring now to
In operation, the decode unit 24 decodes the incoming operation-codes (opcodes) to be dispatched for the computations or processing. The decode unit 24 is responsible for the general decoding of instructions (e.g., x86 instructions and extensions thereof) and how the delivered opcodes may change from the instruction. The decode unit 24 will also pass on physical register numbers (PRNs) from a available list of PRNs (often referred to as the Free List (FL)) to the rename unit 28.
The rename unit 28 maps logical register numbers (LRNs) to the physical register numbers (PRNs) prior to scheduling and execution. According to various embodiments of the present disclosure, the rename unit 28 can be utilized to rename or remap logical registers in a manner that eliminates the need to store known data values in a physical register. In one embodiment, this is implemented with a register mapping table stored in the rename unit 28. According to the present disclosure, renaming or remapping registers saves operational cycles and power, as well as decreases latency.
The scheduler 30 contains a scheduler queue and associated issue logic. As its name implies, the scheduler 30 is responsible for determining which opcodes are passed to execution units and in what order. In one embodiment, the scheduler 30 accepts renamed opcodes from rename unit 28 and stores them in the scheduler 30 until they are eligible to be selected by the scheduler to issue to one of the execution pipes.
The register file control 32 holds the physical registers. The physical register numbers and their associated valid bits arrive from the scheduler 30. Source operands are read out of the physical registers and results written back into the physical registers. In one embodiment, the register file control 32 also check for parity errors on all operands before the opcodes are delivered to the execution units. In a multi-pipelined (super-scalar) architecture, an opcode (with any data) would be issued for each execution pipe.
The execute unit(s) 34 may be embodied as any generation purpose or specialized execution architecture as desired for a particular processor. In one embodiment the execution unit may be realized as a single instruction multiple data (SIMD) arithmetic logic unit (ALU). In another embodiment, dual or multiple SIMD ALUs could be employed for super-scalar and/or multi-threaded embodiments, which operate to produce results and any exception bits generated during execution.
In one embodiment, after an opcode has been executed, the instruction can be retired so that the state of the floating-point unit 16 or integer unit 18 can be updated with a self-consistent, non-speculative architected state consistent with the serial execution of the program. The retire unit 36 maintains an in-order list of all opcodes in process in the floating-point unit 16 (or integer unit 18 as the case may be) that have passed the rename 28 stage and have not yet been committed by to the architectural state. The retire unit 36 is responsible for committing all the floating-point unit 16 or integer unit 18 architectural states upon retirement of an opcode.
Referring now to
Also illustrated in
As illustrated in
For the register mapping table 42 bit setting embodiment, consider again that one of the logical registers, for example the logical register associated with LR 0 (42-0), is determined to be of a known value. In this embodiment, the register mapping table 42 includes additional bits (beyond that needed to address the physical register address space) that can be set to indicate a known value. Thus, regardless of the logical register mapping, one or more of these additional bits can be set to indicate the known that a know value is associated with that logical register.
In one embodiment, the known value is zero, which occurs frequently during floating-point or integer computations. However, any known value that finds frequent use in any implementation of any processor architecture may be used following the teachings of the present disclosure and are within the scope of the present disclosure.
Referring now to
Next, at step 54, the physical register previously mapped to the register mapping table (prior mapping not shown) can be returned to the free list to be made available for other instructions. Finally, at execution time, any instructions (in this example B−0) using the known value would simply insert that value (zero) at the proper time to have the instruction competed. In this way, physical registers can be made available much more rapidly than in previous processor or floating-point architectures. Also, there was no need to move the zero value through the bus or the remaining sections of the processor (or computational units 16 or 18—see
Various processor-based devices may advantageously use the processor (or computational unit) of the present disclosure, including laptop computers, digital books, printers, scanners, standard or high-definition televisions or monitors and standard or high-definition set-top boxes for satellite or cable programming reception. In each example, any other circuitry necessary for the implementation of the processor-based device would be added by the respective manufacturer. The above listing of processor-based devices is merely exemplary and not intended to be a limitation on the number or types of processor-based devices that may advantageously use the processor (or computational unit) of the present disclosure.
While at least one exemplary embodiment has been presented in the foregoing detailed description of the invention, it should be appreciated that a vast number of variations exist. It should also be appreciated that the exemplary embodiment or exemplary embodiments are only examples, and are not intended to limit the scope, applicability, or configuration of the invention in any way. Rather, the foregoing detailed description will provide those skilled in the art with a convenient road map for implementing an exemplary embodiment of the invention, it being understood that various changes may be made in the function and arrangement of elements described in an exemplary embodiment without departing from the scope of the invention as set forth in the appended claims and their legal equivalents.