The invention relates to microprocessors.
In a typical central processing unit (CPU) pipeline flow, an instruction in the pipeline will first obtain its operands and then execute before finally writing back the result and possibly forwarding the result to subsequent dependent consuming instructions. Depending on the CPU microarchitecture, this process often occurs across multiple pipeline stages so as to optimize performance and frequency.
In a superscalar processor containing multiple execution pipelines, forwarding the result of one instruction to one or more consuming instructions in the pipelines may be a performance critical function that if not done efficiently may lead to pipeline stalls. A data dependency stall is the most common stall involving instructions attempting to dispatch to their respective pipelines for execution, where a stalled instruction waits for the producer of an operand to complete. Delays in forwarding the needed operand from its producer to the stalled instruction results in degraded CPU performance.
Embodiments of the invention are directed to systems and methods for forwarding literal generated data to dependent instructions more efficiently using a cache for storing constants (literals or immediates).
In an embodiment, a processor includes a register, a first pipeline, a cache, and a controller. The controller stores a value in an entry in the cache in response to the first pipeline decoding an instruction, wherein the instruction writes the value to the register upon completing execution, and wherein the value is determined or available when the first pipeline decodes the instruction. The controller sets a tag field in the entry to tag the entry with the register, and sets a flag field in the entry to indicate that the entry is valid. The instruction may be a move immediate instruction.
In another embodiment, a method includes decoding a first instruction in a first pipeline, wherein the first instruction writes a value to a register upon completing execution, and wherein the value is determined or available when the first pipeline decodes the first instruction. The method further includes storing the value in an entry in a cache; tagging the entry with the register; and setting the entry as valid.
In another embodiment, a processor includes a first pipeline to decode a first instruction, wherein the first instruction writes a value to a register upon completing execution, and wherein the value is determined or available when the first pipeline decodes the first instruction. The processor further includes a means for storing, the means for storing to store the value in an entry in a cache; a means for tagging, the means for tagging to tag the entry with the register; and a means for setting, the means for setting to set the entry as valid.
In another embodiment, a non-transitory computer readable medium has stored instructions to cause a processor to perform a process. The process includes decoding a first instruction in a first pipeline, wherein the first instruction writes a value to a register upon completing execution, and wherein the value is determined or available when the first pipeline decodes the first instruction; storing the value in an entry in a cache; tagging the entry with the register; and setting the entry as valid.
The accompanying drawings are presented to aid in the description of embodiments of the invention and are provided solely for illustration of the embodiments and not limitation thereof.
Aspects of the invention are disclosed in the following description and related drawings directed to specific embodiments of the invention. Alternate embodiments may be devised without departing from the scope of the invention. Additionally, well-known elements of the invention will not be described in detail or will be omitted so as not to obscure the relevant details of the invention.
The word “exemplary” is used herein to mean “serving as an example, instance, or illustration.” Any embodiment described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments. Likewise, the term “embodiments of the invention” does not require that all embodiments of the invention include the discussed feature, advantage or mode of operation.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of embodiments of the invention. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises”, “comprising”, “includes” and/or “including”, when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
Further, many embodiments are described in terms of sequences of actions to be performed by, for example, elements of a computing device. It will be recognized that various actions described herein can be performed by specific circuits (e.g., application specific integrated circuits (ASICs)), by program instructions being executed by one or more processors, or by a combination of both. Additionally, these sequence of actions described herein can be considered to be embodied entirely within any form of computer readable storage medium having stored therein a corresponding set of computer instructions that upon execution would cause an associated processor to perform the functionality described herein. Thus, the various aspects of the invention may be embodied in a number of different forms, all of which have been contemplated to be within the scope of the claimed subject matter. In addition, for each of the embodiments described herein, the corresponding form of any such embodiments may be described herein as, for example, “logic configured to” perform the described action.
Illustrated in the pipelines of
A move instruction is a commonly used instruction for moving (copying or writing) data from one location to another. A move instruction is often written as MOV, and that convention will be followed here. A common use of a move instruction is to copy the value of a constant into an architected register. The constant value to be copied may be referred to as an immediate or literal. A move instruction for moving a constant to a register may be termed a move immediate instruction and written as MOV Rm #constant, where constant refers to the constant value and Rm refers to the architected register to which the constant value is written. In
Upon decoding a move immediate instruction, an embodiment stores the constant as part of an entry in a cache, referred to as a constant cache and labeled 112 in
The constant cache 112 may be realized in the processor 100 as a register file. In the illustration of
A move immediate instruction requires no subsequent execution to calculate its result. Typically, when a constant is generated, it is consumed immediately by a subsequent (in program order) consuming instruction. By utilizing the constant cache 112, subsequent consuming instructions have access to the stored constant value before the constant value is written to the destination architected register.
The contents of the constant cache 112 may be viewed as being organized into a table, where the constant value stored in an entry is written by a move immediate instruction and tagged according to the destination register of the move immediate instruction. Consider a result (the constant value) of a move immediate instruction stored in the constant cache 112 and a subsequent (in program order) instruction that depends upon the move immediate instruction, where an operand of the subsequent instruction is the constant value that the move immediate instruction is to move to a destination register. The subsequent instruction is the consuming instruction, and the destination register is the register targeted by the move instruction.
For an embodiment, execution of the consuming instruction need not wait for the result of the move immediate instruction to be forwarded, nor wait for the move immediate instruction to complete execution. Rather, the consuming instruction may use as its operand the constant value stored in the entry in the constant cache 112 associated with the move immediate instruction that it depends upon. As a result, no data forwarding is required and no data stall need occur regardless of whether the move immediate instruction has completed or is still in a pipeline.
Furthermore, the move immediate instruction and the data dependent consuming instruction may be at the same stage in different pipelines, and yet for some embodiments the data dependent consuming instruction may obtain its operand with zero pipeline cycle delay.
When a move immediate instruction is decoded and its immediate (literal or constant value) is stored in an entry in the constant cache 114, the flag field 114c associated with the entry is set to indicate that the contents of the entry are valid. When that entry is later accessed by a consuming instruction, the validity of an entry is checked before the immediate stored in the entry is forwarded to the consuming instruction. If the flag field associated with an entry indicates that the immediate stored in the entry is not valid, then the stored immediate is not forwarded to the consuming instruction.
Although the above description is within the context of a move immediate instruction, embodiments are not limited to move immediate instructions when employing the constant cache 112. The controller 110 may be configured so that for other types of instructions that write values to a destination register, an entry may be generated in the constant cache 112 as described with respect to the move immediate instruction, so that the stored value may be forwarded to a consuming instruction. Examples of such instructions are branch and link instructions, and program control relative branches, to name a few.
More generally, the described embodiments may be apply to instructions that write a result to the register file, where the result can be determined by either information contained in the decode of the instruction or available at the time of decode. Such instructions do not have any operands that must read the register file. However, for ease of discussion, the embodiments disclosed herein are described for a move immediate instruction, where a move immediate instruction merely serves as example instruction for which embodiments may be of utility.
When an instruction writes a result to an architected register, where the instruction needs to read from the register file before execution to determine the result, then the controller 110 invalidates any entry in the constant cache 114 with a tag matching the architected register. In this case, the controller 110 sets the flag field of the matching entry to a value indicating that the constant value stored in the entry is not valid.
Controller 110 updates entries in the constant cache 111 according to the above-described embodiments. These actions are may be performed completely by hardware. For some embodiments, instructions stored in a memory, such as for example the memory 116, may carry out the above-described actions. The memory 116 may in general be a non-transitory computer readable medium.
If the decoded instruction is a consumer of the architected register IL, as indicated in step 208, then provided there is a valid entry in the constant cache 112 associated (tagged) with the architected register IL, the constant value C stored in the constant field of that entry is forwarded to the consumer, as indicated in step 210. If the decoded instruction is an instruction that completes execution and writes (or copies) a constant value to the architected register Rm, as indicated in step 212, then the controller 110 invalidates the entry (provided there is one) in the constant cache 112 associated (tagged) with the architected register Rm, as indicated in step 214.
Embodiments may be used in data processing systems associated with the communication device 306, or with the base station 304C, or both, for example.
Those of skill in the art will appreciate that information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.
Further, those of skill in the art, will appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The methods, sequences and/or algorithms described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor.
Accordingly, an embodiment of the invention can include a computer readable media embodying a method for forwarding literal generated data to dependent instructions more efficiently using a constant cache.
Accordingly, the invention is not limited to illustrated examples and any means for performing the functionality described herein are included in embodiments of the invention.
While the foregoing disclosure shows illustrative embodiments of the invention, it should be noted that various changes and modifications could be made herein without departing from the scope of the invention as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the embodiments of the invention described herein need not be performed in any particular order. Furthermore, although elements of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.