I. Field of the Disclosure
The technology of the disclosure relates generally to constant propagation during execution of a computer program by a processor.
II. Background
Many compilers are capable of performing an optimization process known as “constant propagation” when compiling source code into an executable computer program. Conventional constant propagation involves detecting, at compilation, an instance of a computer program instruction or function call that results in a same constant value being computed for all possible executions of the program. Based on this knowledge, a compiler may then optimize the computer program to more efficiently propagate the computed constant value to other dependent instructions that receive the computed constant value as an input.
However, under some circumstances, compile-time constant propagation may be impractical or may generate suboptimal results. For example, a compiler's awareness of constant values may be hindered in cases in which blocks of computer instructions are compiled separately. In some programs, variables may be constant only on a subset of program paths due to the presence of multiple paths to an instruction block within a program. Moreover, constant propagation may result in “code bloat,” or an excessively large compiled program that may make optimization too costly in terms of storage.
Aspects disclosed in the detailed description include propagating constant values using a computed constants table. Related apparatuses and methods are also disclosed. In this regard, in one aspect, an instruction processing circuit is provided to enable constant propagation functionality at run time of computer program instructions. The instruction processing circuit may provide a computed constants table for caching computed constant values to be propagated between instructions. The instruction processing circuit may be configured to detect a deterministic instruction in an instruction stream. As used herein, a “deterministic instruction” is an instruction that can be determined to always produce a given output when provided with a particular input. In some aspects, a deterministic instruction may be an instruction that operates on an immediate constant value, or that takes as input only a constant value or a previously computed constant value cached in the computed constants table. After detecting the deterministic instruction, the instruction processing circuit determines whether an attribute (an address, as a non-limiting example) of the deterministic instruction matches an entry of the computed constants table. If the attribute of the deterministic instruction matches the entry of the computed constants table, a computed constant value stored in the entry of the computed constants table is provided for execution of at least one dependent instruction on the deterministic instruction. In this manner, the computed constant value may be propagated to dependent instructions without requiring re-execution of the deterministic instruction, resulting in improved processor performance. In some aspects, the entry of the computed constants table may also store operands for the deterministic instruction. The instruction processing circuit may then locate the entry in the computed constants table by further determining whether inputs for the detected deterministic instruction match the operands stored in the entry.
In another aspect, an apparatus comprising an instruction processing circuit is provided. The instruction processing circuit is configured to detect, in an instruction stream, a deterministic instruction. The instruction processing circuit is further configured to determine whether an attribute of the deterministic instruction matches an entry of a computed constants table. The instruction processing circuit is also configured to, responsive to determining that the attribute of the deterministic instruction matches the entry of the computed constants table, provide a constant value stored in the entry of the computed constants table for execution of at least one dependent instruction on the deterministic instruction.
In another aspect, a method for providing constant propagation is provided. The method comprises detecting, in an instruction stream, a deterministic instruction. The method further comprises determining whether an attribute of the deterministic instruction matches an entry of a computed constants table. The method also comprises, responsive to determining that the attribute of the deterministic instruction matches the entry of the computed constants table, providing a constant value stored in the entry of the computed constants table for execution of at least one dependent instruction on the deterministic instruction.
In another aspect, an apparatus comprising an instruction processing circuit is provided. The instruction processing circuit comprises a means for detecting, in an instruction stream, a deterministic instruction. The instruction processing circuit further comprises a means for determining whether an attribute of the deterministic instruction matches an entry of a computed constants table. The instruction processing circuit also comprises a means for providing a constant value stored in the entry of the computed constants table for execution of at least one dependent instruction on the deterministic instruction, responsive to determining that the attribute of the deterministic instruction matches the entry of the computed constants table.
With reference now to the drawing figures, several exemplary aspects of the present disclosure are described. The word “exemplary” is used herein to mean “serving as an example, instance, or illustration.” Any aspect described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects.
Aspects disclosed in the detailed description include propagating constant values using a computed constants table. Related apparatuses and methods are also disclosed. In this regard, in one aspect, an instruction processing circuit is provided to enable constant propagation functionality at run time of computer program instructions. The instruction processing circuit may provide a computed constants table for caching computed constant values to be propagated between instructions. The instruction processing circuit may be configured to detect a deterministic instruction in an instruction stream. As used herein, a “deterministic instruction” is an instruction that can be determined to always produce a given output when provided with a particular input. In some aspects, a deterministic instruction may be an instruction that operates on an immediate constant value, or that takes as input only a constant value or a previously computed constant value cached in the computed constants table. After detecting the deterministic instruction, the instruction processing circuit determines whether an attribute (an address, as a non-limiting example) of the deterministic instruction matches an entry of the computed constants table. If the attribute of the deterministic instruction matches the entry of the computed constants table, a computed constant value stored in the entry of the computed constants table is provided for execution of at least one dependent instruction on the deterministic instruction. In this manner, the computed constant value may be propagated to dependent instructions without requiring re-execution of the deterministic instruction, resulting in improved processor performance. In some aspects, the entry of the computed constants table may also store operands for the deterministic instruction. The instruction processing circuit may then locate the entry in the computed constants table by further determining whether inputs for the detected deterministic instruction match the operands stored in the entry.
In this regard,
The computer processor 100 includes input/output circuits 106, an instruction cache 108, and a data cache 110. The computer processor 100 further comprises an execution pipeline 112, which includes a front-end circuit 114, an execution unit 116, and a completion unit 118. The computer processor 100 additionally includes registers 120, which comprise one or more general purpose registers (GPRs) 122, a program counter 124, and a link register 126. In some aspects, such as those employing the ARM® ARM7™ architecture, the link register 126 is one of the GPRs 122, as shown in
In exemplary operation, the front-end circuit 114 of the execution pipeline 112 fetches instructions (not shown) from the instruction cache 108, which in some aspects may be an on-chip Level 1 (L1) cache, as a non-limiting example. The fetched instructions are decoded by the front-end circuit 114 and issued to the execution unit 116. The execution unit 116 executes the issued instructions, and the completion unit 118 retires the executed instructions. In some aspects, the completion unit 118 may comprise a write-back mechanism (not shown) that stores results of instruction execution in one or more of the registers 120. It is to be understood that the execution unit 116 and/or the completion unit 118 may each comprise one or more sequential pipeline stages. In the example of
Some aspects of the computer processor 100 of
For various reasons, the instructions processed within the execution pipeline 112 may not have been optimized using constant propagation at compilation. In this regard, the instruction processing circuit 102 of
As each deterministic instruction is fetched by the front-end circuit 114 of the instruction processing circuit 102, the instruction processing circuit 102 consults the computed constants table 104. The computed constants table 104 contains one or more entries (not shown). Each entry may include an attribute of a previously-detected deterministic instruction, and a computed constant value that was previously generated by the deterministic instruction corresponding to the attribute. Some aspects may provide that the attribute comprises an address of the deterministic instruction and/or an index of the deterministic instruction, as non-limiting examples. In some aspects, the entry may also include one or more operands of the deterministic instruction. Exemplary elements of the computed constants table 104 are discussed in greater detail below with respect to
The instruction processing circuit 102 determines whether an attribute of the deterministic instruction being fetched matches an entry of the computed constants table 104. According to some aspects disclosed herein, the instruction processing circuit 102 may be configured to further determine whether one or more inputs (not shown) for the detected deterministic instruction corresponds to one or more operands stored in the entry. If so (i.e., a “hit”), the instruction processing circuit 102 provides the computed constant value from the entry to at least one dependent instruction. In aspects wherein the computer processor 100 includes the optional constant cache 132, the instruction processing circuit 102 may provide the computed constant value to the at least one dependent instruction via the constant cache 132 (e.g., writing the computed constant value to the constant cache 132). In this manner, the instruction processing circuit 102 may leverage existing functionality of the constant cache 132 to provide the computed constant value to the at least one dependent instruction, thus avoiding the need to implement an additional communications path. Some aspects may provide that the instruction processing circuit 102 comprises a communications pathway (not shown) to provide the computed constant value to the at least one dependent instruction. The at least one dependent instruction may thus obtain the computed constant value for the deterministic instruction without requiring the deterministic instruction to be re-executed.
According to some aspects disclosed herein, if the instruction processing circuit 102 detects a deterministic instruction but does not find the attribute of the deterministic instruction in an entry of the computed constants table 104, a “miss” occurs. In this case, the instruction processing circuit 102 may generate an entry in the computed constants table 104 corresponding to the deterministic instruction upon execution of the deterministic instruction. The generated entry includes the attribute of the deterministic instruction, and stores the computed constant value generated by the deterministic instruction as the computed constant value of the entry. In some aspects, the entry may also include one or more operands in which a corresponding one or more inputs for the deterministic instruction may be stored. If and when the deterministic instruction is again detected by the instruction processing circuit 102, a “hit” in the computed constants table 104 may occur, and the computed constant value may be provided to a dependent instruction.
Some aspects may provide that the instruction processing circuit 102 includes additional elements to facilitate constant propagation. As non-limiting examples, the instruction processing circuit 102 may include an in-flight instruction queue 134 and/or a register mapping table 136. The in-flight instruction queue 134 may be used in some aspects to track “in-flight” instructions (i.e., instructions that have been fetched but not yet executed), while the register mapping table 136 may be used to map one or more of the registers 120 to an entry in the computed constants table 104. The use of the in-flight instruction queue 134 and the register mapping table 136 is discussed in greater detail below with respect to
Each of the entries 202(0)-202(X) also includes a value field 206. The value field 206 stores a computed constant value that is generated upon a first execution of the deterministic instruction. Upon subsequent detection of the deterministic instruction, the instruction processing circuit 102 may provide the computed constant value from the value field 206 to a dependent instruction. In some aspects, a size of the value field 206 may be smaller than a largest size of a constant value supported by the computer processor 100 to save processor area. As a non-limiting example, the computer processor 100 may support 64-bit constants, while the value field 206 may store only the lower 32 bits of a computed constant value. In aspects in which most computed constant values are comprised of 32 or fewer significant bits, the use of a smaller value field 206 may provide space and/or power savings with little to no impact on functionality of the computed constants table 200.
Some aspects may provide that each of the entries 202(0)-202(X) of the computed constants table 200 includes one or more operand fields 208(0)-208(Y). Each of the operand fields 208(0)-208(Y) may store a corresponding input of the deterministic instruction. By determining that an entry 202(0)-202(X) provides both an attribute and one or more operands for the deterministic instruction, the instruction processing circuit 102 may enable the computed constants table 200 to capture multiple paths to the same deterministic instruction. As a non-limiting example, two different constant values generated by a function call that is invoked from two different locations with two different sets of operands may be cached as two separate entries 202(0)-202(X).
According to some aspects disclosed herein, each of the one or more operand fields 208(0)-208(Y) may store a reference to another of the entries 202(0)-202(X), or may store a constant value. In some aspects, the operand fields 208(0)-208(Y) may store a mix of references and constant values, with a bit flag (not shown) associated with each of the operand fields 208(0)-208(Y) indicating whether a pointer or a value is stored therein.
It is to be understood that some aspects may provide that the entries 202(0)-202(X) of the computed constants table 200 may include other fields in addition to the fields illustrated in
To better illustrate exemplary communications flows among the instruction processing circuit 102 and the computed constants table 104 of
As seen in
The computed constants table 304 illustrated in
As noted above, the instruction processing circuit 300 may use the in-flight instruction queue 306 of
The register mapping table 308 in the example of
The constant cache 132 shown in
Referring now to
The instruction processing circuit 300 checks the computed constants table 304, and determines that the attribute 340 does not match any of the entries 302(0)-302(3). Thus, in response to the “miss,” the instruction processing circuit 300 generates the entry 302(0) in the computed constants table 304, and stores the attribute 340 of the deterministic instruction 312 in the PC field 320, as indicated by arrow 344. The instruction processing circuit 300 also generates the entry 326(0) in the in-flight instruction queue 306, as indicated by arrow 346. In the entry 326(0), the attribute 340 of the deterministic instruction 312 is stored in the ID field 328, while an ID of zero (0) for the entry 302(0) of the computed constants table 304 is stored in the table ID field 330. As shown by arrow 348, the instruction processing circuit 300 further generates the entry 332(0) in the register mapping table 308. An identifier for the register R0 is stored in the ID field 334 of the entry 332(0), and an ID of zero (0) for the entry 302(0) of the computed constants table 304 is stored in the table ID field 336. Conventional processing of the deterministic instruction 312 then continues.
In
As in
The instruction processing circuit 300 also generates the entry 326(1) in the in-flight instruction queue 306, as shown by arrow 360. In the entry 326(1), the attribute 350 of the deterministic instruction 314 is stored in the ID field 328, while an ID of one (1) for the entry 302(1) of the computed constants table 304 is stored in the table ID field 330. The instruction processing circuit 300 further updates the table ID field 336 of the entry 332(0) in the register mapping table 308 with an ID of one (1) for the entry 302(1) of the computed constants table 304, as indicated by arrow 362. This may serve to indicate to the instruction processing circuit 300 that the entry 302(1) will contain the most recent computed constant value for the register R0 corresponding to the input 352. Conventional processing of the deterministic instruction 314 then continues.
Referring now to
After determining that the attribute 368 does not match any of the entries 302(0)-302(3) of the computed constants table 304, the instruction processing circuit 300 generates the entry 302(2), as indicated by arrow 370. The attribute 368 of the deterministic instruction 316 is stored in the PC field 320, while an ID of one (1) for the entry 302(1) of the computed constants table 304 is stored in the operand fields 324(0) and 324(1) of the entry 302(2) as operands 372 and 374, respectively.
As in
In
In response, the instruction processing circuit 300 writes the computed constant value 380 provided by the entry 302(0) to the entry 337(0) in the constant cache 132 corresponding to the register R0, as indicated by arrow 384. The computed constant value 380 may then be provided to the dependent instruction 314 (i.e., the next deterministic instruction 314) via the constant cache 132, as indicated by arrow 385. In this manner, the dependent instruction 314 is able to receive the computed constant value 380 without the deterministic instruction 312 having to be re-executed.
A similar process occurs in
In
Accordingly, the instruction processing circuit 300 checks the computed constants table 304, and determines that while the attribute 350 matches the entry 302(1), the input 352 in this example does not match the operand field 324(0) of the entry 302(1). The instruction processing circuit 300 thus generates another entry 302(3) in the computed constants table 304, and stores the attribute 350 of the deterministic instruction 314 in the PC field 320, as indicated by arrow 393. Because the input 352 of the deterministic instruction 314 corresponds to the existing entry 302(2), the instruction processing circuit 300 stores an ID of two (2) (i.e., a reference to the entry 302(2) of the computed constants table 304) in the operand field 324(0) of the entry 302(3) as an operand 394.
The instruction processing circuit 300 then generates the entry 326(0) in the in-flight instruction queue 306, as shown by arrow 395. In the entry 326(0), the attribute 350 of the deterministic instruction 314 is stored in the ID field 328, while an ID of three (3) for the entry 302(3) of the computed constants table 304 is stored in the table ID field 330. The instruction processing circuit 300 further updates the table ID field 336 of the entry 332(0) in the register mapping table 308 with an ID of three (3) referencing the entry 302(3) of the computed constants table 304, as indicated by arrow 396. This may serve to indicate to the instruction processing circuit 300 that the entry 302(3) will contain the most recent computed constant value for the register R0 corresponding to the input 352. Conventional processing of the deterministic instruction 314 then continues. After the deterministic instruction 314 is executed, the entry 302(3) is updated with a computed constant value 397 (in this example, the value 83).
To illustrate exemplary operations for propagating constant values according to some aspects of the instruction processing circuit 102 and the computed constants table 104 of
In
The instruction processing circuit 300 next determines whether an attribute 350 of the deterministic instruction 314 matches an entry 302(1) of the computed constants table 304 (block 402). If the attribute 350 of the deterministic instruction 314 does not match the entry 302(1), processing resumes at block 404 of
Referring now to
With continuing reference to
Propagating constant values using a computed constants table according to aspects disclosed herein may be provided in or integrated into any processor-based device. Examples, without limitation, include a set top box, an entertainment unit, a navigation device, a communications device, a fixed location data unit, a mobile location data unit, a mobile phone, a cellular phone, a computer, a portable computer, a desktop computer, a personal digital assistant (PDA), a monitor, a computer monitor, a television, a tuner, a radio, a satellite radio, a music player, a digital music player, a portable music player, a digital video player, a video player, a digital video disc (DVD) player, and a portable digital video player.
In this regard,
Other master and slave devices can be connected to the system bus 608. As illustrated in
The CPU(s) 602 may also be configured to access the display controller(s) 620 over the system bus 608 to control information sent to one or more displays 626. The display controller(s) 620 sends information to the display(s) 626 to be displayed via one or more video processors 628, which process the information to be displayed into a format suitable for the display(s) 626. The display(s) 626 can include any type of display, including but not limited to a cathode ray tube (CRT), a liquid crystal display (LCD), a plasma display, etc.
Those of skill in the art will further appreciate that the various illustrative logical blocks, modules, circuits, and algorithms described in connection with the aspects disclosed herein may be implemented as electronic hardware, instructions stored in memory or in another computer-readable medium and executed by a processor or other processing device, or combinations of both. The master and slave devices described herein may be employed in any circuit, hardware component, integrated circuit (IC), or IC chip, as examples. Memory disclosed herein may be any type and size of memory and may be configured to store any type of information desired. To clearly illustrate this interchangeability, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. How such functionality is implemented depends upon the particular application, design choices, and/or design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
The various illustrative logical blocks, modules, and circuits described in connection with the aspects disclosed herein may be implemented or performed with a processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
The aspects disclosed herein may be embodied in hardware and in instructions that are stored in hardware, and may reside, for example, in Random Access Memory (RAM), flash memory, Read Only Memory (ROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), registers, a hard disk, a removable disk, a CD-ROM, or any other form of computer readable medium known in the art. An exemplary storage medium is coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a remote station. In the alternative, the processor and the storage medium may reside as discrete components in a remote station, base station, or server.
It is also noted that the operational steps described in any of the exemplary aspects herein are described to provide examples and discussion. The operations described may be performed in numerous different sequences other than the illustrated sequences. Furthermore, operations described in a single operational step may actually be performed in a number of different steps. Additionally, one or more operational steps discussed in the exemplary aspects may be combined. It is to be understood that the operational steps illustrated in the flow chart diagrams may be subject to numerous different modifications as will be readily apparent to one of skill in the art. Those of skill in the art will also understand that information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.
The previous description of the disclosure is provided to enable any person skilled in the art to make or use the disclosure. Various modifications to the disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the spirit or scope of the disclosure. Thus, the disclosure is not intended to be limited to the examples and designs described herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.