I. Field of the Disclosure
The technology of the disclosure relates generally to generating constant values during execution of a computer program by a processor.
II. Background
A conventional computer architecture may specify a number of bits (e.g., 32 bits or 64 bits, as non-limiting examples) that indicate a maximum size for data units, memory addresses, and instructions that are supported by the computer architecture. Arithmetic and other operations provided by the computer architecture may thus enable the generation and use of large constant values that approach or reach the maximum size supported. For instance, a 32-bit computer architecture may support the use of large constant values having up to 32 bits, while a 64-bit architecture may enable the use of large constant values of up to 64 bits.
However, generation of large constant values may present challenges in conventional computer architectures. For example, a computer processor implemented according to a conventional computer architecture may be unable to encode large constant values in a single instruction. As a result, the use of alternate techniques may be required to encode large constant values. These techniques may include the use of a constant-generating instruction sequence comprising one or more constant-generating instructions. The one or more constant-generating instructions may carry out a literal load from a memory, may incorporate relative addressing based on a program counter (PC) plus an offset value, and/or may comprise a series of arithmetic logic unit (ALU) operations, as non-limiting examples. One example of a constant-generating instruction sequence for encoding a large constant value using a PC plus an offset value is illustrated below:
ADRP R0, #4; Register R0 is loaded with a value from a memory location specified by the PC plus the value four (4) shifted left 12 times (4<<12).
ADD R0, #0x33; The value in register R0 is summed with the hexadecimal value 0x33, which is then stored in the register R0. The register R0 now contains the desired constant value.
Even in aspects in which a conventional computer architecture permits a large constant value to be encoded in a single constant-generating instruction, the constant-generating instruction may still require one or more processor cycles to execute. As a result, a dependent instruction that is fetched subsequent to the constant-generating instruction sequence, and that uses the generated constant value as an input, may suffer a latency of one or more processor cycles before the dependent instruction may execute. Accordingly, it is desirable to reduce the processor cycle latency incurred through use of constant-generating instruction sequences of one or more constant-generating instructions.
Aspects disclosed in the detailed description include accelerating constant value generation using a computed constants table. Related circuits, methods, and computer-readable media are also disclosed. In this regard, in one aspect, an instruction processing circuit is provided. The instruction processing circuit is configured to detect, in an instruction stream, a constant-generating instruction sequence. The instruction processing circuit is further configured to determine whether an address of the constant-generating instruction sequence is present in an entry of a computed constants table. The instruction processing circuit is also configured to, responsive to determining that the address of the constant-generating instruction sequence is present in the entry of the computed constants table, provide a constant value stored in the entry for execution of at least one dependent instruction on the constant-generating instruction sequence. In this manner, the generation of constant values by a constant-generating instruction or instruction sequence may be accelerated, allowing dependent instructions to use constant values with zero-cycle latency. In some aspects, the instruction processing circuit may employ one or more state machines to detect a plurality of constant-generating instructions of the constant-generating instruction sequence. Each state machine may be configured to recognize, e.g., a predetermined set of arithmetic logic unit (ALU) operations. If the constant-generating instruction sequence is detected, the instruction processing circuit may determine a last instruction address of a last instruction from among the plurality of constant-generating instructions as the address of the constant-generating instruction sequence.
In another aspect, a method of accelerating generation of constant values is provided. The method comprises detecting, in an instruction stream, a constant-generating instruction sequence. The method further comprises determining whether an address of the constant-generating instruction sequence is present in an entry of a computed constants table. The method also comprises, responsive to determining that the address of the constant-generating instruction sequence is present in the entry of the computed constants table, providing a constant value stored in the entry for execution of at least one dependent instruction on the constant-generating instruction sequence.
In another aspect, a non-transitory computer-readable medium is provided, having stored thereon computer-executable instructions to cause a processor to detect, in an instruction stream, a constant-generating instruction sequence. The computer-executable instructions stored thereon further cause the processor to determine whether an address of the constant-generating instruction sequence is present in an entry of a computed constants table. The computer-executable instructions stored thereon also cause the processor to, responsive to determining that the address of the constant-generating instruction sequence is present in the entry of the computed constants table, provide a constant value stored in the entry for execution of at least one dependent instruction on the constant-generating instruction sequence.
With reference now to the drawing figures, several exemplary aspects of the present disclosure are described. The word “exemplary” is used herein to mean “serving as an example, instance, or illustration.” Any aspect described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects.
Aspects disclosed in the detailed description include accelerating constant value generation using a computed constants table. Related circuits, methods, and computer-readable media are also disclosed. In this regard, in one aspect, an instruction processing circuit is provided. The instruction processing circuit is configured to detect, in an instruction stream, a constant-generating instruction sequence. The instruction processing circuit is further configured to determine whether an address of the constant-generating instruction sequence is present in an entry of a computed constants table. The instruction processing circuit is also configured to, responsive to determining that the address of the constant-generating instruction sequence is present in the entry of the computed constants table, provide a constant value stored in the entry for execution of at least one dependent instruction on the constant-generating instruction sequence. In this manner, the generation of constant values by a constant-generating instruction or instruction sequence may be accelerated, allowing dependent instructions to use constant values with zero-cycle latency. In some aspects, the instruction processing circuit may employ one or more state machines to detect a plurality of constant-generating instructions of the constant-generating instruction sequence. Each state machine may be configured to recognize, e.g., a predetermined set of arithmetic logic unit (ALU) operations. If the constant-generating instruction sequence is detected, the instruction processing circuit may determine a last instruction address of a last instruction from among the plurality of constant-generating instructions as the address of the constant-generating instruction sequence.
In this regard,
The computer processor 100 includes input/output circuits 106, an instruction cache 108, and a data cache 110. The computer processor 100 further comprises an execution pipeline 112, which includes a front-end circuit 114, an execution unit 116, and a completion unit 118. The computer processor 100 additionally includes registers 120, which comprise one or more general purpose registers (GPRs) 122, a program counter 124, and a link register 126. In some aspects, such as those employing the ARM® ARM7™ architecture, the link register 126 is one of the GPRs 122, as shown in
In exemplary operation, the front-end circuit 114 of the execution pipeline 112 fetches instructions (not shown) from the instruction cache 108, which in some aspects may be an on-chip Level 1 (L1) cache, as a non-limiting example. The fetched instructions are decoded by the front-end circuit 114 and issued to the execution unit 116. The execution unit 116 executes the issued instructions, and the completion unit 118 retires the executed instructions. In some aspects, the completion unit 118 may comprise a write-back mechanism (not shown) that stores the execution results in one or more of the registers 120. It is to be understood that the execution unit 116 and/or the completion unit 118 may each comprise one or more sequential pipeline stages. In the example of
Some aspects of the computer processor 100 of
While processing instructions in the execution pipeline 112, the instruction processing circuit 102 may fetch and execute a constant-generating instruction sequence (not shown) comprising one or more constant-generating instructions (not shown). The constant-generating instruction sequence may generate a constant value for loading into one of the registers 120. However, the constant-generating instruction sequence may require one or more processor cycles to generate the constant value. As a result, the instruction processing circuit 102 may be unable to dispatch a subsequent dependent instruction (not shown) until the one or more processor cycles required by the constant-generating instruction sequence has elapsed.
In this regard, the instruction processing circuit 102 of
As the constant-generating instruction sequence is fetched by the front-end circuit 114 of the instruction processing circuit 102, the instruction processing circuit 102 consults the computed constants table 104, which contains one or more entries (not shown). Each entry may include an address of a previously-detected constant-generating instruction sequence, and a constant value that was previously generated by the constant-generating instruction sequence corresponding to the address. According to some aspects in which the constant-generating instruction sequence is a plurality of constant-generating instructions, the address may correspond to a last instruction address of a last instruction from among a plurality of constant-generating instructions.
The instruction processing circuit 102 determines whether an address of the constant-generating instruction sequence being fetched is present in an entry of the computed constants table 104. If the address of the constant-generating instruction sequence is found (i.e., a “hit”), the instruction processing circuit 102 provides the constant value from the entry to at least one dependent instruction. In aspects wherein the computer processor 100 includes the optional constant cache 132, the constant value may be provided to the at least one dependent instruction via the constant cache 132 (e.g., by writing the constant value to the constant cache 132). In this manner, the at least one dependent instruction may obtain the constant value for the constant-generating instruction sequence without incurring wasted processor cycles.
According to some aspects disclosed herein, if the instruction processing circuit 102 detects a constant-generating instruction sequence but does not find the address of the constant-generating instruction sequence in an entry of the computed constants table 104, a “miss” occurs. In this case, the instruction processing circuit 102 may generate an entry in the computed constants table 104 corresponding to the constant-generating instruction sequence upon execution of the constant-generating instruction sequence. The generated entry includes the address of the constant-generating instruction sequence, and stores the constant value generated by the constant-generating instruction sequence as the constant value of the entry. Accordingly, if and when the constant-generating instruction sequence is again detected by the instruction processing circuit 102, a “hit” in the computed constants table 104 may occur, and the constant value may be provided to a dependent instruction.
To better illustrate exemplary communications flows of the instruction processing circuit 102 of
In
The constant-generating instruction sequence 202 in this example is an ADR instruction, which directs the computer processor 100 to load a constant value from an address specified by a current value of the program counter (PC) 124 plus the hexadecimal value 0x10. The constant value is then stored in a register R0, which may be one of the registers 120 of
The computed constants table 104 illustrated in
The constant cache 132 shown in
Referring now to
Upon execution of the constant-generating instruction sequence 202, the constant value (not shown) generated by the constant-generating instruction sequence 202 is forwarded to the dependent instruction 204 using conventional mechanisms, as indicated by arrow 222. The instruction processing circuit 102 then generates the entry 208(X) in the computed constants table 104 based on the constant value, as indicated by arrow 224. The address 206 of the constant-generating instruction sequence 202 is then stored in the program counter field 210 of the entry 208(X), while the constant value is stored in the value field 212 of the entry 208(X).
In response, the instruction processing circuit 102 assigns the constant value 226 provided by the entry 208(X) to the entry 214(0) in the constant cache 132 corresponding to register R0, as indicated by arrow 230. The constant value 226 is then provided to the dependent instruction 204 via the constant cache 132, as indicated by arrow 232. In this manner, the dependent instruction 204 is able to receive the constant value 226 while incurring a zero-cycle latency.
In
The first instruction 304 of the constant-generating instruction sequence 302 in
In the example of
If another ADRP instruction is detected next in the instruction stream 300, the state machine 312 undergoes a transition ADRP 320, which returns the state machine 312 back to the state DETECTED ADRP 318. If an instruction that causes program flow to be redirected is encountered, or if the next instruction encountered in the instruction stream 300 is neither an ADD instruction nor an ADRP instruction, a transition RESET 322 is triggered back to the state IDLE 314. Finally, if the next instruction detected in the instruction stream 300 is an ADD instruction such as the last instruction 306, a transition ACCEPT 324 is triggered, and the state machine 312 moves back to the state IDLE 314. An occurrence of the transition ACCEPT 324 indicates to the instruction processing circuit 102 that the constant-generating instruction sequence 302 has been detected in the instruction processing circuit 102. It is to be understood that that the state machine 312 is a non-limiting example of logic that may be employed to detect instances of the constant-generating instruction sequence 302 in the instruction stream 300. In some aspects, the instruction processing circuit 102 may employ additional and/or other state machines 312 configured to detect other constant-generating instruction sequences in addition to or instead of the constant-generating instruction sequence 302 of
Referring now to
After the constant-generating instruction sequence 302 has executed, the constant value (not shown) generated by the constant-generating instruction sequence 302 is forwarded to the dependent instruction 310 using conventional mechanisms, as indicated by arrow 330. The instruction processing circuit 102 then generates the entry 208(X) in the computed constants table 104 based on the constant value, as indicated by arrow 332. The address 308 of the constant-generating instruction sequence 302 is stored in the program counter (PC) field 210 of the entry 208(X), while the constant value is stored as a constant value in the value field 212 of the entry 208(X).
In
In response, the instruction processing circuit 102 assigns the constant value 334 provided by the entry 208(X) to the entry 214(0) in the constant cache 132 corresponding to register R0, as indicated by arrow 340. The constant value 334 is then provided to the dependent instruction 310 via the constant cache 132, as indicated by arrow 342. The dependent instruction 310 is thus able to receive the constant value 334 while incurring a zero-cycle latency.
The instruction processing circuit 102 next determines whether the address 308 of the constant-generating instruction sequence 302 is present in an entry 208(X) of the computed constants table 104 (block 402). If so, the instruction processing circuit 102 provides a constant value 334 stored in the entry 208(X) for execution of at least one dependent instruction 310 on the constant-generating instruction sequence 302 (block 404). In some aspects, operations of block 404 for providing the constant value 334 may include storing the constant value 334 in the constant cache 132 (block 406). In this manner, the at least one dependent instruction 310 thus may receive the constant value 334 while incurring a zero-cycle latency. The instruction processing circuit 102 then continues processing the instruction stream 300 (block 408).
If, at decision block 402, the instruction processing circuit 102 determines that the address 308 of the constant-generating instruction sequence 302 is not present in an entry 208(X) of the computed constants table 104, the instruction processing circuit 102 generates the entry 208(X) in the computed constants table 104 (block 410). In some aspects, the entry 208(X) is generated upon execution of the constant-generating instruction sequence 302, and stores the address 308 of the constant-generating instruction sequence 302 and the constant value 334 generated by the constant-generating instruction sequence 302. The instruction processing circuit 102 then continues processing the instruction stream 300 (block 408).
As seen in
Accelerating constant value generation using a computed constants table according to aspects disclosed herein may be provided in or integrated into any processor-based device. Examples, without limitation, include a set top box, an entertainment unit, a navigation device, a communications device, a fixed location data unit, a mobile location data unit, a mobile phone, a cellular phone, a computer, a portable computer, a desktop computer, a personal digital assistant (PDA), a monitor, a computer monitor, a television, a tuner, a radio, a satellite radio, a music player, a digital music player, a portable music player, a digital video player, a video player, a digital video disc (DVD) player, and a portable digital video player.
In this regard,
Other master and slave devices can be connected to the system bus 608. As illustrated in
The CPU(s) 602 may also be configured to access the display controller(s) 620 over the system bus 608 to control information sent to one or more displays 626. The display controller(s) 620 sends information to the display(s) 626 to be displayed via one or more video processors 628, which process the information to be displayed into a format suitable for the display(s) 626. The display(s) 626 can include any type of display, including but not limited to a cathode ray tube (CRT), a liquid crystal display (LCD), a plasma display, etc.
Those of skill in the art will further appreciate that the various illustrative logical blocks, modules, circuits, and algorithms described in connection with the aspects disclosed herein may be implemented as electronic hardware, instructions stored in memory or in another computer-readable medium and executed by a processor or other processing device, or combinations of both. The master and slave devices described herein may be employed in any circuit, hardware component, integrated circuit (IC), or IC chip, as examples. Memory disclosed herein may be any type and size of memory and may be configured to store any type of information desired. To clearly illustrate this interchangeability, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. How such functionality is implemented depends upon the particular application, design choices, and/or design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
The various illustrative logical blocks, modules, and circuits described in connection with the aspects disclosed herein may be implemented or performed with a processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
The aspects disclosed herein may be embodied in hardware and in instructions that are stored in hardware, and may reside, for example, in Random Access Memory (RAM), flash memory, Read Only Memory (ROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), registers, a hard disk, a removable disk, a CD-ROM, or any other form of computer readable medium known in the art. An exemplary storage medium is coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a remote station. In the alternative, the processor and the storage medium may reside as discrete components in a remote station, base station, or server.
It is also noted that the operational steps described in any of the exemplary aspects herein are described to provide examples and discussion. The operations described may be performed in numerous different sets other than the illustrated sets. Furthermore, operations described in a single operational step may actually be performed in a number of different steps. Additionally, one or more operational steps discussed in the exemplary aspects may be combined. It is to be understood that the operational steps illustrated in the flow chart diagrams may be subject to numerous different modifications as will be readily apparent to one of skill in the art. Those of skill in the art will also understand that information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.
The previous description of the disclosure is provided to enable any person skilled in the art to make or use the disclosure. Various modifications to the disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the spirit or scope of the disclosure. Thus, the disclosure is not intended to be limited to the examples and designs described herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.