The present invention relates generally to techniques for extending operand constants in a processing system and, more specifically, to advantageous techniques for encoding and decoding extension information in an instruction stream to extend operand constants in a processor.
Many portable products, such as cell phones, laptop computers, personal digital assistants (PDAs) or the like, incorporate one or more processors executing programs that support communication and multimedia applications. The processors need to operate with high performance and efficiency to support the plurality of computationally intensive functions for such products.
The processors operate by fetching and executing instructions that generally have a format of 32-bits or less. Programs often require the use of large constants, such as 32-bit or larger constants for use in generating addresses or for mathematical functions. However, since instruction formats are 32-bits or less, a single instruction cannot specify a 32-bit constant and the operation on the constant in a single instruction format. Consequently, two or more function instructions are generally used, or specialized constant storage space is implemented in hardware and allocated in the addressing space of the processor. For example, a 32-bit constant could be formed by the use of two move immediate instructions. A first move immediate instruction encoded with a first 16-bit constant specifies the first 16-bit constant to be loaded to a low half-word 16-bit portion of a 32-bit target register. A second move immediate instruction encoded with a second 16-bit constant specifies the second 16-bit constant to be loaded to a high half-word 16-bit portion of the 32-bit target register. After fetching and executing the two move immediate instructions, a 32-bit constant would be available for access from the 32-bit target register. In this approach, two instructions and their associated processor cycles are required to create a 32-bit constant which is stored in one of the limited available registers from a register file as the target register. In an alternative implementation, a 32-bit constant may be loaded from memory through the data cache, for example. Additionally, either of these conventional approaches generates a 32-bit constant and a third instruction is then required to do a specified operation using the large constant. Thus, either of these conventional approaches tends to be costly to implement, impacts performance, increases code density, and tends to increase power usage.
Among its several aspects, the present invention recognizes a need for improved implementations supporting constants that are greater in size than can be stored within an instruction format, have a low implementation cost and reduce power usage. To such ends, an embodiment of the invention applies a method for extending a constant. A plurality of instructions having extension information and a target instruction are fetched. A first set of bits from the extension information and a second set of bits within the target instruction are identified. The first set of bits are combined with the second set of bits to generate an extended constant for use as a source operand for execution of the target instruction.
Another embodiment of the invention addresses an apparatus for extending a constant. A decoder circuit is configured to receive a constant extender and a target instruction. An execution circuit is coupled to the decoder circuit and configured to execute the target instruction with an extended constant as a source operand, wherein the extended constant is created by combining a first set of bits from the target instruction with extension bits from the constant extender.
Another embodiment of the invention addresses an apparatus for extending a constant. An instruction decoder circuit is configured to receive a constant extender and a target instruction and to combine an immediate field of bits from the target instruction with extension bits from the constant extender to form an extended constant. A dispatch circuit is configured to dispatch the target instruction and the extended constant on identified dispatch paths. A function execution unit is configured to receive the dispatched target instruction and extended constant from the identified dispatch paths and to execute the target instruction with the extended constant identified as a source operand.
Another embodiment of the invention addresses an apparatus for extending a constant. A decoder and dispatch circuit is configured to receive a constant extender and a target instruction and to dispatch the constant extender and the target instruction on identified dispatch paths. A decode and read operand circuit is configured to receive the dispatched constant extender and target instruction from the dispatch paths and to combine a first set of bits from the dispatched target instruction with extension bits from the dispatched constant extender to form an extended constant. An execution circuit is configured to execute the dispatched target instruction with the extended constant identified as a source operand.
Another embodiment of the invention addresses a method for receiving a constant extender instruction comprising a first set of bits and a target instruction comprising a second set of bits. The first set of bits are combined with the second set of bits to generate an extended constant for use during execution of the target instruction. The extended constant is loaded to a register specified by the target instruction.
A further embodiment of the invention addresses an apparatus for extending a constant. A decoder circuit is configured to receive a constant extender and a memory access instruction. An execution circuit is coupled to the decoder circuit and configured to execute the memory access instruction with an extended constant as a memory address and to load the extended constant to a register specified by the memory access instruction, wherein the extended constant is created by combining a first set of bits from the target instruction with extension bits from the constant extender.
A more complete understanding of the present invention, as well as further features and advantages of the invention, will be apparent from the following Detailed Description and the accompanying drawings.
The present invention will now be described more fully with reference to the accompanying drawings, in which several embodiments of the invention are shown. This invention may, however, be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
Computer program code or “program code” for being operated upon or for carrying out operations according to the teachings of the invention may be initially written in a high level programming language such as C, C++, JAVA®, Smalltalk, JavaScript®, Visual Basic®, TSQL, Perl, or in various other programming languages. A program written in one of these languages is compiled to a target processor architecture by converting the high level program code into a native assembler program. Programs for the target processor architecture may also be written directly in the native assembler language. A native assembler program uses instruction mnemonic representations of machine level binary instructions specified in a native instruction format, such as a 32-bit native instruction format. Program code or computer readable medium as used herein refers to machine language code such as object code whose format is understandable by a processor.
In
The parse bit fields 206, 216, 224, 232, 238, and 254 of
The 26-bit immediate bit field 310 is statically determined prior to loading a program. A 32-bit constant may be statically determined by an analysis of a program and then split into a 26-bit segment and a 6-bit segment for use with the ALU instruction 203, for example. The 26-bit segment is specified in the 26-bit immediate bit field 310 of the constant extender native instruction format 302 and the 6-bit segment is specified in the ALU instruction 203.
The processor pipeline 506 includes, for example, an instruction fetch stage 512, an early decode and dispatch stage 514 having a decode circuit and a dispatch circuit, a memory access unit 516, function execution units 5201, . . . , 520N and a write back stage 524. The memory access unit 516 is used to execute load and store instructions and has a decode stage 517, a read register (Reg) stage 518, and an execute stage 519. The function execution units 5201, . . . , 520N each have decode stages 5211, . . . , 521N, read register stages 5221, . . . , 522N, and execute stages 5231, . . . , 523N, respectively. A write back stage 524 writes results to the register file.
Beginning with the first stage of the processor pipeline 506, the instruction fetch stage 512 associated with a program counter (PC) 509, fetches a packet of, for example, four instructions from the L1 Icache 530 for processing by later stages. If an instruction fetch operation misses in the L1 Icache 530, meaning that an instruction to be fetched is not in the L1 Icache 530, the instruction is fetched from the memory system 534 which may include multiple levels of cache, such as a level 2 (L2) cache, and main memory. The instruction fetch stage 512 may also be configured to identify a constant extender in one cache line and a target instruction in a second cache line and combine the two into an instruction packet for decoding by the early decode and dispatch stage 514. Instructions may be loaded to the memory system 534 from other sources, such as a boot read only memory (ROM), a hard drive, an optical disk, or from an external interface, such as a network. Instructions may be fetched in packets of one or more instructions. A constant extender instruction fetched at a first address may be associated with a target instruction specified at the next higher address, for example. The parse field indication in each 32-bit instruction specifies the length of the packet of instructions.
The early decode and dispatch stage 514 receives the packet of up to four instructions from the instruction fetch stage 512. The instructions in the packet are then classified in the early decode and dispatch unit 514 to identify which execution unit or units the instructions should be dispatched to. Fetched instructions in a very long instruction word (VLIW) packet are to be executed in parallel. For example, a branch instruction paired with a constant extender instruction and fetched in a packet could be evaluated and executed together. One type of branch instruction causes a next program counter (pc) value to be generated that is the current pc value plus an immediate offset value located in the branch instruction. The constant extender instruction may be used to extend the offset value. The early decode and dispatch stage uses the instruction group indication to determine which pipeline (516, 5201, . . . , 520N) will execute each instruction. All instructions specifying operations in the packet may be issued simultaneously to the appropriate execution units for execution. In a scalar machine, a constant extender instruction could be held pending the arrival of the target instruction, at which point both the constant extender and target instructions could be issued in parallel to the specified execution unit, for example.
The early decode operation may be implemented in a parallel process, for example, operating on the fetched plurality of instructions together at a time. For example, with an instruction packet containing four instructions, the first two instructions may be a first constant extender instruction and a move immediate instruction and the next two instructions may be a second constant extender instruction and an arithmetic logic unit (ALU) instruction. In this example, the first constant extender instruction, such as the constant extender instruction 300, is directly associated with the move immediate instruction 202 which is identified as the target instruction. For the move immediate instruction 202, the parse bit field 206 and Igroup bit field 208 are used by the early decode and dispatch stage 514 to identify the destination of the instruction is the function execution unit 5201. In a first embodiment, the move immediate instruction 202 is dispatched over instruction bus 5271 and the constant extender instruction 300 is dispatched over extender bus 5281 to the function execution unit 5201. In a second embodiment, a 32-bit constant 400 is formed in the early decode and dispatch stage 514 and the target instruction is dispatched over instruction bus 5271 and the 32-bit constant is dispatched over extender bus 5281 to the function execution unit 5201.
Similarly, the second constant extender instruction is directly associated with the ALU instruction 203 which is identified as the target instruction. For example, the parse bit field 216 and Igroup bit field 218 are used by the early decode and dispatch stage 514 to identify the destination of the second instruction as the ALU execution unit 5202. In the first embodiment, the ALU instruction 203 is dispatched over instruction bus 5272 and the third instruction encoded using the constant extender native instruction format 302 is dispatched over extender bus 5282 to the function unit 5202. In the second embodiment, the ALU instruction 203 is dispatched over the instruction bus 5272 and a 32-bit constant formed in the early decode and dispatch unit 514 is dispatched over the extender bus 5282 to the function unit 5202. It is appreciated that the four instructions in the packet are decoded and dispatched to the function execution unit 5201 and the function unit 5202 in parallel. Since architecturally a packet is not limited to four instructions, the early decode and dispatch stage 514 may be extended to operate on more than four instructions in parallel depending on an implementation and an application's requirements.
When the function execution unit 5201 receives the dispatched information, the first instruction is decoded in decode stage 5211 to determine the specifics of the move immediate operation and that a 32-bit constant is to be used in the specified operation. In the first embodiment where the move immediate instruction 202 and the constant extender instruction 300 are both dispatched to the function execution unit 5201, the read register stage 5221 fetches any data operands required for the specified load operation from the RF 510. The read register stage 5221 also creates the 32-bit constant for the specified move operation as described above with regards to
When the function unit 5202 receives the third and fourth instructions, the third instruction is decoded in decode stage 5212 to determine the specifics of the ALU function and that a 32-bit constant is to be used in the specified operation. In the first embodiment where the ALU instruction 203 and the constant extender instruction 300 are both dispatched to the function execution unit 5201, the read register stage 5222 fetches any data operands required for the specified ALU operation from the RF 510. The read register stage 5222 also creates the 32-bit constant for the specified ALU operation as described above with regards to
In another example, a hierarchical VLIW packet containing a constant extender instruction 300 and a target load instruction, having an instruction format such as the memory access instruction 204 of
When the memory access unit 516 receives the dispatched information, the first instruction is decoded in decode stage 517 to determine the specifics of the load operation and that a 32-bit constant is to be used as an address in the specified operation. In the first embodiment where the memory access instruction 204 and the constant extender instruction 300 are both dispatched to the function execution unit 516, the read register stage 518 may create the 32-bit address for the specified load operation as described above with regards to
Embodiments of the present invention may be used to improve processor performance and reduce power. For example, in an implementation without the invention, the following sequence of instructions is generally followed to load a first and second element of an array of data elements:
In another example, a hierarchical VLIW packet of two instructions may be received in the processor pipeline 506. The hierarchical VLIW packet contains a constant extender instruction and a duplex instruction, such as duplex instruction 235 of
In a further example, a hierarchical VLIW packet of three instructions may be received in the processor pipeline 506. The hierarchical VLIW packet contains a first constant extender instruction, a second constant extender instruction, and a duplex instruction, such as duplex instruction 250 of
The processor complex 500 may be configured to execute instructions under control of a program stored on a computer readable storage medium. For example, a computer readable storage medium may be either directly associated locally with the processor complex 500, such as may be available from the L1 Icache 530, for operation on data obtained from the L1 Dcache 532, and the memory system 534 or through, for example, an input/output interface (not shown).
At block 604, a plurality of instructions is received from a fetched packet, such as a four instruction packet fetched from the L1 Icache 530. At decision block 606, a determination is made whether any instruction of the packet is a constant extender instruction. Such a determination may be made in the early decode and dispatch stage 514. If the determination is negative, the process 600 proceeds to block 608 for processing the four instruction packet in the processor pipeline. If the determination is positive, the process 600 proceeds to block 610. At block 610, the constant extender, a target instruction, and a destination execution unit are identified, for example, in the early decode and dispatch stage 514. By convention, for example, a target instruction may be positioned adjacent to its associated constant extender instruction, either at a lower address than the constant extender instruction or at a higher address than the constant extender instruction. It is also appreciated, for example, that identification means may be provided to locate both a constant extender instruction and a target instruction which may not be adjacent within a fetched plurality of instructions. Also, a target instruction may be a sub-instruction of a duplex instruction, such as the duplex instruction 235 with sub-instruction 242 as a single target instruction. With two constant extender instructions in a fetched packet, the target instructions may be located in an adjacent duplex instruction, such as the duplex instruction 250 with sub-instructions 256 and 260, each a target instruction of one of the constant extender instructions.
At block 612, a first payload, such as a 26-bit immediate field, is extracted from the constant extender instruction, for example, in the early decode and dispatch stage 514. If two constant extender instructions are present, another 26-bit immediate field would be extracted from the second constant extender instruction. At block 614, a second payload, such as the 6-bit field 222, of the target instruction is combined with the first payload of the constant extender instruction to create an extended constant, such as a 32-bit constant. Similarly, if two constant extender instructions are present, another 32-bit constant would be created. Such a combining operation may be made in the early decode and dispatch stage 514. At block 616, the extended constant and the target instruction are dispatched to the identified execution unit on associated identified dispatch paths. If a second 32-bit constant was created, the second 32-bit constant and its associated target instruction would also be dispatched to the appropriate execution unit. At block 618, the target instruction is executed using the extended constant. With two extended constants and two target instructions, two execution units may each receive one of the extended constants and target instructions for parallel execution. Alternatively, a single execution unit may receive both of the extended constants and target instructions and may execute the two target instructions in parallel or sequentially, depending upon available resources for receiving and executing both extended constants and target instructions. For some types of a target instruction, such as a load instruction, the 32-bit constant is interpreted as an address and, for the processing complex 500, there is one memory access unit 516 which executes the load instruction using the 32-bit extended address. The process 600 then returns to block 604.
At block 644, a plurality of instructions is received from a fetched packet, such as a four instruction packet fetched from the L1 Icache 530. At decision block 646, a determination is made whether any instruction of the packet is a constant extender instruction. Such a determination may be made in the early decode and dispatch stage 514. If the determination is negative, the process 640 proceeds to block 648 for processing the four instruction packet in the processor pipeline. If the determination is positive, the process 640 proceeds to block 650. At block 650, the constant extender instruction, an associated target instruction, and a destination execution unit are identified. If two constant extender instructions and two target instructions are present, both are identified at block 650. At block 652, the constant extender and target instructions are dispatched to the identified execution unit, such as function unit 5201 on associated identified dispatch paths. With two extension operations to be processed, two execution units may each receive one of the constant extender instructions and one of the target instructions. Alternatively, a single execution unit may receive both. At block 654, a first payload, such as the 26-bit immediate field 310, is extracted from the constant extender instruction. At block 656, a second payload, such as the 6-bit immediate field 222, of the target instruction is combined with the first payload of the constant extender instruction to create an extended constant, such as a 32-bit constant. With two extension operations, a second 32-bit constant may be formed in a similar method to that used in blocks 654 and 656. Such a combining operation may be made, for example in the read register stage 5221. At block 658, the target instruction is executed using the 32-bit constant, for example in the execution stage 5231. With two target instructions and extended constants, both may be executed in parallel or sequentially, depending upon available resources for receiving and executing both extended constants and target instructions. The process 640 then returns to block 644.
At block 674, a constant extender instruction and an associated memory access instruction are received in the memory access unit 516. At block 676, a first payload, such as the 26-bit immediate field 310, is extracted from the constant extender instruction. At block 678, a second payload, such as the 6-bit immediate field 229, of the memory access instruction is combined with the first payload of the constant extender instruction to create an extended address, such as a 32-bit address. Such a combining operation may be made, for example, in the decode stage 517 or in the read register stage 518. At block 680, the memory access instruction is executed using the 32-bit address as the memory address to load a data element from memory to register Rx specified in the 5 b Rx field 227 of the memory access instruction. At block 682, the 32-bit address is written to the Ry register as specified by the 5-bit target Ry field 228. The process 670 then returns to block 674.
The methods described in connection with the embodiments disclosed herein may be embodied in a combination of hardware and in a software module storing non-transitory signals executed by a processor. The software module may reside in random access memory (RAM), flash memory, read only memory (ROM), electrically programmable read only memory (EPROM), hard disk, a removable disk, tape, compact disk read only memory (CD-ROM), or any other form of storage medium known in the art. A storage medium may be coupled to the processor such that the processor can read information from, and in some cases write information to, the storage medium. The storage medium coupling to the processor may be a direct coupling integral to a circuit implementation or may utilize one or more interfaces, supporting direct accesses or data streaming using down loading techniques.
While the invention is disclosed in the context of illustrated embodiments for use in processor systems it will be recognized that a wide variety of implementations may be employed by persons of ordinary skill in the art consistent with the above discussion and the claims which follow below. For example, constants larger than 32-bits may be created by using two constant extender instructions. For example, a 58-bit constant may be created by combining two 26-bit immediate fields from each constant extender instruction with a constant field in a target instruction. With three or more constant extender instructions, larger constants may be created, for example 84-bit or larger extended constants may be created.