This invention relates to single instruction multiple data (“SIMD”) operations on packed data in a processor, particularly instructions causing a processor to determine an absolute value or perform a conditional move of operands or where the result may be saturated.
Single instruction, multiple data (“SIMD”) style processing has been used to accelerate multimedia processing, including image processing and data compression. Instruction sets for processors often include SIMD instructions where multiple data elements are packed in a single wide register, with the individual data elements operated on in parallel. Using this approach, multiple operations can be performed with one instruction, thus improving performance. One example is INTEL's MMX (multimedia extension) instruction set.
It would be advantageous to provide new SIMD instructions and supporting circuitry to further enhance multimedia processing, for instance, image segmentation or clipping.
SIMD instructions, including parallel absolute value and parallel conditional move, for parallel processing of packed data are provided as well as a circuit for saturating the result of an operation. Other operations in the instruction set include parallel add, parallel subtract, parallel compare, parallel maximum, and parallel minimum. The operations indicated by the instructions are carried out in the arithmetic logic unit (“ALU”) of a processor.
An instruction indicates, among other things, the operation and the data, in the form of a data word containing data elements, on which the operation is performed. Each data word contains several elements; the number of elements is determined by the mode of operation indicated by the instruction. For instance, when an 8-bit mode is specified, a 32-bit data word contains 4 8-bit data elements, or operands, while in 16-bit mode, the same 32-bit data word contains 2 16-bit operands.
A parallel status flags (“PSF”) register stores the parallel status flags (PSFs) which monitor the status of data elements in data word. PSFs indicate whether the result of an integer operation is zero, the sign of the result of an integer operation, whether there was a carry out from the ALU operation, and whether there was a 2's complement integer overflow result. The PSF register is updated whenever a SIMD instruction that updates PSF flags is performed.
A parallel conditional test (“PTEST”) register contains a code which maps to a test condition. During parallel conditional move (“PCMOV”) instructions, status flags in the PSF register are compared to the test condition in the PTEST register and, if the flags and condition match, the suboperand corresponding to the flags in the PSF register is moved to a specified register.
During parallel absolute value (“PABS”) instructions, the processor determines the absolute value of at least two operands and places the absolute value of the operands in specified registers. The absolute value is determined by using one of the following approaches based on the sign bit of each of the operands: 1) where the sign bit of an operands is 1 and at least one of the other bits is 1, the absolute value of the operand is the 2's complement of the operand; 2) where the sign bit of the operand is 1 and each of the other bits is 0, the absolute value of the operand is the 1's complement of the operand; and 3) where the sign bit of the operand is 0, the absolute value of the operand is the value of the operand.
A method and circuit for handling saturation of a result of an operation are also provided. When two m-bit operands are added, as in an addition, average, or subtraction operation, if an average instruction is executed, the m most significant bits are output; otherwise, the m least significant bits are output and the result is saturated if there is overflow and saturation is enabled.
In one embodiment, the DSE is controlled by a processor status word (“PSW”) register. In
With respect to
A parallel status flags (“PSF”) register is part of the DSE. PSFs are used to monitor the status of data elements in data words. The flags are as follows: Zero (“Z”) indicates if the result of an integer operation is zero; Sign (“S”) indicates the sign of the result of an integer operation; Carry (“CY”) indicates there was a carry out from the ALU operation; and Overflow (“OV”) indicates a 2's complement integer overflow result. The register has the following format:
The PSF register is updated whenever a SIMD instruction that updates PSF flags is performed. In 8-bit mode, computations on byte 0 (the least significant byte) affect PSFO, computations on byte 1 affect PSF1, etc. In 16-bit mode, computations on the lower half-word affect PSF1 while computations on the upper half-word affect PSF3; PSF0 and PSF2 are undefined. Other embodiments of the invention may feature different approaches to handling PSFs.
The DSE also features a parallel condition test (“PTEST”) register. The PTEST register is used when a parallel conditional move (“PCMOV”) instruction is executed. As discussed in greater detail below, a PCMOV operation compares status flags in the PSF register against the test condition specified in the PTEST register; if the flags and the condition match, the suboperand is moved to a specified register. The PTEST register has the following format:
Each 4-bit condition code in the PTEST register maps to a test condition as follows:
Other embodiments of the invention may feature different approaches to handling condition codes and the PTEST register.
SIMD instructions may be executed when the DSE is in SIMD mode (in other words, the SIMD bit discussed above is set to “1”). These instructions take 1 cycle to execute. SIMD instructions which may be executed by the processor described above include the following: a parallel absolute value (“PABS”) instruction, which determines the absolute value of an operand and places that value in a specified register; parallel add/subtract (“PADD/PSUB”) instructions that add or subtract operands together and place the results in specified registers; a parallel average (“PAVG”) instruction that averages two values and places the result in a specified register; parallel max/min (“PMAX/PMIN”) instructions that compare two values and write the greater or lesser value into a specified register; a parallel integer compare (“PCMP”) instruction that compares two operands and modifies condition code flags in the parallel status flag register; and a parallel conditional move (“PCMOV”) instruction that compares status flags in the PSW register with the condition code in the PTEST register and, if the flags and code match, moves the operand to a specified register. The instructions and their actions may be summarized as follows:
As noted above, when the HSIMD bit in the PSW is set to “1,” 16-bit, or half-word, operations are used; otherwise, 8-bit, or byte, operations are employed. (The remainder of this discussion will address the use of 32-bit data words and 16- or 8-bit operations. This limitation is for explanatory purposes only. Other embodiments may use 64- or 128-bit data words and 32- or 64-bit operations, etc.) When the USIMD bit is set to “1,” PMIN and PMAX use unsigned operands. When the NSAT bit is set to “1,” the result should not be saturated. The following table shows which instructions are affected when certain PSW bits are set:
Sample opcodes for the instruction and updated settings in the PSF register following execution of each instruction are shown below:
The OV flag is set to zero after execution of a PAVG instruction because there is never overflow when this instruction is executed. The S flag is cleared to 0 after execution of a PABS instruction. Execution of a PCMOV instruction does not affect PSFs. Other embodiments may, of course, use different opcodes to identify each instruction.
The PAVG instruction may be executed in 8- or 16-bit mode and may operate on signed or unsigned data. The USIMD PSW bit determines whether sign-extension is done before adding the operands. If the USIMD bit is set, the operands are zero-padded by one bit. If USIMD is not set, the operands are sign-extended by one bit. In 16-bit mode, the PAVG operation is as follows:
rb[31:16]=({(USIMD?0:rb[31]), rb[31:16]}+{(USIMD?0:ra[31]), ra[31:16]})[16:1]
(Here, if the USIMD bit is set, the operand is zero-padded by one bit; otherwise the operand is sign-extended (i.e., bit 31 is repeated).)
rb[15:0]=({(USIMD?0:rb[15]), rb[15:0]}+{(USIMD?0:ra[15], ra[15:0]})[16:1]
PSFs following execution of a PAVG instruction in 16-bit mode are as follows:
In 8-bit mode, the PAVG operation is as follows:
rb[31:24]=({(USIMD?0:rb[31]), rb[31:24]}+{(USIMD?0:ra[31]), ra[31:24]})[8:1]
rb[23:16]=({(USIMD?0:rb[23]), rb[23:16]}+{(USIMD?0:ra[23]), ra[23:16]})[8:1]
rb[15:8]=({(USIMD?0:rb[15]), rb[15:8]}+{(USIMD?0:ra[15]), ra[15:8]})[8:1]
rb[7:0]=({(USIMD?0:rb[7], rb[7:0]}+{(USIMD?0:ra[7], ra[7:0]})[8:1]
Following execution of the PAVG operation in 8-bit mode, PSFs are as follows:
“rb” in the tables above refers to the final result of the instruction, not the input operand. The PAVG instruction always rounds down, not towards 0; negative numbers are rounded down towards negative infinity. Execution of the PAVG instruction provides the 8/16 most significant bits (“msbs”) of the result of a 9/17 bits PADD or PSUB operation. Each 8- or 16-bit operation updates the corresponding status flags in the PSF register.
PADD instructions may be executed in either 16- or 8-bit mode on signed and unsigned numbers and will provide saturation if the NSAT bit is clear. (When the USIMD bit is “1,” the instructions treat the operands as unsigned operands. When the USIMD bit is “0,” the instructions treat the operands as signed operands.) In 16-bit mode, a PADD instruction operates as follows:
rb[31:16]=SATURATE(rb[31:16]+ra[31:16]) (rb and ra are the register addresses)
rb[15:0}=SATURATE(rb[15:0]+ra[15:0])
PSFs following execution of a PADD instruction in 16-bit mode are as follows:
The PADD instruction operates in 8-bit mode as follows:
rb[31:24]=SATURATE(rb[31:24]+ra[31:24])
rb[23:16]=SATURATE(rb[23:16]+ra[23:16])
rb[15:8]=SATURATE(rb[15:8]+ra[15:8])
rb[7:0]=SATURATE(rb[7:0]+ra[7:0])
PSFs following an 8-bit operation are as follows:
The “rb” in the above tables refers to the final result of the instruction, not the input operand. Each 8- or 16-bit operation updates the corresponding status flags in the PSF register.
PSUB instructions may also be executed in 8-bit or 16-bit mode on signed and unsigned numbers and will provide saturation if the NSAT bit is clear. In 16-bit mode, the PSUB instruction operates as follows:
rb[31:16]=SATURATE(rb[31:16]−ra[31:16])
rb[15:0}=SATURATE(rb[15:0]−ra[15:0])
PSFs after execution of a PSUB instruction in 8-bit mode are as follows:
The PSUB instruction operates in 8-bit mode as follows:
rb[31:24]=SATURATE(rb[31:24]−ra[31:24])
rb[23:16]=SATURATE(rb[23:16]−ra[23:16])
rb[15:8]=SATURATE(rb[15:8]−ra[15:8])
rb[7:0]=SATURATE(rb[7:00]−ra[7:0])
Following execution of the instruction in 8-bit operation, PSFs are as follows:
The “rb” in the above tables refers to the final result of the instruction, not the input operand. Each 8- or 16-bit operation updates the corresponding status flags in the PSF register.
Results may be saturated in both 8- and 16-bit mode PADD and PSUB operations (in both signed and unsigned mode). No saturation occurs for PAVG operations, since the average can never overflow, and consequently OV is always 0.
In 16-bit unsigned mode, saturation for the PADD instruction occurs as follows:
In 8-bit unsigned mode, saturation for the PADD instruction occurs as follows:
In 16-bit unsigned mode, saturation for the PSUB instruction occurs as follows:
If 8-bit unsigned mode, saturation for the PSUB instruction occurs as follows:
In 16-bit signed mode, saturation occurs as follows:
In 8-bit signed mode, saturation occurs as follows:
If OV is 1, sum[7] is the inverse of cout[7] because OV=cout[6] XOR cout[7]. Also, if OV=1, then sum[7]=cout[6]. Therefore, if OV=1, sum[7] is the inverse of cout[7]. As used here, OV represents the current value that will be written into the PSF register at the end of the current cycle.
In
Bits 6 (cout[6] 78) and 7 (cout[7] 76) of the result in the 9-bit adder 74 are input to an XOR gate 80 and the result is sent to a first AND gate 86. The other input to AND gate 86 indicates whether a PAVG instruction 82 is being executed. This input 82 is inverted 84 before it is input to the first AND gate 86. If a PAVG instruction 82 is being executed, the input to the AND gate 86 is 0. If both the inputs to the first AND gate 86 from the inverter 84 and the XOR gate 80 are 1, then the PSV OV flag 110 will be set to 1, indicating an overflow result. When a PAVG instruction 82 is executed, the PSF OV flag is always set to 0.
Cout[7] 76 is also input 208, 76 to two multiplexers 212, 204 (the bit is inverted 206 before being input to one of the multiplexers 212) along with the result 210, 202 from XOR gate 80. If the USIMD bit 60 is 1, the Cout[7] value 208, 76 is output 216, 214 to a three-way multiplexer 218. The output 108 from the three-way multiplexer 218 depends on the operation performed by the circuit—PSUB 216, PADD 214, or PAVG 200 (0 is always output if PAVG is performed). This output 108 represents the current overflow of the operation (and will be discussed further below).
The output 120 (sum[8:0]) from the adder 74 is divided into sum[7:0] 90 and sum[8:1] 88 (the average of the two operands) and sent to a multiplexer 92. If a PAVG instruction 82 is being executed, the multiplexer 92 will output 114 the average, or sum[8:1] 88, to a second multiplexer 100; otherwise, sum[7:0] 90 will be output 114 to the second multiplexer 100.
The other input 198 to the second multiplexer 100 represents saturation values. Cout[7] 76 is input to a third and fourth multiplexer 94, 192. If the value of Cout[7] 76 is 0, 0x7F 98 is output 112 from the third multiplexer 94 to a fifth multiplexer 196, while 0x00 is output 194 from the fourth multiplexer 192 to the fifth multiplexer 196. If Cout[7] 76 is 1, 0x80 95 is output 112 from the third multiplexer 94 to the fifth multiplexer 196 while 0xFF is output 194 from the fourth multiplexer 192 to the fifth multiplexer 196. If the USIMD bit 160 is 0, the output from the third multiplexer 94 is output to the second multiplexer 100; if the USIMD bit 160 is 1, the output from the fourth multiplexer 192 is sent to the second multiplexer 100.
A second AND gate 102 is connected to the second multiplexer 100. The inputs to the AND gate 102 are the output 108 from the three-way multiplexer 218 which indicates whether there is overflow in the current operation and a line 124 indicating whether the result should be saturated (if the NSAT bit 106 is set to 1, the result should not be saturated; if it is set to 0, the result should be saturated. The NSAT bit 106 is inverted 104 and input 124 to the second AND gate 102.). If there is overflow 108 and if the result should be saturated 124, the output 116 (i.e., the result of the operation) from the second multiplexer 100 is the saturated value 198. Otherwise the result 116 is either sum[7:0] 90 for a PADD or PSUB operation or sum[8:1] 88 for a PAVG operation. A circuit for handling operands of different sizes, for instance 16 bits, works on similar principles.
PMIN and PMAX can operate in 8-bit or 16-bit mode with signed or unsigned data depending on the USIMD bit. In 16-bit mode, PMIN and PMAX instructions are executed as follows:
PSFs are updated as follows following execution of a PMIN or PMAX instruction in 16-bit mode:
In 8-bit mode, PMIN and PMAX instructions are executed as follows:
Following execution of the PMIN or PMAX instruction in 8-bit mode, the PSFs are as follows:
In the above tables, “rb” refers to the final result of the instruction, not the input operand. Each 8- or 16-bit operation updates the corresponding status flags in the PSF register.
The PABS instruction may be executed in either 8- or 16-bit mode depending on the HSIMD PSW bit. The NSAT bit in the PSW does not affect the behavior of the PABS instruction. In 16-bit mode, the PABS instruction is executed as follows:
After execution of the PABS instruction in 16-bit mode, the PSFs are updated as follows:
In 8-bit mode, the PABS instruction is executed as follows:
After execution of the PABS instruction in 8-bit mode, the PSFs are updated as follows:
In the above tables, “rb” refers to the final result of the instruction, not the input operand. Each 8- or 16-bit operation updates the corresponding status flags in the PSF register.
The flags tables assume the PABS operation results in 0-ra in the adder. Therefore; overflow will only be set in one case, when the input is 0x80. This is the only instance where the true result of the PABS operation cannot be represented in the required number of bits.
The PABS function behaves as follows as shown in
The PCMP instruction may be executed in 8- or 16-bit mode on signed or unsigned operands. In executing this instruction, a subtraction is performed without updating the destination register. Instead, the condition code flags in the PSF register are modified. In 16-bit mode, the PCMP operation is as follows:
Following execution of a PCMP instruction in 16-bit mode, PSFs are updated as follows:
In 8-bit mode, the PCMP operation is as follows:
The PSF register is updated as follows:
Each 8- or 16-bit operation updates the corresponding status flags in the PSF register.
PCMOV instructions may be executed in either 16- or 8-bit mode. The instructions test the condition code in the PTEST register (discussed above) against the 4 sets of flags in the PSF register. If the specified condition is true, the corresponding 8 or 16 bits is moved. The PCMOV instruction operates in 16-bit mode as follows:
To illustrate execution of a PCMOV instruction, in
If 8-bit mode is specified (block 126), the PSF3, PSF2, PSF1, and PSF0 flags are tested against the condition code in the PTEST register (blocks 136, 140, 144, 148). If the specified condition is true, the operand associated with the tested PSF is moved to a destination register, i.e., ra[31:24] is moved to rb[31:24] (block 138), ra[23:16] is moved to rb[23:16] (block 142), ra[15:8] is moved to rb[15:8] (block 146), and ra[7:0] is moved to rb[7:0] (block 150). If a specified condition is not true (blocks 136, 140, 144, 148) or an operand is moved (blocks 138, 142, 146, 150), execution of the instruction is finished (block 154).
The PCMOV instruction allows decisions on multiple data streams to be made in one cycle, for example, clipping in image processing. Suppose 8×8 mode is specified and the following transformation of each of the 4 8-bit results in register (“R”) 0 is desired:
This application claims the benefit of provisional United States patent application entitled “Digital Signal Coprocessor,” application No. 60/492,060, filed on Jul. 31, 2003.
Number | Date | Country | |
---|---|---|---|
60492060 | Jul 2003 | US |