Claims
- 1. An apparatus for the efficient processing of complex multiplication computations, the apparatus comprising:
at least one controller sequence processor (SP); a memory for storing process control instructions; a first multiply complex numbers instruction stored in the memory and operative to control the PEs to carry out a multiplication operation involving a pair of complex numbers; and hardware for implementing the first multiply complex numbers instruction.
- 2. The apparatus of claim 1 further comprising a plurality of processing elements (PEs) interconnected with said SP and arranged in an N×N array interconnected in a manifold array interconnection network.
- 3. The apparatus of claim 1 wherein the first multiply complex instruction completes execution in 2 cycles.
- 4. The apparatus of claim 1 wherein the first multiply complex instruction is tightly pipelineable.
- 5. The apparatus of claim 1 wherein each complex number is stored as a word, each word comprising a first half word and a second half word, with a real component of each complex number being stored as the first half word and an imaginary component of each complex number being stored as the second half word.
- 6. The apparatus of claim 1 wherein the first multiply complex instruction includes a plurality of rounding modes, the rounding modes including:
rounding toward a nearest integer; rounding toward zero; rounding toward infinity; and rounding toward negative infinity.
- 7. The apparatus of claim 1 wherein the first multiply complex numbers instruction is one of the following group of instructions: a multiply complex numbers (MPYCX), a multiply complex numbers instruction (MPYCXJ) operative to carry out the multiplication of a pair of complex numbers where an argument is conjugated, a multiply complex numbers instruction (MPYCXD2) operative to carry out the multiplication of a pair of complex numbers with a result divided by two, and a multiply complex numbers instruction (MPYCXJD2) operative to carry out the multiplication of a pair of complex numbers where an argument is conjugated with a result divided by two.
- 8. The apparatus of claim 1 further comprising a multiply accumulate unit including the memory for storing the first multiply complex numbers instruction.
- 9. The apparatus of claim 8 wherein the multiply accumulate unit operates in response to a multiply accumulate instruction (MPYA) to extend a multiplication operation with an accumulate operation.
- 10. The apparatus of claim 8 wherein the multiply accumulate unit operates in response to a sum two product accumulate instruction (SUM2PA) to extend two multiplication operations with an accumulate operation.
- 11. The apparatus of claim 9 wherein the multiply accumulate unit operates in response to a multiply complex with accumulate instruction (MPYCXA) to carry out the multiplication of a pair of complex numbers with accumulation of a third complex number.
- 12. The apparatus of claim 11 wherein the MPYCXA instruction completes execution in 2 cycles.
- 13. The apparatus of claim 12 wherein the MPYCXA instruction is tightly pipelineable.
- 14. The apparatus of claim 1 further comprising one or more of the following additional instructions (MPYCXA, MPYCXAD2, MPYCXJA or MPYCXJAD2) stored in the memory to carry out complex multiplication operations pipelined in 2 cycles.
- 15. A method for the computation of an FFT by a plurality of processing elements (PEs), the method comprising the steps of:
loading input data from a memory into each PE in a cyclic manner; calculating a local FFT by each PE; multiplying by the twiddle factors and calculating a FFT by the cluster of PEs; and loading the FFTs into the memory.
- 16. A method for the computation of a distributed FFT by an N×N processing element (PE) array, the method comprising the steps of:
loading a complex number x and a corresponding twiddle factor w from a memory into each of the PEs; calculating a first product by the multiplication of the complex numbers x and w; transmitting the first product from each of the PEs to another PE in the N×N array; receiving the first product and treating it as a second product in each of the PEs; selectively adding or subtracting the first product and the second product to form a first result; calculating a third product in selected PEs; transmitting the first result or third product in selected PEs to another PE in the N×N array; selectively adding or subtracting the received values to form a second result; and storing the second results in the memory.
- 17. A method for efficient computation by a 2×2 processing element (PE) array interconnected in a manifold array interconnection network, the array comprising four PEs (PE0, PE1, PE2 and PE3), the method comprising the steps of:
loading a complex number x and a corresponding twiddle factor w from a memory into each of the four PEs, complex number x including subparts x0, x1, x2 and x3, twiddle factor w including subparts w0, w1, w2 and w3; multiplying the complex numbers x and w, such that
PE0 multiplies x0 and w0 to produce a product0, PE1 multiplies x1 and w1 to produce a product1, PE2 multiplies x2 and w2 to produce a product2, and PE3 multiplies x3 and w3 to produce a product3; transmitting the product0, the product1, the product2 and the product3, such that
PE0 transmits the product0 to PE2, PE1 transmits the product1 to PE3, PE2 transmits the product2 to PE0, and PE3 transmits the product3 to PE1; and performing arithmetic logic operations, such that
PE0 adds the product0 and the product2 to produce a sum t0, PE1 adds the product1 and the product3 to produce a sum t2, PE2 subtracts the product2 from the product0 to produce a sum t1, and PE3 subtracts the product3 from the product1 to produce a result which is multiplied by −i to produce a sum t3.
- 18. The method of claim 17 further comprising the steps of:
transmitting the sums t0, t1, t2 and t3, such that
PE0 transmits t0 to PE1, PE1 transmits t2 to PE0, PE2 transmits t1 to PE3, and PE3 transmits t3 to PE2; performing the arithmetic logic operations, such that
PE0 adds t0 and t2 to produce a y0, PE1 subtracts t2 from t0 to produce a y1, PE2 adds t1 and t3 to produce a y2, and PE3 subtracts t3 from t1 to produce a y3; and storing y0, y1, y2 and y3 in a memory.
- 19. A special hardware instruction for handling the multiplication with accumulate for two complex numbers from a source register whereby utilizing said instruction and accumulated complex product of two source operands is rounded according to a rounding mode specified in the instruction and loaded into a target register with the complex numbers organized in the source such that a halfword (H1) contains the real component and a halfword (H0) contains the imaginary component.
- 20. The special hardware instruction of claim 19 wherein the accumulated complex product is divided by two before it is rounded.
- 21. An apparatus to efficiently fetch instructions including complex multiplication instructions and an accumulate form of multiplication instructions from a memory element and dispatch the fetched instruction to at least one of a plurality of multiply complex and multiply with accumulate execution units to carry out the instruction specified operation, the apparatus comprising:
a memory element; means for fetching said instructions from the memory element; a plurality of multiply complex and multiply with accumulate execution units; and means to dispatch the fetched instruction to at least one of said plurality of execution units to carry out the instruction specified operation.
- 22. The apparatus of claim 21 further comprising:
an instruction register to hold a dispatched multiply complex instruction (MPYCX); means to decode the MPYCX instruction and control the execution of the MPYCX instruction; two source registers each holding a complex number as operand inputs to the multiply complex execution hardware; four multiplication units to generate terms of the complex multiplication; four pipeline registers to hold the multiplication results; an add function which adds two of the multiplication results from the pipeline registers for the imaginary component of the result; a subtract function which subtracts two of the multiplication results from the pipeline registers for the real component of the result; a round and select unit to format the real and imaginary results; and a result storage location for saving the final multiply complex result, whereby the apparatus is operative for the efficient processing of multiply complex computations.
- 23. The apparatus of claim 21 wherein the means for fetching said instructions is a sequence processor (SP) controller.
- 24. The apparatus of claim 22 wherein the round and select unit provides a shift right as a divide by 2 operation for a multiply complex divide by 2 instruction (MPYCXD2).
- 25. The apparatus of claim 21 further comprising:
an instruction register to hold a dispatched multiply complex instruction (MPYCXJ); means to decode the MPYCXJ instruction and control the execution of the MPYCXJ instruction; two source registers each holding a complex number as operand inputs to the multiply complex execution hardware; four multiplication units to generate terms of the complex multiplication; four pipeline registers to hold the multiplication results; an add function which adds two of the multiplication results from the pipeline registers for the real component of the result; a subtract function which subtracts two of the multiplication results from the pipeline registers for the imaginary component of the result; a round and select unit to format the real and imaginary results; and a result storage location for saving the final multiply complex conjugate result, whereby the apparatus is operative for the efficient processing of multiply complex conjugate computations.
- 26. The apparatus of claim 25 wherein the round and select unit provides a shift right as a divide by 2 operation for a multiply complex conjugate divide by 2 instruction (MPYCXJD2).
- 27. The apparatus of claim 21 further comprising:
an instruction register to hold the dispatched multiply accumulate instruction (MPYA); means to decode the MPYA instruction and control the execution of the MPYA instruction; two source registers each holding a source operand as inputs to the multiply accumulate execution hardware; at least two multiplication units to generate two products of the multiplication; at least two pipeline registers to hold the multiplication results; at least two accumulate operand inputs to the second pipeline stage accumulate hardware; at least two add functions which each adds the results from the pipeline registers with the third accumulate operand creating two multiply accumulate results; a round and select unit to format the results if required by the MPYA instruction; and a result storage location for saving the final multiply accumulate result, whereby the apparatus is operative for the efficient processing of multiply accumulate computations.
- 28. The apparatus of claim 21 further comprising:
an instruction register to hold a dispatched multiply accumulate instruction (SUM2PA); means to decode the SUM2PA instruction and control the execution of the SUM2PA instruction; at least two source registers each holding a source operand as inputs to the SUM2PA execution hardware; at least two multiplication units to generate two products of the multiplication; at least two pipeline registers to hold the multiplication results; at least one accumulate operand input to the second pipeline stage accumulate hardware; at least one add function which adds the results from the pipeline registers with the third accumulate operand creating a SUM2PA result; a round and select unit to format the results if required by the SUM2PA instruction; and a result storage location for saving the final result, whereby the apparatus is operative for the efficient processing of sum of 2 products accumulate computations
- 29. The apparatus of claim 21 further comprising:
an instruction register to hold the dispatched multiply complex accumulate instruction (MPYCXA); means to decode the MPYCXA instruction and control the execution of the MPYCXA instruction; two source registers each holding a complex number as operand inputs to the multiply complex accumulate execution hardware; four multiplication units to generate terms of the complex multiplication; four pipeline registers to hold the multiplication results; at least two accumulate operand inputs to the second pipeline stage accumulate hardware; an add function which adds two of the multiplication results from the pipeline registers and also adds one of the accumulate operand input for the imaginary component of the result; a subtract function which subtracts two of the multiplication results from the pipeline registers and also adds the other accumulate operand input for the real component of the result; a round and select unit to format the real and imaginary results; and a result storage location for saving the final multiply complex accumulate result, whereby the apparatus is operative for the efficient processing of multiply complex accumulate computations.
- 30. The apparatus of claim 29 wherein the round and select unit provides a shift right as a divide by 2 operation for a multiply complex accumulate divide by 2 instruction (MPYCXAD2).
- 31. The apparatus of claim 21 further comprising:
an instruction register to hold the dispatched multiply complex conjugate accumulate instruction (MPYCXJA); means to decode the MPYCXJA instruction and control the execution of the MPYCXJA instruction; two source registers each holding a complex number as operand inputs to the multiply complex accumulate execution hardware; four multiplication units to generate terms of the complex multiplication; four pipeline registers to hold the multiplication results; at least two accumulate operand inputs to the second pipeline stage accumulate hardware; an add function which adds two of the multiplication results from the pipeline registers and also adds one of the accumulate operand input for the real component of the result; a subtract function which subtracts two of the multiplication results from the pipeline registers and also adds the other accumulate operand input for the imaginary component of the result; a round and select unit to format the real and imaginary results; and a result storage location for saving the final multiply complex conjugate accumulate result, whereby the apparatus is operative for the efficient processing of multiply complex conjugate accumulate computations.
- 32. The apparatus of claim 31 wherein the round and select unit provides a shift right as a divide by 2 operation for a multiply complex conjugate accumulate divide by 2 instruction (MPYCXJAD2).
- 33. The apparatus of claim 21 wherein the complex multiplication instructions and accumulate form of multiplication instructions include MPYCX, MPYCXD2, MPYCXJ, MPYCXJD2, MPYCXA, MPYCXAD2, MPYCXJA, MPYCXJAD2 instructions, and all of these instructions complete execution in 2 cylces.
- 34. The apparatus of claim 21 wherein the complex multiplication instructions and accumulate form of multiplication instructions include MPYCX, MPYCXD2, MPYCXJ, MPYCXJD2, MPYCXA, MPYCXAD2, MPYCXJA, MPYCXJAD2 instructions, and all of these instructions are tightly pipelineable.
- 35. An apparatus for the efficient processing of an FFT, the apparatus comprising:
at least one controller sequence processor (SP); a plurality of processing elements (PEs) arranged in an N×N array interconnected in a manifold (ManArray) interconnection network; and a memory for storing instructions to be processed by the SP and by the array of PEs.
- 36. The apparatus of claim 22 wherein the add function and subtract function are selectively controlled functions allowing either addition or subtraction operations as specified by the instruction.
- 37. The apparatus of claim 25 wherein the add function and subtract function are selectively controlled functions allowing either addition or subtraction operations as specified by the instruction.
- 38. The apparatus of claim 29 wherein the add function and subtract function are selectively controlled functions allowing either addition or subtraction operations as specified by the instruction.
- 39. The apparatus of claim 31 wherein the add function and subtract function are selectively controlled functions allowing either addition or subtraction operations as specified by the instruction.
Parent Case Info
[0001] This application claims the benefit of U.S. Provisional Application Serial No. 60/103,712 filed Oct. 9, 1998 which is incorporated by reference in its entirety herein.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60103712 |
Oct 1998 |
US |