Claims
- 1. An extensible pipelined processor adapted for performing iterative calculations on a plurality of data, said processor having an extension instruction set associated therewith, and comprising:
at least one multiply-accumulate stage having at least one accumulator associated therewith; at least one register window; at least one extension instruction provided within said extension instruction set, said at least one extension instruction being adapted to:
(i) subtract a value present in said at least one accumulator from a multiple of a first one of said plurality of data; and (ii) preload said at least one accumulator with a second one of said plurality of data; and logic operatively connected to said at least one multiply-accumulate stage and adapted to write back the result of said subtraction to said register window.
- 2. The processor of claim 1, wherein said at least one extension instruction comprises and FBF instruction adapted to perform at least a portion of an FFT butterfly calculation.
- 3. The processor of claim 2, wherein said FBF instruction comprises a 32-bit instruction word utilizing three inputs to generate two outputs.
- 4. The processor of claim 3, wherein one of said three inputs comprises a twiddle factor, said twiddle factor being derived at least in part from a table disposed in a storage device associated with said processor.
- 5. The processor of claim 3, wherein one of said three inputs comprises a twiddle factor, said twiddle factor being of the form cosθ+j*sinθ.
- 6. The processor of claim 3, wherein said at least one FBF instruction utilizes pairs of data blocks as inputs, at least two of said inputs comprising data obtained sequentially from different data blocks.
- 7. The processor of claim 1, wherein said value present in said at least one accumulator is loaded in parallel with at least one other operation.
- 8. The processor of claim 2, wherein said at least one accumulator comprises 16-bits, and said FBF instruction comprises two operands.
- 9. The processor of claim 2, wherein said at least one accumulator comprises 24-bits.
- 10. The processor of claim 1, wherein said extension instruction set further includes at least one multiply-accumulate instruction, and said at least one extension instruction is pipelined to the same number of stages as said at least one multiply-accumulate instruction to avoid stalling of the pipeline during execution.
- 11. A pipelined digital processor adapted for performing iterative calculations on a plurality of data, said processor having an instruction set with at least one extension instruction, said at least one extension instruction adapted for FFT butterfly calculation, the processor comprising:
at least one multiply-accumulate stage having at least one accumulator associated therewith; at least one register window; and write-back logic operatively connected to said at least one multiply-accumulate stage; wherein said at least one extension instruction being adapted to:
(i) subtract a value present in said at least one accumulator from a multiple of a first one of said plurality of data; (ii) preload said at least one accumulator with a second one of said plurality of data; and (iii) cooperate with said logic to write back the result of said subtraction to said window register.
- 12. The processor of claim 11, wherein said extension instruction set further includes at least one multiply-accumulate instruction, and said at least one extension instruction is pipelined to the same number of stages as said at least one multiply-accumulate instruction to avoid stalling of the pipeline during execution.
- 13. An accumulator used in a digital processor, comprising:
an adder having first and second inputs, said second input being operatively coupled to a first multiplier, said adder adapted to add said inputs to produce at least one output; a first multiplexer adapted to multiplex a plurality of inputs onto at least one output; at least one of said plurality of inputs comprising at least one of said at least one outputs of said adder, at least one of said plurality of inputs of said first multiplexer comprising a pre-load signal associated with a butterfly operation; and at least one register operatively coupled to said at least one output of said first multiplexer and said first input of said adder.
- 14. The accumulator of claim 13, further comprising:
a subtractor having at least a first input coupled to said at least one register; and a second multiplexer having the outputs of said adder and said subtractor as inputs thereto.
- 15. The accumulator of claim 14, further comprising:
a second register operatively coupled to an input of said subtractor; a third multiplexer having:
(i) a plurality of inputs, at least one of said inputs comprising a multiplier product from a second multiplier; and (ii) an output operatively coupled to said second input of said adder; wherein said third multiplexer is adapted to switch inputs to said accumulator between said first and second multipliers.
- 16. An extensible processor adapted for performing iterative calculations on a plurality of data, said processor having a multi-stage instruction pipeline and an extension instruction set, comprising:
at least one multiply-accumulate stage having at least one accumulator associated therewith; at least one first extension instruction being adapted to perform at least a portion of an iterative calculation on said plurality of data; and at least one second extension instruction adapted to perform a multiply or multiply-accumulate operation using said at least one multiply accumulate stage and accumulator; wherein said at least one first extension instruction is pipelined to the same number of stages as said at least one second extension instruction, thereby avoiding pipeline stalling during processing of said at least one first instruction.
- 17. An extensible processor adapted for performing iterative calculations on a plurality of data, said processor having a multi-stage instruction pipeline and an extension instruction set, comprising:
at least one multiply-accumulate stage having at least one accumulator associated therewith; at least one first extension instruction being adapted to perform at least a portion of an iterative calculation on said plurality of data; and at least one second extension instruction adapted to perform a multiply or multiply-accumulate operation using said at least one multiply accumulate stage and accumulator; wherein the propagation of said at least one first extension instruction within said pipeline is controlled at least in part through added pipeline depth, said added pipeline depth providing for reduced execution time of said at least one first extension instruction.
- 18. A method of performing an iterative calculation using a configurable, extensible processor having an instruction set, pipeline, and at least one multiply-accumulate stage with accumulator, the method comprising:
inserting a first extension instruction into said pipeline, said first instruction adapted to:
(i) subtract a value present in said at least one accumulator from a multiple of a first input value; (ii) preload said at least one accumulator with a second input value; and (iii) write back the result of the aforementioned subtraction operation to a designated register location; providing a plurality of inputs to said at least one multiply-accumulate stage, said plurality of inputs comprising at least said first and second inputs; and executing said first extension instruction to produce at least one output from said at least one multiply-accumulate stage.
- 19. The method of claim 18, wherein said act of inserting comprises inserting an extension instruction adapted to perform FFT butterfly operations.
- 20. The method of claim 19, further comprising pipelining the first extension instruction to the same number of stages as a corresponding multiply-accumulate instruction of said instruction set so as to avoid stalling of the pipeline during execution of said first extension instruction.
- 21. A method of performing an iterative calculation using a configurable, extensible processor having an instruction set, pipeline, and at least one multiply-accumulate stage, said processor synthesized at least in part using the method comprising
(i) providing a first extension instruction in a hardware description language (HDL), said first extension instruction being adapted to utilize existing logic within said processor related to other instructions within said instruction set, (ii) adding said first extension instruction to the design of extended data processor; and (iii) synthesizing the extended processor design including said first extension instruction, said first extension instruction being pipelined to the same number of stages as at least one of said other instructions, the method of performing comprising:
inserting said first extension instruction into said pipeline; providing a plurality of inputs to said at least one multiply-accumulate stage; and executing said first extension instruction to produce at least one output from said at least one multiply-accumulate stage.
- 22. The method of claim 21, wherein said processor comprises at least two multiply accumulate stage channels, the method further comprising swapping the multiply results from said at least two multiply-accumulate stages during a multiply-accumulate operation following said executing of said first extension instruction.
- 23. An integrated circuit device optimized for performing iterative calculations on input data, comprising:
at least one silicon die having a plurality of circuit features formed thereon; and an extended processor having a multi-stage instruction pipeline and an extension instruction set, comprising:
at least one multiply-accumulate stage having at least one accumulator associated therewith; at least one first extension instruction being adapted to perform at least a portion of an iterative calculation on said plurality of data; and at least one second extension instruction adapted to perform a multiply or multiply-accumulate operation using said at least one multiply accumulate stage and accumulator; wherein said at least one first extension instruction is pipelined to the same number of stages as said at least one second extension instruction, thereby avoiding pipeline stalling during processing of said at least one first instruction.
- 24. The integrated circuit of claim 23, wherein said device is adapted for performance of FFT butterfly calculations.
- 25. The integrated circuit of claim 23, wherein said device is optimized for reduced power consumption based at least in part on use of a reduced clock speed.
Parent Case Info
[0001] The present application claims priority to U.S. Provisional Patent Application Serial No. 60/285,456, entitled “Data Processor With Enhanced Instruction Execution and Method” filed Apr. 19, 2001.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60285456 |
Apr 2001 |
US |