This application claims the priority benefit of Taiwan application serial no. 97151902, filed on Dec. 31, 2008. The entirety of the above-mentioned patent application is hereby incorporated by reference herein and made a part of this specification.
1. Field of the Invention
The present invention generally relates to a data processing architecture of Fast Fourier Transform (FFT), and more particularly, to an FFT processor.
2. Description of Related Art
FFT has been broadly used in many fields, which include digital signal processing, image processing and communication system. The FFT technology could be used in designing a hardware circuit architecture of an FFT processor with high processing speed and high throughput. A high speed FFT processor plays a critical role in the fields relating digital signal processing, for example, in an OFDM (orthogonal frequency-division multiplexing) communication system. One major challenge to be overcome for designing an FFT processor includes how to reach a good system transmission efficiency with high throughput and the implementation feasibility by using low cost CMOSs (complementary metal-oxide semiconductors) to build an FFT processor.
U.S. Pat. No. 4,534,009 discloses “Multi-Pipelined FFT Processor”. The pipelined FFT processor is able to perform operation processing on continuously input signals in high efficiency to complete FFT calculations. The processing element used in the circuit architecture is based on a radix-2 butterfly unit (radix-2 BU).
In 1984, E. E. Swartzlander, JR, et al published a paper “A Radix 4 Delay Commutator for Fast Fourier Transform Processor Implementation” (IEEE J. Solid-State Circuits, Vol. SC-19, No. 5, October 1984). The processing element of the processor herein is based on a plurality of radix-4 butterfly units (radix-4 BUs), and all the radix-4 BUs and all the BUs are in series connection. The processor herein is accordingly termed as a radix-4 MDC FFT processor. By using the scheme, an FFT processor for Y-points operations requires a memory capacity of (2.5Y-4).
US Patent Application Publication No. 2002/0083107A1 discloses “Fast Fourier Transform Processor Using High Speed Area-Efficient Algorithm”. The processor herein can be seen as a modified architecture of radix-4 processing element, wherein the processor has two different types of processing element: one radix-4 BU and two radix-2 BUs. By interactively connecting in series the two types of processing elements, the above-mentioned processing elements build an FFT processor. Accordingly, the processor is termed as a radix-4/2 MDC FFT processor. Same as the above-mentioned radix-4 MDC FFT processor, an FFT processor for Y-points operations requires a memory capacity of (2.5Y-4).
Accordingly, the present invention is directed to an FFT processor. The provided FFT processor includes a first multi-pipelined MDC unit, a second multi-pipelined MDC unit and a switching network. The first multi-pipelined MDC unit performs in parallel way M radix-2N first butterfly operations so as to output a plurality of first operation results, wherein M and N are integers greater than 1. By changing the delayer positions in the first multi-pipelined MDC unit, the time sequence of the outputs is changed. The switching network is coupled to the first multi-pipelined MDC unit for changing the above-mentioned relative positions of the first operation results. The second multi-pipelined MDC unit is coupled to the switching network and uses the first operation results with changed relative positions to perform in parallel way M radix-2N second butterfly operations so as to output a plurality of second operation results.
The accompanying drawings are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification. The drawings illustrate embodiments of the invention and, together with the description, serve to explain the principles of the invention.
Reference will now be made in detail to the present preferred embodiments of the invention, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers are used in the drawings and the description to refer to the same or like parts.
In the following, the FFT operations are, for example, used for 4096-points to be processed. To accomplish the FFT operations of a given number of operation points, the conventional MDCs, due to the inherent low efficiency thereof, a memory size more than the number of operation points is needed. For example, a conventional radix-2 MDC for processing 4096-points needs a memory size of 6142 words; or a conventional radix-4 MDC for processing 4096-points needs a memory size of 10236 words. However, by using a processing element formed by the following novel MDCs of the embodiments for processing 4096-points, only a memory size of 4096 words is needed, which largely reduces the required memory size, lowers the number of accessing the memory and accordingly effectively reduces the power consumption. In comparison with the conventional MDC circuit, the following embodiments can largely lower the number of accessing the memory, reduce the required memory size and easily implement a processor with a less power consumption, a smaller circuit area and a high throughput. In particular, the throughput of the processor can be easily increased by adding the processing element.
Referring to
The switching network 600 is coupled between the first multi-pipelined MDC unit 500 and the second multi-pipelined MDC unit 700. The switching network 600 can change the relative positions of the first operation results, following by sending the first operation results with changed positions to the second multi-pipelined MDC unit 700. In other words, the switching network 600 is able to change the routing relationship between the first multi-pipelined MDC unit 500 and the second multi-pipelined MDC unit 700. The second multi-pipelined MDC unit 700 uses the first operation results with changed relative positions to perform in parallel way M radix-2N second butterfly operations so as to output a plurality of second operation results. There is no need to use a memory to save/read the operation data between the first multi-pipelined MDC unit 500 and the second multi-pipelined MDC unit 700. By changing the delayer positions in the second multi-pipelined MDC unit 700, the time sequence of signals is changed to accomplish the butterfly operations.
The above-mentioned first multi-pipelined MDC unit 500 can include M MDCs 510-1 until 510-M, wherein each MDC respectively has two input terminals and two output terminals. In
The above-mentioned second multi-pipelined MDC unit 700 can include M MDCs 710-1 until 710-M, wherein each MDC respectively has two input terminals and two output terminals. In
Anyone skilled in the art can determine the above-mentioned N value according to the design requirement. In the following, the depiction is aimed at the situation of, for example, N=3. That is, in the following embodiment, the MDCs 510-1 until 510-M and the MDCs 710-1 until 710-M in
The first switch 421 has a first terminal, a second terminal, a third terminal and a fourth terminal, wherein the first terminal and the second terminal are respectively coupled to the first output terminal of the first butterfly operator 411 and the output terminal of the first delayer 431. The first switch 421 can respectively electrically connect the first terminal and the second terminal thereof to the third terminal and the fourth terminal thereof, or to the fourth terminal and the third terminal thereof. Similarly, the second switch 422 can respectively electrically connect the first terminal and the second terminal thereof to the third terminal and the fourth terminal thereof, or to the fourth terminal and the third terminal thereof.
The input terminal of the second delayer 432 is coupled to the third terminal of the first switch 421 and the second delayer 432 delays the received data by two time slots, following by outputting the delayed data from the output terminal thereof. The first input terminal of the second butterfly operator 412 is coupled to the output terminal of the second delayer 432 and the second input terminal of the second butterfly operator 412 is coupled to the fourth terminal of the first switch 421. The input terminal of the third delayer 441 is coupled to the second output terminal of the second butterfly operator 412 and the third delayer 441 delays the received data by a time slot, following by outputting the delayed data from the output terminal thereof. The first terminal and the second terminal of the second switch 422 are respectively coupled to the first output terminal of the second butterfly operator 412 and the output terminal of the third delayer 441. The input terminal of the fourth delayer 442 is coupled to the third terminal of the second switch 422 and the fourth delayer 442 delays the received data by a time slot, following by outputting the delayed data from the output terminal thereof. The first input terminal of the third butterfly operator 413 is coupled to the output terminal of the fourth delayer 442, and the second input terminal of the third butterfly operator 413 is coupled to the fourth terminal of the second switch 422. The first output terminal and the second output terminal of the third butterfly operator 413 respectively serve as the first output terminal and the second output terminal of the MDC 401.
The operation result of the MDC 401 must follow the algorithm of the butterfly network. Since the inputs and the outputs of the MDC 401 herein are respectively two data, to accomplish the radix-8 butterfly operation as shown by
Table 1 lists the timing relationship of the nodes A-N in
In Table 1, ‘=’ means the first terminal of the switch 411 (or 422) is electrically connected to the third terminal and the second terminal is electrically connected to the fourth terminal; ‘X’ means the first terminal of the switch 411 (or 422) is electrically connected to the fourth terminal and the second terminal is electrically connected to the third terminal. It can be seen from Table 1 that the MDC 401 of
The embodiment is able to obtain various novel MDCs by changing the positions of the delayers in a conventional pipelined MDC 401 so as to change the sequence of outputting the signals. For example,
Referring
The first terminal and the second terminal of the first switch 421 are respectively coupled to the first output terminal of the first butterfly operator 411 and the output terminal of the first delayer 431. The input terminal of the second delayer 432 is coupled to the third terminal of the first switch 421 and the second delayer 432 delays the received data by two time slots, following by outputting the delayed data from the output terminal thereof. The first input terminal of the second butterfly operator 412 is coupled to the output terminal of the second delayer 432 and the second input terminal of the second butterfly operator 412 is coupled to the fourth terminal of the first switch 421. The input terminal of the third delayer 441 is coupled to the first output terminal of the second butterfly operator 412 and the third delayer 441 delays the received data by a time slot, following by outputting the delayed data from the output terminal thereof. The first terminal and the second terminal of the second switch 422 are respectively coupled to the output terminal of the third delayer 441 and the second output terminal of the second butterfly operator 412. Anyone skilled in the art can use any architecture to implement the switches 421-422; for example, by using the above-mentioned switch 220 as shown by
The input terminal of the fourth delayer 442 is coupled to the fourth terminal of the second switch 422 and the fourth delayer 442 delays the received data by a time slot, following by outputting the delayed data from the output terminal thereof. The first input terminal and the second input terminal of the third butterfly operator 413 are respectively coupled to the third terminal of the second switch 422 and the output terminal of the fourth delayer 442. The first output terminal and the second output terminal of the third butterfly operator 413 respectively serve as the second output terminal and the first output terminal of the MDC 402.
Table 2 lists the timing relationship of the nodes A-N in
It can be seen from Table 2 that the MDC 402 of
Referring
The first terminal and the second terminal of the first switch 421 are respectively coupled to the output terminal of the first delayer 431 and the second output terminal of the first butterfly operator 411. The input terminal of the second delayer 432 is coupled to the fourth terminal of the first switch 421 and the second delayer 432 delays the received data by two time slots, following by outputting the delayed data from the output terminal thereof. The first input terminal of the second butterfly operator 412 is coupled to the third terminal of the first switch 421 and the second input terminal of the second butterfly operator 412 is coupled to the output terminal of the second delayer 432. The input terminal of the third delayer 441 is coupled to the first output terminal of the second butterfly operator 412 and the third delayer 441 delays the received data by a time slot, following by outputting the delayed data from the output terminal thereof.
The first terminal and the second terminal of the second switch 422 are respectively coupled to the output terminal of the third delayer 441 and the second output terminal of the second butterfly operator 412. The input terminal of the fourth delayer 442 is coupled to the fourth terminal of the second switch 422 and the fourth delayer 442 delays the received data by a time slot, following by outputting the delayed data from the output terminal thereof. The first input terminal of the third butterfly operator 413 is coupled to the third terminal of the second switch 422 and the second input terminal of the third butterfly operator 413 is coupled to the output terminal of the fourth delayer 442. The first output terminal and the second output terminal of the third butterfly operator 413 respectively serve as the second output terminal and the first output terminal of the MDC 403.
Table 3 lists the timing relationship of the nodes A-N in
It can be seen from Table 3 that the MDC 403 of
Referring
The first input terminal of the second butterfly operator 412 is coupled to the third terminal of the first switch 421 and the second input terminal of the second butterfly operator 412 is coupled to the output terminal of the second delayer 432. The input terminal of the third delayer 441 is coupled to the second output terminal of the second butterfly operator 412. The first terminal and the second terminal of the second switch 422 are respectively coupled to the first output terminal of the second butterfly operator 412 and the output terminal of the third delayer 441. The input terminal of the fourth delayer 442 is coupled to the third terminal of the second switch 422.
The first input terminal of the third butterfly operator 413 is coupled to the output terminal of the fourth switch 442 and the second input terminal of the third butterfly operator 413 is coupled to the fourth terminal of the second switch 422. The first output terminal and the second output terminal of the third butterfly operator 413 respectively serve as the first output terminal and the second output terminal of the MDC 404.
Table 4 lists the timing relationship of the nodes A-N in
It can be seen from Table 4 that the MDC 404 of
Referring
The input terminal of the first delayer 431 is coupled to the second output terminal of the first butterfly operator 411. The first terminal and the second terminal of the first switch 421 are respectively coupled to the first output terminal of the first butterfly operator 411 and the output terminal of the first delayer 431. The input terminal of the second delayer 432 is coupled to the third terminal of the first switch 421. The first input terminal and the second input terminal of the second butterfly operator 412 are respectively coupled to the output terminal of the second delayer 432 and the fourth terminal of the first switch 421. The input terminal of the third delayer 441 is coupled to second output terminal of the second butterfly operator 412. The first terminal and the second terminal of the second switch 422 are respectively coupled to the first output terminal of the second butterfly operator 412 and the output terminal of the third delayer 441. The input terminal of the fourth delayer 442 is coupled to the third terminal of the second switch 422. The first input terminal and the second input terminal of the third butterfly operator 413 are respectively coupled to the output terminal of the fourth delayer 442 and the fourth terminal of the second switch 422.
Table 5 lists the timing relationship of the nodes A-N in
It can be seen from Table 2 that the MDC 405 of
Referring
The input terminal of the first delayer 431 is coupled to the second output terminal of the first butterfly operator 411. The first terminal and the second terminal of the first switch 421 are respectively coupled to the first output terminal of the first butterfly operator 411 and the output terminal of the first delayer 431. The input terminal of the second delayer 432 is coupled to the third terminal of the first switch 421. The first input terminal and the second input terminal of the second butterfly operator 412 are respectively coupled to the output terminal of the second delayer 432 and the fourth terminal of the first switch 421.
The input terminal of the third delayer 441 is coupled to the first output terminal of the second butterfly operator 412. The first terminal and the second terminal of the second switch 422 are respectively coupled to the output terminal of the third delayer 441 and the second output terminal of the second butterfly operator 412. The input terminal of the fourth delayer 442 is coupled to the fourth terminal of the second switch 422. The first input terminal and the second input terminal of the third butterfly operator 413 are respectively coupled to the third terminal of the second switch 422 and the output terminal of the fourth delayer 442.
Table 6 lists the timing relationship of the nodes A-N in
It can be seen from Table 6 that the MDC 406 of
By using the above-mentioned novel MDCs as the first multi-pipelined MDC unit 500 and the second multi-pipelined MDC unit 700, there is no need to use a memory for accessing data between the operation circuits, which is advantageous not only in reducing the memory size, but also in reducing the power consumption of the memory. The N value, as described above, can be determined by the designer; the M value can be determined by anyone skilled in the art according to the design requirement as well. In the following, a case of M=8 and N=3 is exemplarily explained. That is, the first multi-pipelined MDC unit 500 and the second multi-pipelined MDC unit 700 are assumed to perform in parallel way eight radix-23 butterfly operations to accomplish a 64-points FFT operation.
At a third time slot, the switching network 600 changes the internal linking statuses thereof once more. As shown by
Since 4096 is the second power of 64, so that 64-points operation units can build a 4096-points FFT processor. In the embodiment, the 64-points operation unit (for example, M=8, as shown by
Table 7 lists the data timing relationship of the first multi-pipelined MDC unit 500 in a 64-points operation unit of the embodiment.
In Table 7, except ‘time slot’ row, the other figures, such as ‘1’, ‘2’, ‘3’, . . . , ‘64’ represent the relative position of the data in a 64-points FFT operation (64-points butterfly network). For example, ‘13’ in Table 7 represents the data of the thirteenth point in the 64-points FFT operation. Besides, any two same numbers at different time slots in Table 7 do not mean they have the same values of data.
Referring to
It should be noted that the above-mentioned 64-points FFT operation circuit comprising the MDC circuits and the switching network is not an exclusive solution. Taking a radix-23 MDC as an example, there are eight modified architectures in total depending on the different positions of the delayers and the different positions of the output terminals, while the above-mentioned embodiments provide six architectures only, which means there is room for a designer to select MDCs and the corresponding switching networks to build different processing element circuits from the given ones according to the preference and different signal sequences. Similarly, there are other circuit architectures of a processing element in response to different N and different number of points, which is omitted to describe for simplicity.
In comparison with the conventional MDC processors, the invented processor can reduce the number of accessing the memory, effectively reduce the power consumption and largely reduce the required memory size, for example, a Y-points operation requires a memory size of Y only. In addition, the signals between the first multi-pipelined MDC unit 500 and the second multi-pipelined MDC unit 700 are communicated by means of the methodology of ‘inherent cache’ instead of using a memory for accessing data.
In order to increase the throughput of the invented FFT processor, only some processing elements need to be added, for example, as shown by
A 4096-pointe FFT processor can be fabricated by using the 90 nm CMOS (complementary metal-oxide semiconductor) process to combine two processing elements into a processor. In this way, the throughput of the circuit at the operation frequency of 500 MHz can reach 8 Giga-samples per second; in association with different modulations, the maximum data transmission rate reaches 28 Giga-bits per second. When the operation voltage is 1 V, the power consumption is nearly 1 W. Table 8 lists the relevant simulation parameters of the circuit.
In comparison with the prior art, the invented FFT processors are advantageous not only in high throughput and high usage efficiency (100%), but also in largely reducing the required memory size. For an invented FFT processor capable of accomplishing Y-points operation, only a memory size of Y is needed as described above, which reduces the circuit area, lowers the number of accessing the memory and further effectively reduces the power consumption.
In summary, the above-mentioned embodiments use multi-pipelined MDC units and a switching network to implement an FFT processor, wherein the core of each processing element is various novel MDCs. In the above-mentioned embodiments, one of the various MDC architectures in association with an rearrangement of the operation time sequence of the signals in parallel processing builds a multi-pipelined processing element, which is advantageous not only in high usage efficiency and smaller area of an processing element, but also in lowering the number of accessing the memory between the processing elements, reducing the required memory size, reducing the power consumption and largely reducing the circuit area required by the memory. Since the FFT processor provided by the above-mentioned embodiments can be fabricated by using a low-cost CMOS process, the present invention has more advantages: further reducing the power consumption, solving the problems of heat dissipation and battery lifetime and compacting the circuit area. In short, the provided technique benefits for developing a handheld electronic product.
It will be apparent to those skilled in the art that various modifications and variations can be made to the structure of the present invention without departing from the scope or spirit of the invention. In view of the foregoing, it is intended that the present invention covers modifications and variations of this invention provided they fall within the scope of the following claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
97151902 | Dec 2008 | TW | national |