The present invention relates to digital signal processing and computation of discrete Fourier transform. More specifically, it relates to high speed and/or low power designs of fast fourier transform (FFT) circuits based on radix-2n algorithms.
Fast Fourier Transform (FFT) is one of the most important algorithms in the field of digital signal processing, used to efficiently compute discrete fourier transform. Pipelined hardware FFT designs play an important role in real-time applications. In biomedical applications, the power spectral density (PSD) of various signals such as electrocardiography (ECG) or electroencephalography (EEG) need to be estimated. Further, FFT is a key element in Orthogonal Frequency Division Multiplexing (OFDM) based communication technologies such as Wireless LAN, WiMAX, ADSL, VDSL, DVB-T.
Apart from high-speed of operation, these applications demand low power consumption since it is primarily aimed at portable and mobile applications. The most computationally intensive parts of such systems are the fast Fourier transform (FFT). FFT operation has been proven to be both computationally intensive, in terms of arithmetic operations and communicational intensive, in terms of data swapping in the storage. Therefore, efficient implementation of these FFT circuits is very important for successful low power applications.
As will be understood by persons skilled in the relevant arts, FFT circuits are designed, for example, using pipelining and parallelism techniques. These known techniques have enabled engineers to build spectral processing systems and wireless communication systems, using available technologies, which operate at data rates in excess of 1 Gb/s. These known techniques, however, cannot always be applied successfully to the design of low-power and/or high speed systems. Applying these techniques is particularly difficult when dealing with FFT circuits.
The use of pipelining and parallelism techniques, for example, for FFT circuits is known. However, there are several approaches that can be used in applying parallelism technique in the context of FFT circuit, for example, the FFT circuit in a communication transceiver. Many of these approaches may improve the performance of the digital circuit to which they are applied, but degrade the circuit performance in terms of power consumption.
There is a current need for new design techniques and digital logic circuits that can be used to build high-speed digital communication systems and low-power spectral processing systems. In particular, new design methodology and an implementation method are needed which can reduce the overall power consumption and hardware cost of implementing these FFT circuits.
Digital circuits and methods for designing digital circuits that determine output values based on plurality of input values are provided. As described herein, the present invention can be used in a wide range of applications. The invention is suited for low-power biomedical monitoring systems and high-speed communication systems, although the invention is not limited to just these systems.
The key ideas of the proposed design are the parallel FFT circuits which can process consecutive samples, with continuous usage of hardware elements. The present invention proposes a new method to design FFT circuits and also describes low-power implementation method for the proposed low complexity FFT circuits. Digital circuits are designed in accordance with an embodiment of the invention as follows. A number of samples (L) of an input stream to be processed in parallel by a digital circuit is needed, where L is a power of 2 (i.e., L=2k, k is a positive integer). A clocking rate (C) is selected for the digital circuit which consumes power (P). An initial circuit capable of serially processing the samples of the input stream with power consumption P is formed which computes an N-point FFT. N is a whole number greater than zero, in general is a power of two. The data flow graph of N-point FFT which can process N samples in parallel is designed. The data flow graph is retimed and/or pipelined to achieve the folding factor L. The data flow graph is folded by a factor of L to form L parallel circuit processing the input samples.
In accordance with the present invention, the overall hardware cost reduction in FFT circuits is achieved by using the proposed design. Applying the folding technique (See, e.g., M. Ayinala, M. Brown and K. K. Parhi, “Pipelined Parallel FFT Architectures via Folding Transformation,” in IEEE Trans. VLSI Systems, 2011), FFT circuits are designed with reduced hardware cost.
In an embodiment, the data flow graph is folded to form at least two parallel processing circuits that are interconnected.
In an embodiment, the digital logic circuit according to the invention forms a part of transmitter and receiver circuits in an OFDM system. The invention can be used in Wireless LAN devices.
In an embodiment, the digital logic circuit according to the invention forms a spectral power computation unit. The invention can be used in biomedical monitoring devices.
Further embodiments, features, and advantages of the present invention, as well as the structure and operation of the various embodiments of the present invention are described in detail below with reference to accompanying drawings.
The present invention is described with reference to the accompanying figures. The accompanying figure, which are incorporated herein, form part of the specification, illustrate the present invention and, together with the description, further serve to explain the principles of the invention and to enable a person skilled in the relevant art to make and use the invention.
Table 1 lists the performance comparison for different designs in terms of hardware complexity.
Fast Fourier Transform (FFT) is widely used in the field of digital signal processing (DSP) such as filtering, spectral analysis etc., to compute the discrete Fourier transform (DFT). FFT plays a critical role in modern digital communications such as digital video broadcasting and orthogonal frequency division multiplexing (OFDM) systems. Various algorithms have been developed to reduce the computational complexity, of which Cooley-Tukey radix-2 FFT is very popular.
Algorithms including radix-4, split-radix, radix-22 have been developed based on the basic radix-2 FFT approach. The architectures based on these algorithms are some of the traditional FFT circuits. Radix-2 Multi-path delay commutator (R2MDC) is one of the most classical approaches for pipelined implementation of radix-2 FFT is shown in
Many FFT circuits have been proposed based on these traditional algorithms which can process L samples in parallel. In one of the previous inventions, a 2-parallel FFT circuit was proposed (See, Jaiganesh Balakrishnan, and Manish Goel, “Methods and Systems for a Multichannel Fast Fourier Transform (FFT)”, U.S. Pat. No. 7,827,225 B2, November 2010). This circuit process samples from two different channels instead of from the same channel. Further, main drawback of prior circuits is that these are not fully utilized which leads to high hardware complexity. In a direct realization of 2-parallel circuit for the one shown in
Thus, a new method is needed to design the parallel FFT circuits to reduce the hardware complexity and power consumption. The proposed designs process L-consecutive samples in parallel, where L is a power of 2. Further, the hardware elements of the circuit are utilized 100% of the time.
As will be understood by persons skilled in relevant arts, folding transformation can be used to design parallel circuits. Consider a traditional radix-2 algorithm which is shown in the
In this invention, parallel FFT circuits for complex valued signals based on radix-2, radix-22 and radix-23 algorithms. The same approach can be extended to radix-24 and other radices as well. The switch block is as shown in
The 2-parallel FFT circuits are composed of radix-2 butterfly engines connected in cascade. Each butterfly engine processes two samples and computes two output samples, and contains a butterfly computation unit as shown in
Similarly,
The utilization of hardware components in the circuit shown in
A={A0, A2, A4, A6, A1, A3, A5, A7},
B={B5, B7, B0, B2, B4, B6, B1, B3},
C={C3, C5, C7, C0, C2, C4, C6, C1},
D={D2, D4, D6, D1, D3, D5, D7, D0} (1)
The folded circuit is derived by writing the folding equation for all the edges. Pipelining and retiming are required to get non-negative delays in the folded circuit. The data flow graph in
The hardware utilization is 100% in this circuit. In a general case of N-point FFT, with N power of 2, the architecture requires log2 (N) complex butterflies, log2 (N)−1 complex multipliers and 3N/2−2 delay elements or buffers.
In a similar manner, the 2-parallel architecture can be derived for radix-2 DIT FFT using the following folding sets. Assume that multiplier is at the bottom input of the nodes B, C, D.
A={A0, A2, A1, A3, A4, A6, A5, A7},
B={B5, B7, B0, B2, B1, B3, B4, B6},
C={C6, C5, C7, C0, C2, C1, C3, C4},
D={D2, D1, D3, D4, D6, D5, D7, D0}
The pipelined/retimed version of the data flow graph is shown in
A 4-parallel architecture can be derived using the following folding sets.
A={A0, A1, A2, A3} A′={A′0, A′1, A′2, A′3},
B={B1, B3, B0, B2} B′={B′1, B′3, B′0, B′2},
C={C2, C1, C3, C0} C′={C′2, C′1, C′3, C′0},
D={D3, D0, D2, D1} D′={D′3, D′0, D′2, D′1}
The data flow graph shown in
The flow graph of the radix-22 FFT algorithm is shown in
Consider the folding sets
A={A0, A2, A4, A6, A1, A3, A5, A7},
B={B5, B7, B0, B2, B4, B6, B1, B3},
C={C3, C5, C7, C0, C2, C4, C6, C1},
D={D2, D4, D6, D1, D3, D5, D7, D0} (2)
Using the folding sets above, the final circuit shown in
Similar to 4-parallel radix-2 circuit, we can derive 4-parallel radix-22 circuit using the similar folding sets. The 4-parallel radix-22 circuit is shown in
The hardware complexity in the parallel architectures can be further reduced by using radix-2n FFT algorithms. We consider the example of a 64-point radix-23 FFT algorithm. The advantage of radix-23 over radix-2 algorithm is its multiplicative complexity reduction. A 2-parallel circuit is derived using folding sets in (2). Here the data flow graph contains 32 nodes instead of 8 in 16-point FFT.
The proposed circuit is shown in
A 4-parallel radix-23 circuit can be derived similar to the 4-parallel radix-2 FFT circuit. A large number of architectures can be derived using the proposed approach. Using the folding sets of same pattern, 2-parallel and 4-parallel architectures can be derived for radix-22 and radix-24 algorithms. Other embodiments not shown here can be derived by a person skilled in the relevant art by using the main ideas of this invention.
It is mentioned that the proposed design is general and can be applied to any FFT size. It should be noted that the design architecture provided here are few implementations of the proposed FFT circuits using radix-2, radix-22 and radix-23 algorithms. Other circuits for large FFT sizes (N>16) not shown here can be derived by a person skilled in the relevant art.
Next, the hardware complexity analysis is presented to demonstrate the complexity reduction of the proposed FFT circuits. Further, another analysis is presented to evaluate the performance of the circuit in terms of throughput and power consumption of the proposed FFT circuits.
To evaluate the hardware cost, the comparison is made in terms of required number of complex multipliers, adders, delay elements and twiddle factors and throughput. Table 1 shows hardware complexity comparison between the prior inventions and the proposed ones for the case of computing an N-point FFT circuits.
The proposed circuits are all feed-forward which can process 2 samples in parallel, thereby achieving a higher performance than traditional designs which are serial in nature. When compared to some prior inventions, the proposed design doubles the throughput and halves the latency while maintaining the same hardware complexity.
Next, comparison is made between the power consumption of the serial circuit similar to the one shown in
Pser=CserV2fser, (3)
where Cser denotes the total capacitance of the serial circuit, V is the supply voltage and fser is the clock frequency of the circuit. Let Pser denotes the power consumption of the serial architecture.
In an L-parallel system, to maintain the same sample rate, the clock frequency must be decreased to fser/L. The power consumption in the L-parallel system can be calculated as
where Cpar is the total capacitance of the L-parallel system.
For example, consider the proposed architecture in
Therefore, the power consumption in a 2-parallel architecture has been reduced by 37% compared to the serial architecture.
Similarly, for the proposed 4-parallel architecture in
Various embodiments of the present invention have been described above, which are independent of the size of the FFT and/or the parallelism level. These various embodiments can be implemented in communication transceivers and spectral processing systems. These various embodiments can also be implemented in systems other than communication systems. It should be understood that these embodiments have been presented by way of example only, and not limitation.
It will be understood by those skilled in the relevant art that various changes in form and details of the embodiments described may be made without departing from the spirit and scope of the present invention as defined in the claims. Thus, the breadth and scope of present invention should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.
This application claims the benefit of U.S. Provisional Application No. 61/401,552, filed on Aug. 16, 2010, the entire content of which is incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
61401552 | Aug 2010 | US |