1. Field of the Invention
The present invention relates to implementation of a Fast Fourier Transform (FFT) circuit in a real-time system, for example an IEEE 802.11a based Orthogonal Frequency Division Multiplexing (OFDM) receiver.
2. Background Art
The Fast Fourier Transform (FFT) and Inverse Fast Fourier Transform (IFFT) has been frequently applied in modem communication systems due to its efficiency in OFDM systems such as xDSL modems, high definition television (HDTV), and wireless local area networking applications. Examples of wireless local area networking applications include wireless LANs (i.e., wireless infrastructures having fixed access points), mobile ad hoc networks, etc. In particular, the IEEE Standard 802.11a, entitled “Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) specifications: High-speed Physical Layer in the 5 GHz Band”, specifies an OFDM PHY for a wireless LAN with data payload communication capabilities of up to 54 Mbps. The IEEE 802.11a Standard specifies a PHY system that uses fifty-two (52) subcarrier frequencies that are modulated using binary or quadrature phase shift keying (BPSK/QPSK), 16-quadrature amplitude modulation (QAM), or 64-QAM.
A fundamental computational element of the FFT is the “butterfly element”, which in its simplest form (radix-2) transforms two complex values into two other complex values. The butterfly element is used to perform multiple calculations in the different stages of the transform, resulting in synthesis from the time domain to the frequency domain or vice versa.
The substantial number of calculation operations performed by the butterfly element requires highly efficient designs in order to be viable in a real-time system such as wireless LANs. For example, Radix-4 butterfly elements, having four (4) inputs and four (4) outputs, are used to reduce the number of multiplication operations required during FFT processing. The higher radix butterfly element enables a reduction in memory access rate, arithmetic workload, and hence the power consumption. Efficient memory allocation also is an important consideration: in-place computation has been used to reduce memory requirements by overwriting input values (e.g., in the time domain) supplied to the butterfly element with the respective output values (e.g., in the frequency domain) generated by the butterfly element.
The use of a butterfly element, however, requires a substantial number of repeated memory read and write operations for retrieval of input values and storage of output values. Hence, arbitrary techniques for implementing an FFT architecture may result in inefficient use of memory, requiring substantial memory controller resources that increases circuit cost and/or reduces performance of the FFT circuit.
There is a need for an arrangement that enables an FFT circuit to be implemented in a manner that provides minimal latency, optimal memory utilization and power efficiency.
There also is a need for an arrangement that provides optimal utilization of a butterfly element in an FFT circuit, with minimal idle time.
There also is a need for an arrangement that enables a wireless transceiver to perform equalization of a received frequency-modulated signal with minimum equalization error.
These and other needs are attained by the present invention, where an FFT circuit is implemented using a radix-4 butterfly element and a partitioned memory for storage of a prescribed number of data values. The radix-4 butterfly element is configured for performing an FFT operation in a prescribed number of stages, each stage including a prescribed number of in-place computation operations relative to the prescribed number of data values. The partitioned memory includes a first memory portion and a second memory portion, and the data values for the FFT circuit are divided equally for storage in the first and second memory portions in a manner that ensures that each in-place computation operation is based on retrieval of an equal number of data values retrieved from each of the first and second memory portions.
One aspect of the present invention provides a method in a Fast Fourier Transform (FFT) circuit having at least a Radix-4 (or higher) butterfly element. The method includes storing first and second equal portions of a prescribed number of data values in first and second memory portions, respectively, according to a prescribed mapping that ensures the first and second memory portions are accessed for each in-place computation operation. The method also includes executing a prescribed number of FFT stages each having a prescribed number of the in-place computation operations relative to the prescribed number of data values. The executing step includes performing each in-place computation operation by: (1) concurrently accessing an equal number of stored data values from the first memory portion and the second memory portion; and (2) supplying the accessed data values to the at least Radix-4 butterfly element for calculation of respective calculation results.
Another aspect of the present invention provides a Fast Fourier Transform (FFT) circuit. The FFT circuit includes at least a Radix-4 (or a higher Radix) butterfly element configured for generating calculation results in response to receipt of accessed data values, first and second memory portions, and a memory controller. The first and second memory portions are configured for storing first and second equal portions of a prescribed number of data values for in-place computation operations. The memory controller is configured for storing the first and second equal portions of the prescribed number of data values in the first and second memory portions, respectively, according to a prescribed mapping that ensures the first and second memory portions are accessed for each in-place computation operation. The memory controller also is configured for executing a prescribed number of FFT stages, each having a prescribed number of the in-place computation operations relative to the prescribed number of data values, based on: (1) concurrently accessing an equal number of stored data values from the first memory portion and the second memory portion; and (2) supplying the accessed data values to the at least Radix-4 butterfly element for calculation of the respective calculation results.
Additional advantages and novel features of the invention will be set forth in part in the description which follows and in part will become apparent to those skilled in the art upon examination of the following or may be learned by practice of the invention. The advantages of the present invention may be realized and attained by means of instrumentalities and combinations particularly pointed in the appended claims.
Reference is made to the attached drawings, wherein elements having the same reference numeral designations represent like elements throughout and wherein:
The Radix-4 butterfly element 12 is configured for concurrently receiving four inputs (A1, A2, B1, B2) and generating and concurrently outputting four calculation results (A′1, A′2, B′1, B′2), according to known Radix-4 butterfly operations for performing FFT calculations.
The memory portions 16a and 16b are configured for storing equal portions of a prescribed number of data values for in-place computation operations. In particular, assuming a 64-point FFT is to be generated, each memory portion 16a and 16b is configured for storing half of the input points, such that in this case each memory portion stores thirty-two (32) points.
As described below, the memory controller 14 is configured for initially storing the 64-point data values into the memory banks 16a and 16b according to a prescribed mapping that ensures each of the memory banks 16a and 16b are accessed for each in-place computation operation.
As illustrated in
The memory controller 14 maintains this prescribed mapping of data points using in-place computation. Consequently, memory access is optimized by ensuring that both memory banks 16a and 16b are concurrently accessed for each read operation, and that both memory banks 16a and 16b are concurrently accessed for each write operation. Further, memory portions 16a and 16b are configured as dual port memory devices, enabling concurrent read and write operations for the memory banks 16a and 16b (i.e., performed in parallel). Hence, all of the data paths 18a, 18b, 18c, and 18d can be utilized at the same time during a given clock cycle, optimizing memory utilization and minimizing latency.
The memory controller 14 is configured for implementing in-place computations by supplying the four inputs (A1, A2, B1, B2) to the butterfly element 12, and transferring the four outputs (A′1, A′2, B′1, B′2) from the butterfly element 12 to the memory portions 16a and 16b. In particular, the memory controller 14 is configured for retrieving, each clock cycle, a data value (A) from the memory portion (“Bank 2”) 16a and concurrently a data value (B) from the second memory portion (“Bank 1”) 16b via data paths 18a and 18b, respectively. The memory controller 14 also is configured for storing, each clock cycle, a calculation result (A′) to the first memory portion 16a and concurrently a calculation result (B′) to the second memory portion 16b via data paths 18c and 18d, respectively.
For example, the memory controller 14 is configured for concurrently retrieving the stored data values A1 and B1 from the respective memory portions 16a and 16b during clock cycle C1, and retrieving the stored data values A2 and B2 from the respective memory portions 16a and 16b during clock cycle C2; the memory controller 14 buffers the accessed data values A1 and B1 retrieved during the first clock cycle C1, enabling the four inputs A1, A2, B1 and B2 to be supplied in parallel during the clock cycle C2 to the butterfly element 12. The calculation results A′1, A′2, B′1, and B′2 are output in parallel by the butterfly element 12.
As described below, the memory controller 14 completes the in-place computation by outputting the calculation results A′1, A′2, B′1, B′2 to the address locations corresponding to the original inputs A1, A2, B1, B2.
As illustrated in
During event 64 (clock cycle 3), the memory controller 14 concurrently: stores the resulting product B′1 to the location for data point “0” in Bank 116b; stores the resulting product A′1 to the location for data point “16” in Bank 216a; retrieves the data value “17” from Bank 116b for execution of Stage 1, Operation 1 (S1_Op1); and retrieves the data value “1” from Bank 216a for execution of Stage 1, Operation 1 (S1_Op1). The memory controller 14 continues accessing the memory banks 16a and 16b for sequential execution of the Stage 1 operations 30a.
At event 66 (clock cycle 33), the butterfly element executes the last Stage 1 operation (S1_Op15) and outputs the calculation results for data points “15”, “31”, “47”, “63”. During event 66 the memory controller 14 stores the calculation results for data points “15” and “31” in Bank 1 and Bank 2, respectively, and accesses the stored data values “0” and “4” from Bank 1 and Bank 2, respectively, for initiating execution of the Stage 2 operation (S2_Op2) in step 42. The “D” reference in
The butterfly element 12 executes the last Stage 2 operation (S2_Op15) at event 68 (clock cycle 65), and the memory controller 14 concurrently stores the resulting products and retrieves the inputs as described above for initiation of stage 3 operations in step 44.
Note that the sequence of Stage 2 operations also can be selected based on execution of a Stage 3 operation: as illustrated in
Hence, after the memory controller 14 has completed execution in steps 48 and 49 of four (4) Stage 2 operations associated with a Stage 3 operation, e.g., S2_Op0, S2_Op1, S2_Op2, and S2_Op3, (at which point all Stage 1 operations 30a have been completed), then the memory controller 14 can initiate four (4) Stage 3 operations (S3_Op0) in step 50. Assuming in step 53 that more Stage 3 operations need to be executed, the Stage 2 operations 30b can then be completed in groups of 4, followed by execution of the associated Stage 3 operations 30c.
As illustrated in
Although the disclosed embodiment utilizes a Radix-4 butterfly, it will be appreciated that other higher-order (e.g., Radix-8) butterfly elements also may be used with appropriate modification to the memory controller.
We assume an address index a[5:0], where a[0] is the least significant bit, for the data that needs to be accessed during a read or write operation of 64-point FFT. An exclusive OR operation is used to identify the memory bank: if F(a)=XOR(a[4], a[2], a[0])=0, then memory bank 1 is the corresponding memory, and the actual address inside memory bank 1 is a[5:1]; if XOR(a[4], a[2], a[0])=1, then memory bank 2 is the corresponding memory, and the actual address inside memory bank 2 is a[5:1]. The actual address in the selected memory is obtained from the first five (5) bits of the address without memory partition. Hence, the address values A11, A12, and would have the following mappings:
An alternative implementation of the memory controller 14 would be to use a look-up table to specify which memory each data value belongs to and its associated memory index (i.e., memory address within the memory bank).
While this invention has been described with what is presently considered to be the most practical preferred embodiment, it is to be understood that the invention is not limited to the disclosed embodiments, but, on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.