The present invention relates to a fast Fourier transform circuit for executing Fourier transform at high speed.
In a fast Fourier transform on the basis of the Cooley-Tukey algorithm (hereinafter, simply called a fast Fourier transform), generally a computation is executed by means of using a 2-point DFT (Discrete Fourier Transform) as a component (refer to PTL1 to PTL3). In the meantime, for still faster processing, a computation can be executed by using a 4-point DFT as a component (a fast Fourier transform applying radix-4). In this regard, for a fast Fourier transform applying radix-2, the number of data “N” needs to be an exponent of 2; and meanwhile for a fast Fourier transform applying radix-4, the number of data “N” needs to be an exponent of 4.
In order to make processing faster, a combination of a 2-point DFT and a 4-point DFT can be used for processing in the case of the number of data “N” being a power of 2.
PTL1: JP2006-155487A
PTL2: JP2007-148623A
PTL3: JP2004-516551A
In the case where complex data stored in a memory is processed in a fast Fourier transform, it is necessary in a fast Fourier transform applying radix-4 that, for one 4-point DFT, four pieces of complex data are read out of the memory, and then four pieces of complex data output as a result of a computation are written in the memory.
In a commonly-used fast Fourier transform, address intervals of reading and writing in relation to the memory change in the computation of each stage, and therefore it is impossible to read and write four pieces of data at the same time. Accordingly, as the number of cycles needed for executing one 4-point DFT, four cycles are needed if a single port memory is used.
Even in the case of a single port memory, the memory may be segmented into four portions in order to enable reading and writing four pieces of data at the same time. However, at the time of making an LSI in that case, greater the number of segments is, wider a required area becomes because of a test circuit to be added to a memory macro, and furthermore, greater a degree of difficulty in arrangement work becomes. As a result, the rate of mounting does not become increased, and that leads to a wider area of the memory chip.
It is an object of the present invention to give a solution for such inconvenience, and to provide a fast Fourier transform circuit that enables high-speed reading and writing of data to be processed in a computation of each stage of a fast Fourier transform, without segmenting a memory.
A fast Fourier transform circuit according to the present invention includes: a computation unit for executing fast Fourier computations with a plurality of Discrete Fourier Transformations as components; memories for storing input/output data of the computation unit; and control means for controlling writing a computation result produced by the computation unit in the memories in such a way that sequential order of reading data from the memories becomes the same for each stage with respect to computations for a plurality of stages, which the computation unit executes for target data.
According to the present invention, data to be processed in a computation of each stage of a fast Fourier transform can be read out and written in at high speed, without segmenting a memory, so that high-speed processing can be implemented while controlling an increase of an area of an LSI.
Preferred embodiments according to the present invention are explained below with reference to the accompanied drawings.
The fast Fourier transform circuit includes a 4-point DFT computation unit 1, memories 2A and 2B, a read buffer 3, a write buffer 4, a selector 5, a twiddle factor generation unit 6, and a control unit 7.
The 4-point DFT computation unit 1 executes a fast Fourier computation using a Discrete Fourier Transform applying a radix B=4 as a component.
The memories 2A and 2B store each of input/output data and intermediate values from a computation of a stage of P=logB N executed by the 4-point DFT computation unit 1 with respect to data for which the number of data “N” is an exponent of a radix “B.” The memories 2A and 2B have a capacity for storing “N” pieces of complex data, and one address is able to store four pieces of complex data.
Having a configuration of “B pieces of complex data×(N/B)×2 banks”, the read buffer 3 stores data read out of the memories 2A and 2B, and outputs each data for radix “B” to the 4-point DFT computation unit 1. The read buffer 3 has a configuration including 2 banks for “N” pieces of complex data. Therefore, while one bank reads data out of the memories 2A and 2B, the other bank can output data to the 4-point DFT computation unit 1.
Having a configuration of “B pieces of complex data×(N/B)×2 banks”, the write buffer 4 stores a computation result of each stage, which is obtained by the 4-point DFT computation unit 1, and writes it in the memories 2A and 2B. The write buffer 4 has a configuration including 2 banks for “N” pieces of complex data. Therefore, while one bank writes a computation result in the memories 2A and 2B, the other bank can receive a computation result from the 4-point DFT computation unit 1.
The selector 5 selects a read source and a write destination in the memories 2A and 2B. Namely, the selector 5 controls in such a way that one of the memories 2A and 2B works as a read side and the other of them works as a write side. Then, there is no chance that either of the memories 2A and 2B has reading operation and writing operation together at the same time.
The twiddle factor generation unit 6 generates a twiddle factor by which an output from the 4-point DFT computation unit 1 is multiplied. Since the “twiddle factor” is well known in the field of the fast Fourier transform, an explanation of it is omitted here. In an example that
Generating read addresses as well as write addresses in the memories 2A and 2B, and also generating addresses for the read buffer 3 as well as the write buffer 4, the control unit 7 controls writing a computation result produced by the computation unit in the memories in such a way that sequential order of reading data from the memories becomes the same at each stage with respect to computations for a plurality of stages, which the computation unit executes for target data. Furthermore, the control unit 7 controls generating a twiddle factor by the twiddle factor generation unit 6; and also regulates and controls the number of computations by the 4-point DFT computation unit as well as the processing stages.
Besides a memory A10 and a memory B11, additionally an input memory and an output memory may individually be provided. In such a case, the selector 12 has a configuration that makes it possible to access these memories.
(Explanation of Fast Fourier Transform)
A fast Fourier transform on the basis of the Cooley-Tukey algorithm can be broken into a group of DFT computations of P=logB N times; where the number of data is expressed as “N” (the “N” is a power of a radix “B”). A group of DFT computations at a time is referred to as a stage; and then stages are referred to as a first stage, a second stage, . . . , a P-th stage, starting from the input side. In the case of a radix B=4, and the number of data N=16, the number of stages P=2, and four 4-point DFT computations are executed at each stage.
Known is a fact that a fast Fourier transform on the basis of the Cooley-Tukey algorithm can have two types of configurations; namely, a decimation-in-time type and a decimation-in-frequency type; depending on a way of breaking down into a 2-point DFT and a 4-point DFT. On this occasion, a fast Fourier transform having a configuration of a decimation-in-frequency type with radix-4 is explained as an example.
{Math. 1}
X′(0)=x′(0)+x′(1)+x′(2)+x′(3)
X′(1)=x′(0)−j x′(1)−x′(2)+jx′(3)
X′(2)=x′(0)−x′(1)+x′(2)−x′(3)
X′(3)=x′(0)+jx′(1)−x′(2)−jx′(3) (1)
The fast Fourier transform explained with reference to
(Read Addresses as Well as Write Addresses in the Memories)
In a fast Fourier transform of a decimation-in-frequency type, output data of a DFT computation at an S-th stage (wherein S=1, 2, . . . , P) is stored in such a manner as described below, in general. It is assumed that an input data series x(n) is stored, starting from a least significant address side in the memory, so as to be arranged in ascending order along a time axis. On this occasion, an explanation is made on the premise that a following write address WA (S, k, m) and a read address RA (S, k, m) are addresses in which each complex data is stored. The “k” represents sequential order of 4-point DFT computations, under the condition of k=0 through (N/4−1).
The write address WA (S, k, m) is calculated by using the following expression. On this occasion, owing to the condition of m=0, 1, 2, and 3, and 4-point DFT, four pieces of complex data are output per one DFT computation.
The “(a mod b)” represents taking a remainder after dividing “a” by “b.” An example of N=16 is as shown in
By using these addresses as calculated, the addresses stored in the memory after completion of the final stage (P-th stage) are taken out by means of bit reversing. Accordingly, data after transformation is sorted in ascending order.
In the embodiment shown in
Specifically to describe, the control unit 1 specifies the write address WA (S, k, m); under the condition of m=0, 1, . . . , B−1; for writing data coming from the write buffer 4 in either of the memories 2A and 2B, at each S-th stage; S=1, 2, . . . P; with a sum value as a result of adding: a product as a result of multiplying a quotient by “m”, the quotient being obtained by dividing the number of data “N” by a value of a radix “B” to the power of “P−S+1”; wherein “P−S” represents the number of stages remaining; a product as a result of multiplying a quotient by a value of the radix “B” to the power of “S”, the quotient being obtained by dividing the sequential order number “k” by a value of the radix “B” to the power of “S−1”; and a remainder after dividing the sequential order number “k” by a value of the radix “B” to the power of “S−1.” In the meantime, the control unit specifies the read address RA (S, k, m) for reading data coming from either of the memories 2B and 2A in the read buffer 3, at each S-th stage; S=1, 2, . . . P; with a sum value as a result of adding: a product as a result of multiplying a quotient by “m”, the quotient being obtained by dividing the number of data “N” by the radix “B”; and the sequential order number “k.”
In the case of the radix-B=4, the write address WA (S, k, m) is described with an expression shown below:
in the case of the radix-B=4, the read address RA (S, k, m) is described with an expression shown below:
By using addresses calculated in the way described above; a process of bit reversing becomes unnecessary, and the read address in the DFT computation at each stage can be maintained consistent, so that the read buffer 3 and the write buffer 4 can be introduced. Furthermore, a configuration by combining a DFT with the radix-B=2 and a DFT with the radix-B=4 can be implemented easily.
The read buffer 3 and the write buffer 4 individually have a configuration including two banks, and each one of the banks has a capacity that enables storing “N” pieces of complex data. The configuration makes it possible to store data for “N/4” addresses of the memories 2A and 2B in each one bank, when four pieces of complex data in the case of the radix-B=4 are stored at an address of the memories 2A and 2B. For explanation, these banks are called a bank #0, and a bank #1.
At the time of starting a DFT process for one stage, data for 4 addresses of the memories 2A and 2B is read out, and the data is stored in one of the banks of the read buffer 3. Four pieces of complex data are stored in RD (Bi′), wherein the RD (Bi′) is read data for an address “Bi” of the memories 2A and 2B. This situation is expressed by using ‘Ai’ shown in
{Math. 5}
RD(B1)={RD′(AAP+3), RD′(AAP+2), RD′(AAP+1), RD′(AAP)}r=(0, 1, . . . N/4−1) (5)
Data of the read buffer 3 “RB” (rb, rn) at the time of executing a k-th DFT process is expressed as below; wherein the “rb” (={0, 1}) is a bank number of the read buffer 3, and the “rn” is an address of the read buffer 3 (for one piece of complex data).
The 4-point DFT computation unit 1 executes a 4-point DFT process by using the data of the read buffer 3 stored in this way. N/4-process data of the 4-point DFT process is stored in the read buffer 3. In parallel with execution of the 4-point DFT process, subsequent N/4-process data of the 4-point DFT process is read out of the memories 2A and 2B, and it is stored in the other of the banks of the read buffer 3. Since four pieces of complex data are stored in one address of the memories 2A and 2B, data corresponding to one process of the 4-point DFT process can be read out of the memories 2A and 2B by way of one cycle. Then, 1-process data of the 4-point DFT process can be read out of the memories 2A and 2B at the same time as the DFT process, and therefore the 4-point DFT process can be executed by way of one cycle.
The same can also be applied to the write side. Four pieces of complex data to be output through one 4-point DFT process are stored in one of the banks of the write buffer 4; and in the meantime, four pieces of complex data stored in the other of the banks are written in the memories 2A and 2B. Then, one process of writing in the memories can be implemented, during execution of one DFT process.
Data of the write buffer 4 “WB” (wb, wn) is expressed as below; wherein the “wb” (={0, 1}) is a bank number of the write buffer 4, the “wn” is an address of the write buffer 4, and output data of a k-th DFT process is X′(k, m); (wherein m={0, 1, 2, 3}).
{Math. 7}
WB(wb, wn)=X′(k, m)
wn=4×(k mod 4)+m
k=0 . . . (N/4−1) (7)
{Math. 8}
WB(wb, wn)=X′(k, m)
wn=4×m+(k mod 4)
k=0, . . . , (N/4−1) (8)
At each stage, N/4-process data of the 4-point DFT process in the 4-point DFT computation unit 1 needs to be stored in the read buffer 3. Therefore, the 4-point DFT computation unit 1 does not execute a process during four cycles right after starting each stage, but reads data required for the process, out of the memories 2A and 2B, in order to store the data in the read buffer 3. After completion of reading out 4×(N/4) pieces of complex data, the bank of the read buffer 3 is switched, and then data is read out of the memories 2A and 2B, and stored in the read buffer 3. At the same time, data is read out of the other bank of the read buffer 3, and a process of the 4-point DFT computation unit 1 is executed.
Until results of four 4-point DFT processes are stored, the write buffer 4 does not write in the memories 2A and 2B. If once the N/4-process results of the 4-point DFT process are stored after starting the process, the write buffer 4 switches the bank, and then stores a result of a subsequent DFT process in the bank switched. Then, in parallel, the write buffer 4 writes a result of a DFT process, being stored in the other bank, in the memories 2A and 2B.
For a process of one stage, four cycles are required for each of reading out of the memories 2A and 2B to the read buffer 3, and writing from the write buffer 4 to the memories 2A and 2B. Therefore, a minimum execution cycle includes “N/4+4×2” cycles.
If once the input data is stored in the memory 2A, the control unit 7 controls each section, and starts operation from a first stage. The selector 5 switches a connection so as to read out from the memory 2A where the input data is stored.
Each of
Data for first four words; namely, x(0) to x(3), x(16) to x(19), x(32) to x(35), and x(48) to x(51); is read out of the memory 2A and stored in the read buffer 3, subsequently the 4-point DFT computation unit 1 executes a 4-point DFT computation in due order (
In the same way hereafter, a computation by the 4-point DFT computation unit 1, storing a result of the computation in the write buffer 4, reading data out of the memory 2A, and storing the data in the read buffer 3 are executed in due order (
Each of
The operation described above is repeated until the number of 4-point DFT computations reaches N/4.
The operation described above is repeated for a second stage through a P-th stage, and then FFT operation finishes.
In the second and its following stages, the way of writing a computation result of the 4-point DFT in the write buffer 4 is different from that of the first stage, as shown in the expression (8). In the meantime, data writing addresses in the memories 2A and 2B are at N/4P−S+1 intervals in accordance with the expression (3). In the present example, according to N=64 and P=log4 64=3, an interval for addresses (each address for storing one piece of complex data) is “4” and “16” in the second stage and the third stage, respectively.
Incidentally, “z(S, 4×k+i)” in
In the explanation with reference to
In the embodiment described above, the read buffer 3 and the write buffer 4 are introduced, while ways are contrived with respect to laying out of complex data in the memories 2A and 2B as well as writing output data, coming from the 4-point DFT computation unit 1, in the memories 2A and 2B. Thus, even in the case of executing the 4-point DFT through one cycle, it is not needed to segment the memories unnecessarily. Accordingly, any addition of a decoding circuit and a test circuit to the memories 2A and 2B becomes unnecessary so that a scale of the circuit can be reduced. For example, if the memories 2A and 2B are individually segmented into four portions, a decoding circuit and a test circuit with the size of 6 memories are needed. Fortunately, according to the present invention, those circuits can be downsized.
Moreover, while the number of memories is reduced, the number of wire placements in a physical layout at the time of manufacturing LSI can be reduced approximately to a quarter so as to enable easy layout work. Thus, a downsized space (cutting the cost of chips) can be materialized.
The fast Fourier transform circuit shown in
{Math. 9}
X′(0)=x′(0)+x′(1)
X′(1)=x′(0)−x′(1)
X′(2)=x′(2)−x′(3)
X′(3)=x′(2)−x′(3) (9)
According to this configuration, at each stage of P=log B/2 (N/2)+1, the control unit 12 writes data, to be objective for the calculations in a subsequent stage, in the memories in such a way that the data is arranged in sequential order of addresses, which enables reading out of the memories with the same address for each value obtained by dividing the number of data “N” by a value of “B” with respect to sequential order of the fast Fourier computations, i.e., k=0, 1, . . . , N/B−1.
Specifically to describe, the control unit 12 specifies the write address WA (S, k, m); under the condition of m=0, 1, . . . , B−1; for writing data in the memories 2A and 2B at a stage for executing two sets of B/2-point Discrete Fourier Transform, with a value; the value being a result of adding twice “k” and “m” under the condition of m=0, 1, . . . , B/2−1 and the value being a result of adding “N/2”, twice “k”, and “m−B/2” under the condition of m=B/2, . . . , B−1. Moreover, the control unit 12 specifies the write address WA (S, k, m) for writing data in the memories 2A and 2B at each stage for executing a B-point Discrete Fourier Transform, with a sum value as a result of adding: a product as a result of multiplying a quotient by “m”, the quotient being obtained by dividing the number of data “N” by a value of “B” to the power of “P−S+1”; wherein “P−S” represents the number of stages remaining; a product as a result of multiplying a quotient by a half value of the value of “B” to the power of “S”, the quotient being obtained by dividing twice the sequential order number “k” by the value of “B” to the power of “S−1”; and a remainder after dividing the sequential order number “k” by a half value of the value of “B” to the power of “S−1.” In the meantime, the control unit 12 specifies the read address RA (S, k, m) for reading data from the memories 2B and 2A at each of all the ‘P’ stages with a sum value as a result of adding: a product as a result of multiplying a quotient by “m”, the quotient being obtained by dividing the number of data “N” by the value of “B”; and the sequential order number “k.”
In the case of a configuration shown in
In the second and its following stages, a 4-point DFT computation is executed. Meanwhile, being different from what Expression 3 describes, the write address is described with an expression shown below (S≧2). Then, the read address is the same as what Expression 4 describes, as it is in the first stage.
A storage address in the read buffer 3, for data read out of the memories 2A and 2B, is the same as what Expression 6 describes. A write address in the write buffer 4 is specific as described below, depending on the processing stages.
The same as the expression (8).
An example of operation using the addresses described above is explained below with reference to
Each of
Prior to a start of a 2-point DFT process, at first, data for first four words; namely, x(0) to x(3), x(8) to x(11), x(16) to x(19), and x(24) to x(27); is read out of the memory 2A, and the data is stored in the read buffer 3. Subsequently, the 2-point/4-point DFT computation unit 11 executes the 2-point DFT computation with respect to these data. In the meantime, subsequent data; namely, x(4) to x(7); is read out of the memory 2A and stored in the read buffer 3. A computation result from the 2-point/4-point DFT computation unit 11 is stored in the write buffer 4 (Steps described above are shown in
In parallel with writing computation results of 5th to 8th 2-points DFT computations by the 2-point/4-point DFT computation unit 11, in the write buffer 4, data of each one word (four pieces of complex data) from the data stored in the write buffer 4 is written in the memory 2B. After completion of eight 2-points DFT computations, the data stored in the write buffer is written in the memory B. At the time, the data is written in the memory 2B at 1-address intervals.
Each of
Although the above explanation is made on the basis of a fast Fourier transform having a configuration of a decimation-in-frequency type as an example, the present invention can also be embodied with a fast Fourier transform having a configuration of a decimation-in-time type.
In recent years, Orthogonal Frequency Division Multiplexing (OFDM), which is a wireless access system with a high frequency usage efficiency, is used in the field of wireless communication, aiming at a communication rate improvement. For the digital terrestrial broadcasting, wireless Local Area Network (LAN), as well as the mobile communication, and also Long Term Evolution (LTE) that Third Generation Partnership Project (3GPP) is promoting with a new communication method, OFDM is employed.
The communication system has a base station 101. The base station 101 includes: an encoding unit 102, a modulator 103, an OFDM signal generator, a D/A converter 105, and a plurality of antennas 106 (the drawing shows only one of them). The communication system also has a mobile station 111, such as a user terminal, which communicates with the base station 101. The mobile station 111 includes: a decoding unit 112, a demodulator 113, an OFDM signal demodulator 114, an A/D converter 115, and a plurality of antennas 116 (the drawing shows only one of them).
In the base station 101, for example, a CPU (not shown) of the base station 101 inputs data, to be transmitted, as information bits into the encoding unit 102. Then, the encoding unit 102 carries out adding a CRC (Cyclic Redundancy Check) and convolutional coding with respect to the input information bits. Then, the modulator 103 modulates the encoded input data. The OFDM signal generator 104 performs mapping the modulated data onto a frequency axis, and transforms the data on the frequency axis into data on a time axis by way of Inverse Discrete Fourier Transformation. Then, the transformed data is output to the D/A converter 105. The D/A converter 105 converts a digital signal, which the OFDM signal generator 104 outputs, into an analog signal. Then, the modulated data, converted into the analog signal, is transmitted through the plurality of antennas 106.
The mobile station 111 at a receiving side receives the data, transmitted out of the antennas 106 of the transmission side 101, through the plurality of antennas 116. At this point, it is taken into account that the data received by the antennas 116 is affected by noise during the time of propagating through space after being launched from the antennas 106. The data received by the antennas 116 is input into the A/D converter 115. The A/D converter 115 converts the analog signal of the input data into a digital signal. The A/D converter 115 outputs the converted digital signal to the OFDM signal demodulator 114. Then, the OFDM signal demodulator 114 transforms the digital signal on the time axis, which is output from the A/D converter 115, into data on a frequency axis by means of a discrete Fourier Transform, and carries out mapping the data on an IQ plane. The demodulator 113 demodulates the data output from the OFDM signal demodulator 114. Then, the demodulator 113 outputs the demodulated data, obtained by means of demodulation, to the decoding unit 112. The decoding unit 112 performs error correction decoding with respect to the demodulated input data. Using decoded data obtained as a result, a processing circuit in a later stage, such as a CPU, carries out a predetermined process.
A discrete Fourier Transform is used in the OFDM signal generator 104 and the OFDM signal demodulator 114 of a communication system using the OFDM method. The discrete Fourier Transform (DFT) has a great amount of computations. According to the specification of LTE, the number of sub-carriers in a 20-MHz band is 1200. To use DFT for the computations, it is needed to execute a set of computations 1200 times, where a set of computations includes 1199 operations of multiplication and addition of complex numbers. Moreover, these computations must be done within 66.67 microseconds of 10 FDM symbol periods.
As an algorithm for reducing the amount of computations, for example, used is a fast Fourier transform on the basis of the Cooley-Tukey algorithm as described above. By combining 2-point DFTs and 4-point DFTs, the amount of computations in the case of the number of sub-carriers being 1200 becomes as described below:
2048 operations of multiplication and addition of complex numbers+(2048 computations; where one computation includes 3 additions of complex numbers and 1 multiplication of complex numbers)×5 times.
On this occasion, the number of data needs to be a power of 2, in the case of a fast Fourier transform on the basis of the Cooley-Tukey algorithm. Therefore, the number of data is 2048, being greater than 1200 and a minimum number as a power of 2.
In an actual operation, sometimes a twiddle factor as an object of a multiplication of complex numbers may be expressed with only a real part or an imaginary part, and therefore the number of multiplications may be less than those describe above. Nevertheless, remaining is a fact that the amount of computations is still great.
A fast Fourier transform using a 4-point DFT as a component can reduce the amount of computations. Even so, for materializing the computations as described above with hardware, typically adopted is a configuration in which input/output data is stored in a memory. In order to execute a 4-point DFT, it is needed to pick up four pieces of complex data out of a memory, as input data. Then, if the memory is with a single port, four cycles are needed. A memory with multiple ports, or segmenting a memory with a single port into four portions makes it possible to read four pieces of complex data out of the memory through one cycle. Unfortunately, an area of the memory cell itself becomes large in the former, and a test circuit and/or a decoder circuit attached to a memory macro in the latter leads to an increase of an area. Furthermore, segmenting the memory quadruples the number of wire placements. Such inconvenience increases the difficulty in a layout design at the time of manufacturing an LSI, so that it may be conceivably needed to enlarge an area of the LSI for enabling wiring work.
In such a case, applying the fast Fourier transform circuit according to the embodiment described above makes it possible to execute one 4-point DFT computation through one cycle, without using a memory having multiple ports or segmenting a memory, so as to reduce process time.
Furthermore, the present invention can also be utilized in a measuring unit that performs a Fourier transform, such as a spectrum analyzer.
1. 4-point DFT computation unit
2A and 2B. memories
3. read buffer
4. write buffer
5. selector
6. twiddle factor generation unit
7 and 12. control unit
11. 2-point/4-point DFT computation unit
Number | Date | Country | Kind |
---|---|---|---|
2010-030796 | Feb 2010 | JP | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP2011/052844 | 2/10/2011 | WO | 00 | 10/10/2012 |