The present invention will now be described in greater detail based on embodiments with reference to the accompanying drawings, in which:
In the following, the embodiments of the present invention will be described in connection with DFT implementations based on the Cooley-Tukey algorithm.
The Cooley-Tukey algorithm is disclosed in James W. Cooley and John W. Tukey, “An algorithm for the machine calculation of complex Fourier series,” Math. Comput. 19, 297-301 (1965). This is a divide and conquer algorithm that recursively breaks down a DFT of any composite size N=N1N2 into many smaller DFTs of sizes N1 and N2, along with O(n) multiplications by complex roots of unity traditionally called twiddle factors. If N1 is the radix, it is called a decimation in time (DIT) algorithm, whereas if N2 is the radix, it is called a decimation in frequency (DIF, also called the Sande-Tukey algorithm).
One example of use of the Cooley-Tukey algorithm is to divide the transform into two pieces of size n/2 at each step, and is therefore limited to power-of-two sizes, but any factorization can be used in general. These are called the radix-2 and mixed-radix cases, respectively (and other variants have their own names as well). Although the basic idea is recursive, most traditional implementations rearrange the algorithm to avoid explicit recursion. Also, because the Cooley-Tukey algorithm breaks the DFT into smaller DFTs, it can be combined arbitrarily with any other algorithm for the DFT.
According to the following embodiments, DFT modules and devices are implemented based on the Cooley-Tukey algorithm for a wide range of vector lengths, e.g., 1200, 600, 300, 150, 75, 50 and 25. Furthermore, the implementation is optimized for hardware realization due to a reduced number of multiplications, which results in processing time of the order n*log(n).
The embodiments are implemented as DIF, although an implementation as Decimation in Time DIT would of course be possible as well.
In the embodiments, the basic module for all modes is the DFT-25, while other modules are then built based on the DFT-25, and output values are re-ordered.
with k=0 . . . 24, and can be further transformed to:
with i=0 . . . 4.
The 25 input values X[0] to X[24] are in natural order (DIF) and divided into 5 groups with 5 values each. The twiddle factors Wxy to be multiplied at the input and at the output according to the above transformed equations are grouped into five groups with five twiddle factors in each group. These are given in brackets beside the “+” symbol of the data flow graph, and can be calculated as follows:
W
x
y
=e
j*y*2π/x.
The first twiddle factor in each bracket corresponds to the first transition which leads to the “+” symbol, the second twiddle factor corresponds to the second transition, and so on. The results Y[0] to Y[24] of the DFT-25 data processing are not reordered immediately. The reordering will be made after all values of the DFT are calculated.
Input data (X[i]) is supplied to a 5x hold unit 20 and the stored samples are supplied to a first multiplier and multiplied with an assigned twiddle factor W5x generated in a first twiddle factor generating unit 10. Five successive outputs of the first multiplier are added in a first integrator unit 30 and then supplied to a second multiplier where the obtained sum is multiplied with another assigned twiddle factor W25x generated in a second twiddle factor generating unit 12. Again, five successive outputs of the second multiplier are added in a second integrator unit 32 to obtain the output data (Y[i]).
The algorithm can be described as follows:
with k=0 . . . 74, and can be further transformed to:
The output values are not reordered.
Input data (X[i]) is now supplied to a 3x hold unit 22 and the stored samples are supplied to a first multiplier and multiplied with an assigned twiddle factor W75x generated in a first twiddle factor generating unit 14. Three successive outputs of the first multiplier are added in a first integrator unit 30 and then supplied to a second multiplier where the obtained sum is supplied to a basic DFT-25 module 40 (shown in
Thus, the implementation of the DFT-1200, 600, 300, 150, and 50 modules is based on the basic DFT-25 module and the derived DFT-75 module described above.
If the DFT-length is less than the stage length n, the stage is simply bypassed through a selective butter-fly or bypass unit 72, where the two input lines are selectively either connected directly to the output lines or crossed (butter-fly connection) so that the upper input line is connected to the lower output line and vice versa, and through a subsequent delete or bypass unit 74, where the samples are selectively either deleted or by-passed. If the DFT-length is equal or greater than the stage length n, the first n/2 incoming samples are stored in a FIFO (First-In-First-Out) memory 70 (e.g. a shift register). Together with the next n/2 incoming samples, the butter-fly operation of the butter-fly or bypass unit 72 is performed on these n samples. After multiplication at a subsequent multiplier with twiddle factors Wnx generated at a twiddle factor generating unit 76, the samples are output to the next stage of length n/2.
Thus, a modular and flexible DFT implementation for vector lengths other than 2x can be obtained.
However, the above DFT implementations based on the Cooley-Tukey algorithm generally have a butter-fly structure except the last stage of operation (basic DFT module). An example of a DFT of length n was shown in
According to the third embodiment, multiplications are combined with the twiddle factors in the last two stages of a DFT implementation. In case of the DFT-75 module, the number of multiplications can be reduced by 7%. The proposed solution can be used for all DFT implementations based on Cooley-Tukey algorithm.
In the upper part of
a·W
x1
y1
·W
x2
y2
=a·e
jy
·2π/x
·e
jy
·2π/x
=a·e
j(y
/x
+y
/x
)·2π
As an example, the lower part of
L=a·b·c,
where a, b and c are positive integers. In case of the DFT-75 module, the factor “a” is 3, while “b” and “c” are 5.
As regards the above first to third embodiments, it is noted that the functionalities of the individual blocks shown in
In summary, a method and apparatus for implementing a DFT of a predetermined vector size have been described, wherein at least one DFT module is configured to perform DFTs of a first predetermined number and of a vector size corresponding to a second predetermined number, to multiply by twiddle factors, and to perform DFTs of said second predetermined number and of a vector size corresponding to said first predetermined number. At least two of the at least one DFT modules are combined to obtain the predetermined vector size. Thereby, an implementation of non 2x-radix Fourier transformation can be achieved with moderate hardware complexity
The preferred embodiments can be used in any DFT processing environment, for example in wireless access networks, such as UTRAN or EUTRAN, or alternatively in any other signal processing environment. The DFT modules are not restricted to the above DFT-25 and/or DFT-75 modules. Rather, any suitable module size can be implemented. The preferred embodiments my thus vary within the scope of the attached claims.
| Number | Date | Country | Kind |
|---|---|---|---|
| 06 013 260.2 | Jun 2006 | EP | regional |