For a more complete understanding of the disclosure and the advantages thereof, reference is now made to the following brief description, taken in connection with the accompanying drawings and detailed description, wherein like reference numerals represent like parts.
It should be understood at the outset that although an exemplary implementation of one embodiment of the disclosure is illustrated below, the system may be implemented using any number of techniques, whether currently known or in existence. The disclosure should in no way be limited to the exemplary implementations, drawings, and techniques illustrated below, including the exemplary design and implementation illustrated and described herein, but may be modified within the scope of the appended claims along with their full scope of equivalents.
In communication systems, such as modems for asymmetric digital subscriber line (ADSL) and very high data rate subscriber line (VDSL), many signal processing operations may be used. For example, an ADSL modem or a VDSL modem may utilize a time domain windowing operation, a fast Fourier transform (FFT) operation, and an inverse fast Fourier transform (IFFT) operation in a signal processing chain. Hardware optimization may be achieved with a processor architecture that balances processor area, power consumption of the processor, and processing capability requirements of the processor. Hardware optimization in communication systems may be accomplished by using the same processor architecture for performing a plurality of signal processing operations. For example, the same processor architecture may perform the time domain windowing operation, the FFT operation, and the IFFT operation.
Disclosed herein is a butterfly processor architecture that uses a single high speed multiplier unit and two adder/subtracter units that are structured to efficiently execute radix-2 decimation-in-time (DIT) butterfly operations. The computations for windowing operations, FFT operations, and IFFT operations may be realized in terms of butterfly operations and hence the butterfly processor architecture may be used to perform the computations of a plurality of signal processing operations. Using the butterfly processor architecture, the throughput may only be limited by read and write operations to memory for each butterfly operation. The butterfly operations may be performed in-place whereby the results of each operation may be stored in the same location in memory where the inputs for each operation were retrieved. Performing the butterfly operations in-place ensures that the memory may be big enough to hold one frame of data. The butterfly processor architecture may also use scaling elements for implementation of a dynamic scaling algorithm. The dynamic scaling algorithm may reduce the precision requirements of intermediate results when performing the windowing operations, FFT operations, or IFFT operations and hence may reduce the data word length in the memory.
Similarly, data may be received to the signal processing chain 100 through an analog-to-digital converter (ADC) 108 and one or more filters 110 to an adder unit 112. From the adder unit 112, the signal processing chain 100 may split into dual time domain equalization (TEQ) paths with a TEQ 114 and a TEQ 116. The TEQ 114 may output data to the adder unit 118 and the TEQ 116 may output data to the adder unit 120. The signal processing chain 100 includes a feedback loop from the other processing 102 to an echo cancellation (EC) unit 122. The EC unit 122 may provide data to one or more of the adder unit 112, the adder unit 118, or the adder unit 120 to perform echo cancellation. Each of the adder unit 118 and the adder unit 120 provide data to a buffer 124 from the dual TEQ paths.
The buffer 124 may communicate with a butterfly processor 126 for performing various signal processing operations. For example, the butterfly processor 126 may perform windowing operations, FFT operations, and IFFT operations on the data stored in the buffer 124. The butterfly processor 126 may be programmable to perform the windowing, FFT, and IFFT operations on samples in the range of around 64-4096 or more real samples. The buffer 124 may supply data processed by the butterfly processor 126 to the other processing 102 to be interpreted or have other processing operations performed, for example.
u=x+W*y (1)
v=x−W*y (2)
where u, v, W, x, and y may be complex numbers. In FFT and IFFT operations, the multiplication factor W 210 is sometimes referred to as a twiddle factor that may be a complex number expressed as WNi=e−j2πi/N.
With the in-place radix-2 butterfly operation, the results of each operation may be written back into the same location in the buffer 124 that the inputs were retrieved from. Using an in-place radix-2 butterfly operation may ensure that the data buffer 124 may be large enough to hold a frame of data. In a dual TEQ path implementation, the buffer 124 may be large enough for two frames of data. Each TEQ path may store data in one of two logical or physical partitions of the buffer 124. For example, with a physical partition, the buffer 124 may comprise two physical buffers, each configured to store data for one of the dual TEQ paths. One skilled in the art will recognize that the output u 206 may be stored in the buffer 124 at the address 304 and the output v 208 may be stored in the buffer 124 at the address 302. Further, one skilled in the art will recognize that one or both of the output u 206 and the output v 208 may not be stored in the buffer 124.
Input data read from the buffer 124 by the memory access unit 402 may be stored in a data buffer 406 or a data buffer 408. Each of the data buffer 406 and the data buffer 408 may be a one-frame data buffer. While the butterfly processor 126 operates on the data in one of the data buffer 406 or the data buffer 408, the input for the next signal processing operation may be stored in the other of the data buffer 406 or the data buffer 408. The address generator 404 may generate addresses for retrieving the appropriate input for the operations from one of the data buffer 406 or the data buffer 408.
A multiplexer 410 may select which of the data buffer 406 or the data buffer 408 to read data from for processing. The multiplexer 410 may provide the data to a scaling unit 412. The scaling unit 412 may shift input data to the right by a variable number of bits and round the result. For example, the scaling unit 412 may shift input to the right by one bit to perform a divide-by-two operation. The scaling unit 412 may also simply pass data through without shifting the input data. The scaling unit 412 may provide input data corresponding to the input y 204 in the butterfly operation 200 to a multiplier 420. The scaling unit 412 may also provide input data corresponding to the input x 202 in the butterfly operation 200 to each of an adder/subtracter 424 and an adder/subtracter 422.
The butterfly processor 126 may include or have access to a memory 414. The memory 414 may be a read-only memory (ROM) for storing twiddle factors used in performing FFT and IFFT operations. The butterfly processor 126 may also include or have access to a memory 416. The memory 416 may be a random access memory (RAM) for storing window coefficients used in time domain windowing operations. Each of the memory 414 and memory 416 may provide data to a multiplexer 418 in accordance with addresses generated by the address generator 404.
The multiplexer 418 may select which data to provide to the multiplier 420. For example, when utilizing the butterfly processor 126 to perform a FFT or an IFFT operation, the multiplexer 418 may select a twiddle factor supplied by the memory 414. Similarly, when utilizing the butterfly processor 126 to perform a time domain windowing operation, the multiplexer 418 may select a window coefficient supplied by the memory 416.
The multiplier 420 may multiply the input supplied by the scaling unit 412 and the twiddle factor or the window coefficient supplied by the multiplexer 418. The output from the multiplier 420 may be supplied to each of the adder/subtracter 424 and the adder/subtracter 422. The adder/subtracter 424 and the adder/subtracter 422 may perform an addition or subtraction operation on the input provided by the scaling unit 412 and the input provided by the multiplier 420. As described above, the butterfly processor 126 includes the multiplier 420, the adder/subtracter 424, and the adder/subtracter 422 that may be used to perform the butterfly operation 200 on data input from the buffer 124 through the memory access unit 402.
Each of the adder/subtracter 422 and the adder/subtracter 424 may supply their outputs to a scaling and rounding unit 426. The scaling and rounding unit 426 may shift input data to the right by a variable number of bits and round the result. For example, the scaling and rounding unit 426 may shift input to the right by one bit to perform a divide-by-two operation. The scaling and rounding unit 426 may also simply pass data through without shifting the input data.
The scaling unit 412 and the scaling and rounding unit 426 may be used to perform a dynamic scaling algorithm that may reduce the precision requirements of intermediate results when performing the windowing operations, FFT operations, or IFFT operations and hence may reduce the data word length in the buffer 124. Theoretically, the amplitude of the output of an FFT operation can scale up to 4096×√{square root over (2)} for a 4096-point FFT, the precision growing with each stage of the FFT. The growth of precision may necessitate additional bits, increased precision of computation elements, and larger memory sizes.
The dynamic scaling algorithm performed by the scaling unit 412 and the scaling and rounding unit 426 may be used to limit the maximum value possible at a butterfly stage output to 1+√{square root over (2)}. In an embodiment, the scaling and rounding unit 426 may utilize the dynamic scaling technique described U.S. Pat. No. 6,137,839, to Mannering et. al., which is incorporated by reference herein as if reproduced in full below. For example, the dynamic scaling algorithm may examine, at each butterfly stage, the maximum overflow seen in the previous stage. The maximum overflow seen in the previous stage may be used to determine the scaling of inputs of the current stage at the scaling unit 412. The accumulated scaling from previous stages may optionally be undone at the end of the FFT/IFFT operation with the scaling and rounding unit 426, or passed on to the next stage along with the data. The precision may be chosen to provide quantization noise power less than around −86 dBm.
The output from the scaling and rounding unit 426 may be supplied to the memory access unit 402 such that the results of each butterfly operation may be written back into the same location in the buffer 124 that the inputs were retrieved from. Therefore, the butterfly processor 126 may operate to perform an in-place radix-2 butterfly operation. The butterfly processor 126 may perform multiple iterations of the in-place radix-2 butterfly operation to perform various signal processing operations, as described in more detail below.
As described above, the FFT block 520 may perform a DIT Cooley-Turkey FFT operation. A sequence of data may be decomposed into a complex sum of two data subsequences comprised of even and odd data subsequences, respectively. That is, for N real samples x(n) for n=0, 1, . . . , N-1, rather than performing N FFT operations, the N real samples may be converted into N/2 complex samples y(k) as shown below:
y(0)=x(0)+jx(1) (3)
y(1)=x(2)+jx(3) (4)
and so on, where y(k) may generally be expressed as:
As shown in equation (5), the N/2 complex samples y(k) may be a complex sum of the even samples and the odd samples of the N real samples. Therefore, rather than performing N FFT operations, only N/2 complex FFT operations may be performed.
As shown in
The FFT block 520 may also include the stage 1 DIT radix-2 butterfly block 506 through the stage M DIT radix-2 butterfly block 508. For performing an N/2-point DIT FFT operation on a sequence of N real samples, the number of stages of DIT radix-2 butterfly operations performed may be
butterfly operations. For the exemplary stages of butterfly operations shown in
At a stage 1 butterfly operation 604, four butterfly operations are performed, one for each pair of input y(k). For example, a butterfly operation may be performed on the input y(0) and the input y(4) with a twiddle factor W160. Similarly, a stage 2 butterfly operation 606 may perform four butterfly operations on different pairs of the results of the stage 1 butterfly operation 604 with the appropriate twiddle factors. Finally, a stage 3 butterfly operation 608 may perform four butterfly operations on different pairs of the results of the stage 2 butterfly operation 606 with the appropriate twiddle factors to generate a FFT Y(k) for each of the inputs y(k). The butterfly processor 126 may operate to successively perform each of the four butterfly operations for each stage of butterfly operations. Therefore, the butterfly processor 126 iteratively performs
butterfly operations to accomplish an
-point FFT operation.
The results from the stage I DIT radix-2 butterfly block 506 through the stage M DIT radix-2 butterfly block 508 may be expressed as:
The results may also be expressed as:
However, for performing the DIT radix-2 FFT, the output needed may be expressed as:
The needed output, X(i), may be generated by performing the post-processing block 510.
with the twiddle factor W=1+j*0. Therefore, the twiddle factor of the first butterfly operation simply performs a multiplication by one. An output of the subtraction operation, t0, may be multiplied by negative j. For example, if t0=a+jb, then t2=j*t0=b−ja. So, the multiplication by negative j simply rearranges the real and imaginary parts of to differently in t2. The real part of to is negated and stored as the imaginary part of t2, and the imaginary part of t0 is stored as the real part in t2. The outputs of the first butterfly operation are:
for each of the inputs to the second butterfly operation. The symbol
represents an operation to shift the inputs to the right by one bit to perform a divide-by-two operation. The butterfly processor 126 may perform the divide-by-two operation using the scaling unit 412. The outputs of the second butterfly operation are:
which are the desired outputs according to equations (9), (10), and (11). The butterfly processor 126 may store the outputs of the butterfly operation in the buffer 124 at the locations from which the inputs were read from. Therefore each of the stage 1 DIT radix-2 butterfly block 506 through the stage M DIT radix-2 butterfly block 508 and the post-processing block 510 of the FFT block 520 may be performed as a plurality of butterfly operations by the butterfly processor 126.
As described above in conjunction with
The time domain output from the stage M DIT radix-2 butterfly block 508 in the IFFT block 530 may be expressed as:
One skilled in the art will recognize that the equation (20) is similar to that of the FFT block 520 described above, such that the same twiddle factors that are used in the FFT block 520 may be used for the IFFT block 530. Therefore, the memory 414 may only need to store one set of twiddle factors for performing both FFT and IFFT operations.
The pre-processing block 512 may perform two butterfly operations to generate:
Y(i)=Xe*(i)+jXo*(i). (21)
From the post-processing operation, it can be seen that
with the twiddle factor W=1+j*0. Therefore, the twiddle factor of the first butterfly operation simply performs a multiplication by one. An output of the subtraction operation, t0, may be multiplied by negative j. For example, if t0=a+jb, then q=−j*t0=b−ja. So, the multiplication by negative j simply rearranges the real and imaginary parts of t0 differently in q. The real part of t0 is negated and stored as the imaginary part of q, and the imaginary part of t0 is stored as the real part in q. The outputs of the first butterfly operation are:
for each of the inputs to the second butterfly operation. The symbol
represents an operation to shift the inputs to the right by one bit to perform a divide-by-two operation. The butterfly processor 126 may perform the divide-by-two operation using the scaling unit 412. The outputs of the second butterfly operation are:
which are the desired outputs according to equations (21), (24), and (25). The butterfly processor 126 may store the outputs of the butterfly operation in the buffer 124 at the locations from which the inputs were read from. Therefore the pre-processing block 512 and each of the stage 1 DIT radix-2 butterfly block 506 through the stage M DIT radix-2 butterfly block 508 of the IFFT block 530 may be performed as a plurality of butterfly operations by the butterfly processor 126.
As shown in
y(n)=y(F−P+n) (36)
for n=0, 1, . . . , P-1 where F is a discreet multi-tone transceiver (DMT) frame length, N is a real FFT length, P is a cyclic prefix length, and W is a number of window coefficients. As shown in
p=[y(n−N)+y(n)]+j[y(n+1−N)+y(n+1)] (39)
and
q=[y(n−N)−y(n)]j[y(n+1−N)−y(n+1)]. (40)
The butterfly processor 126 may store the outputs of the butterfly operation in the buffer 124 at the locations from which the inputs were read from. While p is generated as part of the first butterfly operation, the output p may not be stored in the location of f.
which is the desired output according to equation (38). The output u may be stored in the location of q to restore the contents of the input g. As mentioned above, the output v is the desired result and may be stored in the location of f. Therefore the time domain windowing block 502 may also be performed as a plurality of butterfly operations by the butterfly processor 126.
As described above, each of the time domain windowing, FFT, and IFFT operations may be performed in terms of butterfly operations by the butterfly processor 126. Each butterfly operation may be performed by the butterfly processor 126 in four clock cycles. Each butterfly operation may be preceded by two read operations to read the inputs from the buffer 124 and followed by two write operations to write the outputs to the buffer 124. When performing the FFT or the IFFT operations, the butterfly processor 126 may compute the result in 4*M*(N/4) clock cycles for each of the stage 1 DIT radix-2 butterfly block 506 through the stage M DIT radix-2 butterfly block 508. Also, the butterfly processor 126 may perform the bit reversal block 504 in 4*N/2 clock cycles.
Each of the pre-processing block 512, the post-processing block 510, and the time domain windowing block 502 may be performed in two butterfly operations. The butterfly processor 126 may perform each of the pre-processing block 512, the post-processing block 510, and the time domain windowing block 502 in 4*2*(N/4) clock cycles. Therefore each of the processing sequences depicted in
In an implementation of the butterfly processor 126, such as that shown in
The butterfly processor 126 may be implemented in 90 nm 1.1V CMOS technology to perform 64-4096 point FFT/IFFT/windowing operations within around 183 us and consume around 19.8 mW of dynamic power for the largest size. The butterfly processor 126 may be implemented into the physical layer blocks of a VDSL2 transceiver or other communication device and occupy an area of 0.38 sqmm. Therefore, the architecture of the butterfly processor 126 may be comparable to that of other known architectures and may match the throughput of pipelined architectures at the same latency.
While several embodiments have been provided in the present disclosure, it should be understood that the disclosed systems and methods may be embodied in many other specific forms without departing from the spirit or scope of the present disclosure. The present examples are to be considered as illustrative and not restrictive, and the intention is not to be limited to the details given herein. For example, the various elements or components may be combined or integrated in another system or certain features may be omitted, or not implemented. For example, while only a single butterfly processor 126 is shown in the implementation of
Also, techniques, systems, subsystems and methods described and illustrated in the various embodiments as discrete or separate may be combined or integrated with other systems, modules, techniques, or methods without departing from the scope of the disclosure. Other items shown or discussed as directly coupled or communicating with each other may be coupled through some interface or device, such that the items may no longer be considered directly coupled to each other but may still be indirectly coupled and in communication, whether electrically, mechanically, or otherwise with one another. Other examples of changes, substitutions, and alterations are ascertainable by one skilled in the art and could be made without departing from the spirit and scope disclosed herein.
This application claims priority to U.S. Provisional Application No. 60/825,672, filed Sep. 14, 2006, entitled “64-4096 Point FFT/IFFT/Windowing Processor for Multi-Standard ADSL/VDSL Applications,” which is incorporated by reference herein as if reproduced in fill below.
Number | Date | Country | |
---|---|---|---|
60825672 | Sep 2006 | US |