The present disclosure is generally related to digital signal processing systems configured to perform fast Fourier transformations (FFT), and more particularly to a low-complexity input/output pruning FFT.
Input pruning FFTs are commonly used in the padded FFT process which is known as the up-sampling process in digital signal processing that consists of extending a signal (or spectrum) with zeros. By doing so, this can increase the time sampling which is known as the time domain interpolation that people commonly use, and which is translated into forcing the FFT algorithm to sample the spectrum at smaller frequency intervals.
FFTs algorithms are used in digital signal processing which break down complex signals into elementary components and where the transform length N, is decomposed into arbitrary factors (N=r1, r2, . . . , rk). Input Pruning FFT's are efficient Fast Fourier Transform (FFT), where the efficiency can be increased by removing operations on input values which are zero. Furthermore, Output pruning FFT is a method used to compute a discrete Fourier transform (DFT) where only a subset of the outputs is needed. In this embodiments, we will propose a generalized radix-r input-output pruning FFT, which will compute efficiently the selected spectrum's bin of a sequence of size N that contains M consecutive non-zero input points from which only Lo outputs are desired.
In some embodiments, a circuit may include an input configured to receive a signal and a radix-r input/output pruning fast Fourier transform (FFT) processing element coupled to the input. The radix-r input/output pruning FFT processing element may be configured to remove FFT operations on input values of zero within the signal and to determine a discrete Fourier Transform (DFT) output having fewer output values than a number of input values of the signal.
In the following discussion, the same reference numbers are used in the various embodiments to indicate the same or similar elements.
Theoretical aspects of the Pruning FFT (PFFT) have mainly concentrated on sequences that have Li consecutive non-zero input points at the beginning. In many applications, such as the Orthogonal Frequency-Division Multiplexing (OFDM)-based Cognitive Radio may utilize FFT pruning, which can apply zeros efficiently to the inputs with arbitrary distributions.
In many applications, the percentage of required input/output bins may be very small. For instance, in the Third Generation Partnership Project (3GPP) LTE (Long Term Evolution) where the Orthogonal Frequency-Division Multiple Access OFDMA's symbol size is 1024 in which 12 users equally share the available 600 sub-carriers, only fifty of the 1024 FFT output bins (4.88%) may be used for each mobile terminal. These partial output/input cases are extraordinarily important for the future wireless systems and due to the fact that the pruning FFT (PFFT) can potentially achieve a significant speed improvement, which is desirable for a wide variety of applications such as: OFDMA (Orthogonal Frequency Division Multiplexing Access) cognitive radio, Very long instruction word (VLIW) digital signal processing (DSP) for mobile Applications, multi-channel OFDM systems, Multiple Input Multiple Output—Orthogonal Frequency Division Multiplexing (MIMO-OFDM) systems, and other applications.
Embodiments of a generalized radix-r input-output pruning FFT are described below that can be used to determine efficiently a selected spectrum's bin of a sequence input values of size N that contains M consecutive non-zero input points from which only Lo outputs are desired. In certain embodiments, the FFT may be used to compute a discrete Fourier transform (DFT) where only a subset of the outputs are needed, such that for a transform of size M which has been zero padded to a size N and where only Lo≤P consecutive outputs of the sized N transform are desired, the FFT can produce P consecutive outputs determined from the Li non-zero consecutive inputs and where it is assumed that Lo/Dip≈P=M/Dop. Other embodiments are also possible.
It should be appreciated that the generalized radix-r input/output pruning FFT disclosed herein may be implemented in a digital signal processor executing instructions, in a field-programmable gate array circuit, or in other hardware or software. Further, the operation or execution of the operations defined by the generalized radix-r input/output pruning FFT may improve the speed of the processing of the input data values, while reducing the number of operations and improving the overall efficiency of the circuit. Further, the resulting FFT output values may be used in a variety of contexts, including image processing, audio processing, encryption and decryption, and so on. One possible implementation of the FFT algorithm in a digital signal processor is described below with respect to
In some embodiments, the DSP 102 may be configured to execute instructions including a generalized radix-r input/output pruning FFT 108, which may be configured to reduce the overall number of computations and memory accesses needed to determine the ordered FFT output. The FFT 108 is depicted as a block within the DSP 102 to indicate that the instructions were previously loaded by the DSP 102; however, the FFT 108 may be stored as instructions within a memory (read-only memory, random access memory, solid state memory, hard disc device, or any combination thereof) and may be loaded and executed by the DSP 102 as needed.
The basis of the radix-2 FFT is that a DFT can be divided into two smaller DFTs, each of which can be divided into two smaller DFTs, and so on, resulting in a combination of two points DFTs. Several methods can be used repeatedly to split the DFTs into smaller (two or four-point) core calculations. By appropriately breaking the DFT into partial DFTs in this way, the number of multiplications and the number of stages of the DFT calculation may be controlled. The number of stages often corresponds to the amount of global communication and/or memory accesses, and thus, a reduction in the number of stages is beneficial (in terms of speed and complexity).
The DSP 102 of
The generalized radix-r pruned FFT can formulate the radix-r as composed engines with identical structures and a systematic means of accessing the corresponding multiplier coefficients. This formulation may enable the design of an engine with the lowest rate of complex multipliers and adders, which utilizes r or r−1 complex multipliers in parallel to implement each of the butterfly computations. There can be a simple mapping from the three indices (FFT stage, butterfly, and element) to the addresses of the multiplier coefficients needed in the DFT computation.
The DFT computation may be expressed as follows:
X
(k)=Σn=0N-1x(n)wNnk for k=0,1, . . . ,N−1, (Equation 1)
where
Considering that the number of consecutive input elements Li that can be different from zero is Li≤M=N/Dip, where the variable N represents a number of input bits and the variable Dip represents a threshold value. Equation 1 could be factorized as follows:
with the following identities:
n=n
1
+Mn
2
k=k
1
+D
ip
k
2
n
1=0,1, . . . ,M−1 k1=0,1, . . . ,Dip−1
n
2=0,1, . . . ,Dip−1 k2=0,1, . . . ,M−1. (Equation 3)
The indices n2 may determine the position of the nonzero consecutive inputs into the sequence where zeroes have been applied efficiently to the inputs with arbitrary distributions. With respect to input data sequences that have Li consecutive non-zero input points at the beginning, the index n2 can be set to zero. As a result, Equation 3 may be rewritten as follows:
X
(k)=Σn=0M-1x(n)wNn(k
The computational complexity of Equation 4 can be performed in two ways. The following paragraphs elaborate on a comparison between these two methods.
A first method may be referred to as the “direct way” or “direct method” in which Equation 3 can be expressed as follows:
X
(k)=Σn=0M-1x(n)wNnk
where y(n) can be expressed as follows:
y
(n)
=x
(n)
w
N
nk
. (Equation 6)
The computational complexity of Equation 4 can be determined as follows:
t
c-input
=D
ip
M(tcFFT
where the variable tcFFT represents the complexity of the FFT algorithm of size M, and the variable tcm is the complexity of the complex multiplier.
Logically, the optimal solution of Equation 5 may be obtained by optimizing the complexity of the FFT algorithm, where conventional researchers have been oriented in the optimization of the FFT algorithm, and not the complexity. According to embodiments of the present disclosure, a method of optimizing Equation 5 could be achieved by incorporating the twiddle factors wNnk
In the following discussion, this simplification is demonstrated relative to a radix-2 FFT in which the split radix algorithm has been excluded. The complexity of the Cooley-Tukey (radix-2 FFT) algorithm in term of complex multiplication can be determined as follows:
where the variable SM=log2 M. On the other hand, each radix-2 butterfly may require two complex additions/subtractions. As a result, the total number of complex of additions/subtractions in tca/s DIT process may be determined as follows:
It should be appreciated that each complex multiplication can require six arithmetic operations, and each addition/subtraction can require two arithmetic operations. Therefore, the total number to of the arithmetic operations in the DIT Cooley-Tukey algorithm can be estimated as follows:
Accordingly, the total amount of arithmetic operations required to compute the input pruning FFT can be determined as follows:
t
c-input-Puning_Medina
=D
ip
M(4MSM+M+6)=N(4M log2 M−3M+6). (Equation 11)
According to some embodiments, the low-complexity input/output (I/O) pruning FFT utilizes radix-r DFT factorization, and Equation 5 can be re-written as follows:
In the summations, the variables r, k1 and k2 are independents of the variable n2. The variable wMrk
To subdivide the axis k2 in Equation 13 in two new axes (v and l), it is assumed that k2=v+lV with v=0, 1, . . . , V−1 and l=0, 1, . . . , r−1 where V=M/r. Therefore, the variable X(k
Considering that the variable wVαV=(wVV)α=1α=1 and V=M/r, Equation 14 could be expressed as follows:
The first matrix, the well-known adder tree matrix Tr, and the second matrix can be known collectively as an Input Pruning FFT twiddle factor matrix WN, respectively. Equation 16 can be expressed in a compact form as follows:
fork1=0, 1, . . . , Dip−1 and v=0, 1, . . . , V−1, where the variable X(k
Further, the variable (WN) can be expressed as follows:
W
N=diag(wND
and the matrix TM can be expressed as follows:
In terms of the digital signal processor, the factorization of the FFT can be interpreted as dataflow, which depicts the arithmetic operations and their dependencies. When Equation 16 is read from left to right, the decimation in frequency (DIF) algorithm can be obtained. When Equation 16 is read from right to left, the decimation in time (Equation DIT) algorithm can be determined. It should be noted that the DIF algorithm may require one shuffling stage in order to obtain ordered output data.
The write address generator (WAG) can be determined as follows:
WAG=lV+v. (Equation 21)
The read address generator zRAG) can be determined as follows:
Further, the DIT Coefficient address generator (CAG) can be determined as follows:
where the variable [[x]]N can represent the operation x modulo N; the variable └x┘ may represent the integer part operator of x; the indices are v=0, 1, . . . , V−1; the variable s=0, 1, . . . , SM; the variable r is the radix-r, the variable V is the number of words
and the variable SM is the number of stages (SM=logr M−1). Accordingly, the low-complexity I/O pruning FFT can be determined as follows:
X
(k
+D
×k
)
=X
(k
+D
×k
,s+1,v,w
)=Σn=0r-1[X(k
The computational complexity of the algorithm of Equation 24 may be similar to the complexity of the DIT Cooley-Tukey algorithm. Therefore; the complexity of Equation 17 in terms of arithmetic operations for Equation 24 can be expressed as follows:
t
c-input-IPJMFFT
=D
ip
M(4MSM)=N(4MSM)=N(4M log2M). (Equation 25)
For real value arithmetic operations, the complexity ratio between the low-complexity I/O pruning FFT and a conventional input pruning FFT as taught by Medina-Melendrez, et al. (M. Medina-Melendrez, M. Arias-Estrada and A. Castro, “Input and/or Output Pruning of Composite Length FFTs Using a DIF-DIT Transform Decomposition”, IEEE Transactions on Signal Processing, Vol. 57, No. 10, pp. 4124-4128, October 2009) (hereinafter “Medina”) can be determined as follows:
This ratio is sketched in
t
c-Iimput/Output-JMIOPFFT
=t
cm(LoDop)+DipDopFFTP+tca/s(DipDop+2LoDop), (Equation 27)
Therefore, the complexity of the generalized radix-r I/O pruning FFT algorithm can be determined as follows:
The complexity comparison between the generalized radix-r I/O pruning FFT and the Medina FFT reveals that both methods have the same complexity for Li=8192. Further, the graph 200 reveals that, for Li=307, the complexity of the generalized radix-r I/O pruning FFT algorithm is approximately equivalent to the Medina FFT algorithm for Li=33 for a number of outputs greater than 28. The operation's reduction ratio between the generalized radix-r I/O pruning FFT algorithm and the Medina FFT is presented in
It should be appreciated from the graphs 500 and 600 of
According to Equation 33, it seems that the output stage would be costly in implementation for Lo>Dip. The direct method and the 2 BF filtering method proposed by Sorensen, et al. (H. V. Sorensen and C. S. Burrus, “Efficient computation of the DFT with only a subset of input or output points,” IEEE Transactions on Signal Processing, vol. 41, no. 3, pp. 1184-1199, March 1993.) can be combined for 1<Dop≤4 in order to achieve a gain estimated at 30% for large N, which can be weighed against the loss in precision as shown in
It should be appreciated that the code example provided in
In conjunction with the systems, circuits, devices, and methods described above with respect to
In some implementations, an efficient radix-r input pruning FFT is disclosed that can reduce the complexity and the computational effort to produce the FFT outputs as compared to conventional approaches. Further, this approach is applied on sequences that will have Li (multiple of the FFT radix-r) consecutive non-zero input points at any position n (n is multiple of the fft radix-r) within the sequence and not necessarily at the beginning. It is an indispensable tool for the orthogonal frequency division multiplexing (OFDM) based Cognitive Radio that will be mainly based on FFT pruning, which applies efficiently zeros to the inputs with arbitrary distributions within the sequence.
Implementations that may be used within the scope of the present disclosure may be illustrated by way of the following clauses:
Clause 1: A circuit comprises an input configured to receive a signal including a plurality of input values and a radix-r input/output pruning fast Fourier transform (FFT) processing element coupled to the input. The radix-r input/output pruning FFT processing element prunes FFT operations related to a subset of the plurality of input values having a value of zero and determines, based on others of the plurality of input values, discrete Fourier Transform (DFT) output having fewer output values than a number of the plurality of input values.
Clause 2: The circuit of clause 1, wherein the circuit determines, from the signal, a sequence of the input values that includes a number of consecutive non-zero input points to determine the DFT output having a selected number of output values.
Clause 3: The circuit of any of the preceding clauses, wherein the selected number of output values is less than a total number of input values of the signal.
Clause 4: The circuit of any of the preceding clauses, wherein the radix-r input/output pruning FFT processing element is configured to provide a number (r) complex multipliers in parallel to implement each of a plurality of butterfly computations of the FFT operations.
Clause 5: The circuit of any of the preceding clauses, wherein the plurality of input values includes a number (M) of consecutive non-zero input values.
Clause 6: The circuit of any of the preceding clauses, wherein the radix-r input/output pruning FFT processing element to determine the subset of the plurality of input values from the number (M) of the consecutive non-zero input values.
Clause 7: The circuit of any of the preceding clauses, where the radix-r input/output pruning FFT processing element incorporates twiddle factors and adder tree matrices of the FFT operations into a single stage.
Clause 8: A method comprises receiving a plurality of input values and determining a subset of a plurality of input values having non-zero values. The method further includes determining a discrete Fourier Transformer (DFT) output based on the subset of the plurality of input values using a radix-r input/output pruning fast Fourier transform (FFT) and providing the DFT output including a plurality of output values to an output interface.
Clause 9. The method of clause 8, wherein a number of the plurality of output values is less than a number of the plurality of input values.
Clause 10. The method of claim 7, further comprising determining a sequence of the plurality of input values that includes a number of consecutive non-zero input points.
Clause 11. The method of claim 9, wherein the DFT output is determined from the sequence of the plurality of input values.
Clause 12: The method of claim 7, further comprising providing a number (r) of complex multipliers in parallel to determine a plurality of butterfly computations of the FFT operations.
Clause 13: The method of claim 7, wherein the plurality of input values includes a number (M) of consecutive non-zero input values.
Clause 14: The method of claim 12, further comprising determining the subset of the plurality of input values based on the number (M) of the consecutive non-zero input values.
Clause 15: The method of claim 7, further comprising incorporating twiddle factors and adder tree matrices of the FFT into a single stage.
Clause 16: A circuit comprises an input interface to receive a signal including a plurality of input values, an output interface to provide a discrete Fourier Transform (DFT) output including a plurality of output values, and a processing element to perform radix-r input/output pruning fast Fourier transform (FFT) operations on the plurality of input values to produce the DFT output. The processing element prunes FFT operations related to a subset of the plurality of input values having a value of zero and determines, based on others of the plurality of input values, the plurality of output values comprising the DFT output having fewer values than a number of the plurality of input values.
Clause 17: The circuit of clause 16, wherein the processing element determines the DFT output based on a number of consecutive non-zero input values.
Clause 18: The circuit of any of the clauses 16 through 17, wherein number of the plurality of output values are less than the plurality of input values.
Clause 19: The circuit of any of the clauses 16 through 18 wherein the processing element provides a number (r) of complex multipliers in parallel to implement each of a plurality of butterfly computations of the FFT operations.
Clause 20: The circuit of any of the clauses 16 through 19 wherein the plurality of input values includes a number (M) of consecutive non-zero input values and the processing element determines the subset of the plurality of input values from the number (M) of the consecutive non-zero input values.
Clause 21: The circuit of any of the clauses 16 through 20, where the processing element incorporates twiddle factors and adder tree matrices of the FFT operations into a single stage.
Although the present invention has been described with reference to preferred embodiments, workers skilled in the art will recognize that changes may be made in form and detail without departing from the scope of the invention.
This application is a non-provisional of and claims priority to U.S. Provisional Patent Application No. 62/686,453 filed on Jun. 18, 2018 and entitled “Processor and Methods Configured to Provide a Low-Complexity Input/Output Pruning Fast Fourier Transform”, which is incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
62686453 | Jun 2018 | US |