1. Field of the Invention
Embodiments of the present invention relate to a fast Fourier transform architecture. More particularly, embodiments of the present invention relate to a system for calculating a fast Fourier transform that utilizes a split-radix tree-based architecture.
2. Description of the Related Art
The calculation of the discrete Fourier transform (DFT) involves many repetitive calculations. Cooley and Tukey realized this fact and developed an algorithm to significantly reduce the number of calculations required to compute the DFT. This algorithm became known as the fast Fourier transform (FFT). Implementations of the FFT usually include one or more processing elements to compute the FFT in stages, wherein the processing elements are generally implemented with a fixed-radix architecture, such as radix-2 or radix-4. The FFT typically operates on N points of data, where N is a power of 2, e.g., 2, 4, 8, 16, etc. Often, the fixed-radix architecture requires that N/radix# processors complete their calculations for every stage, wherein the total number of processors may depend on the number of stages. Furthermore, an entire stage of calculations for the N points of data is usually required to be complete before the next stage of calculations can begin. This type of architecture might not lend itself to implementation among distributed calculation resources, where the calculations for data of size less than N might be more easily performed on discrete components.
Embodiments of the present invention solve the above-mentioned problems and provide a distinct advance in the art of calculating the fast Fourier transform. More particularly, embodiments of the invention provide a method and system for calculating the fast Fourier transform that utilize a split-radix tree-based architecture that may be implemented on multiple field programmable gate arrays.
A fast Fourier transform (FFT) computation system constructed in accordance with various embodiments of the current invention may comprise a plurality of field programmable gate arrays (FPGAs), a plurality of initial calculations modules, a plurality of butterfly modules, a plurality of external interfaces, and a plurality of FPGA interfaces. The FPGAs may include a plurality of configurable logic elements that may be configured to perform mathematical calculations for the FFT. The initial calculations modules may be formed from the configurable logic elements and may be implemented according to a split-radix tree architecture that includes a plurality of interconnected nodes. The initial calculations modules may perform the initial split-radix calculations of the FFT. The butterfly modules may be formed from the configurable logic elements and may be implemented according to the split-radix tree architecture to perform at least a portion of the FFT computation in an order that corresponds to the connection of the nodes of the split-radix tree architecture. The FPGA interfaces are included in each FPGA and allow communication between the FPGAs. The external interfaces are also included in each FPGA and allow communication with one or more external devices in order to receive data which requires an FFT computation and to transmit the FFT computation results.
A method in accordance with various embodiments of the current invention may comprise creating a split-radix tree architecture to accommodate a number of points for an FFT computation. A number of interconnected nodes are created within the tree architecture, wherein each node represents a plurality of mathematical calculations that compute at least a portion of the FFT. The connection of the nodes determines the order of the calculations. The tree architecture includes a plurality of leaf nodes, a plurality of branch nodes, and a single root node. Resources are allocated to compute the FFT among a plurality of FPGAs. The FFT computation is performed according to the tree architecture wherein the calculations associated with the leaf nodes are performed before the calculations associated with the branch nodes which are performed before the calculations associated with the root node.
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
Other aspects and advantages of the present invention will be apparent from the following detailed description of the embodiments and the accompanying drawing figures.
Embodiments of the present invention is described in detail below with reference to the attached drawing figures, wherein:
The drawing figures do not limit the present invention to the specific embodiments disclosed and described herein. The drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the invention.
The following detailed description of the invention references the accompanying drawings that illustrate specific embodiments in which the invention can be practiced. The embodiments are intended to describe aspects of the invention in sufficient detail to enable those skilled in the art to practice the invention. Other embodiments can be utilized and changes can be made without departing from the scope of the present invention. The following detailed description is, therefore, not to be taken in a limiting sense. The scope of the present invention is defined only by the appended claims, along with the full scope of equivalents to which such claims are entitled.
A discrete Fourier transform (DFT) converts a time-sampled time-domain data stream into a frequency-domain representation of the data stream. The DFT is utilized for applications such as spectral analysis, where it is desired to know the frequency components of a signal, such as an audio signal, a video signal, or a signal derived from naturally occurring phenomena. The DFT computation includes many repetitive calculations which may be more efficiently computed using an algorithm known as a fast Fourier transform (FFT). The FFT is generally performed on points of data, or data that is sampled from a signal at regular time intervals. The number of points, N, is usually a power of 2, e.g., 4, 8, 16, 32, etc. Thus, the FFT is performed on N points of time-domain data. The result of the computation is N points of frequency-domain data.
A multiple field-programmable gate array (FPGA) FFT computation system 10 as constructed in accordance with various embodiments of the current invention is shown in
The system 10 performs the FFT computation according to the structure of a tree architecture 24. An example of the tree architecture 24 for a 32-point FFT is shown in
The FPGA 12 generally provides the resources to implement the external interface 14, the initial calculations module 16, the real-data compensation module 18, the FPGA interface 20, and the butterfly modules 22. The FPGA 12 may include standard gate array components, such as configurable logic blocks that include combinational logic gates and latches or registers, programmable switch and interconnect networks, random-access memory (RAM) components, and input/output (I/O) pads. The FPGA 12 may also include specialized functional blocks such as arithmetic/logic units (ALUs) that include high-performance adders and multipliers, or communications blocks for standardized protocols. An example of the FPGA 12 is the Xilinx Virtex™ series, particularly the Virtex™-5 FPGA, from Xilinx, Inc. of San Jose, Calif.
The FPGA 12 may be programmed in a generally traditional manner using electronic programming hardware that couples to standard computing equipment, such as a workstation, a desktop computer, or a laptop computer. The functional description or behavior of the circuitry may be programmed by writing code using a hardware description language (HDL), such as very high-speed integrated circuit hardware description language (VHDL) or Verilog, which is then synthesized and/or compiled to program the FPGA 12. Alternatively, a schematic of the circuit may be drawn using a computer-aided drafting or design (CAD) program, which is then converted into FPGA 12 programmable code using electronic design automation (EDA) software tools, such as a schematic-capture program. The FPGA 12 may by physically programmed or configured using FPGA programming equipment, as is known in the art.
The external interface 14 generally provides communication with external components to manage the flow of data in and out of the system 10. The external interface 14 may prepare the incoming data for the FFT calculation by parsing the data and removing any header, packet, or framing information. The external interface 14 may also put the data in the proper numerical format to be operated on by the initial calculations module 16. Once the FFT calculation is complete, the external interface 14 may prepare the data to be received by other components or systems, such as by converting the numerical format of the data, or by adding headers, packet and framing information, or other communications, bus, or network protocol data. An example of the protocol that the external interface 14 may be compatible with is the PCI Express 2.0 or PCI Express 3.0.
The external interface 14 may be an endpoint component (compatible with the PCI Express or similar protocol) that is included as a built-in block of the FPGA 12 or may be programmed into the FPGA 12 using one or more code segments of a hardware description language (HDL) or other FPGA-programming language. Thus, each FPGA 12 might have its own external interface 14. In certain embodiments, the external interface 14 may be a standalone component that communicates with the FPGA 12 through the standard FPGA 12 I/O ports. Furthermore, there may be a plurality of external interfaces.
The external interface 14 may couple with a communications bus 34 that connects to one or more external devices 36, as shown in
While the communications bus 34 is described above as transmitting and receiving data electrically, the communications bus 34 may also communicate data optically or wirelessly. Thus, the communications bus 34 may include optical transmitting and receiving components, such as lasers, light-emitting diodes (LEDs), and detectors, as well as optical communications media, such as optical fibers or other waveguides. In addition, the communications bus 34 may include radio-frequency (RF) receivers and transmitters that are capable of communicating data according to standard protocols, such as the Institute of Electrical and Electronics Engineers (IEEE) wireless standards 802.11, 802.15, 802.16, and the like.
The external device 36, as shown in
The initial calculations module 16 generally performs the initial calculations of the FFT according to the structure of the tree architecture 24. The initial calculations are those that are associated with the lowest nodes 26 of the tree architecture 24 as shown in
The system 10 generally performs calculations on a data set that is presented in bit-reverse order. An example of a data set presented in bit-reverse order is the column of numbers in boxes along the left side of the 32-point FFT implementation of
When performing an FFT calculation on an N-point set of data with only real components, the initial calculations module 16 may interleave the odd-numbered and even-numbered samples, treating the odd-numbered samples as a real component and the even-numbered samples as an imaginary component, to create N/2 complex data samples. In the case of real-component only data, an N-point real FFT is treated as an N/2-point complex FFT that includes some additional calculations to compensate for the real-only data set, with the initial calculations module 16 putting the data in the proper order.
The initial calculations module 16 may include the components necessary to perform split-radix calculations, which include N=2 and N=4 calculations. Thus, the initial calculations module 16 may include one or more N=2 processors as well as one or more N=4 processors. An N=2 processor 38 may perform the calculations necessary for a 2-point FFT. An N=4 processor 40 may perform the calculations necessary for a 4-point FFT. In various embodiments, the initial calculations module 16 may include the necessary components that can be configured as either two N=2 processors 38, as shown in
The initial calculations module 16 may include specialized functional blocks, combinational logic gates (e.g., AND, OR, NOT), adders, multipliers, multiply/accumulate units (MACs), ALUs, lookup tables, and the like. The initial calculations module 16 may also include buffers in the form of flip-flops, latches, registers, static RAM (SRAM), dynamic RAM (DRAM), and the like to store data before and after the calculations are performed, as well as the intermediate results while the initial calculations are being performed. The initial calculations module 16 may be formed from one or more code segments of an HDL or one or more schematic drawings, and may be programmed into the FPGA 12 as discussed above. The initial calculations module 16 is typically a component or group of components in the FPGA 12. However, in some embodiments, the initial calculations module 16 may be a component or group of components external to the FPGA 12.
The real-data compensation module 18 generally executes a final set of operations on the resulting data after an FFT has been performed on real-component only data. As described above, the odd and even numbered components of an N-point real-component only input data may be treated as real and imaginary components for an N/2-point complex-data FFT. Once the FFT has been calculated, the real-data compensation module 18 utilizes twiddle factors to perform a final calculation on the data to correct the reordering of the data in the time domain. In addition, whether the input data is real-only or is complex, the real-data compensation module 18 buffers the frequency-domain data before it is forwarded out of the system 10 through the external interface 14.
The real-data compensation module 18 may include combinational logic gates, ALUs, shift registers or other serial-deserializer (SERDES) components, and the like. The real-data compensation module 18 may also include buffers in the form of flip-flops, latches, registers, SRAM, DRAM, and the like.
The system 10 generally allows communication from one FPGA 12 to another FPGA 12. Typically, one or more butterfly modules 22 on one FPGA 12 sends data to one or more butterfly modules 22 on another FPGA 12. The FPGA interface 20 couples one or more butterfly modules 22 within the FPGA 12 to an inter-FPGA bus 48. The FPGA interface 20 may buffer the data and add packet data, serialize the data, or otherwise prepare the data for transmission on the inter-FPGA bus 48.
The FPGA interface 20 may include buffers in the form of flip-flops, latches, registers, SRAM, DRAM, and the like, as well as shift registers or SERDES components. The FPGA interface 20 may be a built-in functional FPGA block or may be formed from one or more code segments of an HDL or one or more schematic drawings, and may be programmed into the FPGA 12 as discussed above. The FPGA interface 20 may also be compatible with or include GTP components.
The inter-FPGA bus 48 generally carries data from one FPGA 12 to another FPGA 12 and is coupled with the FPGA interface 20 of each FPGA 12. The inter-FPGA bus 48 may be a single-channel serial line, wherein all the data is transmitted in serial fashion, a multi-channel (or multi-bit) parallel link, wherein different bits of the data stream are transmitted on different channels, or a variation thereof, wherein the communications bus 34 may include multiple lanes of bi-directional data links. The inter-FPGA bus 48 may be compatible with GTP components included in the FPGA interface 20. The inter-FPGA bus may also be implemented as disclosed in U.S. Patent Application No. 2005/0256969, filed May 11, 2004, which is hereby incorporated by reference in its entirety.
The inter-FPGA bus 48 may be implemented on a PCB and may utilize various electrically-conductive elements, such as copper traces. The inter-FPGA may also include optical media, such as optical backplanes or optical waveguides.
The butterfly module 22 generally computes at least a portion of the N-point FFT, wherein that portion may correspond to the calculations performed at one branch node 30 of the tree architecture 24. The butterfly modules 22 as a group receive the output of the initial calculations modules 16 and generally perform the calculations associated with the branch nodes 30 and the root node 32. The butterfly module 22 may operate alone or in parallel with other butterfly modules 22 to perform the calculations of a branch node 30, as seen in
The butterfly module 22 may include specialized functional blocks, combinational logic gates (e.g., AND, OR, NOT), adders, multipliers, MACs, ALUs, lookup tables, and the like. The butterfly module 22 may also include buffers in the form of flip-flops, latches, registers, SRAM, DRAM, and the like. The butterfly module 22 may be formed from one or more code segments of an HDL or one or more schematic drawings, and may be programmed into the FPGA 12 as discussed above.
The tree architecture 24 generally determines the nature and the order of the calculations to compute the FFT, and includes a plurality of leaf nodes 28, a plurality of branch nodes 30, and a single root node 32, as seen in
The tree architecture 24 may be distributed among a plurality of FPGAs 12. Some node calculations for the tree architecture 24 may be performed in one FPGA 12, while other node calculations may be performed in a different FPGA 12, and some of the larger node calculations may be divided among one or more FPGAs 12. Generally, however, calculations for nodes 26 that have connectivity and are clustered together on the tree architecture 24 are performed in the same FPGA 12. An exemplary distribution of the tree architecture 24 for a 32-point FFT is shown in
The calculations performed at the leaf nodes 28 are generally performed by the initial calculations module 16. The calculations for a leaf node 28 of N=2 may be performed by the initial calculations module 16 configured with two N=2 processors 38. Generally, the leaf nodes 28 of N=2 occur in pairs, such that one N=2 processor 38 can handle the first N=2 calculation, while the other N=2 processor 38 handles the second N=2 calculation. The calculations for a leaf node 28 of N=4 may be performed by a single N=4 processor 40, as opposed to decomposing the N=4 calculation to include an N=2 node 26. Thus, as depicted in the implementation of
The implementation of the system 10 as shown in
At least some of the steps that are performed in a method for calculating an FFT in accordance with various embodiments of the current invention are shown in the flow diagram 600 of
In connection with step 602 of
In connection with step 604 of
Step 604 may also include the substeps of creating the FPGA 12 structure and programming the FPGAs 12. The FPGA 12 structure may be created by generating one or more code segments in an HDL that describe the behavior or the architecture of the system 10, which is then synthesized and/or compiled into FPGA-ready code. The FPGA 12 structure may also be created by inputting one or more schematics that display the circuitry or components necessary to perform the calculations into a schematic-capture or similar EDA tool that produces FPGA-ready code. The FPGA 12 may be programmed with the FPGA-ready code by using standard FPGA-programming equipment.
In connection with step 606 of
The invention is disclosed primarily to be utilized in computing the fast Fourier transform. However, the system may be used to perform other calculations that are implemented using a tree-based architecture and include distributed processing elements, such as the inverse fast Fourier transform, which generally transforms frequency-domain data points into a time-domain data set.
Although the invention has been described with reference to the embodiments illustrated in the attached drawing figures, it is noted that equivalents may be employed and substitutions made herein without departing from the scope of the invention as recited in the claims.
Having thus described various embodiments of the invention, what is claimed as new and desired to be protected by Letters Patent includes the following: