The present disclosure relates to channelizers, and more particularly to dynamically reconfigurable, two times (2×) oversampled channelizers.
Many signal processing applications, including communications, radar systems, and electronic warfare applications, require that an input signal be channelized or separated into frequency bins for subsequent analysis and/or manipulation. As the frequency range for signals of interest increases, the computational complexity requirements for channelization also increase. As such, existing channelizers may become unsuitable for some applications where size, weight, and power are constrained.
Although the following Detailed Description will proceed with reference being made to illustrative embodiments, many alternatives, modifications, and variations thereof will be apparent in light of this disclosure.
Techniques are provided herein for dynamically reconfigurable two times (2×) oversampled channelizers with increased efficiency. As noted previously, many signal processing applications, including communications, radar systems, and electronic warfare applications, require that an input signal be channelized or separated into frequency bins for subsequent analysis and/or manipulation. Different signals may require different degrees of channelization (e.g., number of frequency bins or frequency resolution). A dynamically reconfigurable channelizer architecture allows for in-field modification of the number of frequency bins and their frequency response. The dynamically reconfigurable channelizer is suitable for field programmable gate arrays (FPGAs) and complex programmable logic devices (CPLDs) as well as application specific integrated circuits (ASICs) where the architecture cannot be modified once deployed. Additionally, in some applications the signal environment can be rapidly changing, creating a need for dynamic reconfiguration of the channelization parameters (e.g., frequency bin shape and size). Dynamically reconfigurable channelizers accommodate various channelization requirements without requiring multiple instantiations of customized channelizers. Dynamically reconfigurable channelizers are therefore particularly suitable in applications that require such flexibility, while being constrained in size and power consumption, such as airborne or spaceborne platforms, or smartphones and tablets. Unfortunately, existing channelizer architectures are associated with a number of layout-based and performance-based inefficiencies, such as use of duplicative circuitry rather than shared resources, over-reliance on relatively large registers, and underutilized mathematical operators (e.g., complex multiplies) and memories, to name a few examples.
To this end, and in accordance with an embodiment of the present disclosure, a dynamically reconfigurable 2X oversampled channelizer is disclosed which provides improved efficiency through the use of pipelined stages and other techniques, as described below. Additionally, the techniques lend themselves particularly well to implementation in an application specific integrated circuit (ASIC).
The disclosed channelizer can be used, for instance, with receivers in a wide variety of applications including, for example, radar systems and communication systems that can be deployed on aircraft (manned and unmanned), guided munitions and projectiles, space-based systems, electronic warfare systems, and other communication systems including cellular telephones, and smartphones, although other applications will be apparent. In a more general sense, the disclosed techniques are useful for any systems in which RF signals of interest are received, digitized, and channelized, in an environment or application where channelization parameters need to be dynamically reconfigured (e.g., updated in real time while the system is operating). In accordance with an embodiment, the reconfigurable channelizer includes a polyphase filter, a two phase reorder circuit, an FFT (or IFFT) circuit, and a two phase merge circuit. The polyphase filter is configured to filter time domain input data to control spectral shaping of frequency bins of the channelizer output. The two phase reorder circuit is configured to split the filtered data into first and second phases to be provided to two pipelined channels of the FFT circuit. The FFT circuit is configured to transform the first and second phase channels to first and second phase frequency domain data. The two phase merge circuit is configured to merge the first and second phase frequency domain data for distribution into the output frequency bins of the 2× oversampled channelizer. Reconfigurable parameters for the channelizer include the filter coefficients, the number of filter folds or taps, and the number of frequency bins, according to an example.
It will be appreciated that the techniques described herein may provide improved channelization capabilities, compared to channelizers that operate with fixed or otherwise pre-determined (non-configurable) parameters, or other types of circuits that are not as efficient as ASICs. For instance, while the channelizer techniques provided herein may be implemented in an FPGA according to some example embodiments, an ASIC implementation is not constrained by the existing routing architecture of an FPGA. Numerous embodiments and applications will be apparent in light of this disclosure.
System Architecture
The antenna 110 is configured to receive one or more RF signals. The RF front end 120 is configured to pre-process or condition the received RF signal. In some embodiments, the preprocessing may include one or more of automatic gain control, low noise amplification, additional amplification, pre-filtering to relatively wide frequency bands of interest, or any other suitable operations, depending on the needs of the downstream applications 155. The ADC 130 is configured to convert the analog pre-processed RF signal to a digital signal as input data 135 to the reconfigurable channelizer 140.
The reconfigurable channelizer 140 is configured to convert the input data 135 into frequency domain output data 145 that is channelized or separated into a selected number of frequency bins. The number of channels or frequency bins, as well as the coefficients for the filters that generate those channels, are dynamically programmable, which is to say that they can be modified as the system is running. For example, any number of downstream applications 155a-155n that consume the output data 145, can configure the channelizer 140 to meet the needs of that application. For example, frequency bands of interest may vary over time and the channelizer can be reprogrammed in a relatively rapid manner to adapt to those changes. Applications may include, for example, communications systems, radar systems, or any system in which RF signals are received or processed. In some embodiments, the channelizer 140 may be configured or implemented within an ASIC. In some embodiments, the ADC 130 and the channelizer 140 may be implemented within a combination of ASICs.
The reconfigurable channelizer operates at 2× oversampling to allow for full frequency band coverage (although band coverage can be limited as desired by choice of programmable filter coefficients). In a 2× oversampled channelizer, the input data is sampled at a rate of Fs samples per second and the output data rate (per bin) is 2Fs/N, where Nis the number of bins. Furthermore, in a 2× oversampled channelizer, the bin spacing is Fs/N, which allows for alias free bin overlap. In some embodiments, the input sampling rate Fs is 8000 MHz where 16 samples per clock cycle are provided per 500 MHz clock cycle.
The programmable polyphase filter coefficients (hk) 160a are loaded into programmable coefficient storage 420. The coefficients are selected, for example using a filter design tool, to provide the desired or required shape for the channelizer bins, which may depend on the application. In some embodiments, the coefficients are 18 bits.
Input data (xi) 135, enters the channelizer in blocks of 16 samples per clock cycle. The input data is buffered (as described below in connection with
In this embodiment, the input buffer 500 is configured to buffer 16 samples of input data 135 provided on a first clock cycle and forward that data, along with a subsequent 16 samples of input data 135 provided on a second clock cycle, to the input data distributor 510. In this embodiment, each data sample is 16 bits of complex data (16 bits for I and 16 bits for Q) or 32 bits total, and is provided on a lane, as previously described. As such, the input data distributer 510 is fed with 32 samples of input data 135, on 32 32-bit lanes, on every other clock cycle. The input data distributor 510 is configured to distribute the input data to the data storage 410.
In this embodiment, data storage 410 is configured as 8 (F+1) banks of dual port ram (DPR) to store input data 135. The additional DPR bank (i.e., the bank in excess of F) is used to avoid read/write contention issues. Coefficient storage 420 is configured as 7 (F) banks of DPR to store the programmable coefficients. Each of the DPR banks are sized to support a 1024 channel polyphase filter bank (e.g., the largest example described herein). So, for a 64 channel configuration, each DPR bank comprises 32 2-deep DPRs. For a 128 channel configuration, each DPR bank comprises 32 4-deep DPRs, and so on up to a 1024 channel configuration which comprises 32 32-deep DPRs.
The input data distributor MUX 510 writes input data to one data storage DPR bank at a time (e.g., 32 data sample are written into 32 DPRs at a time). So, for an N bin channelizer, DPR bank 0 is loaded with N data samples, then DPR bank 1 is loaded with the next N data samples, and so on. After DPR bank F is loaded, the system cycles back to DPR bank 0.
Additionally, the polyphase filter circuit is configured to reverse the order of the filter output relative to the filter input. As can be seen, in
The F DPR banks are read on every clock cycle and their contents are fed through the data alignment crossbar 520 to the pointwise multiply and row add circuit 530. The reading and writing of DPR banks are organized such that simultaneous reading and writing of the same DPR location does not occur when that location is an output of the data alignment crossbar 520, thus avoiding read-write contention issues. The crossbar circuit 520 is a switch matrix that can selectively couple any of a number of input ports to any of a number of output ports.
Data alignment crossbar 520 comprises a bank of 32 8×7 crossbars that are configured to align the input data samples with the coefficients. The alignment is performed because the DPR bank number associated with the newest data samples changes as the process proceeds. In some embodiments, the data alignment crossbar 520 may be configured to align the coefficients to the data, rather than the data to the coefficients, as this may reduce the area of the crossbar component on the ASIC, since the data width may be greater than the coefficient width.
Pointwise multiply and row add circuit 530 is configured to perform the pointwise multiply between the input data and the stored programmable coefficients, followed by a summation across rows, as previously described. In this embodiment, on each clock cycle, F*32 real-times-complex multiples are performed along with 32 complex F-input additions. The pointwise multiply and row add circuit 530 is shown as a single circuit but can readily be implemented as two distinct circuits (e.g., multiply circuit 530a and adder circuit 530b). More generally, circuit 530 can be thought of as including a multiply circuit and an adder circuit. In some embodiments, the coefficients are scaled such that the maximum output of any polyphase partition stage is close to, and strictly less than 1 (in the sum of absolute value sentence). Typically, this means that the coefficients for an N bin channelizer are scaled up by N/2 so that the DC gain is N/2.
In some embodiments, data storage 410 comprises 32 times (F+1) or 256 32-wide×32-deep DPRs, and coefficient storage 420 comprises 32 times F or 224 18-wide×32-deep DPRs. In some embodiments, the pointwise multiply and row add circuit 530 comprises 448 real multipliers (64 real multipliers per fold) and 384 real adders (64 7-input adder trees, one adder input per fold).
As previously described, the polyphase filter circuit 200 generates frames of N sample data in a single output stream 210, at 32 samples per clock cycle (on 32 lanes, one sample per lane). The two phase reorder circuit 220 is configured to separate the 32 sample wide data stream 210 into two 16 sample wide streams (phase 0 230 and phase 1 240) to be provided to each of two inputs of the FFT. As will be described below, the FFT circuit 250 is implemented as a dual channel FFT capable of processing two streams of data at a time. In the case of a 2× oversampled channelizer, the second stream originates from input data that was delayed from that of the first stream (e.g., delayed by N/2 samples). The frames alternate such that every other frame is processed by the same FFT channel. For example, frames A, C, . . . are processed by the first FFT channel and frames B, D, . . . are processed by the second FFT channel.
On every clock cycle, 32 samples of data 210 are provided to the reorder circuit and two streams of 16 sample output data (phase 0 230 and phase 1 240) are generated by the reorder circuit. DPR bank 0 700 and DPR bank 1 710 are configured to buffer the input samples to the reorder circuit (the polyphase output data 210). Delay element z−k 720 is configured to delay the top stream by k clock cycles, where k=N/32. In some embodiments, delay element 720 may be implemented as an additional DPR. MUX 730 is configured to select every other frame (e.g., A, C, . . . ) for output as phase 0 230 and MUX 740 is configured to select every other alternate frame (e.g., B, D, . . . ) for output as phase 1 240.
In this embodiment, the FFT functionality is employed to implement an IFFT by swapping the input I and Q components, swapping the output I and Q components, and scaling or normalizing the output by a factor of N.
This two channel architecture merges the two FFT circuits to provide a pipelined, streaming implementation that improves efficiency by sharing of resources needed to process two FFT channels. For example, with regard to the butterfly stages (described below), instead of employing X complex multipliers running at a 50% duty cycle, only X/2 complex multipliers are employed, running at a 100% duty cycle. Additionally, read only memories (ROMs) that store the twiddle factors (also referred to as coefficients) may be shared between the two circuits. In some embodiments, miscellaneous control logic may also be shared.
The two channel, streaming, FFT circuit 250, used to implement an IFFT, is shown in
For each input channel, the UQ swap circuit 810 is configured to swap the 16 bit I and 16 bit Q components of the data samples.
For each input channel, the FFT sample reorder circuit 820 is configured to reorder the data samples, depending on the bin configuration N, so that the subsequent FFT stage associated with that bin configuration generates the correct FFT. The FFT sample reorder circuit 820 will be described in greater detail below in connection with
For each input channel, stage 0 through stage 4 circuits (840, 845, 850, 855, 860) are configured to perform a cascaded series of computations (e.g., in a pipelined fashion) from which an FFT of desired size (e.g., 64-point through 1024-point) can be obtained by tapping into the output of the appropriate stage. Each stage uses a butterfly circuit 1100, as will be described in greater detail below in connection with
For each input channel, normalization and UQ swap circuit 870 is configured to scale the output by a factor of 1/N and re-swap the 16 bit I and 16 bit Q components so that the result is an IFFT.
The FFT sample reorder circuit 820 is shown to include an input crossbar 900, DPR bank 910 comprising DPRs 0-15 910a-910d, an output crossbar 920, and a controller 930.
The controller 930 is configured to generate an input frame selection value (e.g., a routing command) 940 to control the input crossbar 900 to determine the routing of input lanes 905 to DPRs 910. The controller 930 is also configured to generate an input address (e.g., a write address) 950 to select an address (e.g., a location) in the DPR bank to receive the input sample (e.g., through a DPR write port). The controller 930 is also configured to generate an output address (e.g., a read address) 960 to select an address (e.g., a location) in the DPR bank, from which the sample is to be read (e.g., through a DPR read port). The controller 930 is also configured to generate an output DPR selection value (e.g., a routing command) 970 to control the output crossbar 920 to determine the routing of DPRs to output lanes 925.
For a 64 point FFT, no reordering is required. An example of a sample reorder process 1000 for a 128 point FFT is illustrated in
In some embodiments, the parameters which specify the sample reordering (input frame select 940, input address 950, output address 960, and output DPR select 970) are pre-determined and may be stored in a look up table LUT (e.g., a ROM). The parameters may be selected to manipulate the data flow through the DPRs to avoid the need to read a DPR more than once in the same clock cycle so that only one DPR read port and one DPR write port are required. The architecture shown in
The multiplexed butterfly circuit 1100 is used in the implementation of the 64-point FFT of the stage 0 circuit 840. According to an embodiment, the 64-point FFT is constructed using two 16-point FFTs, and two multiplexed butterfly circuits 1100: one configured for a 32-point FFT output, and one configured for a 64-point FFT output. The stage 1 circuit 845 includes a butterfly circuit 1100 configured for a 128-point FFT output. The stage 2 circuit 850 includes a butterfly circuit 1100 configured for 256-point FFT output. The stage 3 circuit 855 includes a butterfly circuit 1100 configured for a 512-point FFT output. The stage 4 circuit 860 includes a butterfly circuit 1100 configured for a 1024-point FFT output. The butterfly circuits, which are shared between the phase 0 channel 230 and the phase 1 channel 240, also include memory that is shared between the channels (e.g., to store the pre-computed twiddle factors for the FFT butterfly, as explained below). Each butterfly circuit used in stages 0 to 4, accepts an input frame index 340a, a phase 0 channel input 230, and a phase 1 channel input 240; and generate outputs that include a delayed version of the input frame index 340b, a phase 0 channel output 260, and a phase 1 channel output 270. A selection of the phase 0 and phase 1 channel outputs from among the stages 0 through 4, may be made based on the desired FFT size.
The multiplexed butterfly circuit 1100 is shown to include bit slicing circuit 1105, delay elements 1110 and 1140; MUXs 1120; and butterfly core circuit 1130. The following descriptions of the butterfly circuit 1100 (and the butterfly core 1130 of
As shown in the table, M=Output FFT Size/32, m=log2(M), and p=m−1, except for the case where Output FFT size is 32 for which the 32 point butterfly has fixed coefficients and no LUT is needed.
Bit slicing circuit 1105 is configured to slice or extract bit m from the input frame index 340. Bit m is used to control MUXs 1120, to select from either of the MUX input ports (labeled ‘0’ or ‘1’) based on the value of bit m. For example, m=0 corresponds to input port ‘0’ while m=1 corresponds to input port ‘1’.
Delay element 1110a is configured to delay the phase 1 input by M clock cycles before providing it to port 1 of the MUX 1120a and port 0 of MUX 1120b.
Delay element 1110b is configured to delay the output of MUX 1120a by M clock cycles before providing it to the first branch (e.g., top branch) of the butterfly core circuit 1130. The output of MUX 1120b is provided to the second branch (e.g., bottom branch) of the butterfly core circuit. The input frame index 340 is also provided to the butterfly circuit.
The butterfly core circuit 1130 is configured to compute the sum and difference values from the top and bottom branches of the butterfly, as will be described in greater detail below in connection with
Delay element 1110c is configured to delay the difference (delta) output of the butterfly circuit by M clock cycles before providing it to port 0 of MUX 1120c and port 1 of MUX 1120d. The sum output of the butterfly circuit is provided to port 1 of MUX 1120c and port 0 of MUX 1120d.
Delay element 1140 is configured to delay the input frame index by 2M clock cycles as it is passed on to the next butterfly circuit. Delay element 1110d is configured to delay the output of MUX 1120c by M clock cycles to generate the phase 0 output 260 for this butterfly circuit stage. The output of MUX 1120d is provided as the phase 1 output 270 for this butterfly stage. Delay element 1110d provides alignment of the phase 0 output and the phase 1 output. In some embodiments, the delay provided by element 1110d of the current stage can be used to generate the delay provided by delay element 1110a of the subsequent stage.
Bit extraction circuit 1200 is configured to extract bits p down to 0 from the input frame index for use as an index into the twiddle memory 1210.
Twiddle memory 1210 is configured to store the precomputed twiddle factors (e.g., complex roots of unity) for that FFT stage, for example as a lookup table. In some embodiments, the twiddle memory 1210 is configured as a read only memory.
Multiplier 1220 is configured to multiply the input to the bottom branch of the butterfly with the retrieved twiddle factor to generate the scaled bottom branch input.
Summer 1230 is configured to compute the butterfly sum (e.g., the sum of the input to the top branch of the butterfly with the scaled bottom branch input). Summer 1240 is configured to generate the butterfly difference (e.g., the delta between the input to the top branch of the butterfly and the scaled bottom branch input).
Delay element z−k 1400 is configured to delay the second stream 270 by k clock cycles, where k=N/32. This delay is equivalent to N/2 input samples. Multiplier 1410 is configured to correct the time varying bin-dependent phase shift between the FFT output channels which results from delaying channel 1 by N/2 input samples with respect to channel 0. The correction can be achieved by multiplying every other lane in channel 1 by −1 (e.g., (−1)−k) before merging the streams, as can be seen from the following equations:
DPR bank 0 1420 and DPR bank 1 1430 are configured to buffer consecutive channels of 16 lanes of samples from the top stream and merge them into a first set of 32 lanes. DPR bank 2 1440 and DPR bank 3 1450 are configured to buffer consecutive channels of 16 lanes of samples from the bottom stream and merge them into a second set of 32 lanes. MUX 1460 is configured to alternately select the first set of 32 lanes (e.g., the outputs of DPR banks 0 and 1) with the second set of 32 lanes (e.g., the outputs of DPR banks 2 and 3) to form channelizer output data 145.
In some embodiments, each of the DPR banks are configured as 16 32-wide×32-deep DPRs.
In some embodiments, the components of the dynamically reconfigurable channelizer may be cascaded and the delays may be consolidated allowing for some of the delay elements to be eliminated thus reducing overall latency.
Some embodiments may be described using the expression “coupled” and “connected” along with their derivatives. These terms are not intended as synonyms for each other. For example, some embodiments may be described using the terms “connected” and/or “coupled” to indicate that two or more elements are in direct physical or electrical contact with each other. The term “coupled,” however, may also mean that two or more elements are not in direct contact with each other, but yet still cooperate or interact with each other.
Unless specifically stated otherwise, it may be appreciated that terms such as “processing,” “computing,” “calculating,” “determining,” or the like refer to the action and/or process of a computer or computing system, or similar electronic computing device, that manipulates and/or transforms data represented as physical quantities (for example, electronic) within the registers and/or memory units of the computer system into other data similarly represented as physical entities within the registers, memory units, or other such information storage transmission or displays of the computer system. The embodiments are not limited in this context.
The terms “circuit” or “circuitry,” as used in any embodiment herein, are functional structures that include hardware, or a combination of hardware and software, and may comprise, for example, singly or in any combination, hardwired circuitry, programmable circuitry such as computer processors comprising one or more individual instruction processing cores, state machine circuitry, and/or gate level logic. The circuitry may include a processor and/or controller programmed or otherwise configured to execute one or more instructions to perform one or more operations described herein. The instructions may be embodied as, for example, an application, software, firmware, etc. configured to cause the circuitry to perform any of the aforementioned operations. Software may be embodied as a software package, code, instructions, instruction sets and/or data recorded on a computer-readable storage device. Software may be embodied or implemented to include any number of processes, and processes, in turn, may be embodied or implemented to include any number of threads, etc., in a hierarchical fashion. Firmware may be embodied as code, instructions or instruction sets and/or data that are hard-coded (e.g., nonvolatile) in memory devices. The circuitry may, collectively or individually, be embodied as circuitry that forms part of a larger system, for example, an integrated circuit (IC), an application-specific integrated circuit (ASIC), a system-on-a-chip (SoC), communications system, radar system, desktop computers, laptop computers, tablet computers, servers, smartphones, etc. Other embodiments may be implemented as software executed by a programmable device. In any such hardware cases that include executable software, the terms “circuit” or “circuitry” are intended to include a combination of software and hardware such as a programmable control device or a processor capable of executing the software. As described herein, various embodiments may be implemented using hardware elements, software elements, or any combination thereof. Examples of hardware elements may include processors, microprocessors, circuits, circuit elements (e.g., transistors, resistors, capacitors, inductors, and so forth), integrated circuits, application specific integrated circuits (ASIC), programmable logic devices (PLD), digital signal processors (DSP), field programmable gate array (FPGA), logic gates, registers, semiconductor device, chips, microchips, chip sets, and so forth.
Numerous specific details have been set forth herein to provide a thorough understanding of the embodiments. It will be understood, however, that other embodiments may be practiced without these specific details, or otherwise with a different set of details. It will be further appreciated that the specific structural and functional details disclosed herein are representative of example embodiments and are not necessarily intended to limit the scope of the present disclosure. In addition, although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described herein. Rather, the specific features and acts described herein are disclosed as example forms of implementing the claims.
The following examples pertain to further embodiments, from which numerous permutations and configurations will be apparent.
One example embodiment of the present disclosure provides a fast Fourier transform (FFT) butterfly circuit comprising: a first multiplexer configured to select one of a first channel or a delayed version of a second channel, the selection based on a frame index associated with the first channel and/or the second channel; a second multiplexer configured to select the one of the first channel or the delayed version of the second channel that was not selected by the first multiplexer; and a butterfly core circuit configured to receive a delayed version of the selected channel from the first multiplexer as a top butterfly branch, receive the selected channel from the second multiplexer as a bottom butterfly branch, apply FFT twiddle factors to the bottom butterfly branch to generate a scaled bottom butterfly branch, and generate a sum channel output as the sum of the top butterfly branch and the scaled bottom butterfly branch, and a difference channel output as the difference between the top butterfly branch and the scaled bottom butterfly branch.
In some cases, the butterfly core circuit comprises a memory configured to store the FFT twiddle factors. In some such cases, the FFT twiddle factors are selected from the memory based on a size of an FFT for which the FFT butterfly circuit is employed. In some such cases, the butterfly core circuit comprises a bit extraction circuit configured to extract selected bits from the frame index for use as an address to the memory to select the FFT twiddle factors. In some cases, the FFT butterfly circuit further comprises a bit slice circuit configured to extract a selected bit from the frame index to control operation of the first multiplexer and the second multiplexer, the selected bit based on a size of an FFT for which the FFT butterfly circuit is employed. In some such cases, the FFT butterfly circuit further comprises a third multiplexer configured to select one of the sum channel output or a delayed version of the difference channel output as a first butterfly output channel, the selection based on the extracted selected bit and based on the frame index. In some such cases, the FFT butterfly circuit further comprises a fourth multiplexer configured to select the one of the sum channel output or the delayed version of the difference channel output that was not selected by the third multiplexer as a second butterfly output channel. In some such cases, the FFT butterfly circuit further comprises a delay circuit to delay the first butterfly output channel to align with the second butterfly output channel. In some cases, the FFT butterfly circuit is implemented in an application specific integrated circuit or a field programmable gate array.
Another example embodiment of the present disclosure provides a reconfigurable channelizer comprising: a multi-stage fast Fourier transform (FFT) circuit configured to transform time domain input data to output frequency domain data distributed into frequency bins, wherein a number of the frequency bins is dynamically programmable; and the multi-stage FFT circuit comprising a plurality of FFT butterfly circuits, each FFT butterfly circuit configured to compute an FFT butterfly for an associated stage of the multi-stage FFT circuit.
In some cases, the FFT butterfly circuit comprises: a first multiplexer configured to select one of a first channel of the time domain input data or a delayed version of a second channel of the time domain input data, the selection by the first multiplexer based on a frame index associated with the first channel and/or the second channel; a second multiplexer configured to select the one of the first channel or the delayed version of the second channel that was not selected by the first multiplexer; and a butterfly core circuit configured to receive a delayed version of the selected channel from the first multiplexer as a top butterfly branch, receive the selected channel from the second multiplexer as a bottom butterfly branch, apply FFT twiddle factors to the bottom butterfly branch to generate a scaled bottom butterfly branch, and generate a sum channel output as the sum of the top butterfly branch and the scaled bottom butterfly branch, and a difference channel output as the difference between the top butterfly branch and the scaled bottom butterfly branch. In some such cases, the butterfly core circuit comprises a memory configured to store the FFT twiddle factors and the FFT twiddle factors are selected from the memory based on an FFT size associated with the stage. In some such cases, the butterfly core circuit comprises a bit extraction circuit configured to extract selected bits from the frame index for use as an address to the memory to select the FFT twiddle factors. In some such cases, the FFT butterfly circuit further comprises a bit slice circuit configured to extract a selected bit from the frame index to control operation of the first multiplexer and the second multiplexer, the selected bit based on an FFT size associated with the stage. In some such cases, the FFT butterfly circuit further comprises a third multiplexer configured to select one of the sum channel output or a delayed version of the difference channel output as a first butterfly output channel associated with the stage, the selection by the third multiplexor based on the extracted selected bit and further based on the frame index. In some such cases, the FFT butterfly circuit further comprises a fourth multiplexer configured to select the one of the sum channel output or the delayed version of the difference channel output that was not selected by the third multiplexer as a second butterfly output channel associated with the stage. In some such cases, the FFT butterfly circuit further comprises a delay circuit to delay the first butterfly output channel to align with the second butterfly output channel. In some such cases, the delay circuit is further configured to generate the delayed version of the second channel for a next stage of the multi-stage FFT circuit.
Another example embodiment of the present disclosure provides a method for computing a fast Fourier transform (FFT) butterfly, the method comprising: selecting, by a first multiplexer, one of a first channel or a delayed version of a second channel, the selection based on a frame index associated with the first channel and/or the second channel; selecting, by a second multiplexer, the one of the first channel or the delayed version of the second channel that was not selected by the first multiplexer; routing a delayed version of the selected channel from the first multiplexer to a top butterfly branch of a butterfly core circuit; routing the selected channel from the second multiplexer to a bottom butterfly branch of the butterfly core circuit; selecting FFT twiddle factors, the selection based on a size of an FFT for which the FFT butterfly is computed; applying, by the butterfly core circuit, the FFT twiddle factors to the bottom butterfly branch to generate a scaled bottom butterfly branch; and generating, by the butterfly core circuit, a sum channel output as the sum of the top butterfly branch and the scaled bottom butterfly branch, and a difference channel output as the difference between the top butterfly branch and the scaled bottom butterfly branch. In some cases, the method further comprises selecting, by a third multiplexer, one of the sum channel output or a delayed version of the difference channel output as a first butterfly output channel, the selecting by the third multiplexer based on the frame index; and selecting, by a fourth multiplexer, the one of the sum channel output or the delayed version of the difference channel output that was not selected by the third multiplexer as a second butterfly output channel.
The terms and expressions which have been employed herein are used as terms of description and not of limitation, and there is no intention, in the use of such terms and expressions, of excluding any equivalents of the features shown and described (or portions thereof), and it is recognized that various modifications are possible within the scope of the claims. Accordingly, the claims are intended to cover all such equivalents. Various features, aspects, and embodiments have been described herein. The features, aspects, and embodiments are susceptible to combination with one another as well as to variation and modification, as will be appreciated in light of this disclosure. The present disclosure should, therefore, be considered to encompass such combinations, variations, and modifications. It is intended that the scope of the present disclosure be limited not by this detailed description, but rather by the claims appended hereto. Future filed applications claiming priority to this application may claim the disclosed subject matter in a different manner and may generally include any set of one or more elements as variously disclosed or otherwise demonstrated herein.