1. Field of the Invention
The present invention relates to techniques for performing fast Fourier transforms (FFT).
2. Background Art
The discrete Fourier transform (DFT) is a form of Fourier analysis. The DFT transforms a first function to a second function, which may be referred to as the “frequency domain representation” or the “DFT.” The DFT has many applications, including being used to enable spectral analysis and processing in audio and video applications. A fast Fourier transform (FFT) is an algorithm used to determine the DFT and the inverse of the discrete DFT. The FFT enables the DFT to be determined more quickly than other techniques. Many electronic devices that are used in audio and/or video applications include a processor and/or logic configured to perform the FFT algorithm.
For instance, ARM (Advanced RISC Machine) central processing units (CPUs) frequently used in electronic devices may be configured to perform the FFT algorithm. The ARM architecture is a 32-bit RISC processor architecture widely used in embedded designs. Because of their low power consumption, ARM CPUs are frequently used in mobile electronic devices, which are frequently battery powered.
A need exists for improved ways of performing the FFT algorithm in processors, such as in ARM CPUs. Conventionally, FFTs performed in processors that have limited resources are implemented according to a radix-2 technique, which has disadvantages. For example, performing an FFT in an ARM processor according to a radix-2 technique is relatively slow. A relatively large amount of time is required for computations, and a large amount of power is consumed as a result. Furthermore, the radix-2 technique does not take advantage of the ARM CPU architecture. Still further, performing an FFT in an ARM processor according to a radix-2 technique results typically results with output signals having relatively poor dynamic range.
Thus, what is desired are improved techniques for performing the FFT algorithm in processors, including in processors having limited resources and/or used in mobile devices.
Embodiments of the present invention provide a way of implementing fast Fourier transforms (FFTs) more efficiently. An FFT is enabled to be performed “in place” in a small set of registers. Performing an FFT in this manner may reduce or eliminate a number of memory accesses that are required by conventional techniques. Furthermore, the FFT may be cascaded to perform higher bit-level FFTs on larger sets of data points. The data points may be reordered between cascaded FFTs to enable further efficient use of memory.
In one implementation, a method for performing a FFT on a plurality of input data points is provided. The plurality of input data points includes a first input data point, a second input data point, a third input data point, and a fourth input data point. A real portion of the first input data point is stored in a first register and an imaginary portion of the first input data point is stored in a second register. A real portion of the second input data point is stored in a third register and an imaginary portion of the second input data point is stored in a fourth register. A real portion of the third input data point is stored in a fifth register and an imaginary portion of the third input data point is stored in a sixth register. A real portion of the fourth input data point is stored in a seventh register and an imaginary portion of the fourth input data point is stored in an eighth register. Operations are performed on the first-fourth input data points in place in the first-eight registers and in a ninth register to generate a first output data point, a second output data point, a third output data point, and a fourth output data point.
In another implementation, a system for performing an FFT is provided. The system includes an FFT module and a plurality of registers that includes a first register, a second register, a third register, a fourth register, a fifth register, a sixth register, a seventh register, an eighth register, and a ninth register. The FFT module is configured to store a real portion of the first input data point in the first register and an imaginary portion of the first input data point in the second register, a real portion of the second input data point in the third register and an imaginary portion of the second input data point in the fourth register, a real portion of the third input data point in the fifth register and an imaginary portion of the third input data point in the sixth register, and a real portion of the fourth input data point in the seventh register and an imaginary portion of the fourth input data point in the eighth register. The FFT module is configured to perform operations on the first-fourth input data points in place in the first-eight registers and in the ninth register to generate a first output data point, a second output data point, a third output data point, and a fourth output data point.
In still another implementation, a method for performing a radix-M FFT is provided. A first plurality of data points is received in a first order. The first plurality of data points is reordered into a second order. A radix-N FFT operation is performed on the first plurality of data points in groups of N data points received according to the second order to generate a second plurality of data points. A radix-N FFT operation is performed on the second plurality of data points in groups of N data points sequentially received to generate a third plurality of data points. The third plurality of data points is reordered into a third order. A radix-N FFT operation is performed on the third plurality of data points in groups of N data points received according to the third order to generate a fourth plurality of data points.
In still another implementation, a system for performing a radix-M FFT is provided. The system includes a first permutation module, a first FFT module, a second FFT module, a second permutation module, and a third FFT module. The first permutation module is configured to receive a first plurality of data points in a first order, and to reorder the first plurality of data points into a second order. The first FFT module is configured to receive the first plurality of data points in the second order, and to perform a radix-N FFT operation on the first plurality of data points in groups of N data points received according to the second order to generate a second plurality of data points. The second FFT module is configured to receive the second plurality of data points, and to perform a radix-N FFT operation on the second plurality of data points in groups of N data points sequentially received to generate a third plurality of data points. The second permutation module is configured to receive the third plurality of data points, and to reorder the third plurality of data points into a third order. The third FFT module is configured to receive the third plurality of data points in the third order, and to perform a radix-N FFT operation on the third plurality of data points in groups of N data points received according to the third order to generate a fourth plurality of data points.
Still further, the system may include a scaling module configured to scale at least one of the second plurality of data points, third plurality of data points, and fourth plurality of data points according a corresponding set of twiddle factors.
These and other objects, advantages and features will become readily apparent in view of the following detailed description of the invention. Note that the Summary and Abstract sections may set forth one or more, but not all exemplary embodiments of the present invention as contemplated by the inventor(s).
The accompanying drawings, which are incorporated herein and form a part of the specification, illustrate the present invention and, together with the description, further serve to explain the principles of the invention and to enable a person skilled in the pertinent art to make and use the invention.
The present invention will now be described with reference to the accompanying drawings. In the drawings, like reference numbers indicate identical or functionally similar elements. Additionally, the left-most digit(s) of a reference number identifies the drawing in which the reference number first appears.
The present specification discloses one or more embodiments that incorporate the features of the invention. The disclosed embodiment(s) merely exemplify the invention. The scope of the invention is not limited to the disclosed embodiment(s). The invention is defined by the claims appended hereto.
References in the specification to “one embodiment,” “an embodiment,” “an example embodiment,” etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to effect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.
Furthermore, it should be understood that spatial descriptions (e.g., “above,” “below,” “up,” “left,” “right,” “down,” “top,” “bottom,” “vertical,” “horizontal,” etc.) used herein are for purposes of illustration only, and that practical implementations of the structures described herein can be spatially arranged in any orientation or manner.
The example embodiments described herein are provided for illustrative purposes, and are not limiting. The examples described herein may be adapted to in various ways for implementation in many types of processors and/or processing logic, including ARM CPUs. Furthermore, additional structural and operational embodiments, including modifications/alterations, will become apparent to persons skilled in the relevant art(s) from the teachings herein.
Embodiments enable faster computations in ARM CPUs and use less power than conventional techniques. Further embodiments handle twiddle-factors, fixed-point shifting, and overflow protection in unique ways. Embodiments improve the dynamic range significantly compared to conventional implementation. In some applications, embodiments can replace a hardware accelerator used for FFT computations. Embodiments are applicable to a variety of applications, including audio applications.
For example,
Note that filter 102 and audio processor 104 may be implemented in hardware, software, firmware, or any combination thereof. For example, filter 102 and/or audio processor 104 may be implemented as one or more processors and/or as computer code configured to be executed in one or more processors, such as an ARM CPU. Alternatively, filter 102 and/or audio processor 104 may be implemented as hardware logic/electrical circuitry. Memory 108 may be a memory device such as a RAM device, a ROM device, etc., and/or any other suitable type of storage medium, such as a hard disc drive. Speaker 106 may be any type of speaker configured for broadcasting audio.
System 100 may be implemented in any type of electronic device that may be configured with audio processing functionality, including a desktop computer (e.g., a personal computer, etc.), a mobile computing device (e.g., a cell phone, smart phone, a personal digital assistant (PDA), a laptop computer, a notebook computer, etc.), a mobile email device (e.g., a RIM Blackberry® device), an audio device (e.g., an MP3 or other music file format player such as an Apple iPod) or other electronic device. Although described above as a system for processing audio, system 100 may be used to process other forms of data, including video data and/or other data.
Many processors, including ARM processors, are typically configured to perform a fast Fourier transform (FFT) operation in a radix-2 fashion, where two input data samples or points (e.g., received on filtered audio data signal 112) are processed in a radix-2 FFT butterfly configuration. The radix-2 FFT butterflies may be used in groups to process input audio data in larger groups of samples than two data samples. For example, two stages of radix-2 FFT butterflies may be cascaded to process four input data samples, four stages of radix-2 FFT butterflies may be cascaded to process sixteen input data samples, six stages of radix-2 FFT butterflies may be cascaded to process sixty-four input data samples, etc.
As shown in
Input FFT module 202, audio processing module 204, output FFT module 206, time domain estimation module 214, and time domain processing module 216 may be implemented in hardware, software, firmware, or any combination thereof. For example, input FFT module 202, audio processing module 204, output FFT module 206, time domain estimation module 214, and/or time domain processing module 216 may be implemented as computer code configured to be executed in one or more processors, such as an ARM CPU. Alternatively, input FFT module 202, audio processing module 204, output FFT module 206, time domain estimation module 214, and/or time domain processing module 216 may be implemented as hardware logic/electrical circuitry.
Registers 212 store audio data during processing performed by input FFT module 202 and output FFT module 206. Registers 212 may be accessed by FFT modules 202 and 206 faster than can memory 108, and thus are preferable to be used by FFT modules 202 and 206 to save computational time. Many processors, such as ARM processors, have limited resources. In an ARM processor implementation, registers 212 includes 16 registers. One of the registers is used for a program counter, and another of the registers is used for a stat pointer. Thus, at most 14 of the 16 registers of registers 212 are available for FFT processing by input and output FFT modules 202 and 206. Typically, further registers of the 14 registers may be required for further housekeeping procedures, and thus in some cases, only 9 or 10 of the 16 registers of registers 212 are available for usage during FFT processing. Because of the limited number of registers of registers 212 that are available for FFT processing, typically input and output FFT modules 202 and 206 are configured to perform radix-2 FFT operations to conserve registers.
Embodiments of the present invention enable use of a radix-4 FFT in an ARM processor. For example,
As shown in
In embodiments, one of the registers of registers 212 is used as an input and output buffer index. Embodiments may provide a significant savings in processor resources (e.g., 40% savings). In an embodiment, 17 processor instructions may be used to execute a radix-4 FFT butterfly operation of radix-4 FFT butterfly 300. In alternative embodiments, other numbers of instructions may be used.
As described above, groups of radix-4 FFT butterflies 300 may be used to perform FFT operations of any size. For example,
In an embodiment, input and/or output FFT 202 and 206 may include an index table 602, as shown in
Embodiments enable improved dynamic range. For example, as described above, conventional implementations of FFTs in ARM processors use radix-2 butterfly configurations. For a 16-data sample input signal, a radix-2 butterfly configuration requires four stages of radix-2 butterflies. In contrast, a 16-data sample input signal processed by radix-4 FFT butterflies 300 uses two stages of radix-4 butterflies (e.g., as shown in
Example embodiments for input and output FFT modules 202 and 206 are described in the next subsection, and example embodiments for optionally handling twiddle factors are described in the subsequent subsection.
Example embodiments are described in this subsection for FFT modules 202 and 206, and for performing FFT operations therewith. These example embodiments are provided for illustrative purposes, and are not limiting. Although described below with reference to audio signal processing, the examples described herein may be adapted to other types of signal processing. Furthermore, additional structural and operational embodiments, including modifications/alterations, will become apparent to persons skilled in the relevant art(s) from the teachings herein.
Embodiments of FFT modules 202 and 206 may operate in various ways. For example,
Flowchart 800 and system 900 are described as follows. Other structural and operational embodiments will be apparent to persons skilled in the relevant art(s) based on the discussion regarding flowchart 800 and system 900. For example, fewer than or greater numbers of radix-N FFT modules than the three as shown in
Referring to flowchart 800 in
In step 804, the first plurality of data points is reordered into a second order. For example, in an embodiment, first permutation module 902 shown in
Reordered first plurality of data points 914 may be stored in memory 108, in an embodiment.
For example, first plurality of data points 912 may include 64 data points that are ordered data point 0 through data point 63. In an embodiment, first permutation module 902 may be configured to reorder the 64 data points of first plurality of data points 912 as indicated in a table. For example,
As indicated in table 1000, data point 0 through data point 63 are reordered into the following sequential order of data point 0, data point 32, data point 16, data point 48, data point 8, data point 40, data point 24, data point 56, data point 4, data point 36, data point 20, data point 52, data point 12, data point 44, data point 28, data point 60, data point 2, data point 34, data point 18, data point 50, data point 10, data point 42, data point 26, data point 58, data point 6, data point 38, data point 22, data point 54, data point 14, data point 46, data point 30, data point 62, data point 1, data point 33, data point 17, data point 49, data point 9, data point 41, data point 25, data point 57, data point 5, data point 37, data point 21, data point 53, data point 13, data point 45, data point 29, data point 61, data point 3, data point 35, data point 19, data point 51, data point 11, data point 43, data point 27, data point 59, data point 7, data point 39, data point 23, data point 55, data point 15, data point 47, data point 31, and data point 63.
In step 806, a radix-N FFT operation is performed on the first plurality of data points in groups of N data points received according to the second order to generate a second plurality of data points. For example, in an embodiment, first radix-N FFT module 904 shown in
For example, N may be equal to 4, and thus radix-N FFT module 904 may be configured to perform a radix-4 FFT operation on reordered first plurality of data points 914. In such an embodiment, radix-N FFT module 904 may be configured to perform the FFT operation on groups of 4 data points received in reordered first plurality of data points 914, such as each of the 16 groups of data points shown in table 1000 in
First radix-N FFT module 904 may be configured to perform the radix-N FFT operation in various ways. Example embodiments for radix-N FFT operations that may be performed by radix-N FFT module 904 are described further below. Second plurality of data points 916 generated by first radix-N FFT module 904 may be stored in memory 108 (as indicated by a dotted line in
In step 808, a radix-N FFT operation is performed on the second plurality of data points in groups of N data points sequentially received to generate a third plurality of data points. For example, in an embodiment, second radix-N FFT module 906 shown in
For example, N may be equal to 4, and thus the second radix-N FFT module 906 may be configured to perform a radix-4 FFT operation on second plurality of data points 916. In such an embodiment, second radix-N FFT module 906 may be configured to perform the FFT operation on groups of 4 data points received in second plurality of data points 916. As shown in
Second radix-N FFT module 906 may be configured to perform the radix-N FFT operation in various ways. Example embodiments for radix-N FFT operations that may be performed by second radix-N FFT module 906 are described further below. Third plurality of data points 918 generated by second radix-N FFT module 906 may be stored in memory 108 (as indicated by a dotted line in
In step 810, the third plurality of data points is reordered into a third order. For example, in an embodiment, second permutation module 908 shown in
Reordered third plurality of data points 920 may be stored in memory 108, in an embodiment.
For example, third plurality of data points 918 may include 64 data points that are ordered data point 0 through data point 63. In an embodiment, third permutation module 908 may be configured to reorder the 64 data points of third plurality of data points 918 as indicated in a table similar to table 1000. For example,
As indicated in table 1100, data point 0 through data point 63 are reordered into the following sequential order of data point 0, data point 4, data point 8, data point 12, data point 16, data point 20, data point 24, data point 28, data point 32, data point 36, data point 40, data point 44, data point 48, data point 52, data point 56, data point 60, data point 1, data point 5, data point 9, data point 13, data point 17, data point 21, data point 25, data point 29, data point 33, data point 37, data point 41, data point 45, data point 49, data point 53, data point 57, data point 61, data point 2, data point 6, data point 10, data point 14, data point 18, data point 22, data point 26, data point 30, data point 34, data point 38, data point 42, data point 46, data point 50, data point 54, data point 58, data point 62, data point 3, data point 7, data point 11, data point 15, data point 19, data point 23, data point 27, data point 31, data point 35, data point 39, data point 43, data point 47, data point 51, data point 55, data point 59, and data point 63.
In step 812, a radix-N FFT operation is performed on the third plurality of data points in groups of N data points received according to the third order to generate a fourth plurality of data points. For example, in an embodiment, third radix-N FFT module 910 shown in
For example, N may be equal to 4, and thus third radix-N FFT module 910 may be configured to perform a radix-4 FFT operation on reordered third plurality of data points 920. In such an embodiment, third radix-N FFT module 910 may be configured to perform the FFT operation on groups of four data points received in reordered third plurality of data points 920, such as each of the 16 groups of data points shown in table 1100 in
Third radix-N FFT module 910 may be configured to perform the radix-N FFT operation in various ways. Example embodiments for radix-N FFT operations that may be performed by third radix-N FFT module 910 are described further below.
Fourth plurality of data points 922 may be stored in memory 108 by third radix-N FFT module 910, in an embodiment. Fourth plurality of data points 922 may be a plurality of data points in frequency domain audio data 208 (output by input FFT module 202) or frequency domain processed time domain audio data signal 220 (output by output FFT module 206) in
As described above, first-third radix-N FFT modules 904, 906, and 910 may be configured to perform a radix-N FFT operation in various ways. For example,
For illustrative purposes, flowchart 1200 is described below in the context of a radix-4 FFT embodiment. For instance,
In the following example, radix-4 FFT module 1302 is described as performing an FFT on a group of four input data points: a first input data point, a second input data point, a third input data point, and a fourth input data point. In embodiments, the four data points may be a group of four data points of reordered first plurality of data points 914 received by first radix-N FFT module 904 (e.g., one of the first-sixteenth groups of data points in table 1100 of
In step 1202, a real portion of the first input data point is stored in a first register and an imaginary portion of the first input data point is stored in a second register. For instance, referring to
In step 1204, a real portion of the second input data point is stored in a third register and an imaginary portion of the second input data point is stored in a fourth register. For instance, radix-4 FFT module 1302 may be configured to store a real portion of the second input data point in third register 1304c and an imaginary portion of the second input data point in fourth register 1304d.
In step 1206, a real portion of the third input data point is stored in a fifth register and an imaginary portion of the third input data point is stored in a sixth register. For instance, radix-4 FFT module 1302 may be configured to store a real portion of the third input data point in fifth register 1304e and an imaginary portion of the third input data point in sixth register 1340f.
In step 1208, a real portion of the fourth input data point is stored in a seventh register and an imaginary portion of the fourth input data point is stored in an eighth register. For instance, radix-4 FFT module 1302 may be configured to store a real portion of the fourth input data point in seventh register 1304g and an imaginary portion of the fourth input data point in eighth register 1304h.
In step 1210, operations are performed on the first-fourth input data points in place in the first-eight registers and in a ninth register to generate a first output data point, a second output data point, a third output data point, and a fourth output data point. For instance, in an embodiment, radix-4 FFT module 1302 may be configured to perform operations on the first-fourth input data points in place in first-eight registers 1304a-1304h and in ninth register 1304i (e.g., which may function as a “dummy” or temporary data register) to generate a first output data point, a second output data point, a third output data point, and a fourth output data point. Radix-4 FFT module 1302 may output the first-fourth output data points on a first-fourth output data point signal 1308, which may be stored in memory 108 (
In embodiments, a radix-4 FFT algorithm may be performed in step 1210 by radix-4 FFT module 1302 that uses first-ninth registers 1304a-1304i. By performing the radix-4 FFT algorithm in place in first-ninth registers 1304a-1304i, rather than having to perform the radix-4 FFT algorithm by having to repeatedly access memory 108 to copy data points into registers 212 and/or to store computational results in memory 108, a number of memory I/O operations is greatly reduced or even completely eliminated, saving time and processing resources. In this manner, in embodiments, an efficient radix-4 FFT algorithm may be performed in place in registers 212, in contrast to conventional techniques which either use multiple stages of less efficient radix-2 FFTs or perform less efficient radix-4 FFTs that require many time consuming accesses to memory 108.
In embodiments, the radix-4 FFT algorithm may be performed in place in first-ninth registers 1304a-1304i using a relatively low number of instructions. For instance,
Referring to
In step 1404, a results of a subtraction of a contents of the third register from a contents of the first register is stored in the third register. For instance, radix-4 FFT module 1302 may be configured to subtract a contents of third register 1304c (which is initially the real portion of the second data point) from a contents of first register 1304a (which is the first sum) to generate a first subtraction results, and to store the first subtraction results in third register 1304c.
In step 1406, a sum of a contents of the second register and a contents of the fourth register is stored in the second register. For instance, radix-4 FFT module 1302 may be configured to sum a contents of second register 1304b (which is initially the imaginary portion of the first data point) and a contents of fourth register 1304d (which is initially the imaginary portion of the second data point) to generate a second sum, and to store the second sum in second register 1304b.
In step 1408, a results of a subtraction of a contents of the fourth register from a contents of the second register is stored in the fourth register. For instance, radix-4 FFT module 1302 may be configured to subtract a contents of fourth register 1304d (which is initially the imaginary portion of the second data point) from a contents of second register 1304b (which is the second sum) to generate a second subtraction results, and to store the second subtraction results in fourth register 1304d.
In step 1410, a sum of a contents of the fifth register and a contents of the seventh register is stored in the fifth register. For instance, radix-4 FFT module 1302 may be configured to sum a contents of fifth register 1304e (which is initially the real portion of the third data point) and a contents of seventh register 1304g (which is initially the real portion of the fourth data point) to generate a third sum, and to store the third sum in fifth register 1304e.
In step 1412, a results of a subtraction of a contents of the seventh register from a contents of the fifth register is stored in the seventh register. For instance, radix-4 FFT module 1302 may be configured to subtract a contents of seventh register 1304g (which is initially the real portion of the fourth data point) from a contents of fifth register 1304e (which is the third sum) to generate a third subtraction results, and to store the third subtraction results in seventh register 1304g.
In step 1414, a sum of a contents of the sixth register and a contents of the eighth register is stored in the sixth register. For instance, radix-4 FFT module 1302 may be configured to sum a contents of sixth register 1304f (which is initially the imaginary portion of the third data point) and a contents of eighth register 1304h (which is initially the imaginary portion of the fourth data point) to generate a fourth sum, and to store the fourth sum in sixth register 1304f.
In step 1416, a results of a subtraction of a contents of the eighth register from a contents of the sixth register is stored in the eighth register. For instance, radix-4 FFT module 1302 may be configured to subtract a contents of eighth register 1304h (which is initially the imaginary portion of the fourth data point) from a contents of sixth register (which is the fourth sum) to generate a fourth subtraction results, and to store the fourth subtraction results in eighth register 1304h.
Referring to
In step 1420, a results of a subtraction of a contents of the fifth register from a contents of the first register is stored in the fifth register. For instance, radix-4 FFT module 1302 may be configured to subtract a contents of fifth register 1304e (which is the third sum) from a contents of first register 1304a (which is the fifth sum) to generate a fifth subtraction results, and to store the fifth subtraction results in fifth register 1304e.
In step 1422, a sum of a contents of the second register and a contents of the sixth register is stored in the second register. For instance, radix-4 FFT module 1302 may be configured to sum a contents of second register 1304b (which is the second sum) and a contents of sixth register 1304f (which is the fourth sum) to generate a sixth sum, and to store the sixth sum in second register 1304b.
In step 1424, a results of a subtraction of a contents of the sixth register from a contents of the second register is stored in the sixth register. For instance, radix-4 FFT module 1302 may be configured to subtract a contents of sixth register 1304f (which is the fourth sum) from a contents of second register 1304b (which is the sixth sum) to generate a sixth subtraction results, and to store the sixth subtraction results in sixth register 1304f.
In step 1426, a results of a subtraction of a contents of the eighth register from a contents of the third register is stored in the ninth register. For instance, radix-4 FFT module 1302 may be configured to subtract a contents of eighth register 1304h (which is the fourth subtraction results) from a contents of third register 1304c (which is the first subtraction results) to generate a seventh subtraction results, and to store the seventh subtraction results in ninth register 1304i.
In step 1428, a sum of a contents of the third register and a contents of the eighth register is stored in the third register. For instance, radix-4 FFT module 1302 may be configured to sum a contents of third register 1304c (which is the first subtraction results) and a contents of eighth register 1304h (which is the fourth subtraction results) to generate a seventh sum, and to store the seventh sum in third register 1304c.
In step 1430, a sum of a contents of the fourth register and a contents of the seventh register is stored in the eighth register. For instance, radix-4 FFT module 1302 may be configured to sum a contents of fourth register 1304d (which is the second subtraction results) and a contents of seventh register 1304g (which is the third subtraction results) to generate an eighth sum, and to store the eighth sum in eighth register 1304h.
In step 1432, a results of a subtraction of a contents of the seventh register from a contents of the fourth register is stored in the fourth register. For instance, radix-4 FFT module 1302 may be configured to subtract a contents of seventh register 1304g (which is the third subtraction results) from a contents of fourth register 1304d (which is the second subtraction results) to generate an eighth subtraction results, and to store the eighth subtraction results in fourth register 1304d.
In step 1434, the contents of the ninth register is stored in the seventh register. For instance, radix-4 FFT module 1302 (or other mechanism) may be configured to store the contents of ninth register 1304i (which is the seventh subtraction results) in seventh register 1304g.
As a result of the in-place FFT operation of flowchart 1400, a real portion of the first output data point is stored in first register 1304a, an imaginary portion of the first output data point is stored in second register 1304b, a real portion of the second output data point is stored in third register 1304c, an imaginary portion of the second output data point is stored in fourth register 1304d, a real portion of the third output data point is stored in fifth register 1304e, an imaginary portion of the third output data point is stored in sixth register 1304f, a real portion of the fourth output data point is stored in seventh register 1304g, and an imaginary portion of the fourth output data point is stored in eighth register 1304h.
Note that first permutation module 902, first radix-N FFT module 904, second radix-N FFT module 906, second permutation module 908, and third radix-N FFT module 910 may be implemented in hardware, software, firmware, or any combination thereof.
For example, first permutation module 902, first radix-N FFT module 904, second radix-N FFT module 906, second permutation module 908, and/or third radix-N FFT module 910 may be implemented as one or more processors and/or as computer code configured to be executed in one or more processors, such as an ARM CPU. Alternatively, first permutation module 902, first radix-N FFT module 904, second radix-N FFT module 906, second permutation module 908, and/or third radix-N FFT module 910 may be implemented as hardware logic/electrical circuitry. Tables 1000 and 1100 may be stored in memory 108 or other storage device in any suitable form, such as in the form of a table, a data array, a database, etc.
Example embodiments are described in this section for the optional handling of twiddle factors for FFT modules 202 and 206. Twiddle factors are scaling factors that are used in an FFT to improve the dynamic range of signals, such as audio signals.
Embodiments described herein may include scaling with twiddle factors to improve dynamic range to a degree as desired, including scaling with twiddle factors configured to obtain a maximum possible dynamic range.
For example,
As shown in
For instance, in an embodiment, TFSM 1502 may be configured to scale the 64 data points of first plurality of data points 912 with the twiddle factors indicated in a table. For example,
For example, a first row of table 1600 lists a first twiddle factor pair and a second twiddle factor pair. The first twiddle factor pair includes a real twiddle factor of 40000000 (hex) and an imaginary twiddle factor of 00000000 (hex). The second twiddle factor pair includes a real twiddle factor of 3FB11B47 (hex) and an imaginary twiddle factor of F9BA1651 (hex). TFSM 1502 may be configured to scale a first data point received in first plurality of data points 912 using the first twiddle factor pair, and each subsequent data point according to the corresponding twiddle factor pair. For instance, in an embodiment, TFSM 1502 may be configured to multiply the real portion of the first data point by the real twiddle factor of 40000000 (hex), and to multiply the imaginary portion of the first data point by the imaginary twiddle factor of 00000000 (hex) to determine the corresponding scaled real and imaginary portions of the first data point. In another embodiment, TFSM 1502 may be configured to scale each of the real and imaginary portions of the first data point using both of the real and imaginary twiddle factors of the first twiddle factor pair. For example, TFSM 1502 may calculate the scaled real portion of the first data point according to Equation 1 shown as follows:
ReDPnew=ReDPold×ReTF−ImDPold×ImTF Equation 1
where:
ReDPnew=the scaled real portion of the data point,
ReDPold=the real portion of the data point (prior to scaling),
ReTF=the real twiddle factor of the twiddle factor pair,
ImDPold=the imaginary portion of the received data point (prior to scaling), and
ImTF=the imaginary twiddle factor of the twiddle factor pair.
In a similar manner, TFSM 1502 may calculate the scaled imaginary portion of the first data point according to Equation 2 shown as follows:
ImDPnew=ReDPold×ImTF+ImDPold×ReTF Equation 2
where:
ImDPnew=the scaled imaginary portion of the data point.
TFSM 1502 may be configured to calculate scaled real and imaginary portions of each received data point according to Equations 1 and 2, or according to other algorithms.
In the current example of table 1600, each twiddle factor is shown as a 32 bit value. In other embodiments, twiddle factors may have other bits value lengths, including being 16 bit values. Embodiments of the present invention enable 32-bit twiddle factors to be used, as opposed to conventional techniques which use 16 bit twiddle factors. For example, registers 1304a-1304p of registers 212 shown in
By being able to use 32-bit twiddle factors, 16 additional twiddle factor bits are available, which enable much more accurate calculations to be performed, which thereby enable the preservation of dynamic range. For instance, in an embodiment, system 1500 may be configured as a 64 point FFT algorithm using 3 radix-4 FFT modules for a voice and/or streaming audio application. In such an application, the incoming data to system 1500 typically may be 16-bit linear PCM data. Embodiments enable high quality audio with large dynamic range to be achieved, because 32 bit twiddle factors may be applied to the 16 bit (or other bit length) data.
Second and third TFSMs 1504 and 1506 shown in
As shown in
Note that first-third TFSMs 1502, 1504, and 1506 may be implemented in hardware, software, firmware, or any combination thereof. For example, first-third TFSMs 1502, 1504, and/or 1506 may be implemented as one or more processors and/or computer code configured to be executed in one or more processors, such as an ARM CPU. Alternatively, first-third TFSMs 1502, 1504, and 1506 may be implemented as hardware logic/electrical circuitry.
As described above, audio processor 104 (e.g., shown in
Devices in which embodiments may be implemented may include storage, such as storage drives, memory devices, and further types of computer-readable media. Examples of such computer-readable media include a hard disk, a removable magnetic disk, a removable optical disk, flash memory cards, digital video disks, random access memories (RAMs), read only memories (ROM), and the like. As used herein, the terms “computer program medium” and “computer-readable medium” are used to generally refer to the hard disk associated with a hard disk drive, a removable magnetic disk, a removable optical disk (e.g., CDROMs, DVDs, etc.), zip disks, tapes, magnetic storage devices, MEMS (micro-electromechanical systems) storage, nanotechnology-based storage devices, as well as other media such as flash memory cards, digital video discs, RAM devices, ROM devices, and the like. Such computer-readable media may store program modules that include logic for implementing audio processor 104, system 900, system 1500, input FFT module 202, audio processing module 204, output FFT module 206, time domain estimation module 214, and time domain processing module 216 (
While various embodiments of the present invention have been described above, it should be understood that they have been presented by way of example only, and not limitation. It will be apparent to persons skilled in the relevant art that various changes in form and detail can be made therein without departing from the spirit and scope of the invention. Thus, the breadth and scope of the present invention should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.
This application claims the benefit of U.S. Provisional Application No. 61/018,200, filed on Dec. 31, 2007, which is incorporated by reference herein in its entirety.
| Number | Date | Country | |
|---|---|---|---|
| 61018200 | Dec 2007 | US |