The present invention relates generally to signal processing. A system and method for analyzing a signal into frequency components is disclosed.
A useful step in analyzing a signal is the separation of the signal into frequency components. For some time, the fast Fourier transform or FFT algorithm has been used to analyze a time domain signal into its frequency components. For various types of processing, and in particular for processing audio signals, it would be desirable to analyze a signal into its frequency components with improved temporal resolution at high frequencies and better spectral resolution at low frequencies. Numerous techniques have been proposed for accomplishing this. Included among such techniques are systems that use a set of filters to separate the signal being analyzed into different channels or frequency components. Such filter sets operate roughly in a manner that is analogous to a biological cochlea, which includes a series of filtered output signals that correspond to different frequency channels.
Filter sets may be implemented with analog or digital filters. Previous instantiations of filter sets have been limited by practical considerations in designing filters. For example, high order bandpass filters to separate each channel output are expensive to implement. Various approaches have been implemented using combinations of high pass and low pass filters; however, more efficient techniques are needed to allow real time processing of signals for various important applications including speech recognition, source separation of audio signals and stream separation of audio signals.
The present invention will be readily understood by the following detailed description in conjunction with the accompanying drawings, wherein like reference numerals designate like structural elements, and in which:
A detailed description of a preferred embodiment of the invention is provided below. While the invention is described in conjunction with that preferred embodiment, it should be understood that the invention is not limited to any one embodiment. On the contrary, the scope of the invention is limited only by the appended claims and the invention encompasses numerous alternatives, modifications and equivalents. For the purpose of example, numerous specific details are set forth in the following description in order to provide a thorough understanding of the present invention. The present invention may be practiced according to the claims without some or all of these specific details. For the purpose of clarity, technical material that is known in the technical fields related to the invention has not been described in detail so that the present invention is not unnecessarily obscured.
A filter cascade for frequency analysis is disclosed that includes a number of features. In various embodiments, the features are implemented either separately or together. For example, in some embodiments, each frequency component is computed by subtracting the output of a low pass filter from the input to the filter. In this manner a bandpass signal is derived. In some embodiments, low pass filters are chained or cascaded with each filter output being fed to the next filter input in a filter set. The output of the last filter in the set is downsampled, with the filter set itself collectively acting as a high order antialiasing filter. The downsampled filter set output comprised of lower frequency components may then be more efficiently processed. Filters in the cascade may be designed so that the Q of the filters varies with frequency.
U.S. patent application Ser. No. 09/534,682 which was previously incorporated by reference (hereinafter, “the 682 application”) discloses a digital filter cascade for frequency analysis. The filters in the cascade are chained together and sets of filters are separated into octaves with downsampling between octaves. Filter parameters are shared among corresponding filters in different octaves. As described herein, advantages may be realized if filter parameters are varied among octaves in a manner that varies the Q, or sharpness of the filters among octaves. In one embodiment, the Q is varied substantially according to critical bandwidth.
In different embodiments, second order or higher digital or analog filters may be used. The nature of the filters, of course determines the exact nature of each channel output that generally emphasizes a given frequency band and thus has a general bandpass character. Collectively, the channel outputs represent the frequency components of the signal. Because of the subtraction of each LPF input and output, each channel output represents a band or slice of frequencies and the sum of all the outputs represents the entire input signal.
Because the output of each LPF is fed to the input of the next LPF, forming a chain of low pass filters, the output of the last LPF in the chain has characteristics of a much higher order filter than the order of the last filter. This higher order filtering effect may be exploited when the output of the last filter in the chain is downsampled. Essentially, the chain of low pass filters used to separate out frequency channels collectively act as a high order filter that performs the function of an anti aliasing filter when the signal is downsampled.
An example of this is depicted in
In one embodiment, second order individual filters are used and a chain of 60 filters process one octave of the signal before downsampling. Downsampling may be implemented by simply discarding every other sample or any other appropriate technique. The amount of downsampling is determined by the Nyquist criterion. A suitable amount of oversampling may be done as desired. The combined effect of the chain of filters is that of a very high order anti aliasing filter. Thus, downsampling the signal may be done to speed the processing of lower frequency octaves without requiring an expensive high order anti aliasing filter.
It should be noted that the benefit of chaining the low pass filters is realized in certain embodiments without implementing the subtractors to calculate the frequency bands. The output of each low pass filter may be used directly to represent the energy in each frequency channel. The output of the last filter in each chain is downsampled with the filter chain itself performing the function of an antialiasing filter.
The filter cascade may be implemented using either analog or digital filters. In one embodiment, the filters are implemented as digital filters with cutoff frequencies designed to produce the desired channel resolution. Each filter has a set of coefficients (a0, a1, a2, b1, b2) associated with it. The output of each filter is calculated according to the following function:
yn=a0xn+a1xn-1+a2xn-2−b1yn-1−b2yn-2 Equation 1.
The filter response H(z) is given by the following:
where fs is the sampling frequency.
Substitution of the above into the transfer function of Equation 2 produces a filter response H(f), which is a function of the filter coefficients a0, a1, a2, b1, b2 and the sampling rate fs.
As described in the 682 application, the filter coefficients may be reused between sets of filters with the response of the filters being altered as a result of downsampling between the sets of filters. In the embodiment shown, the filters are evenly distributed over the octaves, resulting in 60 filters per octave. 60 objects are created in a computer. Each object has a set of coefficients as described above, and additionally has ten sets of state variables, corresponding to ten filters running at frequencies that are whole octaves apart. The 60 objects using their first sets of state variables correspond to the first octave group of filters, while the 60 objects using their second sets of state variables (and sampling at a lower frequency) correspond to the second octave group of filters, and so on. In another embodiment, each object contains a set of coefficients, but only one set of state variables, and is run at a single frequency. In this case, 600 objects are required to represent 600 filters.
The filters in the first octave are tuned to the frequencies in the highest octave, 20 kHz to 10 kHz, and are sampled at 44.1 kHz, which satisfies the Nyquist sampling criterion. The filters in the second octave are tuned to half of the frequencies of the corresponding filters in the first octave, and range from 10 kHz to 5 kHz. These filters in the second octave are sampled at 22.05 kHz, half of the first sampling frequency. Coefficients for each filter are stored in memory and applied in the computations for the filters. The cascade response is the sum of responses of individual filters (which are all weak responses by themselves, but when summed, produce a much stronger response). The coefficients of the filters are determined by the desired response.
As the audio signal is passed through each filter, the signal is sampled and filtered before being passed to the next filter.
Downsampling each successive octave significantly decreases the computational complexity of the system. In addition, the required precision for filter coefficients is lower, and thus, fewer bits are required to represent each coefficient. Digital low-pass filters have the property that the numerical precision required to represent the filter coefficients depends on the ratio between the cutoff frequency and the sampling frequency. For a given sampling frequency, a filter with a low cutoff frequency will require higher-precision coefficients than a filter with a higher cutoff frequency. Without the successive downsampling technique, very high-precision filter coefficients (on the order of 23 bits) are required to represent the lowest-cutoff-frequency filters (30 Hz) at the 44 kHz sampling rate. With the successive downsampling technique, lower-precision coefficients (on the order of 12 bits) can be used to represent the 30-Hz cutoff filters, since the sampling rate is much lower in the lowest octave after many downsampling steps. This reduced precision results in lower hardware complexity (less memory, smaller registers, lower-precision arithmetic operators) and thus lower overall cost in a custom hardware implementation.
In the embodiment described in the 682 application, each filter shares filter parameters with filters that are one, two, or more octaves higher or lower, resulting in reduced storage requirements. For example, the highest frequency filter 40 in the first octave shares filter coefficients with the highest frequency filter 50 in the second octave, the highest frequency filter 60 in the third octave, and so on. The second-highest frequency filter 42 in the first octave shares filter coefficients with the second-highest frequency filters 52 and 62 in the second and third octaves, and with all other corresponding filters (tuned to frequencies that are one, two, or more octaves lower).
Alternatively, it has been determined that the delay at low frequencies can be improved by changing the filter parameters within each octave as described below. For many systems, this is preferable to sharing filter parameters between corresponding filters in different octaves because the benefit from improved delay at low frequencies offsets increased memory storage requirements.
In one embodiment, filter coefficients are tuned to produce a desired Q (quality factor, or degree of sharpness or frequency selectivity) depending on the frequency band (determined by the frequency cutoff) being processed by the filter. Reusing filter coefficients in the cascade results in a cascade with constant Q, and all the filter responses will have the same shape (Q). This “constant-Q” configuration has the advantages of conceptual simplicity and shared filter coefficients, but has significant delays at low frequencies. For example, for a constant-Q design with a phase accumulation of four cycles at all frequencies, the delay at the 20 kHz tap will be 200 μs, while the delay at the 20 Hz tap will be 200 ms. Faster performance at low frequencies is desirable to improve the response time of the cascade, which may be accomplished by changing the filter coefficients of the filters in lower octaves.
The filters may be designed to have zeros as well as poles, and the equation for such a system is given by
If 600 filters are used, and implemented with a cascade of 600 poles-only sections, each one would contribute a quarter-cycle of phase accumulation at its best frequency, resulting in a large amount of delay. In one embodiment, the filter cascade is configured so that the center frequencies decrease exponentially through the cascade. The Q's decrease gradually through the cascade, to give sharp responses at high frequencies, where delay is not an issue, and to give fast responses at low frequencies, where some loss of sharpness is acceptable in return for faster response. This implementation of nonconstant Q filters is particularly useful for signal processing systems used, for example in submarine passive sonar, speech recognition, music transcription, audio stream separation and sound localization. It should be noted that this approach is not limited to downsampled filter cascades, and may be used with filter cascades with no downsampling.
Design of a filter cascade with constant-Q involves choosing the range of cutoff frequencies and the number of taps per octave, such as a frequency range of 20 Hz to 20 kHz, 600 taps, 10 octaves (60 taps/octave). This determines fp for each tap. Fixed values are chosen for Qp, Qz, and fratio=fz/fp, based on the sharpness and delay desired through the cascade. In one embodiment, values used for a constant-Q design may be Qp=7.0, Qz=7.5, and fratio=1.03. In another embodiment, the values may be Qp=23, Qz=26, and fratio=1.01.
For a variable-Q filter cascade using 600 taps in 10 octaves, one embodiment may employ the following values: Qp=7.0, Qz=7.0, and fratio=1.03, with a sampling rate of 44.1 kHz and 2× oversampling in the highest octave. These values are used for the first 360 taps, and then varied linearly over the next 240 taps to Qp=1.6, Qz=1.6, and fratio=1.1 at tap 600 (the lowest frequency tap). This results in a design with broader filter responses at low frequencies, but much faster time response.
In another embodiment, the Qp, Qz, and fratio parameters are selected to match the filter responses to appropriate psychophysical critical bandwidth and loudness perception curves. Critical bandwidth is the tuning width of the filter response curves, within which signal components can interact with each other. Critical bandwidth curves are given in Rossing, 1982, “The Science of Sound” (Addison-Wesley, Reading, Mass.), the disclosure of which is hereby incorporated by reference. The critical bandwidth varies from a little less than 100 Hz at low frequencies to between two and three musical semitones (12% to 19%) at high frequencies. Loudness perception describes how sensitive the filters are to different frequencies. For example, the threshold of audibility at 20 Hz is about 65 dB higher than at 1 kHz.
One embodiment of a variable-Q filter cascade uses the following parameters:
A filter cascade for analyzing a signal into frequency components has been described. In various embodiments, the filter cascade utilizes different techniques to improve temporal resolution at high frequencies and spectral resolution at low frequencies. As a result, each of the disclosed filter cascade embodiments are particularly useful as a component of a voice recognition system. In addition, the filter cascade is useful for audio stream separation and sound localization.
Although the foregoing invention has been described in some detail for purposes of clarity of understanding, it will be apparent that certain changes and modifications may be practiced within the scope of the appended claims. It should be noted that there are many alternative ways of implementing both the process and apparatus of the present invention. Accordingly, the present embodiments are to be considered as illustrative and not restrictive, and the invention is not to be limited to the details given herein, but may be modified within the scope and equivalents of the appended claims.
This application claims priority to co-pending U.S. patent application Ser. No. 09/534,682 (Attorney Docket No. ANSCP001) entitled EFFICIENT COMPUTATION OF LOG-FREQUENCY-SCALE DIGITAL FILTER CASCADE filed Mar. 24, 2000, which is incorporated herein by reference for all purposes.
Number | Date | Country | |
---|---|---|---|
Parent | 10074991 | Feb 2002 | US |
Child | 10613224 | Jul 2003 | US |