Disclosed herein is a feedback suppression system using phase enhanced frequency estimation.
A microphone may receive an audio signal and transmit the same to an amplifier to amplify the received audio signals. Any number of loudspeakers may be used to playback the amplified audio signal. The amplified audio signal may often be subject to acoustic feedback due to a loop gain created from a closed loop established by the loudspeaker, the microphone and the amplifier.
Feedback suppression systems are often placed between the microphone and the amplifier to help mitigate the effects of feedback. These suppression systems may analyze an audio signal to detect feedback peaks.
A feedback suppression system for reducing acoustic feedback may include a controller configured to buffer a series of incoming digital sample signals to provide a plurality of buffered signals, the incoming digital sample signal being indicative of an audio input signal that includes audio data and may include acoustic feedback, to determine a complex spectrum of the plurality of buffered signals, determine a magnitude squared spectrum from the complex spectrum; identify at least one peak in the magnitude squared spectrum, identify a frequency of the at least one identified peak using a phase enhanced frequency estimate, and set a notch filter at the identified frequency to eliminate the acoustic feedback of the audio input signal.
The embodiments of the present disclosure are pointed out with particularity in the appended claims. However, other features of the various embodiments will become more apparent and will be best understood by referring to the following detailed description in conjunction with the accompanying drawings in which:
As required, detailed embodiments of the present invention are disclosed herein; however, it is to be understood that the disclosed embodiments are merely of the invention that may be embodied in various and alternative forms. The figures are not necessarily to scale; some features may be exaggerated or minimized to show details of particular components. Therefore, specific structural and functional details disclosed herein are not to be interpreted as limiting, but merely as a representative basis for teaching one skilled in the art to variously employ the present invention.
Disclosed herein is a frequency estimation system to be used with a feedback suppression system. The frequency estimation system estimates the frequencies at which feedback peaks occur. A notch filter is then placed at these frequencies to reduce the gain and thus reduce feedback. The estimated frequency may be determined using a phase spectrum of a Fast Fourier Transform (FFT) analysis of the audio signal in conjunction with the magnitude spectrum. The disclosed system provides for an improved frequency estimation system.
The processor 106 may be a hardware based computing device or may be within a computing device. The processor 106 may include a controller including computer-executable instructions, where the instructions may be executable by one or more computing devices.
The RAMs 206, 214, 216 may be memory devices to store data items capable and enable such data items to be read therefrom. The RAMs 206, 214, 216 may include circular buffers. The non-volatile memories 204, 218 may store program instructions and may be in the form of flash memory or read only memory (ROM). The program instructions may be loaded during a start-up process in the appropriate RAM 206, 214, 216.
At block 304, the processor 106 stores the digital samples in a buffer in RAM 206.
At block 306, the processor 106 may analyze the digital samples and determine notch filter parameters such as frequency, bandwidth or alternatively Quality factor, which is inversely related to the bandwidth (i.e., Q-value), and gain. This process may be performed at intervals, such as every 85 milliseconds.
At block 308, the processor 106 may apply at least one notch filter to the samples using the determined notch filter parameters. The samples are processed in the time domain using the filter parameters determined in block 306. While the samples may be processed at one processing rate, the notch filter parameters may be defined at a different processing rate (typically a much slower rate) at block 306, as indicated by the line 312. Advantages exist in running block 306 at a slower rate than the filtering block 308 since block 306 is computationally complex. When the notch filter parameters are changed in block 306, the filter parameters used in block 308 are slowly changed (i.e., interpolated) from their current values to the new target values defined by block 306 over a time of approximately 50-200 ms to avoid introducing clicks in the audio. The interpolation can be done on the filter parameters, on the actual computed filter coefficients, or on a combination of both.
At block 310, once the notch filter has been applied, the processed samples are sent to the DAC 108. The process 300 then ends.
At block 410, the processor 106 may receive the digital samples from the ADC.
At block 412, along path 403, the processor 106 may transmit the stored copies of the digital signals to a buffer (in RAM 206). The notch filter parameters determined along path 403, may be determined at one rate while the digitals samples may be processed along path 402 at a different rate. That is, the stored copies of the digital signals may be used to generate the notch filter parameters at a different rate than the rate at which the digital signals are processed. In one example, the notch filter parameters may be determined at a rate of once every 85 milliseconds while the digital signals may be processed at a rate of once every 21 microseconds.
At block 414, the processor 106 may perform a spectral analysis of the buffered signals to isolate peaks in the magnitude spectrum. During this process, frequency estimates as well as other spectral features such as the average spectral level may be used to isolate the peaks. This process is described in more detail with respect to
At block 416, the processor 106 may perform a spectral peak analysis to identify a peak trajectory based on a tracking of the peaks over a time period. Several peak features may be extracted from the peak trajectory, such as the rate of growth of the peak magnitudes, the standard deviations of the peak magnitudes, the rate of change of the peak frequencies, and the standard deviations of the peak frequencies. Other measures of deviation could also be used here such as maximum absolute deviation.
At block 418, the processor 106 may use the extracted features for each peak trajectory to classify each peak as either a feedback peak or a program material peak. The classifier can be based on simple thresholds for each of the extracted features or it can use more advanced techniques such as a Bayesian classifier or a neural network. The parameters of the classifier may either be tuned by hand or they may be estimated by using a training set of peaks that are pre-classified as feedback peaks or program material peaks. The deviation in frequency of the classified peak is a useful feature when the frequency is estimated using fast frequency reassignment. This may be due, at least in part, to the very small measurement error associated with fast frequency reassignment (see, e.g., equation 14 below) that allows the natural deviation of the peaks to be accurately estimated. Feedback peaks tend to have very small deviation where most program material peaks from voices or instruments tend to have significantly larger deviation. Thus the deviation in the reassigned frequency of the peak trajectories is a powerful discriminant for classifying peaks as either program material or feedback. For example, the deviation in frequency can be computed as:
where kpeak is the index of the kth peak, F(kpeak, k) is the reassigned frequency of the kth peak at a delay of k measurement intervals, and F() is the mean value of the reassigned frequency over the past N measurement intervals. The absolute value is taken of the difference of F(kpeak, k)−F(). A measurement interval k may refer to each time the reassigned frequency is computed for a peak (typically every 85 ms). Most feedback peaks will have a dF(kpeak) of <1 cent whereas peaks from real program material will have a dF(kpeak) of 5 cents or more (1 cent is 1/100 of a semitone), which emphasizes why dF(kpeak) is an excellent feature for classifying peaks into feedback or program material groups.
If the peak trajectory is determined to be a feedback peak, then the frequency of the respective peak may be determined to be a candidate frequency and may be transmitted to block 420, as described below.
It should be noted that each of the processes in blocks 416 and 418 may include a series of routines or sub-processes. Further, the path 402 may be referred to as an implementation process. Once the notch filter parameters are determined (e.g., blocks 412-418 along path 403), the implementation process (e.g., blocks 420-424) may test candidate frequencies received from block 418 by applying a corresponding notch filter at the candidate frequency to the digital signal.
At block 420, the processor 106 may receive the candidate frequencies and assign a state machine subroutine to each candidate frequency. The processor 106 may assign the candidate frequencies based on a control scheme that runs the state matching subroutines in succession from zero to the last state machine routine. There may be N number of machines for N number of notch filters wherein one state machine may control one notch filter. For each candidate frequency, the assignment process 420 searches all state machine routines (blocks 422). If the candidate frequency is close to a frequency that has already been assigned to a state machine (e.g., is already in use), the candidate frequency is assigned to that same state machine routine. In this case, the notch filter frequency associated with the state machine may be adjusted to the average of its current frequency and the new frequency. In addition the gain of its notch filter may be adjusted by a nominal amount (typically −3 to −6 dB) up to a maximum attenuation (typically −18 dB) and the bandwidth may be increased by an amount proportional to the difference between the state machines current notch frequency and the new candidate frequency so the filter can more easily cover the two feedback peaks. If the candidate frequency is not close to any existing state machine notch frequencies, then the candidate frequency is assigned to the first free state machine routine with a nominal gain (typically −6 dB) and bandwidth (typical Q of 10-120). If there are no free state machines, then the oldest state machine (i.e., the state machine that was assigned a frequency earlier than any of the others) is used and the new candidate frequency is assigned to it with a nominal gain (typically −6 dB) and bandwidth (typical Q of 10-120).
At block 424, the filter parameters frequency, gain and bandwidth (or Q-value) are converted into filter coefficients using a standard notch filter design, where each notch filter is implemented with a single biquadic filter, or biquad.
At blocks 426, the processor 106 applies the notch filters using the generated filter coefficients from block 424. That is, the notch filter is applied at the estimated frequency from block 414, or in the case where one state machine shares multiple candidate frequencies, the notch filter is applied at a frequency derived from the individual candidate frequencies derived in block 414.
At block 428, the processor 106 transmits the filtered digital samples to the DAC 108 for conversion back to the analog domain (e.g., analog electrical signals). The process 400 may end. The resultant analog electrical signals may ultimately be passed to the amplifier 110 and the loudspeaker 112 for reproduction.
At block 503, the processor 106 may compute the discrete Fourier transform using a Fast Fourier Transform (FFT) to obtain a complex spectrum Sh(w) based on the window function generated in block 502. While FFTs are discussed herein, other methods for computing a Fourier transform such as a Discrete Fourier Transform (DFT), may also be used.
At block 504, the processor 106 may determine the squared magnitude spectrum. The squared magnitude spectrum may be represented by:
M2=Sh(w)·Sh*(w) Eq.2
At block 505, the phase may be determined by:
At block 506, the processor 106 may identify the peaks of the magnitude spectrum. Peaks are identified as bins kpeak such that M2(kpeak−1)≦M2(kpeak)>M2(kpeak+1), where typically the N largest peaks are kept, with N typically equal to 6-12, although other values of N may be used.
At block 507, the processor 106 may estimate the frequencies of each identified peak. This frequency estimation may be accomplished by using the computed phase to achieve a more accurate frequency estimation. In one example, the phase in a peak bin in the current frame may be compared to a past frame. The rate of change of the phase may be used to determine the frequency estimate. However, this method may require a second FFT to be computed a short time before the current frame since the past frame must be relatively close in time (i.e., much smaller than the typical analysis time between peak analysis in the feedback suppression system.) In another example, frequency reassignment may be used which allows for a faster approximation and requires only one FFT to be calculated. This process is described in greater detail in
Once the peak frequencies are estimated, the processor 106 may then provide the estimated peak frequency to block 416 of
At block 602, the processor 106 may calculate an FFT resulting in the complex spectrum S(w), similar to block 502 above. Unlike process 500, process 600 does not window the digital signals first. That is, the FFT is performed on the unwindowed digital samples.
At block 603, the processor 106 convolves the complex spectrum S(w) with the Fourier transform of the Hann window H to obtain the complex Hann window spectrum Sh(w)=S*H(w).
At block 604, the processor 106 may determine the squared magnitude spectrum. The squared magnitude spectrum may be represented by:
M2=Sh(w)·Sh*(w) Eq.4
At block 605, the phase may be determined by:
At block 605, the processor 106 may identify the peaks of the magnitude spectrum, whereas before, peaks are identified as bins kpeak such that M2(kpeak−1)≦M2(kpeak)>M2(kpeak+1), where typically the N largest peaks are kept, with N typically equal to 6-12, although other values of N may be used.
At block 606, the processor 106 may estimate the frequencies of each identified peak. This frequency estimation may be accomplished by using the reassigned frequency of each peak bin to achieve a more accurate frequency estimation. Because process 600 implements fast frequency reassignment, only one FFT is calculated and there is no need to compute the phase directly, thus avoiding the computation of the inverse tan function.
Other conventional methods of estimating the frequency of a peak may involve simply using the frequency of the spectral bin where the peak is located. In some cases, the two adjacent bins are also used and parabolic interpolation is used to obtain a more accurate peak frequency (i.e., an inverted parabola is fit through the 3 points and the peak of the inverted parabola is used as the center). The accuracy of these methods are limited by the resolution of the FFT bins. During spectral analysis, the higher the resolution in the frequency domain, the lower the resolution in the time domain. Because of this, in short time Fourier transform (STFT) spectral analysis, if better frequency resolution is desired, the analysis window must be very wide in the time domain, losing any information on the location of events in the window and smearing any time varying events. On the other hand, making a window narrower in the time domain causes the window to be wider in the frequency domain, creating poor frequency resolution. Accordingly, in a standard spectrogram image which plots magnitudes of the FFT spectrum as columns for each time instance that the FFT is computed, a tradeoff can be seen as time events become smeared when the window size is wide and frequency events become smeared when the window size is narrow.
Alternatively, a more accurate estimate of a standard short-time Fourier transform (STFT) spectrogram analysis may be accomplished by assigning the energy of an FFT bin to a center of gravity of the energy contributions rather than the center of the window. Reassignment of frequency may be computed using the following equation.
where ω is the bin frequency, {circumflex over (ω)}(t, ω) is the reassigned frequency, Sh(t, ω) is the complex spectrum of a signal s(t) windowed by a window function h(t), and Sdh(t, ω) is the complex spectrum of a signal s(t) windowed by a window function dh(t), where dh(t) is the time derivative of h(t).
A Hann window may be defined as:
h(n)=0.5+0.5 cos(2πn/N) for −N/2<n≦N/2, 0 otherwise Eq.7
Using Euler's equation:
H(k) may represent the FFT coefficients of h(n) for frequency bin k. Because the Fourier basis functions are orthogonal, it may be understood that h(n) has three non-zero Fourier transform coefficients H(−1)=0.25, H(0)=0.5 and H(1)=0.25.
Based on the Fourier Convolution Theorem, multiplication in the time domain equates convolution in the frequency domain. Thus:
Sh=S*H Eq.9
where S is the unwindowed Fourier transform of s(t). For a single bin k in the FFT, we have:
Sh(k)=0.25S(k−1)+0.5S(k)+0.25S(k+1) Eq.10
Differentiating the Hann window of equation 7 with respect to time results in dh(n):
Using Euler's equation:
As shown in the above equation, two Fourier coefficients exist DH(−1)=−π/2Nj, and DH(1)=π/2Nj. Thus, for a single bin k of an FFT:
The fast frequency reassignment may be determined by substituting equation 13 into equation 6, where the reassigned frequency for a bin k of an FFT is:
where ω(k)=2π*k/N. Since Sh(k) can be computed using equation 10, it is shown that the fast frequency reassignment shown in equation 14 may be computed from a single unwindowed FFT spectrum S. Equation 14 may be computed from a single FFT and simple convolutions. This, unlike a typical frequency reassignment that requires at least two FFT computations, is a simpler and faster method. Although both methods may have near equivalent accuracy, fast frequency reassignment is significantly faster computationally due at least in part on the lack of the additional FFT computation. Once the peak frequency is estimated, the processor 106 may then provide the estimated peak frequency to block 416 of
As explained, the processor 106 may be a computing device or within a computing device. The processor 106 may include a controller including computer-executable instructions, where the instructions may be executable by one or more computing devices. Computer-executable instructions may be compiled or interpreted from computer programs created using a variety of programming languages and/or technologies, including, without limitation, and either alone or in combination, Java™, C, C++, Visual Basic, Java Script, Perl, Matlab Simulink, TargetLink, etc. In general, a processor 106 (or a microprocessor) receives instructions, e.g., from a memory, a computer-readable medium, etc., and executes these instructions, thereby performing one or more processes, including one or more of the processes described herein. Such instructions and other data may be stored and transmitted using a variety of computer-readable media.
A computer-readable medium (also referred to as a processor-readable medium) includes any non-transitory (e.g., tangible) medium that participates in providing data (e.g., instructions) that may be read by a computer (e.g., by a processor of a computer). Such a medium may take many forms, including, but not limited to, non-volatile media and volatile media. Non-volatile media may include, for example, EEPROM (Electrically Erasable Programmable Read-Only Memory and is a type of non-volatile memory used in computers and other electronic devices to store small amounts of data that must be saved when power is removed, e.g., calibration tables or device configuration.) optical or magnetic disks and other persistent memory. Volatile media may include, for example, dynamic random access memory (DRAM), which typically constitutes a main memory. Such instructions may be transmitted by one or more transmission media, including coaxial cables, copper wire and fiber optics, including the wires that comprise a system bus coupled to a processor of a computer. Common forms of computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, a RAM, a PROM, an EPROM, a FLASH-EEPROM, any other memory chip or cartridge, or any other medium from which a computer can read.
Databases, data repositories or other data stores described herein may include various kinds of mechanisms for storing, accessing, and retrieving various kinds of data, including a hierarchical database, a set of files in a file system, an application database in a proprietary format, a relational database management system (RDBMS), etc. Each such data store is generally included within a computing device employing a computer operating system such as one of those mentioned above, and are accessed via a network in any one or more of a variety of manners. A file system may be accessible from a computer operating system, and may include files stored in various formats. An RDBMS generally employs the Structured Query Language (SQL) in addition to a language for creating, storing, editing, and executing stored procedures, such as the PL/SQL language mentioned above.
In some examples, system elements may be implemented as computer-readable instructions (e.g., software) on one or more computing devices (e.g., servers, personal computers, etc.), stored on computer readable media associated therewith (e.g., disks, memories, etc.). A computer program product may comprise such instructions stored on computer readable media for carrying out the functions described herein.
While embodiments are described above, it is not intended that these embodiments describe all possible forms of the invention. Rather, the words used in the specification are words of description rather than limitation, and it is understood that various changes may be made without departing from the spirit and scope of the invention. Additionally, the features of various implementing embodiments may be combined to form further embodiments of the invention.
Number | Name | Date | Kind |
---|---|---|---|
6347148 | Brennan | Feb 2002 | B1 |
6505091 | Imai | Jan 2003 | B1 |
7613529 | Williams | Nov 2009 | B1 |
20060049981 | Merkel | Mar 2006 | A1 |
20060227978 | Truong | Oct 2006 | A1 |
20080085013 | Somasundaram | Apr 2008 | A1 |
20100054496 | Williams | Mar 2010 | A1 |
20110240841 | Lange | Oct 2011 | A1 |
20130090868 | Kajiwara | Apr 2013 | A1 |
20140177868 | Jensen | Jun 2014 | A1 |
Entry |
---|
Hainsworth et al., Time Frequency Reassignment: A Review and Analysis, Technical Report (27 pages), Cambridge University Engineering Dept., United Kingdom. |
Number | Date | Country | |
---|---|---|---|
20160029123 A1 | Jan 2016 | US |