1. Field of the Invention
The present invention relates to the field of acoustics, and in particular to a method and apparatus for suppressing wind noise.
2. Description of Related Art
When using a microphone in the presence of wind or strong airflow, or when the breath of the speaker hits a microphone directly, a distinct impulsive low-frequency puffing sound can be induced by wind pressure fluctuations at the microphone. This puffing sound can severely degrade the quality of an acoustic signal. Most solutions to this problem involve the use of a physical barrier to the wind, such as fairing, open cell foam, or a shell around the microphone. Such a physical barrier is not always practical or feasible. The physical barrier methods also fail at high wind speed. For this reason, prior art contains methods to electronically suppress wind noise.
For example, Shust and Rogers in “Electronic Removal of Outdoor Microphone Wind Noise”—Acoustical Society of America 136th meeting held Oct. 13, 1998 in Norfold, Va. Paper 2pSPb3, presented a method that measures the local wind velocity using a hot-wire anemometer to predict the wind noise level at a nearby microphone. The need for a hot-wire anemometer limits the application of that invention. Two patents, U.S. Pat. No. 5,568,559 issued Oct. 22, 1996, and U.S. Pat. No. 5,146,539 issued Dec. 23, 1997, both require that two microphones be used to make the recordings and cannot be used in the common case of a single microphone.
These prior art inventions require the use of special hardware, severely limiting their applicability and increasing their cost. Thus, it would be advantageous to analyze acoustic data and selectively suppress wind noise, when it is present, while preserving signal without the need for special hardware.
The invention includes a method, apparatus, and computer program to suppress wind noise in acoustic data by analysis-synthesis. The input signal may represent human speech, but it should be recognized that the invention could be used to enhance any type of narrow band acoustic data, such as music or machinery. The data may come from a single microphone, but it could as well be the output of combining several microphones into a single processed channel, a process known as “beamforming”. The invention also provides a method to take advantage of the additional information available when several microphones are employed.
The preferred embodiment of the invention attenuates wind noise in acoustic data as follows. Sound input from a microphone is digitized into binary data. Then, a time-frequency transform (such as short-time Fourier transform) is applied to the data to produce a series of frequency spectra. After that, the frequency spectra are analyzed to detect the presence of wind noise and narrow-band signal, such as voice, music, or machinery. When wind noise is detected, it is selectively suppressed. Then, in places where the signal is masked by the wind noise, the signal is reconstructed by extrapolation to the times and frequencies. Finally, a time series that can be listened to is synthesized. In another embodiment of the invention, the system suppresses all low frequency wide-band noise after having performed a time-frequency transform, and then synthesizes the signal.
The invention has the following advantages: no special hardware is required apart from the computer that is performing the analysis. Data from a single microphone is necessary but it can also be applied when several microphones are available. The resulting time series is pleasant to listen to because the loud wind puffing noise has been replaced by near-constant low-level noise and signal.
The details of one or more embodiments of the invention are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of the invention will be apparent from the description and drawings, and from the claims.
For a more complete description of the present invention and further aspects and advantages thereof, reference is now made to the following drawings in which:
A method, apparatus and computer program for suppressing wind noise is described. In the following description, numerous specific details are set forth in order to provide a more detailed description of the invention. It will be apparent, however, to one skilled in the art, that the present invention may be practiced without these specific details. In other instances, well known details have not been provided so as to not obscure the invention.
Overview of Operating Environment
The output of the enhancement process can be applied to other processing systems, such as a voice recognition system, or saved to a file, or played back for the benefit of a human listener. Playback is typically accomplished by converting the processed digital output stream into an analog signal by means of a digital-to-analog converter 28, and amplifying the analog signal with an output amplifier 30 which drives an audio speaker 32 (e.g., a loudspeaker, headphone, or earphone).
Functional Overview of System
One embodiment of the wind noise suppression system of the present invention is comprised of the following components. These components can be implemented in the signal processing system as described in
A first functional component of the invention is a time-frequency transform of the time series signal.
A second functional component of the invention is background noise estimation, which provides a means of estimating continuous or slowly varying background noise. The dynamic background noise estimation estimates the continuous background noise alone. In the preferred embodiment, a power detector acts in each of multiple frequency bands. Noise-only portions of the data are used to generate the mean of the noise in decibels (dB).
The dynamic background noise estimation works closely with a third functional component, transient detection. Preferably, when the power exceeds the mean by more than a specified number of decibels in a frequency band (typically 6 to 12 dB), the corresponding time period is flagged as containing a transient and is not used to estimate the continuous background noise spectrum.
The fourth functional component is a wind noise detector. It looks for patterns typical of wind buffets in the spectral domain and how these change with time. This component helps decide whether to apply the following steps. If no wind buffeting is detected, then the following components can be optionally omitted.
A fifth functional component is signal analysis, which discriminates between signal and noise and tags signal for its preservation and restoration later on.
The sixth functional component is the wind noise attenuation. This component selectively attenuates the portions of the spectrum that were found to be dominated by wind noise, and reconstructs the signal, if any, that was masked by the wind noise.
The seventh functional component is a time series synthesis. An output signal is synthesized that can be listened to by humans or machines.
A more detailed description of these components is given in conjunction with
Wind Suppression Overview
The samples of a current window are subjected to a time-frequency transformation, which may include appropriate conditioning operations, such as pre-filtering, shading, etc. (206). Any of several time-frequency transformations can be used, such as the short-time Fourier transform, bank of filter analysis, discrete wavelet transform, etc. The result of the time-frequency transformation is that the initial time series x(t) is transformed into transformed data. Transformed data comprises a time-frequency representation X(f, i), where t is the sampling index to the time series x, and f and i are discrete variables respectively indexing the frequency and time dimensions of X. The two-dimensional array X(f,i) as a function of time and frequency will be referred to as the “spectrogram” from now on. The power levels in individual bands f are then subjected to background noise estimation (step 208) coupled with transient detection (step 210). Transient detection looks for the presence of transient signals buried in stationary noise and determines estimated starting and ending times for such transients. Transients can be instances of the sought signal, but can also be “puffs” induced by wind, i.e. instance of wind noise, or any other impulsive noise. The background noise estimation updates the estimate of the background noise parameters between transients. Because background noise is defined as the continuous part of the noise, and transients as anything that is not continuous, the two needed to be separated in order for each to be measured. That is why the background estimation must work in tandem with the transient detection.
An embodiment for performing background noise estimation comprises a power detector that averages the acoustic power in a sliding window for each frequency band f When the power within a predetermined number of frequency bands exceeds a threshold determined as a certain number c of decibels above the background noise, the power detector declares the presence of a transient, i.e., when:
X(f,i)>B(f)+c, (1)
where B(f) is the mean background noise power in band f and c is the threshold value. B(f) is the background noise estimate that is being determined.
Once a transient signal is detected, background noise tracking is suspended. This needs to happen so that transient signals do not contaminate the background noise estimation process. When the power decreases back below the threshold, then the tracking of background noise is resumed. The threshold value c is obtained, in one embodiment, by measuring a few initial buffers of signal assuming that there are no transients in them. In one embodiment, c is set to a range between 6 and 12 dB. In an alternative embodiment, noise estimation need not be dynamic, but could be measured once (for example, during boot-up of a computer running software implementing the invention), or not necessarily frequency dependent.
Next, in step 212, the spectrogram X is scanned for the presence of wind noise. This is done by looking for spectral patterns typical of wind noise and how these change with time. This components help decide whether to apply the following steps. If no wind noise is detected, then the steps 214, 216, and 218 can be omitted and the process skips to step 220.
If wind noise is detected, the transformed data that has triggered the transient detector is then applied to a signal analysis function (step 214). This step detects and marks the signal of interest, allowing the system to subsequently preserve the signal of interest while attenuating wind noise. For example, if speech is the signal of interest, a voice detector is applied in step 214. This step is described in more details in the section titled “Signal Analysis.”
Next, a low-noise spectrogram C is generated by selectively attenuating X at frequencies dominated by wind noise (step 216). This component selectively attenuates the portions of the spectrum that were found to be dominated by wind noise while preserving those portions of the spectrum that were found to be dominated by signal. The next step, signal reconstruction (step 218), reconstructs the signal, if any, that was masked by the wind noise by interpolating or extrapolating the signal components that were detected in periods between the wind buffets. A more detailed description of the wind noise attenuation and signal reconstruction steps are given in the section titled “Wind Noise Attenuation and Signal Reconstruction.”
In step 220, a low-noise output time series y is synthesized. The time series y is suitable for listening by either humans or an Automated Speech Recognition system. In the preferred embodiment, the time series is synthesized through an inverse Fourier transform.
In step 222, it is determined if any of the input data remains to be processed. If so, the entire process is repeated on a next sample of acoustic data (step 204). Otherwise, processing ends (step 224). The final output is a time series where the wind noise has been attenuated while preserving the narrow band signal.
The order of some of the components may be reversed or even omitted and still be covered by the present invention. For example, in some embodiment the wind noise detector could be performed before background noise estimation, or even omitted entirely.
Signal Analysis
The preferred embodiment of signal analysis makes use of at least three different features for distinguishing narrow band signals from wind noise in a single channel (microphone) system. An additional fourth feature can be used when more than one microphone is available. The result of using these features is then combined to make a detection decision. The features comprise:
1) the peaks in the spectrum of narrow band signals are harmonically related, unlike those of wind noise
2) their frequencies are narrower those of wind noise,
3) they last for longer periods of time than wind noise,
4) the rate of change of their positions and amplitudes are less drastic than that of wind noise, and
5) (multi-microphone only) they are more strongly correlated among microphones than wind noise.
The signal analysis (performed in step 214) of the present invention takes advantage of the quasi-periodic nature of the signal of interest to distinguish from non-periodic wind noises. This is accomplished by recognizing that a variety of quasi-periodic acoustical waveforms including speech, music, and motor noise, can be represented as a sum of slowly-time-varying amplitude, frequency and phase modulated sinusoids waves:
in which the sine-wave frequencies are multiples of the fundamental frequency f0 and Ak(n) is the time-varying amplitude for each component.
The spectrum of a quasi-periodic signal such as voice has finite peaks at corresponding harmonic frequencies. Furthermore, all peaks are equally distributed in the frequency band and the distance between any two adjacent peaks is determined by the fundamental frequency.
In contrast to quasi-periodic signal, noise-like signals, such as wind noise, have no clear harmonic structure. Their frequencies and phases are random and vary within a short time. As a result, the spectrum of wind noise has peaks that are irregularly spaced.
Besides looking at the harmonic nature of the peaks, three other features are used. First, in most case, the peaks of wind noise spectrum in low frequency band are wider than the peaks in the spectrum of the narrow band signal, due to the overlapping effect of close frequency components of the noise. Second, the distance between adjacent peaks of the wind noise spectra is also inconsistent (non-constant). Finally, another feature that is used to detect narrow band signals is their relative temporal stability. The spectra of narrow band signals generally change slower than that of wind noise. The rate of change of the peaks positions and amplitudes are therefore also used as features to discriminate between wind noise and signal.
Examples of Signal Analysis
When there are more than one microphone present, the method uses an additional feature to distinguish wind noise in addition to the heuristic rules described in
Signal Analysis Implementation
In one embodiment, any one of the following features can be used alone or in any combination thereof to accomplish step 504:
1) finding all peaks in spectra having SNR>T
2) measuring peak width as a way to determine whether the peaks are stemming from wind noise
3) measuring the harmonic relationship between peaks
4) comparing peaks in spectra of the current buffer to the spectra from the previous buffer
5) comparing peaks in spectra from different microphones (if more than one microphone is used).
Given a point of the spectrum s(i) at the i th frequency bin, it is considered a peak if and only if:
s(i)>s(i−1) (3)
and
s(i)>s(i+1). (4)
Furthermore, a peak is classified as being voice (i.e. signal of interest) if:
s(i)>s(i−2)+7 dB (5)
and
s(i)>s(i+2)+7 dB. (6)
Otherwise the peak is classified as noise (e.g. wind noise). The numbers shown in the equation (e.g. i+2, 7 dB) are just in this one example embodiment and can be modified in other embodiments. Note that the peak is classified as a peak stemming from signal of interest when it is sharply higher than the neighboring points (equations 5 and 6). This is consistent with the example shown in
Following along again in
In step 520, the stability of the peaks in narrow band signals is then measured. This step compares the frequency of the peaks in the previous spectra to that of the present one. Peaks that are stable from buffer to buffer receive added evidence that they belong to an acoustic source and not to wind noise.
Finally, in step 522, if signals from more than one microphone are available, the phase and amplitudes of the spectra at their respective peaks are compared. Peaks whose amplitude or phase differences exceed certain threshold are considered to belong to wind noise. On the other hand, peaks whose amplitude or phase differences come under certain thresholds are considered to belong to an acoustic signal. The evidence from these different steps are combined in step 524, preferably by a fuzzy classifier, or an artificial neural network, giving the likelihood that a given peak belong to either signal or wind noise. Signal analysis ends at step 526.
Wind Noise Detection
Wind Noise Attenuation and Signal Reconstruction
Computer Implementation
The invention may be implemented in hardware or software, or a combination of both (e.g., programmable logic arrays). Unless otherwise specified, the algorithms included as part of the invention are not inherently related to any particular computer or other apparatus. In particular, various general-purpose machines may be used with programs written in accordance with the teachings herein, or it may be more convenient to construct more specialized apparatus to perform the required method steps. However, preferably, the invention is implemented in one or more computer programs executing on programmable systems each comprising at least one processor, at least one data storage system (including volatile and non-volatile memory and/or storage elements), and at least one microphone input. The program code is executed on the processors to perform the functions described herein.
Each such program may be implemented in any desired computer language (including machine, assembly, high level procedural, or object oriented programming languages) to communicate with a computer system. In any case, the language may be a compiled or interpreted language.
Each such computer program is preferably stored on a storage media or device (e.g., solid state, magnetic or optical media) readable by a general or special purpose programmable computer, for configuring and operating the computer when the storage media or device is read by the computer to perform the procedures described herein. For example, the compute program can be stored in storage 26 of
A number of embodiments of the invention have been described. Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the invention. The invention is defined by the following claims and their full scope and equivalents.
This application claims the benefit of U.S. Provisional Patent Application No. 60/449,511, filed Feb. 21, 2003.
Number | Name | Date | Kind |
---|---|---|---|
4486900 | Cox et al. | Dec 1984 | A |
4531228 | Noso et al. | Jul 1985 | A |
4630304 | Borth et al. | Dec 1986 | A |
4630305 | Borth et al. | Dec 1986 | A |
4811404 | Vilmur et al. | Mar 1989 | A |
4843562 | Kenyon et al. | Jun 1989 | A |
4845466 | Hariton et al. | Jul 1989 | A |
5012519 | Adlersberg et al. | Apr 1991 | A |
5027410 | Williamson et al. | Jun 1991 | A |
5056150 | Yu et al. | Oct 1991 | A |
5146539 | Doddington et al. | Sep 1992 | A |
5251263 | Andrea et al. | Oct 1993 | A |
5313555 | Kamiya | May 1994 | A |
5400409 | Linhard | Mar 1995 | A |
5426703 | Hamabe et al. | Jun 1995 | A |
5426704 | Tamamura et al. | Jun 1995 | A |
5442712 | Kawamura et al. | Aug 1995 | A |
5479517 | Linhard | Dec 1995 | A |
5485522 | Solve et al. | Jan 1996 | A |
5495415 | Ribbens et al. | Feb 1996 | A |
5502688 | Recchione et al. | Mar 1996 | A |
5526466 | Takizawa | Jun 1996 | A |
5550924 | Helf et al. | Aug 1996 | A |
5568559 | Makino | Oct 1996 | A |
5584295 | Muller et al. | Dec 1996 | A |
5586028 | Sekine et al. | Dec 1996 | A |
5617508 | Reaves | Apr 1997 | A |
5651071 | Lindemann et al. | Jul 1997 | A |
5677987 | Seki et al. | Oct 1997 | A |
5680508 | Liu | Oct 1997 | A |
5692104 | Chow et al. | Nov 1997 | A |
5701344 | Wakui | Dec 1997 | A |
5727072 | Raman | Mar 1998 | A |
5752226 | Chan et al. | May 1998 | A |
5809152 | Nakamura et al. | Sep 1998 | A |
5859420 | Borza | Jan 1999 | A |
5878389 | Hermansky et al. | Mar 1999 | A |
5920834 | Sih et al. | Jul 1999 | A |
5933495 | Oh | Aug 1999 | A |
5933801 | Fink et al. | Aug 1999 | A |
5949888 | Gupta et al. | Sep 1999 | A |
5982901 | Kane et al. | Nov 1999 | A |
6011853 | Koski et al. | Jan 2000 | A |
6108610 | Winn | Aug 2000 | A |
6122384 | Mauro | Sep 2000 | A |
6130949 | Aoki et al. | Oct 2000 | A |
6163608 | Romesburg et al. | Dec 2000 | A |
6167375 | Miseki et al. | Dec 2000 | A |
6173074 | Russo | Jan 2001 | B1 |
6175602 | Gustafsson et al. | Jan 2001 | B1 |
6192134 | White et al. | Feb 2001 | B1 |
6199035 | Lakaniemi et al. | Mar 2001 | B1 |
6208268 | Scarzello et al. | Mar 2001 | B1 |
6230123 | Mekuria et al. | May 2001 | B1 |
6252969 | Ando | Jun 2001 | B1 |
6289309 | deVries | Sep 2001 | B1 |
6405168 | Bayya et al. | Jun 2002 | B1 |
6415253 | Johnson | Jul 2002 | B1 |
6434246 | Kates et al. | Aug 2002 | B1 |
6453285 | Anderson et al. | Sep 2002 | B1 |
6507814 | Gao | Jan 2003 | B1 |
6510408 | Hermansen | Jan 2003 | B1 |
6587816 | Chazan et al. | Jul 2003 | B1 |
6615170 | Liu et al. | Sep 2003 | B1 |
6643619 | Linhard et al. | Nov 2003 | B1 |
6687669 | Schrögmeier et al. | Feb 2004 | B1 |
6711536 | Rees | Mar 2004 | B2 |
6741873 | Doran et al. | May 2004 | B1 |
6766292 | Chandran et al. | Jul 2004 | B1 |
6768979 | Menéndez-Pidal et al. | Jul 2004 | B1 |
6782363 | Lee et al. | Aug 2004 | B2 |
6822507 | Buchele | Nov 2004 | B2 |
6859420 | Coney et al. | Feb 2005 | B1 |
6882736 | Dickel et al. | Apr 2005 | B2 |
6910011 | Zakarauskas | Jun 2005 | B1 |
6937980 | Krasny et al. | Aug 2005 | B2 |
6959276 | Droppo et al. | Oct 2005 | B2 |
7043030 | Furuta | May 2006 | B1 |
7047047 | Acero et al. | May 2006 | B2 |
7062049 | Inoue et al. | Jun 2006 | B1 |
7072831 | Etter | Jul 2006 | B1 |
7092877 | Ribic | Aug 2006 | B2 |
7117145 | Venkatesh et al. | Oct 2006 | B1 |
7117149 | Zakarauskas | Oct 2006 | B1 |
7158932 | Furuta | Jan 2007 | B1 |
7165027 | Kellner et al. | Jan 2007 | B2 |
7313518 | Scalart et al. | Dec 2007 | B2 |
7386217 | Zhang | Jun 2008 | B2 |
20010028713 | Walker | Oct 2001 | A1 |
20020037088 | Dickel et al. | Mar 2002 | A1 |
20020071573 | Finn | Jun 2002 | A1 |
20020094100 | Kates et al. | Jul 2002 | A1 |
20020094101 | De Roo et al. | Jul 2002 | A1 |
20020176589 | Buck et al. | Nov 2002 | A1 |
20030040908 | Yang et al. | Feb 2003 | A1 |
20030147538 | Elko | Aug 2003 | A1 |
20030151454 | Buchele | Aug 2003 | A1 |
20030216907 | Thomas | Nov 2003 | A1 |
20040078200 | Alves | Apr 2004 | A1 |
20040093181 | Lee | May 2004 | A1 |
20040138882 | Miyazawa | Jul 2004 | A1 |
20040161120 | Petersen et al. | Aug 2004 | A1 |
20040165736 | Hetherington et al. | Aug 2004 | A1 |
20040167777 | Hetherington et al. | Aug 2004 | A1 |
20050114128 | Hetherington et al. | May 2005 | A1 |
20050238283 | Faure et al. | Oct 2005 | A1 |
20050240401 | Ebenezer | Oct 2005 | A1 |
20060034447 | Alves et al. | Feb 2006 | A1 |
20060074646 | Alves et al. | Apr 2006 | A1 |
20060100868 | Hetherington et al. | May 2006 | A1 |
20060115095 | Glesbrecht et al. | Jun 2006 | A1 |
20060116873 | Hetherington et al. | Jun 2006 | A1 |
20060136199 | Nongpiur et al. | Jun 2006 | A1 |
20060251268 | Hetherington et al. | Nov 2006 | A1 |
20060287859 | Hetherington et al. | Dec 2006 | A1 |
20070019835 | Ivo de Roo et al. | Jan 2007 | A1 |
20070033031 | Zakarauskas | Feb 2007 | A1 |
Number | Date | Country |
---|---|---|
2158847 | Sep 1994 | CA |
2157496 | Oct 1994 | CA |
2158064 | Oct 1994 | CA |
1325222 | Dec 2001 | CN |
0 076 687 | Apr 1983 | EP |
0 629 996 | Dec 1994 | EP |
0 629 996 | Dec 1994 | EP |
0 750 291 | Dec 1996 | EP |
1 450 353 | Aug 2004 | EP |
1 450 354 | Aug 2004 | EP |
1 669 983 | Jun 2006 | EP |
64-039195 | Feb 1989 | JP |
06269084 | Sep 1994 | JP |
6282297 | Oct 1994 | JP |
06319193 | Nov 1994 | JP |
6349208 | Dec 1994 | JP |
2000-261530 | Sep 2000 | JP |
2001215992 | Aug 2001 | JP |
2001-350498 | Dec 2001 | JP |
138806 | Jun 1998 | KR |
WO 00-41169 | Jul 2000 | WO |
WO 0156255 | Aug 2001 | WO |
WO 01-73761 | Oct 2001 | WO |
Number | Date | Country | |
---|---|---|---|
20040165736 A1 | Aug 2004 | US |
Number | Date | Country | |
---|---|---|---|
60449511 | Feb 2003 | US |