1. Field of the Invention
The present invention relates to the area of audio signal processing, more particularly to method and apparatus for reducing wind noise.
2. Description of Related Art
The wind may introduce an annoying noise when voice recording in outdoors. Especially in strongly windy conditions, the wind noise recorded by a microphone may be too big to almost overcome a target voice desired to be recorded.
The fast-moving gas forms a rotating airflow around the microphone to generate the wind noise. In general, the wind noise is mainly concentrated in low frequency bands.
Generally, a windscreen may be used to weaken the impact of the wind noise. However, many small devices, e.g. a digital video camera or a recording pen, is not equipped with a windscreen, so the impact of the wind noise is inevitable. Additionally, a high pass filter is used to reduce the wind noise since the wind noise mainly comprises a low band component. However, low band components of the voice itself are also cut in addition to the wind noise, the quality of the recoding sound is decreased.
Thus, improved techniques for method and device for reducing wind noise are desired to overcome the above disadvantages.
This section is for the purpose of summarizing some aspects of the present invention and to briefly introduce some preferred embodiments. Simplifications or omissions in this section as well as in the abstract or the title of this description may be made to avoid obscuring the purpose of this section, the abstract and the title. Such simplifications or omissions are not intended to limit the scope of the present invention.
In general, the present invention pertains to improved techniques to reduce wind noise effectively in recorded signals. In one aspect of the present invention, there is a strong correlation of two voice signals from target voices in the same frequency band sampled simultaneously by a pair of microphones in a common scene while there is a weak correlation of wind noises in the same frequency band of the two voice signals sampled simultaneously by the pair of microphones in the common scene. Taking advantage of this feature to provide a larger gain to the frequency band having a strong correlation and a smaller gain to the frequency band having weak correlation, thereby the wind noise is reduced efficiently with minimum impact on the target voices.
One of the features, benefits and advantages in the present invention is to provide techniques to remove wind noises with minimum impact on recorded signals.
Other objects, features, and advantages of the present invention will become apparent upon examining the following detailed description of an embodiment thereof, taken in conjunction with the attached drawings.
These and other features, aspects, and advantages of the present invention will become better understood with regard to the following description, appended claims, and accompanying drawings where:
The detailed description of the present invention is presented largely in terms of procedures, steps, logic blocks, processing, or other symbolic representations that directly or indirectly resemble the operations of devices or systems contemplated in the present invention. These descriptions and representations are typically used by those skilled in the art to most effectively convey the substance of their work to others skilled in the art.
Reference herein to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Further, the order of blocks in process flowcharts or diagrams or the use of sequence numbers representing one or more embodiments of the invention do not inherently indicate any particular order nor imply any limitations in the invention.
Embodiments of the present invention are discussed herein with reference to
Improved techniques are provided to reduce wind noises effectively according to one embodiment of the present invention. It can be seen that a correlation of target voices in the same frequency band of two voice signals sampled simultaneously by a pair of microphones in a common scene is strong, and a correlation of wind noises in the same frequency band of the two voice signals sampled simultaneously by the pair of microphones in the common scene is very weak. Taking advantage of this feature to provide a larger gain to the frequency band having strong correlation and a smaller gain to the frequency band having weak correlation, thereby the wind noise is reduced efficiently with minimum impact on the target voices.
The microphones 11 and 12 are configured to sample two voice signals (e.g. a left or first voice signal and a right or second voice signal) simultaneously in a common scene, output the two voice signals to the band pass filter 13, and output the two voice signals to the analysis window module 15 and the analysis window module 16 respectively.
The cross correlation module 14 is configured to calculate a cross correlation of the two voice signals within the frequency range of 100-200 Hz to determine whether the two voice signals sampled currently contain the wind noise. The two voice signals processed by the band pass filter 13 is denoted as ×1 and ×2, and the following calculations is performed by the cross correlation module 14:
where Corr×1×2 is a cross correlation of ×1 and ×2, Corr×1 is a self correlation of ×1, and Corr×2 is a self correlation of ×2 . So, the normalized cross correlation corr×1×2 of ×1 and ×2 is:
where corr×1×2 is a number between 0 and 1 and reflects a cross correlation between the two voice signals. It is indicated that the two voice signals contain the wind noise if the value of corr×1×2 approximates to 1. It is indicated that the two voice signals don't contain the strong wind noise if the value of corr×1×2 approximates to 0. The cross correlation module 14 outputs the normalized cross correlation corr×1×2 to the wind noise reduction module 19. Hence, the corr×1×2 is used as an overall probability parameter to determine whether the two voice signals contain the wind noise.
The analysis window modules 15 and 17 are configured to process the two voice signals with analysis window respectively. The FFT (Fast Fourier Transform Algorithm) modules 16 and 18 are configured to convert the processed two voice signals in a time domain to the two voice signals in a frequency domain respectively. The two voice signals in the frequency domain are sent to the wind noise reduction module 19.
The cross correlation computing unit 191 is configured to calculate a cross correlation of the two voice signals in the frequency domain to obtain a normalized cross correlation corrLR(i) of each frequency band of the two voice signals in the frequency domain within the frequency range of under 1000 Hz, wherein i is a number of the frequency band of the two voice signals in the frequency domain.
The weighted module 192 is configured to weigh the normalized cross correlation corrLR(i) of each frequency band depending on the overall normalized cross correlation corr×1×2 to get an weighted normalized cross correlation corrLR′(i).
The average computing unit 193 is configured to compute an average value of the two voice signals within the frequency range of 0-1000 Hz.
The gain control unit 194 is configured to control a gain of the average value of the two voice signals within the frequency range of 0-1000 Hz depending on the weighted normalized cross correlation corrLR′(i).
The operations of the wind noise reduction module 19 are described in detail hereafter. A real part of an ith frequency band of the voice signal inputted from the microphone 11 is denoted as Re_L(i), and an imaginary part of the ith frequency band of the voice signal inputted from the microphone 11 is denoted as Re_L(i). A real part of an ith frequency band of the voice signal inputted from the microphone 12 is denoted as Re_R(i), and an imaginary part of the ith frequency band of the voice signal inputted from the microphone 12 is denoted as Re_R(i).
The following calculations is performed by the cross correlation computing unit 191:
CorrLR(i)=Re—L(i)*Re—R(i)+Im—L(i)*Im—R(i);
CorrLL(i)=Re—L(i)*Re—L(i)+Im—L(i)*Im—L(i);
CorrRR(i)=Re—R(i)*Re—R(i)+Im—R(i)*Im—R(i).
Wherein CorrLR(i) is a cross correlation of the ith frequency band of the voice signal from the microphone 11 and the voice signal from the microphone 12, CorrLL(i) is a self correlation of the ith frequency band of the voice signal from the microphone 11, CorrRR(i) is a self correlation of the ith frequency band of the voice signal from the microphone 12. So, the normalized cross correlation corrLR(i) of the ith frequency band of the two voice signals is:
The cross correlation of the two voice signals within the frequency range of under 1000 Hz is required to be calculated since the wind noise is mainly concentrated in the frequency under 1 Khz. Wherein i=0˜N/8 if FFT points is N and a sampling rate is 8 Khz. It is noted that the corrLR(i) may be used as a partial probability parameter to determine where the ith frequency band of the two voice signals contains the wind noise.
The weighted module 192 gets the weighted normalized cross correlation corrLR′(i) according to the following equation:
corrLR′(i)=corrLR(i)*corr×1×2.
The average computing unit 193 computes the average value of the two voice signals within the frequency range of 0-1000 Hz according to the following equations:
Re(i)=(Re—L(i)+Re—R(i))/2;
Im(i)=(Im—L(i)+Im—R(i))/2.
Because the target voices in the two voice signals have a strong correlation and the wind noises in the two voice signals almost have no correlation, the average of the two voice signals has no effect to the target voices, but makes an attenuation of 6 dB to the wind noise. Thereby, the signal to noise ratio of the voice signal is enhanced.
The gain control unit 194 control the gain of the average value of the two voice signals according to the following equations:
Re_out(i)=Re(i)*corrLR′(i);
Im_out(i)=Im(i)*corrLR′(i).
The value of corrLR′(i) is lower if the ith frequency band contains the stronger wind noise, so the values of Im_out(i) and Re_out(i) are smaller. In other words, the smaller gain is provided to the frequency band signal containing the stronger wind noise. The value of corrLR′(i) is higher if the ith frequency band contains the weaker wind noise, so the values of Im_out(i) and Re_out(i) are larger. In other words, the larger gain is provided to the frequency band signal containing the weaker wind noise. Thereby, the signal to noise ratio of the voice signal is further enhanced.
Re_out(i) is the real part of the voice signal, and Im_out(i) is the imaginary part of the voice signal. The voice signal consisting of Re_out(i) and Im_out(i) is duplicated to replace the two voice signals from the microphone 11 and the microphone 12 in the same frequency band. The two voice signals
The IFFT modules 20 and 22 are configured to convert the two voice signals in the frequency domain from the wind noise reduction module 19 back to the two voice signals in the time domain respectively. The integrated window modules 21 and 23 are configured to process the two voice signals to get the final two voice signals with the wind noise reduced respectively.
At 501, a cross correlation of two voice signals sampled simultaneously in a common scene is calculated to generate a normalized cross correlation corrLR(i) of each frequency band of the two voice signals.
At 502, gains of the two voice signals is adjusted according to the normalized cross correlation value of each frequency band of the two voice signals to reduce the wind noise in the two voice signals.
In a preferred embodiment, the method 500 further comprises the following operation before 501. The two voice signals are band pass filtered with a certain frequency range thereof passed and other frequency range thereof rejected. The certain frequency range is about 100-200 Hz since the energy of the wind noise is mainly concentrated in a frequency range of 100-200 Hz. A normalized cross correlation corr×1×2 of the two voice signals within the certain frequency range is calculated to determine whether the two voice signals contain the wind noise. The normalized cross correlation corrLR(i) of each frequency band is weighted depending on the normalized cross correlation corr×1×2 to get an weighted normalized cross correlation corrLR′(i). So, the gains of the two voice signals is adjusted according to the weighted normalized cross correlation corrLR′(i) of each frequency band of the two voice signals to reduce the wind noise in the two voice signals.
The present invention has been described in sufficient details with a certain degree of particularity. It is understood to those skilled in the art that the present disclosure of embodiments has been made by way of examples only and that numerous changes in the arrangement and combination of parts may be resorted without departing from the spirit and scope of the invention as claimed. Accordingly, the scope of the present invention is defined by the appended claims rather than the foregoing description of embodiments.
Number | Date | Country | Kind |
---|---|---|---|
200810240479.0 | Dec 2008 | CM | national |