This disclosure relates generally to audio systems, methods and devices, and more particularly to the detection and reduction of wind noise in audio devices.
In various audio devices using a single microphone or an array of microphones, wind noise may contribute to audio interference, due to local air turbulence around one or more microphone inlets in the audio device. Although wind screening devices that are positionable over a microphone inlet opening are in widespread use, these devices generally attenuate the sound pressure at the one or more inlet openings, resulting in reduced overall audio performance.
Methods and apparatuses for detection and reduction of wind noise in audio devices are disclosed. In an aspect, a method includes acquiring and transforming the audio signals. Correlations from the transformed audio signals are computed. A cross correlation index is compared to a predetermined value to determine if a wind noise spectral content is present. In another aspect, an apparatus includes an audio processing unit to receive non-decomposed audio signals, and an audio decomposition unit to receive the non-decomposed audio signals and to generate decomposed audio signals. A wind noise spectrum estimation unit receives non-decomposed audio signals and decomposed audio signals and identifies wind noise spectral components in at least one of the non-decomposed and decomposed audio signals. A wind noise spectrum reduction unit receives the wind noise spectral components and removes the wind noise spectral components from at least one of the non-decomposed and the decomposed audio signals.
Various embodiments are described in detail in the discussion below and with reference to the following drawings.
Audio systems, methods and devices configured to reduce wind noise effects are disclosed. Briefly, and in general terms, wind noise may constitute a problem in a variety of audio devices, such as mobile phones, hearing aids and sound recording devices. Disturbances resulting from turbulent air flow proximate to the one or more microphones coupled to the audio device may generate noise that may cause degradation in the audio signal. In particular, audio devices that include more than one microphone may have an elevated susceptibility to wind noise, since the effects of wind noise are generally uncorrelated. The various embodiments may also find application in reducing the effects of still other sources of noise in audio signals, such as noise stemming from background sources other than wind, self-generated electronic noise, and self-generated electromechanical noise due to the movement of an electromechanical device configured to translate a lens focusing apparatus.
In accordance with the various embodiments, a method of detecting the presence of wind noise spectral content in an audio signal will be described. In the discussion that follows, reference may be made to an array of four microphones that may be positioned on an audio device. It is understood in the following discussion that the microphone array may include fewer than four microphones, or even more than four microphones. Further, in the discussion that follows, it is understood that the methods and apparatuses may be executed using hardware, software and/or firmware elements, or any combination of hardware, software and/or firmware elements. Accordingly, the various embodiments are not to be interpreted as depending from any particular implementing form.
A time domain response for each of four microphones in a microphone array may be expressed as:
r1(n;blknum); (1)
r2(n;blknum); (2)
r3(n;blknum); (3)
r4(n;blknum); (4)
where the parameter n indicates an index to access the sample points in the time-domain block. The time domain signals as expressed in expressions (1) through (4) may be subjected to a window function, such as a Hamming or a Hanning window function in order to avoid spectral leakage, as well as other undesired effects. Still other window functions may also be employed, such as, for example, a rectangular or a cosine window function.
The time domain signals as expressed in expressions (1) through (4) may be decomposed into discrete frequency components by performing a discrete Fourier transform (DFT) on the time domain signals in expressions (1) through (4). In accordance with the various embodiments, the DFT may include any one of the algorithms collectively known as the Fast Fourier Transform (FFT). The frequency distributions corresponding to the time domain signals may therefore be represented by:
f
1(f;blknum)=F(r1(n;blknum)); (5)
f
2(f;blknum)=F(r2(n;blknum)); (6)
f
3(f;blknum)=F(r3(n;blknum)); (7)
f
4(f;blknum)=F(r4(n;blknum)); (8)
where F is a generalized DFT operator, which may represent the application of the FFT algorithm to the time domain signals in expressions (1) through (4). Autocorrelations may also be generated that may represent an instantaneous power from each of the respective microphones in the microphone array (for selected frequency bins):
f
11(f;blknum)=αf11(f;blknum−1)+(1−α)f1(f;blknum)f1*(f;blknum); (9)
f
22(f;blknum)=αf22(f;blknum−1)+(1−α)f2(f;blknum)f2*(f;blknum); (10)
f
33(f;blknum)=αf33(f;blknum−1)+(1−α)f3(f;blknum)f3*(f;blknum); (11)
f
44(f;blknum)=αf44(f;blknum−1)+(1−α)f4(f;blknum)f4*(f;blknum). (12)
In the foregoing expressions (9) through (12), the terms f1*, f2*, f3* and f4* represent conjugate functions of the transforms f1, f2, f3 and f4 in expressions (5) through (8), and α is a smoothing constant that ranges between zero and one. In the various embodiments, the smoothing constant α may be approximately 0.9, although other suitable values may also be used. Cross correlations may also be generated, which may be expressed as:
f
12(f;blknum)=αf12(f;blknum−1)+(1−α)f1(f;blknum)f2*(f;blknum); (13)
f
13(f;blknum)=αf13(f;blknum−1)+(1−α)f1(f;blknum)f3*(f;blknum); (14)
f
14(f;blknum)=αf14(f;blknum−1)+(1−α)f1(f;blknum)f4*(f;blknum); (15)
f
23(f;blknum)=αf23(f;blknum−1)+(1−α)f2(f;blknum)f3*(f;blknum); (16)
f
24(f;blknum)=αf24(f;blknum−1)+(1−α)f2(f;blknum)f4*(f;blknum); (17)
f
23(f;blknum)=αf23(f;blknum−1)+(1−α)f2(f;blknum)f3*(f;blknum); (18)
f
34(f;blknum)=αf34(f;blknum−1)+(1−α)f3(f;blknum)f4*(f;blknum). (19)
Based upon the autocorrelations presented in expressions (9) through (12)) and the cross correlations presented in expressions (13) through (19), a cross correlation index (CCI) may be defined as follows:
CCI(f;blknum)=[ABS(f12(f;blknum))/√f11(f;blknum)f22(f;blknum)]+[ABS(f13(f;blknum))/√f11(f;blknum)f33(f;blknum)]+[ABS(f14(f;blknum))/√f11(f;blknum)f44(f;blknum)]+[ABS(f23(f;blknum))/√f22(f;blknum)f33(f;blknum)]+[ABS(f24(f;blknum))/√f22(f;blknum)f44(f;blknum)]+[ABS(f34(f;blknum))/√f33(f;blknum)f44(f;blknum)] (20)
In the foregoing expression (20), the operator ABS represents the absolute value function, and the cross correlation index CCI may be evaluated for a selected frequency bin (e.g., a selected portion of the sampled spectrum) and for a selected block number, blknum. Since each of the terms in expression (20) may range in value between zero, which corresponds to uncorrelated signals, and one, which corresponds to completely correlated signals, expression (20) may have a magnitude that ranges in value between approximately zero and approximately six. Consequently, a relatively low magnitude value for the CCI generally reflects uncorrelated signals, and may indicate the presence of wind noise spectral content in an audio signal. Correspondingly, while a relatively higher magnitude value for the CCI generally reflects correlated signals, and may indicate that the audio signal may include wind noise spectral content is absent, or present only to a limited degree.
At 14, the audio signals acquired at 12 may transformed using a DFT algorithm to generate frequency distributions corresponding to the audio signals, as shown in expressions (5) through (8) above. In an embodiment, one of the Fast Fourier Transform algorithms may be employed. Accordingly, the frequency distributions may include frequency bins of a predetermined frequency range, for example, each bin may be approximately about 16 Hz, although the bins may include any suitable frequency range. At 16, autocorrelations and cross correlations may be computed, in accordance with the expressions (9) through (19) above, and the Cross Correlation Index (CCI) may be computed, as shown in expression (20) above, at 18.
At decision 20, the CCI may be compared to a predetermined value VAL to determine if wind noise spectral content is present. Since the CCI may range between approximately zero and approximately six, the predetermined value VAL may include any value between approximately zero and approximately six. For example, VAL may be selected to be approximately 3.5, although other values that are either greater than 3.5 or less than 3.5 may also be used. In any case, if the CCI computed at 18 is greater that the selected value for VAL, then the method 10 determines that wind noise spectral content is not present in the selected block of the acquired audio signals, as shown at 22. Alternatively, if CCI is determined to be less than the selected value for VAL, then the method 12 determines that wind noise spectral content is present in the selected block, at 24, and the pertinent data, such as the block number (blknum), or other pertinent data, may be stored, as shown at 26. At decision 28, the method 10 determines whether all blocks have been processed by the method 10, by comparing the blknum to a fixed value that expresses the maximum number of blocks to be processed (MAXBLK). If all blocks have been processed, then the method 10 ends. Otherwise, at 29, blknum is incremented, and the method 10 returns to 12, and acquires another audio signal block.
In accordance with the various embodiments, another method of detecting the presence of wind noise spectral content in an audio signal will now be described. Briefly, and in general terms, the method includes calculating a power in a low frequency region of the audio signals, and does not require the intermediate computation of the various autocorrelations and cross correlations, as described in the method disclosed above. The power in the low frequency audio region may be expressed as follows:
LFP(blknum)=α[LFP(blknum−1)]+(1−α)[SUM[ABS(f1(0,blknum) . . . ABSf1(LFNUM,blknum)]/[SUM[ABS(f1(0,blknum) . . . ABSf1(BLKLEN/2−1,blknum)] (21)
In the foregoing expression (21), SUM represents a summation operator, which is operable to form a sum of all of the arguments. Accordingly, in expression (21), the suitably transformed time domain signals are summed over all frequency bins in a selected low frequency range. For example, in the numerator portion of expression (21), the transformed time domain signals may be summed from the zero frequency bin to a selected upper limit, LFNUM. In the various embodiments, the LFNUM may be approximately 40, so that if each frequency bin is approximately about 16 Hz, the expression (21) is summed for all frequency bins up to approximately 640 Hz, although other values for LFNUM and other frequency bin values may also be used. In the denominator portion of expression (21), the suitably transformed time domain signals are summed from the zero frequency bin to an upper limit (BLKLEN/2-1), which entails most of the sampled frequency bins. In expression (21), α is the smoothing constant as previously described, and may have a value of approximately 0.9, although other suitable values may also be used.
If the computed value for LFP(blknum) is greater than a predetermined threshold value, then wind noise spectral content is absent from the audio signals. Correspondingly, if the computed value for LFP(blknum) is less than, or even equal to the selected threshold value, then wind noise spectral content is present in the audio signals. In accordance with the various embodiments, a value for the threshold value may be greater than 0.1, and less than 0.9. In another of the various embodiments, the selected threshold value ranges between approximately 0.5 and approximately 0.7.
At 34, the audio signals acquired at 32 may transformed using a DFT algorithm, such as one of the Fast Fourier Transform (FFT) algorithms, as discussed in greater detail above. At 36, the low frequency power LFP(blknum) may be computed, in accordance with the foregoing expression (21). At decision 38, the calculated LFP may be compared to the selected threshold value. Accordingly, if LFP is less than the selected threshold value, then no wind spectral content is detected in the selected block, at 40. Otherwise, if the calculated LFP is greater than the selected threshold value, at 42, wind noise spectral content is detected in the selected block. At 44, the noise-related data may be stored for the selected block. At 46, the method 30 determines whether all blocks have been processed by comparing the blknum to MAXBLK. If all blocks have been processed, then the method 30 ends. Otherwise, blknum is incremented at 48, and the method 30 returns to 32, and acquires another audio signal block.
With reference still to
Still another method of detecting the presence of wind noise spectral content in an audio signal may now be described. The presently disclosed method includes arranging the suitably transformed time domain signals time into an array A, so that:
A(f,blknum)=[LOG(fi(f,blknum))] (22)
Where LOG is a logarithmic operator that operates on the transformed time domain signals fi(f, blknum). A slope may be calculated by applying a LINREG operator to the array A. The LINREG operator performs a linear regression on the elements of the array A, and returns a value for SLOPE, as follows:
SLOPE(blknum)=LINREG(A) (23)
The magnitude of SLOPE obtained from expression (23) may then be compared to a selected threshold value, so that if SLOPE<threshold, then the audio signal includes wind noise spectral content. In the various embodiments, a suitable threshold value may be within a range of values between approximately one and negative one. In another of the various embodiments, the threshold value may be approximately zero. In still another of the various embodiments, the threshold may be approximately −0.02.
At 54, the acquired audio signals may be transformed using a DFT algorithm, such as one of the Fast Fourier Transform (FFT) algorithms. At 56, the transformed signals may be arranged into an array. When the transformed signals are arranged in the array, the logarithm of each of the elements may be taken, as shown in expression (22) above. At 58, a linear regression on the array elements may be performed, as shown in expression (23), to generate a slope value. At decision 60, the slope value generated at 58 may be compared to a predetermined threshold value. At 62, if the slope value is greater than the threshold value, then the method 50 determines that there is no wind noise spectral content in the audio signals. If the slope value is less than the threshold value, then the method 50 determines that wind noise spectral content is present in the audio signals, at 64. At 66, the noise-related data may be stored. At 68, the method 50 determines whether all blocks have been processed by comparing the blknum to MAXBLK. If all blocks have been processed, then the method 50 ends. Otherwise, blknum is incremented at 70, and the method 50 returns to 52, and acquires another audio signal block.
Still yet another method of detecting the presence of wind noise spectral content in an audio signal is described below. The disclosed method includes calculating a Cross Correlation Index (CCI), as shown in expression (20), for a selected number of the frequency bins. The CCI values may then be averaged to yield an average value (AVAL) that may be compared to a predetermined threshold value to determine if wind noise spectral content is present in an audio signal. Accordingly, AVAL may be expressed as:
AVAL=AVG(CCI(0,blknum) . . . CCI(LFNUM,blknum)) (24)
Where AVG is an operator that performs arithmetic averaging on the arguments in expression (24), and LFNUM is a parameter that expresses a maximum bin number to be included in the averaging. The value for AVAL may be compared to a predetermined threshold value, so that if AVAL<threshold, wind noise spectral content may be present in the audio signals.
In another method, a combination of the foregoing methods may be employed to determine if audio signals include wind noise spectral content. As a preliminary matter, the audio signals, generally referred to as A-format signals, may be decomposed (or processed) into B-format signals having non-directional and directional components that may generally include an omnidirectional component W, and X, Y and Z directional components. Accordingly, the following definitions may be made:
AVG1=AVG(LFP(a(1)), LFP(a(2)) . . . LFP(a(n));
AVG2=AVG(LFP(b(1)), LFP(b(2)) . . . LFP(b(n));
AVG3=AVG(SLOPE a(1), SLOPE a(2) . . . SLOPE a(n));
AVG4=AVG(SLOPE b(1), SLOPE b(2) . . . SLOPE b(n)); and
AVG5=AVG(CCI)
Where LFP may be calculated according to expression (21) presented above, the SLOPE may be calculated according to expression (23) presented above, and the CCI may be calculated according to expression (20). In the foregoing, a(i) represents A-format signals corresponding to the discrete frequency bins, and b(i) includes B-format signals derived from the A-format signals. Briefly, and in general terms, audio signals (e.g., A-format audio signals) received by an audio device may be decomposed to yield B-format signals that exhibit both non-directional and directional characteristics. For example, the A-format signals may be decomposed and processed to form the B-format signals having a W component, which is a generally non-directional monaural component, and up to three directional components, generally referred to as the X, Y and Z B-format components. The foregoing definitions may be combined to yield a parameter COMB:
COMB=C1AVG1+C2AVG2+C3AVG3+C4AVG4+C5AVG5 (25)
Where C1, C2, C3, C4 and C5 are constants that may be selected to provide suitable weighting in expression (25). In order to determine if a wind noise spectral content may be present in the audio signals, the parameter COMB may be compared to a threshold value, so that if COMB<threshold, then wind noise spectral content may be present in the audio signals. Although expression (25) utilizes the LFP, SLOPE and CCI in evaluating COMB, it is understood that in other embodiments, the LFP and the SLOPE may be used, or alternatively, the LFP and CCI, or the SLOPE and the CCI may be used.
The microphone array 132 may be coupled to an audio processing unit 134 that may be configured to provide power to the array 132, and to receive and amplify signals received from the array 132. The audio processing unit 124 may also be configured to perform other signal processing functions, such as analog-to-digital (ND) conversion of the analog signals received from the array 132 and provide storage for analog or digital signals. The audio processing unit 124 may also be configured to provide level and data compression of the received signals. Still other audio enhancements may be provided by the audio processing unit 124, including equalization and filtering, or other audio enhancements.
The audio processing unit 134 may be coupled to an audio decomposition unit 136. Briefly, the audio decomposition unit 136 may be configured to receive non-decomposed audio signals (e.g., A-format audio signals) from the audio processing unit 134, and decompose the received A-format audio signals into B-format signals (e.g., W, X, Y and Z components) that exhibit both non-directional and directional characteristics.
Still referring to
The apparatus 130 also includes a wind noise spectrum reduction unit 140 that is configured to receive B-format signals from the audio decomposition unit 136, and also coupled to the wind noise spectrum estimation unit 138. The wind noise spectrum reduction unit 140 may be configured to receive the wind noise spectral information generated by the wind noise spectrum estimation unit 138, and to remove the wind noise spectral effects from the B-format signals received by the wind noise spectrum reduction unit 140. Accordingly, the wind noise spectrum reduction unit 140 may generate an output 142 that has the wind noise spectral portion removed, as will be discussed in greater detail below.
The apparatus 150 may also include a frequency domain cross correlation unit 152 that is operable to transform the A-format signals, and to form autocorrelations and cross correlations based upon the transformed values. In addition, the frequency domain cross correlation unit 152 may be configured to generate the cross correlation index (CCI) and to compare the CCI to a threshold value, as discussed in detail above.
The apparatus 170 may also include the frequency domain cross correlation unit 152, which was discussed earlier in conjunction with
The wind noise spectral estimation unit 174 may include the wind noise spectral estimation unit 138, as also discussed earlier in connection with
Referring still to
A noise spectrum estimation unit 182 may be configured to receive input signals 184 from a plurality of other spectrum estimation units that are specifically tailored to process and estimate signals received from these sources. For example, the input signals 184 may receive signals from an electronic noise spectrum estimation unit, a motor drive mechanism noise spectrum estimation unit, a background noise spectrum estimation unit, although other input signals 184 may be dedicated to other noise spectrum estimation units, if desired. The noise spectrum estimation unit 182 may therefore be configured to process the input signals 184 and the wind noise spectrum estimation received from the wind noise spectrum estimation unit 138. The noise spectrum estimation unit 182 may be coupled to a noise reduction unit 186 that may be configured to substantially remove the effects of the noise sources, so that an output 188 may communicate audio signals that are not affected by the noise sources.
One suitable form for the gain factor may be expressed by G(f;n)=1−(ABS(N(f;n))/ABS(Sin(f;n)), provided that ABS(Sin(f;n))−C6 ABS(N(f;n))≧C7 ABS (N(f;n)) where C6 and C7 are suitably selected constants. Another suitable form for the gain factor may include G(f;n)=(ABS(N(f;n))/ABS(Sin(f;n)), when ABS(Sin(f;n))−C6 ABS(N(f;n))<C7 ABS (N(f;n)). In the various embodiments, the constant C6 may range between approximately zero and one. In still other embodiments, the constant C6 may be approximately 0.1. The constant C7 may also range between approximately zero and one, and may also range between approximately 0.3 and 0.4.
COND=(ABS(S(f;n))−C8 ABS(N(f;n))≧C9 ABS N(f;n)) ‘OR’((f<fL) ‘AND’ maskCC(f;n)=1), where S(f;n) and N(f;n) are the input spectrum and the wind noise estimate, respectively, and ‘OR’ and ‘AND’ are Boolean logical operators. MaskCC(f;n) is a variable mask, which will be described in greater detail below. The constant C8 may be greater than, or equal to one, but in the various embodiments, may range between approximately three and approximately six. The constant C9 may range between zero and one, but in the various embodiments, may range between approximately 0.005 and approximately 0.1. Since COND is a logical expression, it will yield a value of ‘TRUE’ of ‘FALSE’, which depends upon satisfaction of the inequality. At decision 204, a logical state of the logical conditional variable COND may be determined. If COND is “TRUE”, then the gain may be expressed as: G(f;n)=1−RATIO, where RATIO=(ABS(N(f;n))/ABS(Sin(f;n)), as shown at 206. Otherwise, the method 200 proceeds to 208, where the gain may be expressed as: G(f;n)=RATIO.
The variable maskCC(f;n) may be calculated by first assigning a value to a temporary variable mask(f) according to a comparison between CCI(f;n) and a selected threshold value CCTH. Accordingly, if CCI(f;n)>CCTH, the temporary variable mask(f;n) may be assigned a value of one. Otherwise, mask(f;n) is set equal to zero. With values assigned to the temporary variable mask(f;n), and if any bin is characterizable as a signal bin, the immediate neighbors of the bins may also be included. Accordingly, if the temporary variable mask(f;n)>0, then mask(f−PKWDTH; n)= . . . mask(f+PKWDTH; n)=1, where PKWDTH is a selected parameter. Furthermore, if any frequency bin may be categorized as a signal bin, then the status may be maintained using a hangover method. Briefly, a hangover method may be implemented by selecting temporary variable mask(f;n) values that are greater than zero, and assigning maskCC(f;n)=HOVER, where HOVER is a selected hover block. If maskCC(f;n)>0., then the maskCC(f;n) may be decremented, so that maskCC(f;n)=maskCC(f;n)−1.
From the foregoing it will be appreciated that, although specific embodiments have been described herein for purposes of illustration, various modifications may be made without deviating from the spirit and scope of the disclosure. Furthermore, where an alternative is disclosed for a particular embodiment, this alternative may also apply to other embodiments even if not specifically stated.