The present invention relates to the digital processing of signals from microphones or other such transducers, and in particular relates to a device and method for measuring the amount of wind noise or the like in such signals, for example to enable wind noise compensation or suppression to be initiated or controlled depending on the amount of wind noise present.
Wind noise is defined herein as a microphone signal generated from turbulence in an air stream flowing past a microphone port or over a microphone membrane. This is as opposed to the sound of wind blowing past other objects distal from the microphone, such as the sound of rustling leaves as wind blows past a tree in the far field, and such distal noise sources do not comprise wind noise within the present definition.
For wearable devices, the proximity of a human body (e.g. head, torso, and/or hand) may generate additional turbulence and wind noise. Wind noise is impulsive and often has an amplitude large enough to exceed the nominal speech amplitude. Wind noise can thus be objectionable to the user and/or can mask other signals of interest. It is desirable that digital signal processing devices are configured to take steps to ameliorate the deleterious effects of wind noise upon signal quality. To do so requires a suitable means for reliably measuring wind noise when it occurs, without falsely indicating that wind noise exists to some extent when in fact other factors are affecting the signal.
Some previous approaches to wind noise detection (WND) assume that non-wind sounds are generated in the far field and thus have a similar sound pressure level (SPL) and phase at each microphone, whereas wind noise is substantially uncorrelated across microphones. However, for non-wind sounds generated in the far field, the SPL between microphones can substantially differ due to localized sound reflections, room reverberation, and/or differences in microphone coverings, obstructions, or location such as due to orthogonal plane placement of microphones on a smartphone with one looking inwards and the other looking outwards. Substantial SPL differences between microphones can also occur with non-wind sounds generated in the near field, such as a telephone handset held close to the microphones. Differences in microphone output signals can also arise due to differences in microphone sensitivity, i.e. mismatched microphones, which can be due to relaxed manufacturing tolerances for a given model of microphone, or the use of different models of microphone in a system.
The spacing between the microphones causes non-wind sounds to have different phase at each microphone sound inlet, unless the sound arrives from a direction where it reaches both microphones simultaneously. In directional microphone applications, the axis of the microphone array is usually pointed towards the desired sound source, which gives the worst-case time delay and hence the greatest phase difference between the microphones.
When the wavelength of a received sound is much greater than the spacing between microphones, i.e. at low frequencies, the microphone signals are fairly well correlated and previous WND methods might not falsely detect wind at such frequencies. However, when the received sound wavelength approaches the microphone spacing, the phase difference causes the microphone signals to become less correlated and non-wind sounds can be falsely detected as wind. The greater the microphone spacing, the lower the frequency above which non-wind sounds will be, or might be, falsely detected as wind, i.e. the greater the portion of the audible spectrum in which false detections might occur. False detection may also occur due to other causes of phase differences between microphone signals, such as localized sound reflections, room reverberation, and/or differences in microphone phase response or inlet port length. Given that the spectral content of wind noise at microphones can extend from below 100 Hz to above 10 kHz depending on factors such as the hardware configuration, the presence of a user's head or hand, and the wind speed, it is desirable for wind noise detection to operate satisfactorily throughout much if not all of the audible spectrum, so that wind noise can be detected and suitable suppression means activated only in sub bands where wind noise is problematic.
In light of the above-noted difficulties of differentiating wind noise from other signal types, to date wind noise has been addressed by coarse detection methods, being systems which simply output a binary flag indicating whether wind noise is present or absent. In such systems the binary output detection flag is then used to alter the operation of other processing modules, such as to switch wind noise reduction on or off in a binary manner. To even produce such a binary detection output is nevertheless difficult to accomplish with sufficient accuracy, due to the complexities noted above.
Any discussion of documents, acts, materials, devices, articles or the like which has been included in the present specification is solely for the purpose of providing a context for the present invention. It is not to be taken as an admission that any or all of these matters form part of the prior art base or were common general knowledge in the field relevant to the present invention as it existed before the priority date of each claim of this application.
Throughout this specification the word “comprise”, or variations such as “comprises” or “comprising”, will be understood to imply the inclusion of a stated element, integer or step, or group of elements, integers or steps, but not the exclusion of any other element, integer or step, or group of elements, integers or steps.
In this specification, a statement that an element may be “at least one of” a list of options is to be understood that the element may be any one of the listed options, or may be any combination of two or more of the listed options.
According to a first aspect, the present invention provides a device for measuring wind noise, the device comprising:
According to a second aspect, the present invention provides a non-transitory computer readable medium comprising computer program code means to make a computer execute a procedure for wind noise measurement, the computer program product comprising:
According to a third aspect, the present invention provides a method for measuring wind noise, the method comprising:
In some embodiments of the invention, the scalar metric reflecting the intensity of wind noise may be a single scalar non-binary value. The scalar metric reflecting the intensity of wind noise may, in some embodiments, be expressed as a probability between 0 and 1, reflecting a probability of the presence of wind noise.
In some embodiments of the invention, the scalar non-binary metric reflecting an intensity of wind noise comprises a plurality of measures respectively determined from distinct microphone signals. In some embodiments of the invention, at least some of the plurality of measures comprise scalar non-binary values. In some embodiments of the invention, the scalar metric reflecting the intensity of wind noise is a measure of wind noise power.
In some embodiments of the invention, there may be provided at least one wind noise measurement cell receiving microphone signals from at least two microphones, wherein the wind noise measurement cell is controllable by a control signal to measure wind noise either by (a) comparing sample distributions from two microphone signals or (b) comparing temporally spaced sample distributions from a single microphone signal. The control signal may in some embodiments be configured to exclude a particular microphone signal from the cell measurements at times when the respective microphone is occluded. In some embodiments of the invention, wind noise measures from at least two wind noise measurement cells are passed to a decision function module configured to produce a combined output measure from the individual wind noise measures.
In some embodiments of the invention, the first and second signals are made to be temporally distinct by taking temporally distinct samples of a single microphone signal. In some embodiments of the invention, the first and second signals are made to be spatially distinct by taking the first signal from a first microphone and taking the second signal from a second microphone spaced apart from the first microphone.
In some embodiments of the invention, for each sub-band of a plurality of sub-bands, a scalar non-binary metric reflecting an intensity of wind noise present in the first and second signals in that sub-band may be derived. In some embodiments of the invention, wind noise may be measured first in respect of a lower frequency sub-band, and may only measure wind noise in respect of a higher frequency sub-band if non-negligible wind noise is measured in the lower frequency sub-band. In some embodiments of the invention, wind noise reduction may be applied only in each sub-band in which the measurement of wind noise is greater than a respective sub-band threshold.
In some embodiments of the invention, the difference between the first distribution and the second distribution may be calculated and copied to more than one wind noise measurement block.
In some embodiments of the invention, the decision function module is configured to produce the combined output measure as a scalar metric from the individual wind noise measures by applying a neural network. In some embodiments of the invention, the decision function module may be configured to produce the combined output measure as a scalar metric from the individual wind noise measures by applying a hidden Markov model. In some embodiments of the invention, the decision function module is configured to produce the combined output measure as a binary metric from the individual wind noise measures by applying a truth table.
In some embodiments of the invention, the device may be a telephony headset or handset, a still camera, a video camera, a tablet computer, a cochlear implant or a hearing aid.
In some embodiments, the decision function module is configured to produce the combined output measure as a scalar metric from the individual wind noise measures by applying one or more of: averaging, weighted sums, maxima, minima, or a combination thereof.
In some embodiments, each microphone signal is matched for amplitude so that an expected variance of each signal is the same or approximately the same. In some embodiments, the first and second microphone signals are matched for an acoustic signal of interest such as speech before the wind noise measurement is performed.
In some embodiments, the distribution of each of the first and second signals comprises a cumulative distribution of signal sample magnitude. In some embodiments, the distribution of each of the first and second signals is determined only at one or more selected values. In some embodiments, calculating the difference between the first distribution and the second distribution is performed by calculating the point-wise difference between the first and second distribution at each selected value, and summing the absolute values of the point-wise differences to produce a measure of the difference between the first distribution and the second distribution.
In some embodiments, the or each microphone signal is high pass filtered to remove any DC component. In some embodiments, the wind noise measurement is performed on a frame-by-frame basis by comparing the distribution of samples from a single frame of each signal. In some embodiments, the difference between the first distribution and the second distribution is smoothed over multiple frames.
An example of the invention will now be described with reference to the accompanying drawings, in which:
Corresponding reference characters indicate corresponding components throughout the drawings.
The signals from each microphone can however each be independently impacted by wind noise arising from wind passing the respective microphone port and in the immediate vicinity of each respective microphone.
In general, in accordance with the invention, the wind noise measurement output comprises at least one scalar metric, or a collection of binary metrics and/or scalar metrics which collectively constitute a non-binary output, to thereby indicate a severity or intensity of wind noise observed in the microphone signals. In this embodiment this analysis is performed on a sub-band basis, whereby for each sub-band the WNM 200 produces an output of at least one scalar non-binary metric, or a collection of binary and/or scalar metrics which collectively constitute a non-binary output, which indicates an intensity of wind noise observed in the microphone signals in that particular sub-band. The “per-sub-band” wind intensity metrics are output for example for use by a wind noise reduction (WNR) module configured to apply any suitable technique to reduce wind noise in affected sub-bands while attempting to preserve the target signal (e.g. speech), responsive to the observed severity of wind noise. Any suitable wind noise reduction technique may be applied.
The scalar yz is a combined overall wind presence indicator which in this embodiment is produced by a decision function block DFΣ360, by OR-ing the individual decisions y11, y21, and y31. In alternative embodiments, the individual decisions may be AND-ed, or any other method of aggregating the individual decisions into a single indicator yz may be used.
The vector Y3 contains grouped aggregations of individual wind presence indicators such that Y3={y123,y133,y233}, where yij3, i=1.3, j=1.3, i≠j, yij3=yji3 is an aggregated wind presence indicator produced by combining individual wind presence indicator for the i-th and j-th microphone respectively, this being accomplished by blocks DF12 370, DF13 380 and DF23 390. In this embodiment, each individual grouped aggregated wind presence indicator, yij3, is produced by the respective blocks DF12 370, DF13 380 and DF23 390 by OR-ing the individual decisions yi1, yj1 from cells 310, 330, 350. In alternative embodiments, the individual decisions from cells 310, 330, 350 may be AND-ed by the respective blocks DF12 370, DF13 380 and DF23 390, or any other method of aggregating the individual decisions into a single indicator yz may be used.
It is to be noted that, while the embodiment of
Addressing
Each Cell 310, 330, 350 is also input with an individual control signal, control, from Select module 340. The control signal is used to switch between single- and multi-microphone wind noise measurement schemes. The Select module 340 may be configured such that individual control signals are changed in real time in response to changing environmental or situational conditions. For example, at times one or more of the microphones n may be blocked or obstructed or occluded such as by dirt or the user's hand, with the result that electrical signal xi generated by the microphone Mi is severely attenuated or distorted, and therefore features of electrical signal xi generated by Mi used for wind presence decision become unreliable. Detecting blocked mics may be performed by any suitable method, including but not limited to the teachings of U.S. Provisional Patent Application No. 62/529,295 by the present Applicant.
In response to detection of a blocked microphone, or detection of any other circumstance requiring exclusion of a given microphone signal, the Select module 340 generates a control signal which configures the Celli such that it would only use electrical signal xi+1 from an unobstructed microphone, Mi+1, as discussed further in the following with reference to
In the WNM 200 of
By outputting a whole suite of WNM indications comprising the individual measures Y1, the overall measure y2, and the grouped measures Y3, other modules in the device can use the set of outputs Y1, y2, and Y3 in any unique manner suitable for that particular module. This is particularly important as the WNM module 200 acts as something of a “master switch” in relation to a number of other signal processing stages of the device, and thus has the significant responsibility of activating, deactivating, or substantially modifying the operation of a number of other processing functions. The present invention thus recognises that a single binary wind noise “detection” output is inappropriately coarse and that outputting multiple binary indicators Y1, y2, and Y3, or one or more soft WNM indicators, enables the powerful system-wide effect of the WNM module to be more accurately applied throughout the remainder of the signal processing functions of the device.
For example, some modules should be deactivated or should pause adaption very quickly in response to the onset of even very small amounts of wind noise, in cases where the operation or adaption of such modules is easily and quickly corrupted by wind noise. In contrast, other modules should be activated in response to the onset of wind noise only once there is a great degree of certainty that wind noise is present in the signal. The present invention thus recognises that a single binary detection output is inappropriately coarse and fails to meet the respective unique and diverse requirements of multiple diverse other processing functions. The present invention recognises that instead outputting a wind noise measure comprising multiple binary indicators, or one or more soft non-binary metrics, enables the powerful system-wide effect of the WNM module 200 to be more accurately applied on a case-by-case basis throughout the remainder of the signal processing functions of the device.
The digitised signals x1 and x2 are input into the feature calculation modules of block 316, Featurei and Featurei+1 respectively, where signal features, fi and fi+1, used for measurement of wind in the input signals, are calculated. The features fi and fi+1 are fed from block 316 to the Criterion Function module 318, CFi where a final decision about wind noise presence in the signals xi and xi+1 is made. The Criterion Function module 318 implements a criterion function Q(*) which combines features into a single scalar yi so that:
yi=Q(fi,fi+1)
Thus, in the Dual Microphone configuration of
In the case of the Dual Microphone configuration of
On the other hand, when a Single Microphone configuration is used as depicted in
Thus, in Single Microphone configuration, the scalar yi indicates an intensity of wind noise in the input signal xi, but conveys nothing about wind noise in the input signal xi+1. Similarly, in single microphone configuration the scalar yi indicates intensity of wind at the microphone Mi, but conveys nothing about the intensity of wind at the microphone Mi+1.
In the embodiment of
In this embodiment, empirical distribution functions (EDF) of signals xi and xi+1 are used as the features fi and fi+1, and the criterion function Q(*) was the mean absolute difference between fi and fi+1 quantised to a single bit: 0 (wind not present) and 1 (wind present) via a predefined threshold or a combination of thresholds (e.g. if Schmitt trigger is used). To explain the nature of the EDF features we consider
However, the use of EDFs in the present invention recognises that EDFs are affected by wind noise in the input signal(s).
Alternative embodiments may calculate the difference between cumulative distribution functions 620 and 630 in any suitable manner, such as by using a smaller or larger number of points or by characterising the respective distribution and then comparing the extracted characteristics to each other. Other embodiments may also omit normalisation of the result and/or may normalise the result to any suitable scale. Smoothing of the metric is desirable due to the wide variability visible in the unsmoothed results of
With the above-described approach to determining the individual metrics, we turn now to
In both Dual and Single Microphone configurations (
In the Dual Microphone configuration (
In the Single Microphone configuration (
As noted for
The calculated features, f1, f2, and f3 are fed from 910, 912, 914, respectively, into a set of criterion functions, CF1 930, CF2 932 and CF3 934, together with a respective delayed copy of the features, f1, f2, and f3 provided via delay blocks 920, 922, 924. Criterion function modules CF1 930, CF2 932 and CF3 934 calculate individual (single microphone) wind measures y1, y2, and y3 from features f1, f2, and f3, respectively.
The features, f1, f2, and f3 are also fed from 910, 912, 914, respectively into a set of criterion function modules CF12 936, CF13 938 and CF23 940, so that pairwise dual microphone wind presence measures y12, y13, and y23 are calculated from each pair combination of features f1, f2, and f3 respectively.
While not shown in
yi,i+1, . . . ,L=Q(fi,fi+1, . . . ,fL)
where L>2 is the total number of features taking part in the decision; L depends on the number of microphones and allowed wind noise measurement complexity.
In the WNM 900 of
Individual (single mic) measures y1, y2, y3, and pairwise (dual mic) measures y12, y13, and y23 are then passed to the multiple-input multiple output (MIMO) Decision Fusion module, DF 950. The Decision Fusion module outputs the Individual Measures (single mic) Y1, the Overall Measure y2 and the Grouped Measures (dual mics) Y3, as described previously. The DF module 950 may be implemented as a neural network, hidden Markov model (HMM) or any other appropriate algorithm for generating scalar non-binary measures, or as a MIMO Truth Table or any other appropriate algorithm in alternative embodiments where one or more of the measures Y1, y2 and Y3 are binary decisions.
In each the described embodiments, with regard to the Decision Function, in the case of a two microphone wind noise measurement module two single channel wind noise blocks, and one two channel noise block can be instantiated. The output of the two channel block should increase if wind is present on either microphone. The output of the single channel block should increase if wind is present on that single microphone. If the two channel block goes high without either single channel block going high then this is likely to be a false fire, and it is to be noted that the decision block can protect against this false fire by only setting the output high (or biasing it higher) if the two channel block is high AND either of the single channel blocks are high. Similar logic can be applied in the decision block of the single channel wind noise measurement modules, the output can only go high (or be biased higher) if the single channel block is high AND the two channel block is high.
Other examples of methods to combined “soft” wind presence decisions, which may be utilised in the DF block in other embodiments of the invention, include:
where N is the number of “soft” decisions
where w is a corresponding weight
A further advantage provided by embodiments of the invention outputting a scalar non-binary wind noise measure, is that use of a single threshold may result in rapid and repeated switching of a binary output from ON to OFF and back to ON again, many times in quick succession, even when smoothed. Use of a soft output enables some hysteresis to be introduced so that an OFF to ON threshold can differ from an ON to OFF threshold, on a module by module basis, so that when the WNM indication is hovering around one such threshold it will not cause inappropriately fluctuating responses in each downstream module.
While in
It is noted that wind noise energy tends to be concentrated at the low portion of the spectrum; and with increased wind velocity the wind noise occupies progressively more and more bandwidth. As wind noise energy for many wind noise situations is thus mainly located at low frequencies, a significant portion of the speech spectrum remains relatively unaffected by wind noise. Therefore in order to preserve the naturalness of the processed audio signal by not modifying the unaffected bands, some embodiments of the present invention recognise that wind-noise reduction techniques which attempt to reduce wind noise energy while preserving signal (e.g. speech) energy, should be applied selectively only to the portion of spectrum which is affected by wind noise. Thus the “wind noise-free” parts of the speech signal spectrum will not be unnecessarily modified by the system. Hence, this selective reduction of wind noise requires an improved measurement metric which can indicate a severity of wind noise in particular spectral sub-bands. Accordingly, it is to be understood that the techniques described herein for full band wind noise measurement can similarly be applied on a sub-band basis, whereby sub-band microphone signals are created by use of appropriate time domain bandpass filters and wind noise detection is applied in each sub band.
In the described embodiments each microphone signal can be matched for amplitude so that an expected variance of each signal is the same or approximately the same. The microphone signals can also be matched for an acoustic signal of interest before the wind noise measurement is performed.
In some embodiments, while the microphone signals are captured by the headset 100, the microphone signals and/or features may be transmitted using the transceiver 18 to a remote system such as a smartphone or a remote system located on one or more remote servers in a cloud computing environment, for computation of one or more parts of the described wind noise measurement. Signals based on the determinations of the remote system may then be returned to the headset 100 or an associated smartphone or other local device for further action.
It will be appreciated by persons skilled in the art that numerous variations and/or modifications may be made to the invention as shown in the specific embodiments without departing from the spirit or scope of the invention as broadly described. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive.
Number | Name | Date | Kind |
---|---|---|---|
6882736 | Dickel et al. | Apr 2005 | B2 |
7171008 | Elko | Jan 2007 | B2 |
7340068 | Petersen et al. | Mar 2008 | B2 |
7464029 | Visser | Dec 2008 | B2 |
9906882 | Sapozhnykov | Feb 2018 | B2 |
20050041825 | Rasmussen | Feb 2005 | A1 |
20120128163 | Moerkebjerg | May 2012 | A1 |
20140161271 | Teranishi | Jun 2014 | A1 |
Number | Date | Country |
---|---|---|
2011030022 | Oct 2011 | JP |
2013091021 | Jun 2013 | WO |
2016011499 | Jan 2016 | WO |
WO 2016011499 | Jan 2016 | WO |
Entry |
---|
Sapozhnykov, V.V., “Sub-Band Detector for Wind-Induced Noise” J Sign Process Syst (2018), https://doi.org/10.1007/s11265-017-1325-8. |
Wilson, Keith et al:“Discrimination of Wind Noise and Sound Waves by Their Contrasting Spatial and Temporal Properties”, Acta Acustica United With Acustica, vol. 96 (2010) 991-1002. |
Visser, E. et al., A spatio-temporal speech enhancement scheme for robust speech recognition in noisy environments, Speech Communication 41 (2003) 393-407. |
Number | Date | Country | |
---|---|---|---|
20190244627 A1 | Aug 2019 | US |