The present disclosure relates generally to noise activity detectors for use in for example noise reduction systems.
In many signal processing applications, such as echo cancellation, speech recognition, speech encoding, voice-over-IP, and in particular noise reduction systems, it is important to gather real-time information and statistics about the noise in the signal. This is most often achieved by detecting when there is a useful amount of the desired signal and treating that portion of the signal as “non-noise.” At other times, the signal is assumed to be only noise and the information and statistics that are desired are gathered during those times.
In single channel systems, the noise and desired signal are mixed, and the incoming mixed noisy signal is considered to be a linear sum of the desired signal and unwanted noise. By detecting when there is the presence of desired signal in the mixed signal, the noise information is not updated during this part of the signal. Instead, updating of the noise characteristics at other times allows noise reduction, for example, to be executed with appropriate processing.
In voice communication systems, the need for determining the presence of noise-only periods has given rise to the proliferation of numerous voice determination methods, often called voice detection or voice activity detection (VAD) methods, since the voice portion of the mixed signal is the desired portion.
Such methods usually rely upon the fact that talkers must hear at least a portion of their own voice in order to form their words properly. In order to reliably hear themselves speak, talkers need to keep their own voice about 10 dB above the ambient or background noise level. Thus, in the presence of loud background noise, talkers naturally elevate their voice level to keep it slightly above the competing background noise level.
Voice activity detection methods, whether implemented in the time domain or in the frequency domain, utilize this fact. Many such systems are based upon means that detect when the total energy of the incoming noisy signal is above a threshold, and indicate that there is the presence of voice when this condition is met. Of course, the threshold must be adjusted to be always above the level of the background noise portion of the signal but below the level of the combined voice-plus noise level. Many complex methods have been devised to create such real-time dynamic threshold adjustment for this purpose.
However, such “reverse” methods—that is the detection of the desired signal so that the noise periods can be implied, rather than the direct detection of the noise portions themselves, have drawbacks. For example, in noise above approximately 90 dB SPL (Sound Pressure Level) it becomes nearly impossible for humans to further elevate the loudness of their voice and the SNR (signal-to-noise ratio) of the input signal drops, often to below 0 dB (1:1).
Conventional voice detection systems operate poorly, or not at all, when the SNR becomes low—for example below 10 dB. As long as the voice signal power is significantly above the noise signal power, such systems are able to detect the presence of voice. But in increasingly noisy situations, the voice detection accuracy decreases until such systems fail to operate at all.
Another significant problem is the detection of wind noise, the noise created when air flows over microphones used in voice detection systems. With the proliferation of mobile communication devices, wind noise is becoming of critical importance. Such noise can exhibit highly variable properties, and therefore the noise of wind is often misclassified by such systems. When this happens, the noise reduction of VAD-based noise reduction systems can be compromised because the noise template is incorrectly updated. For wind noise to be correctly classified, additional methods or processes must be implemented to reliably detect it, at the cost of more complexity and expense.
Yet another difficulty with conventional voice detection schemes is that voice signals do not abruptly terminate but slowly decay after each utterance. Voice detection based upon the voice power being above a noise power threshold will falsely indicate the end of voicing when the voice signal's decaying tail drops below the threshold level, even though voice is still present. Therefore these systems often add a so called “hangover” timer to delay the onset of the noise indication.
Classical voice detection methods assume that the background noise is stationary or only slowly varying. In non-stationary noise conditions, classical voice detection schemes are unreliable, since rapid changes in noise level, especially upward jumps in noise, can not be distinguished from the onset of a voice burst and therefore give false indications of voice presence.
Such voice detectors also react to the presence of nearby voices other than that of the user, even though background voices are actually “noise” in systems where the user's own voice is the only desired signal.
Further, virtually all voice detection methods rely upon setting or updating one or more thresholds based upon the prior history of the signal, rather than on instantaneous current conditions. By relying upon prior information, such thresholds can not update quickly, and the voice detection output is slow to react to rapid changes in background noise, creating errors until the system can eventually adjust.
The problems with voice detection methods historically have been addressed by adding enhancements to the basic principle of signal power threshold detection. Such enhancements include means for tracking noise levels in order for the threshold to be updated in real time, the addition of separate wind detector schemes, improved sensitivity methods allowing the threshold to be set with greater precision to operate in lower SNR conditions, adding hangover methods to prevent the false indication that voicing has ended when at the end of an utterance it has simply decayed below the threshold, and creating lockout periods that wait for a time longer than any expected naturally occurring voicing period after which the threshold is allowed to adjust more rapidly in order to attempt to accommodate bursts or steps in background noise level. However, using such enhancements still produces limited operation and still results in the false detection of noise-only signal conditions.
Yet other voice detection methods have been created that rely upon the availability of more than one signal, such as from an array of sensors or microphones. However, these systems have the great disadvantage that they only work when multiple signals are available, or where multiple sensors can be accommodated. Also, they increase the complexity, cost, size and power consumption of such systems.
Other solutions that are known rely upon complex signal processing computations such as autocorrelation, cross correlation, variance, Linear Predictive Coding (LPC) coefficients, various statistical noise predictors (e.g. Gaussian, Laplacian and Gamma distributions), stationarity measures, and so on. In general these solutions do not significantly improve performance, and are still aimed at the detection of voicing periods rather than detection of the noise-only periods themselves.
As described herein, a method for generating an indication of noise activity in a signal includes:
a) calculating average energy of the signal in a critical bandwidth;
b) determining a frequency-dependent threshold function;
c) generating a dynamic modification of the frequency-dependent threshold function using the average energy;
d) identifying frequency components of the signal having energy that is above threshold values determined by the threshold function at corresponding frequencies, and determining a first average energy value representing an average energy of the identified frequency components with energy above the threshold values;
e) identifying frequency components of the signal having energy that is below threshold values determined by the threshold function at corresponding frequencies, and determining a second average energy value representing an average energy of the identified frequency components with energy below the threshold values;
f) applying an offset value to at least one of the first and second average energy values;
g) comparing, after application of said offset value, the resultant first and second average energy values with one another; and
h) indicating the presence of noise activity if, as a result of said comparison, it is determined that, the resultant first average energy value is below the resultant second average energy value.
Also as described herein, a noise activity detector for generating an indication of noise activity in a signal includes:
a) a first circuit configured to calculate the average energy in a critical bandwidth;
b) a second circuit configured to determine a frequency-dependent threshold function;
c) a third circuit configured to generate a dynamic modification of the frequency-dependent threshold function using the average energy;
d) a fourth circuit configured to identify frequency components of the signal having energy that is above threshold values determined by the threshold function at corresponding frequencies, and to determine a first average energy value representing an average energy of the identified frequency components with energy above the threshold;
e) a fifth circuit configured to identify frequency components of the signal having energy that is below threshold values determined by the threshold function at corresponding frequencies, and to determine a second average energy value representing an average energy of the identified frequency components with energy below the threshold;
f) a sixth circuit configured to apply an offset value to at least one of the first and second average energy values;
g) a seventh circuit configured to compare, after application of said offset value, the resultant first and second average energy values with one another; and
h) an eight circuit configured to indicate the presence of noise activity if, as a result of said comparison, it is determined that the resultant first average energy value is below the resultant second average energy value.
Also as described herein, a noise activity detector for generating an indication of noise activity in a signal includes:
a) means for calculating average energy of the signal in a critical bandwidth;
b) means for determining a frequency-dependent threshold function;
c) means for generating a dynamic modification of the frequency-dependent threshold function using the average energy;
d) means for identifying frequency components of the signal having energy that is above threshold values determined by the threshold function at corresponding frequencies, and
determining a first average energy value representing an average energy of the identified frequency components with energy above the threshold values;
e) means for identifying frequency components of the signal having energy that is below threshold values determined by the threshold function at corresponding frequencies, and determining a second average energy value representing an average energy of the identified frequency components with energy below the threshold values;
f) means for applying an offset value to at least one of the first and second average energy values;
g) means for comparing, after application of said offset value, the resultant first and second average energy values with one another; and
h) means for indicating the presence of noise activity if, as a result of said comparison, it is determined that, the resultant first average energy value is below the resultant second average energy value.
Also as described herein, a program storage device readable by a machine, embodying a program of instructions executable by the machine to perform a method for generating an indication of noise activity in a signal, the method includes:
a) calculating average energy of the signal in a critical bandwidth;
b) determining a frequency-dependent threshold function;
c) generating a dynamic modification of the frequency-dependent threshold function using the average energy;
d) identifying frequency components of the signal having energy that is above threshold values determined by the threshold function at corresponding frequencies, and determining a first average energy value representing an average energy of the identified frequency components with energy above the threshold values;
e) identifying frequency components of the signal having energy that is below threshold values determined by the threshold function at corresponding frequencies, and determining a second average energy value representing an average energy of the identified frequency components with energy below the threshold values;
f) applying an offset value to at least one of the first and second average energy values;
g) comparing, after application of said offset value, the resultant first and second average energy values with one another; and
h) indicating the presence of noise activity if, as a result of said comparison, it is determined that, the resultant first average energy value is below the resultant second average energy value.
The accompanying drawings, which are incorporated into and constitute a part of this specification, illustrate one or more examples of embodiments and, together with the description of example embodiments, serve to explain the principles and implementations of the embodiments.
In the drawings:
Example embodiments are described herein in the context of a processor or individual circuits, or a flow diagram of a process that is performed. Those of ordinary skill in the art will realize that the following description is illustrative only and is not intended to be in any way limiting. Other embodiments will readily suggest themselves to such skilled persons having the benefit of this disclosure. Reference will now be made in detail to implementations of the example embodiments as illustrated in the accompanying drawings. The same reference indicators will be used to the extent possible throughout the drawings and the following description to refer to the same or like items.
In the interest of clarity, not all of the routine features of the implementations described herein are shown and described. It will, of course, be appreciated that in the development of any such actual implementation, numerous implementation-specific decisions must be made in order to achieve the developer's specific goals, such as compliance with application- and business-related constraints, and that these specific goals will vary from one implementation to another and from one developer to another. Moreover, it will be appreciated that such a development effort might be complex and time-consuming, but would nevertheless be a routine undertaking of engineering for those of ordinary skill in the art having the benefit of this disclosure.
In accordance with this disclosure, the components, process steps, and/or data structures described herein may be implemented using various types of operating systems, computing platforms, computer programs, and/or general purpose machines. In addition, those of ordinary skill in the art will recognize that devices of a less general purpose nature, such as hardwired devices, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), or the like, may also be used without departing from the scope and spirit of the inventive concepts disclosed herein. Where a method comprising a series of process steps is implemented by a computer or a machine and those process steps can be stored as a series of instructions readable by the machine, they may be stored on a tangible medium such as a computer memory device (e.g., ROM (Read Only Memory), PROM (Programmable Read Only Memory), EEPROM (Electrically Erasable Programmable Read Only Memory), FLASH Memory, Jump Drive, and the like), magnetic storage medium (e.g., tape, magnetic disk drive, and the like), optical storage medium (e.g., CD-ROM, DVD-ROM, paper card, paper tape and the like) and other types of program memory.
The noise detector, also referred to as a noise activity detector (NAD), as disclosed herein is based upon the unique characteristics of noise as differentiated from the characteristics of other signals, in particular the characteristics of desired signals. Generally, it is applicable to the detection of periods when a signal is only noise, and is especially useful therefore in systems, such as noise reduction systems, where knowledge of noise-only periods is needed for their function. In particular, the arrangement disclosed herein is directed at reliable detection of periods with only acoustic noise in a mixed microphone input signal which may contain speech, wind and acoustic background noise. An alternate use is as a voice activity detector. More particularly, it is directed to use in voice grade communication systems and devices such as cellular telephones, Bluetooth® wireless headsets, voice command and control and automatic speech recognition, among others. For purposes of this discussion, three types of sound are identified: acoustic noise, wind noise, and voice.
It is instructive to note that the model curve (long-dashed line) crosses the average noise level line (short-dashed line) at what will be termed the effective frequency of slightly above 700 Hz. The significance of this effective frequency is explained in more detail below, as will be the manner in which the model curve is selected and constructed.
Assuming that the model curve (long-dashed line) has been properly determined so as to relatively accurately correspond to the typical noise power frequency characteristic shape, the average power for the model curve over the selected bandwidth of about 250 Hz to about 2,500 Hz is made to be equal to the actual average noise power in the measured data by raising or lowering the model curve until the two power averages are the same. This is accomplished mathematically by solving for the magnitude of the model which makes the average model power match the actual average measured power. The effective frequency at which the model and actual average power lines cross (i.e. are equal) then can be determined. In effect, the model curve passes through the average power line such that it creates equal areas between it and the average power line, above and below the effective frequency crossing point when plotted on a magnitude squared vs. frequency plot (not shown). It can be seen that for this data the −6 dB sloped model provides a close approximation of the noise data characteristic when they cross at approximately 700 Hz. Thus, 700 Hz is determined to be the effective frequency for this data.
It should be recognized that the shape of the measured data is dependent upon the characteristics of the specific signal pickup system. With other systems, a curved (non-straight) line may be a more appropriate model for the noise response of the system. For the data depicted in
From
By analyzing numerous noise signals measured with the system in which the system disclosed herein may be used, it was determined that when the model curve (a straight −6 dB/oct. line in this case) was set to equal the average measured noise signal power at 750 Hz, the model did indeed create a good approximation to all acoustic noise signals. However, whereas acoustic noise signals exhibit little deviation from the model, voice (discussed below) and wind noise both exhibit significant deviation from the model. As explained above, for purposes of this discussion, three types of sound are identified: acoustic noise, wind noise, and voice. Acoustic noise is generally a catch-all for all non-wind noise and non-voice sounds.
It can be seen from the plots, that while the noise data in
The distinction between low and high wind-induced noise is a relative concept; it can be seen that the plots are significantly different. Since wind “noise” is generated at the port(s) of the microphone, the transition wind speed between the results of
The noise activity detector (NAD) disclosed herein uses the characteristics described above to identify a signal and indicate when noise-only periods of the signal are present. There are myriad applications for such an operation—for example, it can be used to provide a control signal that gates other functions such as updating a noise template in a spectral subtraction process, updating an automatic microphone matching table, blocking an automatic gain circuit from raising the gain when only noise is present, and so on. The noise activity detector disclosed herein is described in the context of audio signals in a communication system. However, the process disclosed herein is not limited to single-channel, single-band applications, but is also applicable to multi-channel applications, as well as to multi-band applications. Since the process is performed in the frequency domain, selection of the frequency range over which it operates is simple, and additional implementations of the noise detector can be used for other frequency ranges. An example of such an application would be a multi-band spectral subtraction process in which it may be necessary to independently update the noise template for each band when there is only noise in the respective band, even though there may be voice- and/or wind-induced signal in other bands. The noise activity detector can also be used with multi-channel applications to provide an indication for each channel when its signal was only noise. Although for many multi-channel systems each input signal may be similar to the signals that the other sensors receive, there are many situations where that is not the case, such as for wind-induced noise, and noise generated mechanically at a port such as by physical contact with the operator's skin or with other objects.
As examples of possible applications, a control signal from a noise, applied to each channel of a multi-channel system, could be used for channel-specific spectral subtraction processes, and/or the signals from the noise detectors on the different channels could be combined to enable an automatic microphone matching process to compensate for variations in the sensitivities of multiple microphones. In the latter application, the channel-specific noise detectors will assure that the microphone matching does not match to noise present on a single channel.
With reference to
In an example embodiment, a communications audio signal with an 8 ksps (kilo-samples per second) sample rate is separated into 512-sample frames, windowed with a Hanning window, converted to the frequency domain using an FFT (Fast Fourier Transform), and a single sub-band consisting of the frequency bins between 250 Hz and 2,500 Hz is selected.
The resulting sub-band bin values are provided as input to NAD 20, the output of which is provided for subsequent control of a desired process associated with the particular communications application.
Block 16 represents the determination of the noise model and frequency process performed by the practitioner during the design of the system in which the noise detector is to be used, and is a function of the particular application. Typical noise, as sensed by the sensor system of the intended application, is analyzed for a curve fit using well known curve-fitting methods. The shape of the fitted mathematical curve is the noise model, and for example, in
Block 17 represents the determination of the critical bandwidth. The critical bandwidth is generally a contiguous range of frequencies that includes the range in which the data fits the model. In the signals of
The critical bandwidth, noise power model and effective frequency determination processes of blocks 16 and 17 may use the following steps:
The process of block 16 is described in more detail with reference to
Noise model determination is performed at step 26, together with a determination of the effective frequency at step 28. Steps 26 and 28 correspond to block 16 of
The determination of effective frequency FE (step 28) is also accomplished as mentioned above and described here more fully. After the shape of the noise power model 26 and the critical bandwidth 17 have been determined, the power model is mathematically integrated over the critical bandwidth to determine the average model power level. The frequency at which this level intersects the noise power model curve is the effective frequency FE.
Let the noise power model be defined as
P
NM(f)=α·SN(f) (1)
where PNM(f) is the noise power model, SN(f) is the noise power model shape function, f is frequency, and α is a magnitude scale factor to be determined. The shape model is integrated over the critical bandwidth and then divided by the critical bandwidth, BWC, to produce the average noise power model level.
Let the critical bandwidth of the sub-band be defined by its lower frequency boundary, flow, and its upper frequency boundary, fhi. In the exemplary case being discussed here, flow=200 and fhi=2500. Therefore
BW
C
=f
hi
−f
low (2)
and, the average noise power model level is
This average noise model power level will equal the value of the noise power model at the effective frequency FE. That is,
P
NM avg
=P
NM(FE) (4)
therefore, FE can be found by solving equation 4. As can be readily seen, model curves that are monotonic are preferred.
For the example case,
which is effectively about 700 Hz.
The above parameters of critical bandwidth, noise power model and effective frequency can all be predetermined during the design of the noise detector, and need not be calculated in real-time, thereby reducing the calculation power required for the operating system.
The real time operation of the noise activity detector (NAD) 20 is now described with reference to
At step 30 in
The define threshold function, Th(f), step 32 (and circuit 104) determines a dynamic frequency-dependent threshold using the noise power model, PNM(f), determined in step 26 and the effective frequency, FE, determined in step 28, by calculating the average power in the current frame of data and setting the level, α, of the model so that the average power level for the current frame is equal to the value of the model at the effective frequency, FE That is,
where PN avg is the current average power level. Thus, the threshold function for the ith frame of data is determined by circuit 104 and in step 32 as
Th
i(f)=ai·SN(f)=PNM i(f) (8)
Note that this threshold is not a single level and is not dependent upon prior frames of data, both of which are common in other such detectors. Because the threshold is immediate—that is, calculated for and used by only the current frame—the NAD 20 is able to follow rapid changes in background noise. Thus a dynamic modification of the frequency-dependent threshold function using the average energy is used.
The threshold function, Thi(f), is used to divide the spectral data of the current frame into two groups, those FFT frequency bins whose power data magnitudes are greater than the threshold, and those whose power data magnitudes are less than the threshold.
The logarithms of the energy averages EBELOW and EABOVE are each calculated in steps 38 and 40, and the resulting values optionally provided to filters that create a smoothing function across time by acting on the values from sequential frames. Log circuit 110 and filtering circuit 112 of
When desired, the filtering of steps 38 and 40 in an exemplary embodiment is performed with an exponential filter of the following form:
E
X avg=αx·(log(EY i)−log(EX i-1))+log(EX i-1) (9)
where EY is either EBELOW or EABOVE, αX is a time constant that determines the amount of smoothing where αX is between 0 and 1, and where a typical value may be 0.1. The subscript x denotes that αX may have different values for the ABOVE and BELOW cases. EY is the smoothed output signal, where Y can be ABV or BLW, designating which signal is being smoothed.
There is no limitation on the type and complexity of the smoothing filter(s), and many are known in the art. More complex smoothing filters can be used which can provide asymmetrical rise (attack) and fall (decay) time constants. Hangover is created when the ABOVE smoothed signal is able to move up faster than down, and the BELOW smoothed signal is able to move down faster than up.
The approach described above provides two signals that are similar in magnitude for a typical noise signal input to noise detector 20 so detection of the noise only portion of the input signal is simplified if one of these signals is offset from the other. During system design, an offset is determined by the practitioner in Determine Offset step 42, where the offset is slightly larger than the random variation in the two logarithmic signals when a noise signal is input to noise detector 20. This amount of offset then prevents false negative triggers of the noise detector, i.e. false indications that other-than-noise is present when indeed the input signal is only noise. Such false triggers do not create error in operation of the associated noise reduction or other process with which the noise detector is used, but it does slow the operation of some. The offset, therefore is meant to minimize this effect. The offset, which may be a negative number, is added to the output of log & filter step 40 in add offset step 44. Just as well, the add offset step could be after step 38 and the offset applied to the signal EAV-LO. In this case, in order to achieve the same result, the offset value determined in step 42 would have the opposite sign.
After offsetting one of the two signals, the resulting values are compared in the decision step 46 (circuit 114). Decision step 46 causes Set Noise Indicator step 48 (circuit 116) to set the NAD output to an “on” state indicating the presence of noise only if the output from step 38, EAV-LO, is greater than the output from step 44, EAV-HI. When EAV-LO is less than EAV-HI, decision test step 46 causes reset noise indicator step 50 to reset NAD output to an “off” state indicating the presence of other-than-noise in the input signal. An alternate embodiment uses an offset value dependent upon whether the NAD output is currently on or off, and in this way hysteresis can be incorporated into the NAD switching for applications where it is desirable to have a more stable NAD output.
To illustrate the performance of this noise detector,
Across the top are numbered sections indicating the input signal characteristics at different times. Sections (1) and (5) are periods of time with only silence and when noise detector 20 had no signal input. In this case, whichever state the noise detector indicates is acceptable since the input signal is neither noise nor non-noise, and a noise reduction system would have no input noise to reduce.
Section (2) is a period during which the signal input to the noise detector 20 was clean voice in quiet ambient conditions. A short period at the end of this second section has only normal room ambient sound with no voice. The noise detector properly handled this relatively easy condition, indicating the presence of the voice as non-noise and yet detecting the absence of voice during noise only periods. The system used for the plot of
Section (3) consists of very loud (85 dB SPL) input noise only sound that was a mixture of music, single loud voice and voice babble from multiple directions. Here it can be seen that the noise detector indicates mostly noise only, but also creates non-noise indications as a result of the single loud background voice even though the SNR for the background voice is less than −10 dB.
In section (4) nearby voice speech was added to the noise from Section (3), with the added voice SNR being approximately −3 dB. As designed, the NAD output shows that the noise-only periods are correctly indicated while during voicing, the NAD correctly indicates non-noise. Correct operation at such low input SNR levels shows the capability of this new noise/voice detector.
While embodiments and applications have been shown and described, it would be apparent to those skilled in the art having the benefit of this disclosure that many more modifications than mentioned above are possible without departing from the inventive concepts disclosed herein. The invention, therefore, is not to be restricted except in the spirit of the appended claims.
This application claims the benefit of U.S. Provisional Patent Application No. 60/965,854, filed on Aug. 22, 2007, entitled “Noise Detector”, the disclosure of which is hereby incorporated by reference for all purposes.
Number | Date | Country | |
---|---|---|---|
60965854 | Aug 2007 | US |