1. Field of the Invention
The invention generally relates to noise suppression.
2. Background
Modern communication devices often include a primary sensor (e.g., a primary microphone) for detecting speech of a user and a reference sensor (e.g., a reference microphone) for detecting noise that may interfere with accuracy of the detected speech. A signal that is received by the primary sensor is referred to as a primary signal. In practice, the primary signal usually includes a speech component (e.g., a user's speech) and a noise component (e.g., background noise). A signal that is received by the reference sensor is referred to as a reference signal. The reference signal usually includes reference noise (e.g., background noise), which may be combined with the primary signal to provide a speech signal that has a reduced noise component, as compared to the primary signal.
For example, a communication device may include a dual-channel adaptive noise canceller that is configured to approximate a transfer function between a primary sensor and a reference sensor. In accordance with this example, the noise canceller may filter a reference signal and subtract reference noise that is included in the reference signal from a primary signal to provide a speech signal. The speech signal is intended to be an accurate representation of a speech component that is included in the primary signal.
However, the speech signal often includes residual noise. Many techniques for decreasing the residual noise of the speech signal involve estimating the noise power spectrum of the speech signal. These techniques traditionally average the speech signal over non-speech portions thereof (i.e., portions of the speech signal in which speech is not present). For instance, a voice activity detector (VAD) is usually used to indicate which portions of the speech signal do not include speech. However, detection reliability of a VAD may decrease substantially for low input signal-to-noise ratios (SNRs) and/or for speech signals having relatively weak speech components. Moreover, the number of presumable non-speech portions of the speech signal may not be sufficient for a noise estimator to accurately estimate the noise power spectrum of the speech signal. For instance, an insufficient number of non-speech portions may limit the ability of the noise estimator to track a varying noise power spectrum.
A system and/or method for providing noise estimation using an adaptive smoothing factor based on a Teager energy ratio in a multi-channel noise suppression system, substantially as shown in and/or described in connection with at least one of the figures, as set forth more completely in the claims.
The accompanying drawings, which are incorporated herein and form part of the specification, illustrate embodiments of the present invention and, together with the description, further serve to explain the principles involved and to enable a person skilled in the relevant art(s) to make and use the disclosed technologies.
The features and advantages of the disclosed technologies will become more apparent from the detailed description set forth below when taken in conjunction with the drawings, in which like reference characters identify corresponding elements throughout. In the drawings, like reference numbers generally indicate identical, functionally similar, and/or structurally similar elements. The drawing in which an element first appears is indicated by the leftmost digit(s) in the corresponding reference number.
The following detailed description refers to the accompanying drawings that illustrate example embodiments of the present invention. However, the scope of the present invention is not limited to these embodiments, but is instead defined by the appended claims. Thus, embodiments beyond those shown in the accompanying drawings, such as modified versions of the illustrated embodiments, may nevertheless be encompassed by the present invention.
References in the specification to “one embodiment,” “an embodiment,” “an example embodiment,” or the like, indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Furthermore, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to implement such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.
Various approaches are described herein for, among other things, providing noise estimation using an adaptive smoothing factor based on a Teager energy ratio in a multi-channel noise suppression system. A Teager energy ratio is a ratio of an average Teager energy operator (TEO) energy of a first signal to an average TEO energy of a second signal.
The average TEO energy of a signal is defined by the equation:
In Equation 1, Ēsignal represents the average TEO energy of the signal x(n), and N represents the number of samples (a.k.a. frames) of the signal x(n). N may be any positive integer (e.g., 3, 10, 51, 80, 152, etc.).
In accordance with the noise suppression techniques described herein, the average TEO energies of the respective first and second signals are calculated using Equation 1. The average TEO energy of the first signal is divided by the average TEO energy of the second signal to provide a ratio of the average TEO energy of the first signal to the average TEO energy of the second signal.
In accordance with some example embodiments, the first signal is a primary signal that is received at a primary sensor (e.g., a primary microphone), and the second signal is a reference signal that is received at a reference sensor (e.g., a reference microphone). For instance, these embodiments may process the primary signal based on the ratio of the average TEO energy of the primary signal to the average TEO energy of the reference signal to provide a speech signal that includes less noise than the primary signal.
In accordance with other example embodiments, the first signal is a speech signal, and the second signal is a noise signal. For instance, these embodiments may process the speech signal based on the ratio of the average TEO energy of the speech signal to the average TEO energy of the noise signal to provide an output signal that includes less noise than the speech signal.
An example system is described that includes a first constraint module, a second constraint module, an adaptive speech filter, and an adaptive noise filter. The first constraint module is configured to determine a value of a first speech indicator to indicate whether a primary signal includes speech according to a first determination technique. The second constraint module is configured to determine a value of a second speech indicator to indicate whether the primary signal includes speech according to a second determination technique that is different from the first determination technique. At least one of the first constraint module or the second constraint module is configured to utilize a ratio of an average TEO energy of the primary signal to an average TEO energy of a reference signal to determine a respective at least one of the first speech indicator or the second speech indicator. The adaptive speech filter is configured to filter the primary signal based on the first speech indicator and a noise signal to provide a speech signal. The adaptive noise filter is configured to filter the reference signal based on the second speech indicator and the speech signal to provide the noise signal.
Another example system is described that includes an energy calculator, a factor calculator, and a single-channel noise suppressor. The energy calculator is configured to calculate an average TEO energy of a speech signal and an average TEO energy of a noise signal. The energy calculator is further configured to calculate a ratio of the average TEO energy of the speech signal to the average TEO energy of the noise signal. The factor calculator is configured to calculate an adaptive smoothing factor that is based on the ratio. The single-channel noise suppressor is configured to estimate a noise power spectrum of the speech signal based on the smoothing factor.
Yet another example system is described that includes the first and second example systems. For instance, an output of the first example system may be coupled to an input of the second example system, such that the second example system estimates the noise power spectrum of the speech signal that is provided by the first example system.
An example method is described for suppressing noise. In accordance with this example method, a value of a first speech indicator is determined to indicate whether a primary signal includes speech using a first determination technique. A value of a second speech indicator is determined to indicate whether the primary signal includes speech using a second determination technique. The second determination technique is different from the first determination technique. At least one of the first determination technique or the second determination technique utilizes a ratio of an average TEO operator energy of the primary signal to an average TEO energy of a reference signal. The primary signal is filtered using an asymmetric crosstalk resistant adaptive noise canceller (ACTRANC) based on the first speech indicator and a noise signal to provide a speech signal. The reference signal is filtered using the ACTRANC based on the second speech indicator and the speech signal to provide the noise signal.
Another example method is described for suppressing noise. In accordance with this example method, an average TEO energy of a speech signal is calculated. An average TEO energy of a noise signal is calculated. A ratio of the average TEO energy of the speech signal to the average TEO energy of the noise signal is calculated. An adaptive smoothing factor is determined that is based on the ratio. A noise power spectrum of the speech signal is estimated based on the smoothing factor.
The noise suppression techniques described herein have a variety of benefits as compared to conventional noise suppression techniques. For instance, the techniques described herein may reduce distortion of a primary or speech signal and/or suppress noise (e.g., background noise, babble noise, etc.) that is associated with the primary or speech signal more than conventional techniques. The use of multiple constraint modules having different decision rules may increase the accuracy of determinations regarding whether a primary signal and/or a reference signal includes speech. For instance, the constraint modules may provide more accurate determinations than voice activity detectors (VADs) that are often included in conventional noise suppression systems.
Using an adaptive smoothing factor that is based on a Teager energy ratio to estimate noise may allow for continuous updating of the noise power spectrum frame-by-frame (e.g., regardless whether the frames include speech), rather than updating only during speech-inactive periods as is common with VADs. Speech-inactive periods are periods during which speech does not occur. Accordingly, using such an adaptive smoothing factor may avoid errors that are commonly introduced by VADs because the changes of the noise may continue to be tracked during active speech periods. Comparing speech and noise signals at an output of an ACTRANC, for example, rather than using a VAD or comparing primary and reference signals at an input of the ACTRANC, to determine the smoothing factor may provide more accurate detection of speech in situations that are characterized by weak speech, low input signal-to-noise ratios (SNRs), and/or substantial speech leakage to the reference sensor. Moreover, using TEO energy may enhance the discriminability between speech and noise signals.
By positioning primary sensor 104 so that it is closer to the user's mouth than reference sensor 106 during regular use, a magnitude of the user's speech that is detected by primary sensor 104 is likely to be greater than a magnitude of the user's speech that is detected by reference sensor 106. Furthermore, a magnitude of background noise that is detected by primary sensor 104 is likely to be less than a magnitude of the background noise that is detected by reference sensor 106. Example techniques for suppressing noise with respect to a user's speech are described in greater detail in the following discussion.
Primary sensor 104 and reference sensor 106 are shown to be positioned on the respective front and back portions of wireless communication device 102 in respective
One reference sensor 106 is shown in
As shown in
ACTRANC 304 is configured to process the primary signal P(n) and the reference signal R(n) to provide the speech signal e1(n) and a noise signal e2(n). ACTRANC 304 includes a delay module 308, an adaptive speech filter 310A, and an adaptive noise filter 310B. Delay module 308 is configured to delay the primary signal P(n) with respect to the reference signal R(n). For example, leakage of the speech component of the primary signal P(n) onto the reference signal R(n) may not occur instantaneously. In accordance with this example, leakage of the speech component of the primary signal P(n) onto the reference signal R(n) may be delayed by a time period that corresponds to a difference between a duration of time it takes for the primary signal P(n) to travel from a user's mouth to primary sensor 302A and a duration of time it takes for the primary signal P(n) to travel from the user's mouth to reference sensor 302B.
Adaptive speech filter 310A is configured to filter the primary signal P(n) based on the noise signal e2(n) and a first speech indicator that is received from first constraint module 306A to provide the speech signal e1(n). Accordingly, adaptive speech filter 310A adaptively removes noise from the speech signal e1(n). Adaptive speech filter 310A includes a combiner 312A and a first filter module 314A. Combiner 312A subtracts a first intermediate signal y1(n) from the primary signal P(n) to provide the speech signal e1(n). First filter module 314A manipulates the noise signal e2(n) based on the speech signal e1(n) and the first speech indicator to provide the first intermediate signal y1(n).
First filter module 314A may be configured to determine whether to update coefficient(s) of a transfer function of first filter module 314A based on a value of the first speech indicator. For example, if the first speech indicator has a first value, first filter module 314A updates the coefficient(s) of its transfer function. In accordance with this example, if the first speech indicator has a second value, first filter module 314A does not update the coefficient(s) of its transfer function. For instance, the first value may indicate that the primary signal P(n) does not include speech, and the second value may indicate that the primary signal P(n) includes speech. In accordance with an example embodiment, first filter module 314A updates the coefficient(s) of its transfer function if and only if the value of the first speech indicator indicates that the primary signal P(n) does not include speech.
A volume change or a change of the user's distance from primary sensor 302A may affect whether the coefficient(s) of the transfer function are updated. For instance, if the volume of the user's speech decreases or the distance of the user's mouth to primary sensor 302A increases, filter module 314A may increase the coefficient(s) of the transfer function.
Adaptive noise filter 310B is configured to filter the reference signal R(n) based on the speech signal e1(n) and a second speech indicator that is received from second constraint module 306B to provide the noise signal e2(n). Accordingly, adaptive noise filter 310B adaptively removes speech from the noise signal e2(n). Adaptive noise filter 310B includes a combiner 312B and a second filter module 314B. Combiner 312B subtracts a second intermediate signal y2(n) from the reference signal R(n) to provide the noise signal e2(n). Second filter module 314B manipulates the speech signal e1(n) based on the noise signal e2(n) and the second speech indicator to provide the second intermediate signal y2(n). For instance, second filter module 314B may be configured to reduce and/or eliminate crosstalk with respect to the primary signal.
Second filter module 314B may be configured to determine whether to update coefficient(s) of a transfer function of second filter module 314B based on a value of the second speech indicator. For example, if the second speech indicator has a third value, second filter module 314B updates the coefficient(s) of its transfer function. In accordance with this example, if the second speech indicator has a fourth value, second filter module 314B does not update the coefficient(s) of its transfer function. For instance, the third value may indicate that the primary signal P(n) includes speech, and the fourth value may indicate that the primary signal P(n) does not include speech. In accordance with an example embodiment, second filter module 314B updates the coefficient(s) of its transfer function if and only if the value of the second speech indicator indicates that the primary signal P(n) includes speech.
First filter module 314A and second filter module 314B may be configured to update coefficients of their transfer functions using any suitable technique, including but not limited to a normalized least mean square technique, a recursive least square technique, an adaptive filtering technique that utilizes an adaptive step size, etc. For instance, using an adaptive step size may increase the rate of convergence for updating the coefficients. In an example embodiment, a normalized least mean square technique is used with a filter length of sixty-four samples and step sizes of 0.009 and 0.01 for the respective first and second filter modules 314A and 314B, though the example embodiments are not limited in this respect.
First constraint module 306A is configured to process the primary signal P(n) and the reference signal R(n) in accordance with a first technique to determine whether the primary signal P(n) includes speech. Upon making the determination, first constraint module 306A provides the first speech indicator to first filter module 314A for processing as described above. The value of the first speech indicator indicates whether the primary signal P(n) includes speech, as determined in accordance with the first technique. Further detail regarding example functionality and structure of first constraint module 306A is described below with reference to respective
Second constraint module 306B is configured to process the primary signal P(n) and potentially the reference signal R(n) in accordance with a second technique to determine whether the primary signal P(n) includes speech. Upon making the determination, second constraint module 306B provides a second speech indicator to second filter module 314B for processing as described above. The value of the second speech indicator indicates whether the primary signal P(n) includes speech, as determined in accordance with the second technique. Further detail regarding example functionality and structure of second constraint module 306B is described below with reference to
As shown in
At step 404, a value of a second speech indicator is determined to indicate whether the primary signal includes speech using a second determination technique that is different from the first determination technique. At least one of the first determination technique or the second determination technique utilizes a ratio of an average Teager energy operator (TEO) energy of the primary signal to an average TEO energy of a reference signal. In an example implementation, second constraint module 306A determines the value of the second speech indicator to determine whether the primary signal P(n) includes speech using the second determination technique.
At step 406, the primary signal is filtered using an asymmetric crosstalk resistant adaptive noise canceller based on the first speech indicator and a noise signal to provide a speech signal. In an example implementation, ACTRANC 304 filters the primary signal. For instance, adaptive speech filter 310A may filter the primary signal P(n) based on the first speech indicator and noise signal e2(n) to provide speech signal e1(n).
At step 408, the reference signal is filtered using the asymmetric crosstalk resistant adaptive noise canceller based on the second speech indicator and the speech signal to provide the noise signal. In an example implementation, ACTRANC 304 filters the reference signal. For instance, adaptive noise filter 310B may filter reference signal R(n) based on the second speech indicator and the speech signal e1(n) to provide the noise signal e2(n).
As shown in
where P(n) represents the primary signal, and N represents the number of samples of the primary signal P(n). In an example implementation, energy calculator 602 calculates the average TEO energy of the primary signal.
At step 504, an average TEO energy of a reference signal is calculated. For example, using Equation 1, the average TEO energy of the reference signal may be represented by the equation:
where R(n) represents the reference signal, and N represents the number of samples of the reference signal R(n). In an example implementation, energy calculator 602 calculates the average TEO energy of the reference signal.
At step 506, a ratio of the average TEO energy of the primary signal to the average TEO energy of the reference signal is calculated. For example, the ratio of the average TEO energy of the primary signal to the average TEO energy of the reference signal may be represented by the equation:
In an example implementation, energy calculator 602 calculates the ratio of the average TEO energy of the primary signal to the average TEO energy of the reference signal.
At step 508, a determination is made whether the ratio is less than a noise threshold. A noise threshold is a representative magnitude below which speech is considered to be absent from a signal. For example, the ratio being less than the noise threshold may indicate that the primary signal does not include speech. In accordance with this example, the ratio being greater than the noise threshold may indicate that the primary signal includes speech. In an example implementation, comparison module 604 determines whether the ratio is less than the noise threshold. If the ratio is less than the noise threshold, flow continues to step 510. Otherwise, flow continues to step 512.
At step 510, a speech indicator having a first value is provided to an adaptive speech filter. The first value indicates that filter coefficient(s) of a transfer function of the adaptive speech filter are to be updated. In an example implementation, indicator module 606 provides the speech indicator to the adaptive speech filter. For instance, indicator module 606 may determine that the speech indicator is to have the first value in response to the primary signal not including speech.
At step 512, a speech indicator having a second value is provided to an adaptive speech filter. The second value indicates that filter coefficient(s) of a transfer function of the adaptive speech filter are not to be updated. The second value is different from the first value. In an example implementation, indicator module 606 provides the speech indicator to the adaptive speech filter. For instance, indicator module 606 may determine that the speech indicator is to have the second value in response to the primary signal including speech.
In an example embodiment, first constraint module 600 is configured to compare the ratio to a leakage threshold. The leakage threshold denotes the amount of the speech component of the primary signal that leaks onto the reference signal. In accordance with this example embodiment, first constraint module 600 is further configured to update the noise threshold to take into consideration a first proportion of the ratio if the ratio is less than a leakage threshold and to take into consideration a second proportion of the ratio if the ratio is greater than the leakage threshold. The second proportion is different from the first proportion.
For example, the noise threshold may be updated in accordance with Equations 5 and 6 below if the ratio is less than the leakage threshold.
Ē
n
thresh
new=α×(Ēn
Ē
n
thresh
=ρ×Ē
n
thresh
new (Equation 6)
where Ēn
In accordance with this example, the noise threshold may be updated in accordance with Equations 7 and 8 below if the ratio is greater than the leakage threshold.
Ē
n
thresh
new=β×(Ēn
Ē
n
thresh
=ρ×Ē
n
thresh
new (Equation 8)
where 0<β<1. In accordance with one example implementation, (β=0.999, though the scope of the example embodiments is not limited in this respect.
As shown in
At step 704, a determination is made whether the average TEO energy of the primary signal is greater than a primary threshold. For example, the average TEO energy of the primary signal being greater than the primary threshold may indicate that the primary signal includes speech. In accordance with this example, the average TEO energy of the primary signal being less than the primary threshold may indicate that the primary signal does not include speech. In an example implementation, comparison module 804 determines whether the average TEO energy of the primary signal is greater than the primary threshold. If the average TEO energy of the primary signal is greater than the primary threshold, flow continues to step 706. Otherwise, flow continues to step 718.
In an example embodiment, second constraint module 800 is configured to update the primary threshold to take into consideration the average TEO energy of the primary signal. For example, the primary threshold may be updated in accordance with Equation 9 below.
Ē
p
thresh
new=αTG×(Ēp
where Ēp
At step 706, an average TEO energy of a reference signal is calculated. In an example implementation, energy calculator 802 calculates the average TEO energy of the reference signal.
At step 708, a ratio of the average TEO energy of the primary signal to the average TEO energy of the reference signal is calculated. In an example implementation, energy calculator 802 calculates the ratio of the average TEO energy of the primary signal to the average TEO energy of the reference signal.
At step 710, a determination is made whether the ratio is greater than a speech threshold. A speech threshold is a representative magnitude above which a signal is considered to include speech. For example, the ratio being greater than the speech threshold may indicate that the primary signal includes speech. In accordance with this example, the ratio being less than the speech threshold may indicate that the primary signal does not include speech. In an example implementation, comparison module 804 determines whether the ratio is greater than the speech threshold. If the ratio is greater than the speech threshold, flow continues to step 712. Otherwise, flow continues to step 718.
In an example embodiment, second constraint module 800 is configured to update the speech threshold to take into consideration a first proportion of the ratio if the ratio is less than a leakage threshold and to take into consideration a second proportion of the ratio if the ratio is greater than the leakage threshold. The second proportion is different from the first proportion.
For example, the speech threshold may be updated in accordance with Equations 10 and 11 below if the ratio is less than the leakage threshold.
Ē
s
thresh
new=α×(Ēs
Ē
s
thresh
=ρ×Ē
s
thresh
new (Equation 11)
where Ēs
In accordance with this example, the speech threshold may be updated in accordance with Equations 12 and 13 below if the ratio is greater than the leakage threshold.
Ē
s
thresh
new=β×(Ēs
Ē
s
thresh
=ρ×Ē
s
thresh
new (Equation 13)
where 0<β<1. In accordance with one example implementation, β=0.999, though the scope of the example embodiments is not limited in this respect.
At step 712, a maximum correlation is determined between the primary signal and instances of the reference signal that correspond to respective time instances that include a time instance to which the primary signal corresponds. In an example implementation, correlation module 806 determines the maximum correlation between the primary signal and the instances of the reference signal. An example technique to determine a maximum correlation between a primary signal and instances of a reference signal is described below with reference to
At step 714, a determination is made whether the maximum correlation is greater than a correlation threshold. For example, the maximum correlation being greater than the correlation threshold may indicate that the primary signal includes speech. In accordance with this example, the maximum correlation being less than the correlation threshold may indicate that the primary signal does not include speech. In one example embodiment, the correlation threshold is equal to 0.65, though the scope of the example embodiments is not limited in this respect. In an example implementation, comparison module 804 determines whether the maximum correlation is greater than the correlation threshold. If the maximum correlation is greater than the correlation threshold, flow continues to step 716. Otherwise, flow continues to step 718.
At step 716, a speech indicator having a first value is provided to an adaptive noise filter. The first value indicates that filter coefficient(s) of a transfer function of the adaptive noise filter are to be updated. In an example implementation, indicator module 808 provides the speech indicator to the adaptive noise filter. For instance, indicator module 808 may determine that the speech indicator is to have the first value in response to the primary signal including speech.
At step 718, a speech indicator having a second value is provided to an adaptive noise filter. The second value indicates that filter coefficient(s) of a transfer function of the adaptive noise filter are not to be updated. In an example implementation, indicator module 808 provides the speech indicator to the adaptive noise filter. For instance, indicator module 808 may determine that the speech indicator is to have the second value in response to the primary signal not including speech.
In some example embodiments, one or more steps 702, 704, 706, 708, 710, 712, 714, 716, and/or 718 of flowchart 700 may not be performed. Moreover, steps in addition to or in lieu of steps 702, 704, 706, 708, 710, 712, 714, 716, and/or 718 may be performed.
It will be recognized that second constraint module 800 may not include one or more of energy calculator 802, comparison module 804, correlation module 806, and/or indicator module 808. Furthermore, second constraint module 800 may include modules in addition to or in lieu of energy calculator 802, comparison module 804, correlation module 806, and/or indicator module 808. Moreover, server 500 may be implemented as one or more servers.
The correlations that correspond to the respective instances 902A-902N of the reference signal R(n) are compared to determine the maximum correlation between the primary signal and the instances 902A-902N. For instance, the maximum correlation may be compared to a correlation threshold to determine whether filter coefficient(s) of a transfer function of an adaptive noise filter are to be updated, as described above in step 714 of flowchart 700.
Example Matlab® code for implementing the example technique described with reference to
In this example code, fstart denotes the start of the current frame, and fend denotes the end of the current frame. SL and SR determine the length of a sliding window through which the reference signal R(n) is incremented. In an example embodiment, SL=−8, and SR=8. However, these example values are provided for illustrative purposes and are not intended to be limiting. It will be recognized that SL and SR may be any suitable values.
The technique depicted in
As shown in
As shown in
where e1(n) represents the speech signal, and N represents the number of samples of the speech signal e1(n). In an example embodiment, the sampling rate is eight kilohertz (kHz), though the scope of the example embodiments is not limited in this respect. The sampling rate may be any suitable rate. In an example implementation, energy calculator 1002 calculates the average TEO energy of the speech signal.
At step 1104, an average TEO energy of a noise signal is calculated. For example, using Equation 1, the average TEO energy of the noise signal may be represented by the equation:
where e2(n) represents the noise signal, and N represents the number of samples of the noise signal e2(n). In an example implementation, energy calculator 1002 calculates the average TEO energy of the noise signal.
At step 1106, a ratio of the average TEO energy of the speech signal to the average TEO energy of the noise signal is calculated. For example, the ratio of the average TEO energy of the speech signal to the average TEO energy of the noise signal may be represented by the equation:
In an example implementation, energy calculator 1002 calculates the ratio of the average TEO energy of the speech signal to the average TEO energy of the noise signal.
At step 1108, an adaptive smoothing factor that is based on the ratio is calculated. In an example implementation, factor calculator 1004 calculates the adaptive smoothing factor.
At step 1110, a noise power spectrum of the speech signal is estimated based on the smoothing factor. In an example implementation, single-channel noise suppressor 1008 estimates the noise power spectrum of the speech signal.
Sub-band module 1006 is configured to divide the speech signal into a plurality of sub-bands. For instance, each sub-band may correspond to a respective frame of the speech signal. Any one or more of the sub-bands may include speech. Speech may be absent from any one or more of the sub-bands. In accordance with an example embodiment, single-channel noise suppressor 1008 is configured to determine a plurality of noise power estimates that corresponds to the plurality of respective sub-bands based on the smoothing factor. In further accordance with this example embodiment, single-channel noise suppressor 1008 is configured to combine the plurality of noise power estimates to estimate the noise power spectrum of the speech signal. It will be recognized that factor calculator 1004 may calculate the smoothing factor in full-band or in sub-bands. For instance, the smoothing factor may include a plurality of sub-factors that corresponds to the plurality of sub-bands. In accordance with another example embodiment, multi-channel post processor 1000 does not include sub-band module 1006.
As shown in
In this example code, function [z] represents curve 1202. In an example embodiment, noise_thres=0.1, speech_thres=10, lower_thres=0.5, upper_thres=0.9999, alpha=0.4966, and beta=0.07. However, these example values are provided for illustrative purposes and are not intended to be limiting. It will be recognized that noise_thres, speech_thres, lower_thres, upper_thres, alpha, and beta may be any suitable values. For instance the values may depend on an extent of leakage of the speech signal onto the noise signal. Moreover, curve 1202 is provided for illustrative purposes and is not intended to be limiting. It will be recognized that the smoothing factor may be related to the ratio of the speech signal to the noise signal in any suitable manner. For instance, the smoothing factor may be linearly related to the ratio with respect to a range of values of the ratio.
As shown in
At step 1304, a second noise power estimate is determined based on the smoothing factor. The second noise power estimate corresponds to a second portion of the speech signal that does not include speech. In an example implementation, noise power estimator 1402 determines the second noise power estimate.
At step 1306, the first noise power estimate and the second noise power estimate are combined to estimate a noise power spectrum of the speech signal. In an example implementation, estimate combiner 1404 combines the first noise power estimate and the second noise power estimate to estimate the noise power spectrum of the speech signal.
The noise power spectrum of a speech signal may be estimated using a ratio of an average Teager energy operator (TEO) energy of the speech signal to an average TEO energy of a noise signal in any of a variety of ways. In accordance with one example technique for estimating the noise power spectrum, let x(n) and d(n) denote a speech signal and an uncorrelated additive noise signal, respectively, where n is a discrete-time index. The observed noisy signal y(n) is defined as the sum of the speech and uncorrelated additive noise signals. Accordingly, y(n) may be represented by the equation:
y(n)=x(n)+d(n). (Equation 17)
The observed noisy signal y(n) is divided into overlapping frames by the application of a window function and analyzed using a short-time Fourier transfer (STFT) in accordance with the following equation:
In Equation 18, k is a frequency bin index that indicates a designated sub-band of the observed noisy signal y(n); 1 is a time frame index that indicates a designated frame of the observed noisy signal y(n); h is an analysis window of size N; and M is a frame update step in time. Two hypotheses, H0(k,l) and H1(k,l), respectively indicate speech absence (i.e., VAD==0) and speech presence (i.e., VAD=1) in the lth frame of the kth sub-band of the observed noisy signal y(n). These hypotheses may be defined in accordance with Equations 19 and 20.
H
0(k,l):Y(k,l)=D(k,l) (Equation 19)
H
1(k,l):Y(k,l)=X(k,l)+D(k,l) (Equation 20)
In Equations 19 and 20, X(k,l) and D(k,l) represent the STFTs of the respective clean and noise signals. The variance of the noise in the kth sub-band may be denoted as:
λd(k,l)=E[|D(k,l)|2], Equation 21)
where E[|D(k,l)|2] represents the expectation (i.e., estimate) of the energy of the noise signal.
One technique that may be used to estimate the noise power spectrum of the input signal is to apply temporal recursive smoothing to the noisy measurement during periods of speech absence. Such a technique may be described using Equations 22 and 23.
H
0′(k,l):{circumflex over (λ)}d(k,l+1)=αd{circumflex over (λ)}d(k,l)+(1−αd)|Y(k,l)|2 (Equation 22)
H
1′(k,l):{circumflex over (λ)}d(k,l+1)={circumflex over (λ)}d(k,l) (Equation 23)
In Equations 22 and 23, αd is a fixed smoothing parameter, 0<αd<1, and
H0′ and H1′ designate hypothetical speech absence and presence, respectively. A distinction may be made between the hypotheses defined in Equations 19 and 20, which are used for estimating the clean speech, and the hypotheses defined in Equations 22 and 23, which control the adaptation of the noise spectrum. For instance, the fixed smoothing parameter αd of Equations 22 and 23 may be replaced with an adaptive smoothing factor f(RTEO
{circumflex over (λ)}d(k,l+1)=f(RTEO
where the adaptive smoothing factor f(RTEO
The example noise suppression techniques described herein may be employed with respect to any suitable noise suppression application, including but not limited to beam forming, adaptive noise cancellation, blind source separation (BSS), etc.
It will be recognized that a wireless communication device (e.g., wireless communication device 102) may include multi-channel noise suppression system 300, including any one or more of primary sensor 302A, reference sensor 302B, ACTRANC 304, first constrain module 306A, second constraint module 306B, delay module 308, adaptive speech filter 310A, adaptive noise filter 310B, combiner 312A, combiner 312B, first filter module 314A, second filter module 314B, energy calculator 602, comparison module 604, indicator module 606, energy calculator 802, comparison module 804, correlation module 806, and/or indicator module 808; and/or multi-channel post processor 1000, including any one or more of energy calculator 1002, factor calculator 1004, sub-band module 1006, single-channel noise suppressor 1008, noise power estimator 1402, and/or estimate combiner 1404. However, the embodiments described herein are not limited to wireless communication devices. For instance, any one or more of the aforementioned elements may be included in a non-wireless communication device.
It will be further recognized that ACTRANC 304, first constrain module 306A, second constraint module 306B, delay module 308, adaptive speech filter 310A, adaptive noise filter 310B, combiner 312A, combiner 312B, first filter module 314A, and second filter module 314B depicted in
For example, ACTRANC 304, first constrain module 306A, second constraint module 306B, delay module 308, adaptive speech filter 310A, adaptive noise filter 310B, combiner 312A, combiner 312B, first filter module 314A, second filter module 314B, energy calculator 602, comparison module 604, indicator module 606, energy calculator 802, comparison module 804, correlation module 806, indicator module 808, energy calculator 1002, factor calculator 1004, sub-band module 1006, single-channel noise suppressor 1008, noise power estimator 1402, and/or estimate combiner 1404 may be implemented as computer program code configured to be executed in one or more processors.
In another example, ACTRANC 304, first constrain module 306A, second constraint module 306B, delay module 308, adaptive speech filter 310A, adaptive noise filter 310B, combiner 312A, combiner 312B, first filter module 314A, second filter module 314B, energy calculator 602, comparison module 604, indicator module 606, energy calculator 802, comparison module 804, correlation module 806, indicator module 808, energy calculator 1002, factor calculator 1004, sub-band module 1006, single-channel noise suppressor 1008, noise power estimator 1402, and/or estimate combiner 1404 may be implemented as hardware logic/electrical circuitry.
For instance,
Computer 1800 also includes a primary or main memory 1808, such as a random access memory (RAM). Main memory has stored therein control logic 1824A (computer software), and data.
Computer 1800 also includes one or more secondary storage devices 1810. Secondary storage devices 1810 include, for example, a hard disk drive 1812 and/or a removable storage device or drive 1814, as well as other types of storage devices, such as memory cards and memory sticks. For instance, computer 1800 may include an industry standard interface, such as a universal serial bus (USB) interface for interfacing with devices such as a memory stick. Removable storage drive 1814 represents a floppy disk drive, a magnetic tape drive, a compact disk drive, an optical storage device, tape backup, etc.
Removable storage drive 1814 interacts with a removable storage unit 1816. Removable storage unit 1816 includes a computer useable or readable storage medium 1818 having stored therein computer software 1824B (control logic) and/or data. Removable storage unit 1816 represents a floppy disk, magnetic tape, compact disc (CD), digital versatile disc (DVD), Blue-ray disc, optical storage disk, memory stick, memory card, or any other computer data storage device. Removable storage drive 1814 reads from and/or writes to removable storage unit 1816 in a well known manner.
Computer 1800 also includes input/output/display devices 1804, such as monitors, keyboards, pointing devices, etc. For instance, input/output/display devices 1804 may include a primary sensor (e.g., primary sensor 302A) and/or a reference sensor (e.g., reference sensor 302B).
Computer 1800 further includes a communication or network interface 1820. Communication interface 1820 enables computer 1800 to communicate with remote devices. For example, communication interface 1820 allows computer 1800 to communicate over communication networks or mediums 1822 (representing a form of a computer useable or readable medium), such as local area networks (LANs), wide area networks (WANs), the Internet, cellular networks, etc. Network interface 1820 may interface with remote sites or networks via wired or wireless connections.
Control logic 1824C may be transmitted to and from computer 1800 via the communication medium 1822.
Any apparatus or manufacture comprising a computer useable or readable medium having control logic (software) stored therein is referred to herein as a computer program product or program storage device. This includes, but is not limited to, computer 1800, main memory 1808, secondary storage devices 1810, and removable storage unit 1816. Such computer program products, having control logic stored therein that, when executed by one or more data processing devices, cause such data processing devices to operate as described herein, represent embodiments of the invention.
Devices in which embodiments may be implemented may include storage, such as storage drives, memory devices, and further types of computer-readable media. Examples of such computer-readable storage media include a hard disk, a removable magnetic disk, a removable optical disk, flash memory cards, digital video disks, random access memories (RAMs), read only memories (ROM), and the like. As used herein, the terms “computer program medium” and “computer-readable medium” are used to generally refer to the hard disk associated with a hard disk drive, a removable magnetic disk, a removable optical disk (e.g., CDROMs, DVDs, etc.), zip disks, tapes, magnetic storage devices, micro-electromechanical systems-based (MEMS-based) storage devices, nanotechnology-based storage devices, as well as other media such as flash memory cards, digital video discs, RAM devices, ROM devices, and the like.
Such computer-readable storage media may store program modules that include computer program logic for ACTRANC 304, first constrain module 306A, second constraint module 306B, delay module 308, adaptive speech filter 310A, adaptive noise filter 310B, combiner 312A, combiner 312B, first filter module 314A, second filter module 314B, energy calculator 602, comparison module 604, indicator module 606, energy calculator 802, comparison module 804, correlation module 806, indicator module 808, energy calculator 1002, factor calculator 1004, sub-band module 1006, single-channel noise suppressor 1008, noise power estimator 1402, and/or estimate combiner 1404; flowchart 400 (including any one or more steps of flowchart 400), flowchart 500 (including any one or more steps of flowchart 500), flowchart 700 (including any one or more steps of flowchart 700), flowchart 1100 (including any one or more steps of flowchart 1100), and/or flowchart 1300 (including any one or more steps of flowchart 1300); and/or further embodiments described herein. Some example embodiments are directed to computer program products comprising such logic (e.g., in the form of program code or software) stored on any computer useable medium. Such program code, when executed in one or more processors, causes a device to operate as described herein.
The invention can be put into practice using software, firmware, and/or hardware implementations other than those described herein. Any software, firmware, and hardware implementations suitable for performing the functions described herein can be used.
While various embodiments have been described above, it should be understood that they have been presented by way of example only, and not limitation. It will be understood by those skilled in the relevant art(s) that various changes in form and details may be made to the embodiments described herein without departing from the spirit and scope of the invention as defined in the appended claims. Accordingly, the breadth and scope of the present invention should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.
This application claims the benefit of U.S. Provisional Application No. 61/254,032, filed Oct. 22, 2009, the entirety of which is incorporated by reference herein.
Number | Date | Country | |
---|---|---|---|
61254032 | Oct 2009 | US |