This application is a National Stage Entry of PCT/JP2014/058961 filed on Mar. 27, 2014, which claims priority from Japanese Patent Application JP2013-083411 filed on Apr. 11, 2013, the contents of all of which are incorporated herein by reference, in their entirety.
The present invention relates to a technique of suppressing noise with a non-stationary component.
In the above technical field, patent literature 1 discloses a technique of reducing wind noise by separating an input acoustic signal into low, middle, and high bands. In patent literature 1, a restored signal in the low band is generated from a middle-band component, a modified acoustic signal for the low band is generated by weighted sum of the restored signal and the original low-band signal, and a modified acoustic signal for the middle band is generated by reducing the signal level of the middle-band component. Lastly, the original high-band signal and each of the modified acoustic signals for the low and middle bands are combined to generate an enhanced signal.
Patent literature 2 discloses a technique of separating an input sound into low and high bands, and suppressing wind noise included in a low-band noisy speech signal in accordance with the probability of wind noise.
Either of the techniques described in patent literatures 1 and 2, however, simply suppresses wind noise by reducing the signal level of a speech signal in the low band, and is not an effective method as a method of suppressing non-stationary noise like wind noise. As a result, it is impossible to change an input sound into an easy-to-hear sound.
The present invention enables to provide a technique of solving the above-described problem.
One aspect of the present invention provides a signal processing apparatus comprising:
a transformer that transforms an input signal into an amplitude component signal in a frequency domain;
a stationary component estimator that estimates a stationary component signal having a frequency spectrum with a stationary characteristic based on the amplitude component signal in the frequency domain;
a replacement unit that generates a new amplitude component signal using the amplitude component signal obtained by the transformer and the stationary component signal, and replaces the amplitude component signal by the new amplitude component signal; and
an inverse transformer that inversely transforms the new amplitude component signal into an enhanced signal.
Another aspect of the present invention provides a signal processing method comprising:
transforming an input signal into an amplitude component signal in a frequency domain;
estimating a stationary component signal having a frequency spectrum with a stationary characteristic based on the amplitude component signal in the frequency domain;
generating a new amplitude component signal using the amplitude component signal obtained in the transform and the stationary component signal, and replacing the amplitude component signal by the new amplitude component signal; and
inversely transforming the new amplitude component signal into an enhanced signal.
Still other aspect of the present invention provides a signal processing program for causing a computer to execute a method, comprising:
transforming an input signal into an amplitude component signal in a frequency domain;
estimating a stationary component signal having a frequency spectrum with a stationary characteristic based on the amplitude component signal in the frequency domain;
generating a new amplitude component signal using the amplitude component signal obtained in the transform and the stationary component signal, and replacing the amplitude component signal by the new amplitude component signal; and
inversely transforming the new amplitude component signal into an enhanced signal.
According to the present invention, it is possible to change an input sound into an easy-to-hear sound.
Preferred embodiments of the present invention will now be described in detail with reference to the drawings. It should be noted that the relative arrangement of the components, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless it is specifically stated otherwise. Note that “speech signal” in the following explanation indicates a direct electrical change that occurs in accordance with the influence of speech or another sound. The speech signal transmits speech or another sound and is not limited to speech.
A signal processing apparatus 100 according to the first embodiment of the present invention will be described with reference to
The transformer 101 transforms an input signal 110 into an amplitude component signal 130 in a frequency domain.
The stationary component estimator 102 estimates a stationary component signal 140 having a frequency spectrum with a stationary characteristic based on the amplitude component signal 130 in the frequency domain. The replacement unit 103 generates a new amplitude component signal 150 using the amplitude component signal 130 and the stationary component signal 140, and replaces the amplitude component signal 130 by the new amplitude component signal 150. The inverse transformer 104 inversely transforms the new amplitude component signal 150 into an enhanced signal 160.
With the above arrangement, it is possible to suppress unpleasant non-stationary noise by replacing noise included in an input sound by stationary, easy-to-hear noise.
<<Overall Arrangement>>
A signal processing apparatus according to the second embodiment of the present invention will be described with reference to the accompanying drawings. The signal processing apparatus according to this embodiment, for example, appropriately suppresses non-stationary noise like wind noise. Simply speaking, in the frequency domain, a stationary component in an input sound is estimated, and part or all of the input sound is replaced by the estimated stationary component. The input sound is not limited to speech. For example, an environmental sound (noise on the street, the traveling sound of a train/car, an alarm/warning sound, a clap, or the like), a person's voice or animal's sound (chirping of a bird, barking of a dog, mewing of a cat, laughter, a tearful voice, a cheer, or the like), music, or the like may be used as an input sound. Note that speech is exemplified as a representative example of the input sound in this embodiment.
The stationary component estimator 202 estimates a stationary component included in the noisy signal amplitude spectrum |X(k, n)| supplied from the transformer 201, and generates a stationary component signal (stationary component spectrum) N(k, n).
The replacement unit 203 replaces the noisy signal amplitude spectrum |X(k, n)| supplied from the transformer 201 using the generated stationary component spectrum N(k, n), and transmits an enhanced signal amplitude spectrum |Y(k, n)| to the inverse transformer 204 as a replacement result.
The inverse transformer 204 inversely transforms the enhanced signal phase spectrum |Y(k, n)| supplied from the replacement unit 203 into a resultant signal by compositing the noisy signal phase spectrum 220 supplied from the transformer 201, and supplies the resultant signal to an output terminal 207 as an enhanced signal.
<<Arrangement of Transformer>>
Two successive frames may partially be overlaid (overlapped) and windowed. Assume that the overlap length is 50% the frame length. For t=0, 1, . . . , K−1, the windowing unit 212 outputs the left-hand sides of
A symmetric window function is used for a real signal. The window function is designed to make the input signal and the output signal match with each other except a calculation error when the output of the transformer 201 is directly supplied to the inverse transformer 204. This means w2(t)+w2(t+K/2)=1.
The description will be continued below assuming an example in which windowing is performed for two successive frames that overlap 50%. As w(t), the windowing unit can use, for example, a Hanning window given by
Various window functions such as a Hamming window and a triangle window are also known. The windowed output is supplied to the Fourier transformer 213 and transformed into a noisy signal spectrum X(k, n). The noisy signal spectrum X(k, n) is separated into the phase and the amplitude. A noisy signal phase spectrum argX(k, n) is supplied to the inverse transformer 204, whereas the noisy signal amplitude spectrum |X(k, n)| is supplied to the stationary component estimator 202 and the replacement unit 203. As already described, a power spectrum may be used in place of the amplitude spectrum.
<<Arrangement of Inverse Transformer>>
Y(k,n)=|Y(k,n)|·exp(jargX(k,n)) (4)
where j represents an imaginary unit.
Inverse Fourier transform is performed for the obtained enhanced signal spectrum. The signal is supplied to the windowing unit 242 as a series of time domain sample values y(t, n) (t=0, 1, . . . , K−1) in which one frame includes K samples, and multiplied by the window function w(t). A signal obtained by windowing an nth frame enhanced signal y(t, n) (t=0, 1, . . . , K−1) by w(t) is given by the left-hand side of
The frame composition unit 243 extracts the outputs of two adjacent frames from the windowing unit 242 on the basis of K/2 samples, overlays them, and obtains an output signal (the left-hand side of equation (6)) for t=0, 1, . . . , K/2-1 by
{circumflex over (y)}(t,n)=
An obtained output signal 260 is transmitted from the frame composition unit 243 to the output terminal 207.
Note that the transform in the transformer 201 and the inverse transformer 204 in
The stationary component estimator 202 can estimate a stationary component after a plurality of frequency components obtained by the transformer 201 are integrated. The number of frequency components after integration is smaller than that before integration. More specifically, a stationary component spectrum common to an integrated frequency component obtained by integrating frequency components is obtained and commonly used for the individual frequency components belonging to the same integrated frequency component. As described above, when a stationary component signal is estimated after a plurality of frequency components are integrated, the number of frequency components to be applied becomes small, thereby reducing the total calculation amount.
(Definition of Stationary Component Spectrum)
The stationary component spectrum indicates a stationary component included in the input signal amplitude spectrum. A temporal change in power of the stationary component is smaller than that of the input signal. The temporal change is generally calculated by a difference or ratio. If the temporal change is calculated by a difference, when an input signal amplitude spectrum and a stationary component spectrum are compared with each other in a given frame n, there is at least one frequency k which satisfies
(|N(k,n−1)|−|N(k,n)|)2<(|X(k,n−1)|−|X(k,n)|2 (7)
Alternatively, if the temporal change is calculated by a ratio, there is at least one frequency k which satisfies
That is, if the left-hand side of the above expression is always higher than the right-hand side for all the frames n and frequencies k, it can be defined that N(k, n) is not a stationary component spectrum. Even if the functions are the indices, logarithms, or powers of X and N, the same definition can be given.
(Method of Deriving Stationary Component Spectrum)
Various estimation methods such as the methods described in non-patent literatures 1 and 2 can be used to estimate a stationary component spectrum.
For example, non-patent literature 1 discloses a method of obtaining, as an estimated noise spectrum, the average value of noisy signal amplitude spectra of frames in which no target sound is included. In this method, it is necessary to detect the target sound. A section where the target sound is included can be determined by the power of the enhanced signal.
As an ideal operation state, the enhanced signal is the target sound other than noise. In addition, the level of the target sound or noise does not largely change between adjacent frames. For these reasons, the enhanced signal level of an immediately preceding frame is used as an index to determine a noise section. If the enhanced signal level of the immediately preceding frame is equal to or smaller than a predetermined value, the current frame is determined as a noise section. A noise spectrum can be estimated by averaging the noisy signal amplitude spectra of frames determined as a noise section.
Non-patent literature 1 also discloses a method of obtaining, as an estimated noise spectrum, the average value of noisy signal amplitude spectra in the early stage in which supply of them has started. In this case, it is necessary to meet a condition that the target sound is not included immediately after the start of estimation. If the condition is met, the noisy signal amplitude spectrum in the early stage of estimation can be obtained as the estimated noise spectrum.
Non-patent literature 2 discloses a method of obtaining an estimated noise spectrum from the minimum value (minimum statistic) of the noisy signal amplitude spectrum. In this method, the minimum value of the noisy signal amplitude spectrum within a predetermined time is held, and a noise spectrum is estimated from the minimum value. The minimum value of the noisy signal amplitude spectrum is similar to the shape of a noise spectrum and can therefore be used as the estimated value of the noise spectrum shape. However, the minimum value is smaller than the original noise level. Hence, a spectrum obtained by appropriately amplifying the minimum value is used as an estimated noise spectrum.
In addition, an estimated noise spectrum may be obtained using a median filter. An estimated noise spectrum may be obtained by WiNE (Weighted Noise Estimation) as a noise estimation method of following changing noise by using the characteristic in which noise slowly changes.
The thus obtained estimated noise spectrum can be used as a stationary component spectrum.
(Spectrum Shape)
A function of obtaining an amplitude spectrum (replacement amplitude spectrum) used for replacement is not limited to a linear mapping function of N(k, n) represented by α(k, n)N(k, n). For example, a linear function such as α(k, n)N(k, n)+C(k, n) can be adopted. In this case, if C(k, n)>0, the level of the replacement amplitude spectrum can be improved as a whole, thereby improving the stationarity at the time of hearing. If C(k, n)<0, the level of the replacement amplitude spectrum can be decreased as a whole but it is necessary to adjust C(k, n) so a band in which the value of the spectrum becomes negative does not appear. In addition, the function of the stationary component spectrum N(k, n) represented in another form such as a high-order polynomial function or nonlinear function can be used.
Since the non-stationarity of wind noise is high, if it is attempted to estimate wind noise, the accuracy decreases, and the conventional noise estimation method cannot cope with wind noise. However, when a stationary component signal is generated by, for example, performing averaging in the frequency direction, and used to perform replacement, it is possible to change wind noise into a sound which is not unpleasant while ensuring the tracking capability.
(Coefficient α)
An empirically appropriate value is determined as the coefficient α(k, n) by which the stationary component signal N(k, n) is multiplied. For example, if α(k, n)=1, |Y(k, n)|=N(k, n) is obtained, and thus the stationary component signal N(k, n) is directly used as an output signal to the inverse transformer 104. At this time, if the stationary component signal N(k, n) is large, large noise unwantedly remains. To solve this problem, the coefficient α(k, n) may be determined so that the maximum value of the amplitude component to be output to the inverse transformer 104 is equal to or smaller than a predetermined value. For example, if α(k, n)=0.5, replacement is performed by a signal of half the power of the stationary component signal N(k, n). If α(k, n)=0.1, the sound becomes small, and has the same spectrum shape as that of the stationary component signal N(k, n).
For example, if an SNR (signal-to-noise ratio) is low, a target sound is small, and thus strong suppression may be performed by decreasing α(k, n). To the contrary, when the SNR is high, noise is small, and thus no replacement may be performed by setting α(k, n) to 1.
In addition, by considering that a sound is unpleasant when the high band is enhanced, a function of making α(k, n) sufficiently small when k is equal to or larger than a threshold, or a monotone decreasing function of k, which becomes smaller as k increases, may be used.
According to this embodiment, since it is possible to make the noise component of the output signal stationary, the sound quality improves, as compared with the conventional techniques. Note that the replacement unit 203 may replace an amplitude component on a sub-band basis in place of a frequency basis.
A signal processing apparatus according to the third embodiment of the present invention will be described with reference to
The comparator 631 compares a noisy signal amplitude spectrum |X(k, n)| with a first threshold obtained by calculating a stationary component spectrum N(k, n) by a linear mapping function as the first function. In this embodiment, a case in which comparison is performed with a representative constant multiple among linear mapping functions, that is, a multiple of α1(k, n) will be explained. If the amplitude (power) component |X(k, n)| is larger than the multiple of α1(k, n) of the stationary component signal N(k, n), the higher amplitude replacement unit 632 performs replacement by a replacement amplitude spectrum, that is, the multiple, serving as the second function, of α2(k, n) of the stationary component signal N(k, n); otherwise, the spectrum shape is directly used as an output signal |Y(k, n)| of the replacement unit 603. That is, if |X(k, n)|>α1(k, n)N(k, n), |Y(k, n)|=α2(k, n)N(k, n) is obtained; otherwise, |Y(k, n)|=|X(k, n)| is obtained.
A method of calculating a spectrum to be used for comparison with the noisy signal amplitude spectrum |X(k, n)| is not limited to the method using the linear mapping function of the stationary component spectrum N(k, n). For example, a linear function like α1(k, n)N(k, n)+C(k, n) can be adopted. In this case, if C(k, n)<0, a band where replacement is performed by the stationary component signal increases, and it is thus possible to largely suppress unpleasant non-stationary noise. In addition, the function of the stationary component spectrum N(k, n) represented in another form such as a high-order polynomial function or nonlinear function can be used.
This is effective when a variation in input signal is large in a frequency band in which power is larger than the threshold α1(k, n)N(k, n) obtained by multiplying the stationary component signal by the predetermined coefficient. On the other hand, since it is possible to maintain the naturalness in a band in which power is smaller than the threshold α1(k, n)N(k, n) obtained by multiplying the stationary component signal by the predetermined coefficient, the sound quality improves.
To cope with this, it is possible to replace the spectrum by a spectrum with higher stationarity by setting α1(k, n)>α2(k, n) before and after time t3, as shown in the lower portion of
At each time, α2(k, n) can be obtained according to a procedure of (1)→(2) below.
(1) A short-time moving average X_bar(k, n) (k and n are indices corresponding to the frequency and time, respectively) of the input signal is calculated in advance by, for example, |X_bar(k, n)|=(|X(k, n−2)|+|X(k, n−1)|+|X(k, n)|+|X(k, n+1)|+|X(k, n+2)|)/5. (2) The difference between the short-time moving average (|X_bar(k, n)|) and a value (α2(k, n)−N(k, n)) after replacement is calculated, and if the difference is large, the value of α2(k, n) is changed to decrease the difference. If the changed value is represented by α2_hat(k, n), the following methods may be used as a change method. (a) α2_hat(k, n)=0.5·α2(k, n) is uniformly set (constant multiplication is performed by a predetermined value). (b) α2_hat(k, n)=|(X_bar(k, n)|/|N(k, n)| is set (calculation is performed using |X_bar(k, n)| and |N(k, n)|). (c) α2_hat(k, n)=0.8·|X_bar(k, n)|/|N(k, n)|+0.2 is set (same as above).
However, a method of obtaining α2(k, n) is not limited to the above-described one. For example, α2(k, n) which is a constant value regardless of the time may be set in advance. In this case, the value of α2(k, n) may be determined by actually hearing a processed signal. That is, the value of α2(k, n) may be determined in accordance with the characteristics of a microphone and a device to which the microphone is attached.
For example, when the following condition is met, the coefficient α2(k, n) may be obtained by dividing the short-time moving average |X_bar(k, n)| by the stationary component signal |N(k, n)| before and after time n using equations 1 to 3, and the input signal |X(k, n)| may be replaced by the short-time moving average |X_bar(k, n)| as a result. When the following condition is not met, α2(k, n)=α1(k, n) may be set.
|X(k,n)|>α1(k,n)·N(k,n) and α1(k,n)·N(k,n)−|X_bar(k,n)|>δ Condition:
α2(k,n−1)=|X_bar(k,n)|/N(k,n) Equation 1:
α2(k,n)=|X_bar(k,n)|/N(k,n) Equation 2:
α2(k,n+1)=|X_bar(k,n)|/N(k,n) Equation 3:
As described above, in the stationary component signal N(k, n), if it is impossible to prevent a “spike” of the amplitude component signal within a short time, it is possible to perform replacement using the short-time moving average, thereby improving the sound quality.
A signal processing apparatus according to the fourth embodiment of the present invention will be described with reference to
The comparator 931 compares a noisy signal amplitude spectrum |X(k, n)| with a multiple (second threshold), serving as the third function, of β1(k, n) of a stationary component signal N(k, n). If the amplitude (power) component |X(k, n)| is smaller than the multiple of β1(k, n) of the stationary component signal N(k, n), the lower amplitude replacement unit 932 performs replacement by a multiple, serving as the fourth function, of β2(k, n) of the stationary component signal N(k, n); otherwise, the spectrum shape is directly used as an output signal |Y(k, n)| of the replacement unit 903. That is, if |X(k, n)|>β1(k, n)N(k, n), |Y(k, n)|=β2(k, n)N(k, n) is obtained; otherwise, |Y(k, n)|=|X(k, n)| is obtained.
This is effective when a variation in input signal is large in a frequency band in which power is smaller than the threshold β1(k, n)N(k, n) obtained by multiplying the stationary component signal by the predetermined coefficient multiple. On the other hand, since it is possible to maintain the naturalness in a band in which power is smaller than the threshold β1(k, n)N(k, n) obtained by multiplying the stationary component signal by the predetermined coefficient, the sound quality improves.
To cope with this, it is possible to replace the spectrum by a spectrum with higher stationarity by setting β1(k, n)<β2(k, n) before and after time n=t5, as shown in the lower portion of
At each time, β(k, n) can be obtained according to a procedure of (1)→(2) below.
(1) A short-time moving average X_bar(k, n) (k and n are indices corresponding to the frequency and time, respectively) of the input signal is calculated in advance by, for example, X_bar(k, n)=(X(k, n−2)+X(k, n−1)+X(k, n)+X(k, n+1)+X(k, n+2))/5. (2) The difference between the short-time moving average (X_bar(k, n)) and a value (β2(k, n)·N(k, n)) after replacement is calculated, and if the difference is large, the value of β2(k, n) is changed to decrease the difference. If the changed value is represented by β2_hat(k, n), the following methods may be used as a change method. (a) β2_hat(k, n)=0.5·β2(k, n) is uniformly set (constant multiplication is performed by a predetermined value). (b) β2_hat(k, n)=(X_bar(k, n)/N(k, n) is set (calculation is performed using X_bar(k, n) and N(k, n)). (c) β2_hat(k, n)=0.8·X_bar(k, n)/N(k, n)+0.2 (same as above).
However, a method of obtaining β2(k, n) is not limited to the above-described one. For example, β2(k, n) which is a constant value regardless of the time may be set in advance. In this case, the value of β2(k, n) may be determined by actually hearing a processed signal. That is, the value of β2(k, n) may be determined in accordance with the characteristics of a microphone and a device to which the microphone is attached.
For example, when the following condition is met, the coefficient β2(k, n) may be obtained by dividing the short-time moving average |X_bar(k, n)| by the stationary component signal N(k, n) before and after time n using equations 1 to 3, and the input signal |X(k, n)| may be replaced by the short-time moving average |X_bar(k, n)| as a result. When the following condition is not met, β2(k, n)=β1(k, n) may be set.
|X(k,n)|>β1(k,n)·N(k,n) and β1(k,n)·N(k,n)−|X_bar(k,n)|>δ Condition:
β2(k,n−1)=X_bar(k,n)/N(k,n) Equation 1:
β2(k,n)=X_bar(k,n)/N(k,n) Equation 2:
β2(k,n+1)=X_bar(k,n)/N(k,n) Equation 3:
As described above, in the stationary component signal N(k, n), if it is impossible to prevent a “spike” of the amplitude component within a short time, it is possible to perform replacement using the short-time moving average, thereby improving the sound quality.
A signal processing apparatus according to the fifth embodiment of the present invention will be described with reference to
The first comparator 1231 compares a noisy signal amplitude spectrum |X(k, n)| with a multiple (third threshold), serving as the fifth function, of α1(k, n) of a stationary component signal N(k, n). If the amplitude (power) component |X(k, n)| is larger than the multiple of α1(k, n) of the stationary component signal N(k, n), the higher amplitude replacement unit 1232 performs replacement by a multiple, serving as the sixth function, of α2(k, n) of the stationary component signal N(k, n); otherwise, the spectrum shape is directly used as an output signal |Y1(k, n)| to the second comparator 1233. That is, if |X(k, n)|>α1(k, n)N(k, n), |Y1(k, n)|=α2(k, n)N(k, n) is obtained; otherwise, |Y1(k, n)|=|X(k, n)| is obtained.
On the other hand, the second comparator 1233 compares the output signal |Y1(k, n)| from the lower amplitude replacement unit 1234 with a multiple (fourth threshold), serving as the seventh function, of β1(k, n) of the stationary component signal N(k, n). If the output signal |Y1(k, n)| from the higher amplitude replacement unit 1232 is smaller than the multiple of β1(k, n) of the stationary component signal N(k, n), the lower amplitude replacement unit 1234 performs replacement by a multiple, serving as the eighth function, of β2(k, n) of the stationary component signal N(k, n); otherwise, the spectrum shape is directly used as an output signal |Y2(k, n)|. That is, if |Y1(k, n)|<β1(k, n)N(k, n), |Y2(k, n)|=β2(k, n)N(k, n) is obtained; otherwise, |Y1(k, n)|=|Y2(k, n)| is obtained.
This is effective when a variation in input signal is large in a frequency band in which power is larger than the threshold α1(k, n)N(k, n) obtained by multiplying the stationary component signal by the predetermined coefficient and a frequency band in which power is smaller than the threshold β1(k, n)N(k, n).
A signal processing apparatus according to the sixth embodiment of the present invention will be described with reference to
If the amplitude (power) component |X(k, n)| is larger than a multiple of α1(k, n) of the stationary component signal N(k, n), the higher amplitude replacement unit 1432 performs replacement by a multiple of α2(k, n) of the amplitude component X(k, n); otherwise, the spectrum shape is directly used as an output signal |Y(k, n)| of the replacement unit 1403. That is, if |X(k, n)|>α1(k, n)N(k, n), |Y(k, n)|=α2(k, n)|X(k, n)| is obtained; otherwise, |Y(k, n)|=|X(k, n)| is obtained.
This is effective when a variation in input signal is large in a frequency band in which power is larger than the threshold α1(k, n)N(k, n) obtained by multiplying the stationary component signal by the predetermined coefficient and when the characteristic of the spectrum shape preferably remains as much as possible in an output signal. For example, it is effective to perform the processing according to this embodiment in a speech section when it is desirable to perform speech recognition while suppressing wind noise. On the other hand, since it is possible to maintain the naturalness in a band in which power is smaller than the threshold α1(k, n)N(k, n) obtained by multiplying the stationary component signal by the predetermined coefficient, the sound quality improves.
A signal processing apparatus according to the seventh embodiment of the present invention will be described with reference to
This is effective when a variation in input signal is large in a frequency band in which power is larger than a threshold α1(k, n)N(k, n) obtained by multiplying the stationary component signal by a predetermined coefficient and a frequency band in which power is smaller than a threshold β1(k, n)N(k, n) and when the characteristic of the spectrum shape preferably remains as much as possible in an output signal.
A signal processing apparatus according to the eighth embodiment of the present invention will be described with reference to
The speech detector 1701 determines, on a frequency basis, whether speech is included in a noisy signal amplitude spectrum |X(k, n)|. The replacement unit 1703 replaces the noisy signal amplitude spectrum |X(k, n)| at a frequency at which no speech is included by using a stationary component spectrum N(k, n). That is, if the output of the speech detector 1701 is 1 or it is determined that speech is included, |Y(k, n)|=α(k, n)N(k, n) is obtained. If the output of the speech detector 1701 is 0 or it is determined that no speech is included, |Y(k, n)|=|X(k, n)| is obtained.
According to this embodiment, since replacement is performed using the stationary component signal N(k, n) at a frequency except for that at which speech is included, it is possible to avoid a distortion of speech caused by suppression.
A signal processing apparatus according to the ninth embodiment of the present invention will be described with reference to
The speech detector 1801 calculates a probability p(k, n) that speech is included in a noisy signal amplitude spectrum |X(k, n)| on a frequency basis where p(k, n) is a real number of 0 (inclusive) to 1 (inclusive). The replacement unit 1803 replaces the noisy signal amplitude spectrum |X(k, n)| using the speech presence probability p(k, n) and a stationary component signal N(k, n). By using, for example, a function α(p(k, n)) of p(k, n) ranging from 0 to 1, an output signal |Y(k, n)|=α(p(k, n))N(k, n)+(1−α(p(k, n)))|X(k, n)| may be obtained.
On the other hand, a time direction smoother 2004 smoothes the input amplitude component in the time direction. A frequency direction difference calculator 2005 calculates the difference between amplitude components at adjacent frequencies. An absolute value sum calculator 2006 calculates the sum of absolute differences between amplitude components calculated by the frequency direction difference calculator 2005.
A determiner 2007 derives the speech presence probability p(k, n) based on the sums of absolute values calculated by the absolute value sum calculators 2003 and 2006.
In each of
According to this embodiment, it is possible to make noise stationary in accordance with the speech presence probability, and suppress non-stationary noise like wind noise while effectively avoiding a distortion of speech and the like.
A signal processing apparatus according to the 10th embodiment of the present invention will be described with reference to
The higher amplitude replacement unit 2232 receives a speech detection flag (0/1) from a speech detector 1701. If the flag indicates non-speech and |X(k, n)|>α1(k, n)N(k, n), |Y(k, n)|=α2(k, n)N(k, n) is obtained; otherwise, |Y(k, n)|=|X(k, n)| is obtained.
This is effective when a variation in input signal is large in a frequency band in which power is larger than the threshold α1(k, n)N(k, n) obtained by multiplying a stationary component signal by a predetermined coefficient in a non-speech band. On the other hand, since it is possible to maintain the naturalness in a speech band or a band in which power is smaller than the threshold α1(k, n)N(k, n) obtained by multiplying the stationary component signal by the predetermined coefficient, the sound quality improves.
A signal processing apparatus according to the 11th embodiment of the present invention will be described with reference to
The lower amplitude replacement unit 2332 receives a speech detection flag (0/1) from a speech detector 1701. If the flag indicates non-speech and |X(k, n)|<β1(k, n)N(k, n), |Y(k, n)|=β2(k, n)N(k, n) is obtained; otherwise, |Y(k, n)|=|X(k, n)| is obtained.
This is effective when a variation in input signal is large in a frequency band in which power is smaller than the threshold β1(k, n)N(k, n) obtained by multiplying a stationary component signal by a predetermined coefficient in a non-speech band. On the other hand, since it is possible to maintain the naturalness in a speech band or a band in which power is larger than the threshold β1(k, n)N(k, n) obtained by multiplying the stationary component signal by the predetermined coefficient, the sound quality improves.
A signal processing apparatus according to the 12th embodiment of the present invention will be described with reference to
The higher amplitude replacement unit 2432 receives a speech detection flag (0/1) from a speech detector 1701. If the flag indicates non-speech and |X(k, n)|>α1(k, n)N(k, n), |Y1(k, n)|=α2(k, n)N(k, n) is obtained; otherwise, |Y1(k, n)|=|X(k, n)| is obtained. That is, if the amplitude (power) component |X(k, n)| is larger than a multiple of α1(k, n) of the stationary component signal |N(k, n)| in a non-speech section, the higher amplitude replacement unit 2432 performs replacement by a multiple of α2(k, n) of the stationary component signal |N(k, n)|; otherwise, the spectrum shape is directly used as an output signal |Y1(k, n)| to the second comparator 1233.
On the other hand, the lower amplitude replacement unit 2434 replaces, by a multiple of β2(k, n) of the stationary component signal N(k, n), the output signal only at a frequency at which the output signal |Y1(k, n)| from the higher amplitude replacement unit 2432 is smaller than the multiple of β2(k, n) of the stationary component signal N(k, n) in a non-speech section. At a frequency at which the output signal |Y1(k, n)| is larger than the multiple of β2(k, n), the spectrum shape is directly used as an output signal |Y2(k, n)|. That is, if |Y1(k, n)|<β1(k, n)N(k, n), |Y2(k, n)|=β2(k, n)N(k, n) is obtained; otherwise, |Y1(k, n)|=|Y2(k, n)| is obtained.
This is effective when a variation in input signal is large in a frequency band in which power is larger than the threshold α1(k, n)N(k, n) obtained by multiplying the stationary component signal by the predetermined coefficient and a frequency band in which power is smaller than the threshold β1(k, n)N(k, n) and when the characteristic of the spectrum shape preferably remains as much as possible in a speech section.
A signal processing apparatus according to the 13th embodiment of the present invention will be described with reference to
If the amplitude (power) component |X(k, n)| is larger than a multiple of α1(k, n) of the stationary component signal N(k, n) in a non-speech section, the higher amplitude replacement unit 2532 performs replacement by a multiple of α2(k, n) of the input amplitude component |X(k, n)|; otherwise, the spectrum shape is directly used as an output signal |Y1(k, n)| of the replacement unit 2503. That is, if |X(k, n)|>α1(k, n)N(k, n), |Y(k, n)|=α2(k, n)|X(k, n)| is obtained; otherwise, |Y(k, n)|=|X(k, n)| is obtained.
This is effective when a variation in input signal is large in a frequency band in which power is larger than the threshold α1(k, n)N(k, n) obtained by multiplying the stationary component signal by the predetermined coefficient and when the characteristic of the spectrum shape preferably remains as much as possible in an output signal. For example, when it is desirable to recognize speech in a speech section while suppressing wind noise in a non-speech section, even if a non-speech section is determined, the spectrum shape in a section where power is large remains. Therefore, even if speech presence/absence determination is wrong, it is possible to improve the speech recognition accuracy.
A signal processing apparatus according to the 14th embodiment of the present invention will be described with reference to
If the amplitude (power) component |X(k, n)| is larger than a multiple of α1(k, n) of a stationary component signal |N(k, n)| in a non-speech section, the higher amplitude replacement unit 2632 performs replacement by the multiple of α2(k, n) of the input amplitude component |X(k, n)|; otherwise, the spectrum shape is directly used as an output signal |Y1(k, n)| to the second comparator 1233. That is, if |X(k, n)|>α1(k, n)N(k, n), |Y1(k, n)|=α2(k, n)|X(k, n)| is obtained; otherwise, |Y1(k, n)|=|X(k, n)| is obtained.
This is effective when a variation in input signal is large in a frequency band in which power is larger than the threshold α1(k, n)N(k, n) obtained by multiplying the stationary component signal by the predetermined coefficient and when the characteristic of the spectrum shape preferably remains as much as possible in an output signal |Y2(k, n)|. For example, when it is desirable to recognize speech in a speech section while suppressing wind noise in a non-speech section, even if a non-speech section is determined, the spectrum shape in a section where power is large remains. Therefore, even if speech presence/absence determination is wrong, it is possible to improve the speech recognition accuracy.
A signal processing apparatus according to the 15th embodiment of the present invention will be described with reference to
The noise suppressor 2701 suppresses noise using a noisy signal amplitude spectrum |X(k, n)| supplied from a transformer 201 and a stationary component spectrum N(k, n) estimated by a stationary component estimator 202, and transmits an enhanced signal amplitude spectrum G(k, n)|X(k, n)| to the replacement unit 203 as a noise suppression result.
If G(k, n)|X(k, n)|>α1(k, n)N(k, n), the replacement unit 203 sets |Y(k, n)|=α2(k, n)N(k, n); otherwise, the replacement unit 203 sets |Y(k, n)|=G(k, n)|X(k, n)|.
A multiplier 2802 obtains the enhanced signal amplitude spectrum G(k, n)|X(k, n)| by multiplying the input signal |X(k, n)| by the gain G(k, n) obtained by the gain calculator 2801. The replacement unit 203 replaces the enhanced signal amplitude spectrum G(k, n)|X(k, n)| by a multiple of a coefficient α(k, n) of the stationary component spectrum N(k, n) in accordance with a condition.
According to this embodiment, it is possible to make a signal after noise suppression stationary in accordance with a condition, and suppress other noise while effectively suppressing noise such as wind noise with a strong non-stationary component.
A signal processing apparatus according to the 16th embodiment of the present invention will be described with reference to
In this embodiment, in the replacement unit 2903, non-stationary noise is suppressed by replacement while suppressing noise using a gain.
The gain calculator 2935 calculates a gain G(k, n) using a noisy signal amplitude spectrum |X(k, n)| supplied from a transformer 201 and a stationary component spectrum N(k, n) estimated by a stationary component estimator 202. This calculation method may use a known noise suppression technique, similarly to the 15th embodiment.
The first comparator 2931 compares G(k, n)|X(k, n)| with α1(k, n)N(k, n). If G(k, n)|X(k, n)|>α1(k, n)N(k, n), the higher amplitude replacement unit 2932 sets G1(k, n)=α2(k, n)N(k, n)/|X(k, n)|; otherwise, the higher amplitude replacement unit 2932 sets G1(k, n)=G(k, n).
On the other hand, the second comparator 2933 compares G1(k, n)X(k, n) with β1(k, n)N(k, n). If G1(k, n)X(k, n)<β1(k, n)N(k, n), the lower amplitude replacement unit 2934 sets G2(k, n)=β2(k, n)N(k, n)/X(k, n); otherwise, the lower amplitude replacement unit 2934 sets G2(k, n)=G1(k, n).
Lastly, a multiplier 2936 multiplies the input amplitude spectrum |X(k, n)| by the gain G2(k, n), and outputs a replaced new amplitude spectrum G2(k, n)|X(k, n)|.
As described above, when the replacement unit 2903 performs gain calculation, and performs replacement processing using a gain, it is possible to make a signal after noise suppression stationary in accordance with a condition, and suppress other noise while effectively suppressing noise such as wind noise with a strong non-stationary component.
A signal processing apparatus according to the 17th embodiment of the present invention will be described with reference to
In accordance with a speech detection result (0/1 or speech presence probability p) by the speech detector 1701, a replacement unit 3003 replaces a noise suppression result G(k, n)|X(k, n)| by a noise suppressor by a multiple of a coefficient α(k, n) of a stationary component signal N(k, n) from a stationary component estimator 202. The replacement unit 3003 may have the arrangement described in each of the ninth to 14th embodiments.
In addition, for example, a noise suppressor 2701 may calculate an MMSE STSA gain function value G(k, n) for each frequency band based on a speech presence probability p(k, n) output from the speech detector 1701 by using the technique described in patent literature 3, multiply an input signal |X(k, n)| by the MMSE STSA gain function value, and obtain an enhanced signal G(k, n)|X(k, n)|, thereby outputting the enhanced signal to the replacement unit 3003.
According to this embodiment, it is possible to make signal after noise suppression stationary in accordance with a speech detection result, and output clear speech while effectively suppressing noise such as wind noise with a strong non-stationary component and other noise.
The signal processing apparatus according to each of the above-described embodiments is applicable to suppression of wind noise at the time of video shooting or voice recording, a vehicle passing sound (car/bullet train), a helicopter sound, noise on the street, cafeteria noise, office noise, the rustle of a dress, and the like. Note that the present invention is not limited to this, and is applicable to any signal processing apparatus required to suppress a non-stationary noise from an input signal.
Note that the present invention is not limited to the above-described embodiments. The arrangement and details of the present invention can variously be modified without departing from the spirit and scope thereof, as will be understood by those skilled in the art. The present invention also incorporates a system or apparatus that combines different features included in the embodiments in any form.
The present invention may be applied to a system including a plurality of devices or a single apparatus. The present invention is also applicable even when a signal processing program for implementing the functions of the embodiments is supplied to the system or apparatus directly or from a remote site. Hence, the present invention also incorporates the program installed in a computer to implement the functions of the present invention by the computer, a medium storing the program, and a WWW (World Wide Web) server that causes a user to download the program. In particular, the present invention incorporates a non-transitory computer readable medium storing a program for causing a computer to execute processing steps included in the above-described embodiments.
As an example, a processing procedure executed by a CPU 3102 provided in a computer 3100 when the speech processing explained in the first embodiment is implemented by software will be described below with reference to
An input signal is transformed into an amplitude component signal in the frequency domain (S3101). Based on the amplitude component signal in the frequency domain, a stationary component signal having a frequency spectrum with a stationary characteristic is estimated (S3103). A new amplitude component signal is generated using the input amplitude component signal and the stationary component signal (S3105). The amplitude component signal is replaced by the new amplitude component signal (S3107). In addition, the new amplitude component signal is inversely transformed into an enhanced signal (S3109).
Program modules for executing these processes are stored in a memory 3104. When the CPU 3102 sequentially executes the program modules stored in the memory 3104, it is possible to obtain the same effects as those in the first embodiment.
Similarly, as for the second to 17th embodiments, when a CPU 3102 executes program modules corresponding to the functional components described with reference to the block diagrams from the memory 3104, it is possible to obtain the same effects as those in the embodiments.
[Other Expressions of Embodiments]
Some or all of the above-described embodiments can also be described as in the following supplementary notes but are not limited to the followings.
(Supplementary Note 1)
There is provided a signal processing apparatus comprising:
a transformer that transforms an input signal into an amplitude component signal in a frequency domain;
a stationary component estimator that estimates a stationary component signal having a frequency spectrum with a stationary characteristic based on the amplitude component signal in the frequency domain;
a replacement unit that generates a new amplitude component signal using the amplitude component signal obtained by the transformer and the stationary component signal, and replaces the amplitude component signal by the new amplitude component signal; and
an inverse transformer that inversely transforms the new amplitude component signal into an enhanced signal.
(Supplementary Note 2)
There is provided the signal processing apparatus according to supplementary note 1, wherein the replacement unit generates the new amplitude component signal based on a function of the stationary component signal at at least some frequencies.
(Supplementary Note 3)
There is provided the signal processing apparatus according to supplementary note 1 or 2, wherein the replacement unit generates the new amplitude component signal by multiplying the stationary component signal by a coefficient at at least some frequencies.
(Supplementary Note 4)
There is provided the signal processing apparatus according to supplementary note 1, 2, or 3, wherein the replacement unit generates the new amplitude component signal based on a second function of the stationary component signal at a frequency at which the amplitude component signal is larger than a first threshold determined based on a first function of the stationary component signal.
(Supplementary Note 5)
There is provided the signal processing apparatus according to supplementary note 4, wherein the replacement unit includes
a comparator that compares the first threshold and the amplitude component signal, and
a higher amplitude replacement unit that generates the new amplitude component signal based on the second function of the stationary component signal at a frequency at which the amplitude component signal is larger than the first threshold, and directly obtains, as the new amplitude component signal, the amplitude component signal obtained by the transformer at a frequency at which the amplitude component signal is not larger than the first threshold.
(Supplementary Note 6)
There is provided the signal processing apparatus according to supplementary note 4, wherein the replacement unit includes
a comparator that compares the amplitude component signal with a multiple, serving as the first threshold, of a first coefficient of the stationary component signal, and
a higher amplitude replacement unit that obtains, as the new amplitude component signal, a multiple, serving as the second function, of a second coefficient of the stationary component signal when the amplitude component signal is larger than the multiple of the first coefficient of the stationary component signal, and directly obtains, as the new amplitude component signal, the amplitude component signal obtained by the transformer when the amplitude component signal is not larger than the multiple of the first coefficient of the stationary component signal.
(Supplementary Note 7)
There is provided the signal processing apparatus according to any one of supplementary notes 1 to 6, wherein the replacement unit generates the new amplitude component signal based on a fourth function of the stationary component signal at a frequency at which the amplitude component signal is smaller than a second threshold determined based on a third function of the stationary component signal.
(Supplementary Note 8)
There is provided the signal processing apparatus according to any one of supplementary notes 1 to 7, wherein the replacement unit includes
a comparator that compares the second threshold and the amplitude component signal, and
a higher amplitude replacement unit that generates the new amplitude component signal based on the second function of the stationary component signal at a frequency at which the amplitude component signal is larger than the second threshold, and directly obtains, as the new amplitude component signal, the amplitude component signal obtained by the transformer at a frequency at which the amplitude component signal is not larger than the second threshold.
(Supplementary Note 9)
There is provided the signal processing apparatus according to supplementary note 7, wherein the replacement unit includes
a comparator that compares the amplitude component signal with a multiple, serving as the second threshold, of a third coefficient of the stationary component signal, and
a lower amplitude replacement unit that obtains, as the new amplitude component signal, a multiple of a fourth coefficient of the stationary component signal when the amplitude component signal is smaller than the multiple of the third coefficient of the stationary component signal, and directly obtains, as the new amplitude component signal, the amplitude component signal obtained by the transformer when the amplitude component signal is not smaller than the multiple of the third coefficient of the stationary component signal.
(Supplementary Note 10)
There is provided the signal processing apparatus according to any one of supplementary notes 1 to 9, wherein the replacement unit
generates the new amplitude component signal based on a sixth function of the stationary component signal at a frequency at which the amplitude component signal is larger than a third threshold determined based on a fifth function of the stationary component signal, and replaces the amplitude component signal by the new amplitude component signal, and
generates the new amplitude component signal based on an eighth function of the stationary component signal at a frequency at which the amplitude component signal is smaller than a fourth threshold determined based on a seventh function of the stationary component signal, and replaces the amplitude component signal by the new amplitude component signal, and
the third threshold is not smaller than the fourth threshold.
(Supplementary Note 11)
There is provided the signal processing apparatus according to supplementary note 10, wherein the replacement unit includes
a first comparator that compares the amplitude component signal with a multiple, serving as the third threshold, of a fifth coefficient of the stationary component signal,
a higher amplitude replacement unit that replaces the amplitude component signal using a multiple of a sixth coefficient of the stationary component signal as the new amplitude component signal when the amplitude component signal is larger than the multiple of the fifth coefficient of the stationary component signal, and directly obtains, as the new amplitude component signal, the amplitude component signal obtained by the transformer when the amplitude component signal is not larger than the multiple of the fifth coefficient of the stationary component signal,
a second comparator that compares the multiple, serving as the fourth threshold, of the sixth coefficient of the stationary component signal with the new amplitude component signal output from the higher amplitude replacement unit, and
a lower amplitude replacement unit that further replaces the new amplitude component signal obtained by the higher amplitude replacement unit using a multiple of a seventh coefficient of the stationary component signal when the new amplitude component signal output from the higher amplitude replacement unit is smaller than the multiple of the sixth coefficient of the stationary component signal, and directly outputs the new amplitude component signal obtained by the higher amplitude replacement unit when the amplitude component signal is not smaller than the multiple of the sixth coefficient of the stationary component signal.
(Supplementary Note 12)
There is provided the signal processing apparatus according to supplementary note 1, wherein the replacement unit includes
a comparator that compares the amplitude component signal with a multiple of a seventh coefficient of the stationary component signal, and
a higher amplitude replacement unit that replaces the amplitude component signal using a multiple of an eighth coefficient of the amplitude component signal as the new amplitude component signal when the amplitude component signal is larger than the multiple of the seventh coefficient of the stationary component signal, and directly obtains, as the new amplitude component signal, the amplitude component signal obtained by the transformer when the amplitude component signal is not larger than the multiple of the seventh coefficient of the stationary component signal.
(Supplementary Note 13)
There is provided the signal processing apparatus according to supplementary note 1, wherein the replacement unit includes
a first comparator that compares the amplitude component signal with a multiple of a ninth coefficient of the stationary component signal,
a higher amplitude replacement unit that replaces the amplitude component signal using a multiple of a 10th coefficient of the amplitude component signal as the new amplitude component signal when the amplitude component signal is larger than the multiple of the ninth coefficient of the stationary component signal, and directly obtains, as the new amplitude component signal, the amplitude component signal obtained by the transformer when the amplitude component signal is not larger than the multiple of the ninth coefficient of the stationary component signal,
a second comparator that compares the new amplitude component signal output from the higher amplitude replacement unit with a multiple of an 11th coefficient of the stationary component signal, and
a lower amplitude replacement unit that further replaces the new amplitude component signal obtained by the higher amplitude replacement unit using a multiple of a 12th coefficient of the stationary component signal when the amplitude component signal is smaller than the multiple of the 11th coefficient of the stationary component signal, and outputs the new amplitude component signal obtained by the higher amplitude replacement unit when the amplitude component signal is not smaller than the multiple of the 11th coefficient of the stationary component signal.
(Supplementary Note 14)
There is provided the signal processing apparatus according to any one of supplementary notes 1 to 13, further comprising:
a speech detector that detects speech from the amplitude component signal,
wherein the replacement unit replaces the amplitude component signal obtained by the transformer in a non-speech section.
(Supplementary Note 15)
There is provided the signal processing apparatus according to any one of supplementary notes 1 to 13, further comprising:
a speech detector that generates a speech presence probability from the amplitude component signal,
wherein the replacement unit replaces the amplitude component signal obtained by the transformer so that the amplitude component signal becomes closer to the stationary component signal as the speech presence probability is lower in the frequency domain.
(Supplementary Note 16)
There is provided the signal processing apparatus according to any one of supplementary notes 1 to 15, further comprising:
a noise suppressor that suppresses noise included in the amplitude component signal,
wherein the replacement unit generates a new amplitude component signal using the stationary component signal and an enhanced amplitude component signal obtained by the noise suppressor, and replaces the amplitude component signal by the new amplitude component signal.
(Supplementary Note 17)
There is provided a signal processing method comprising:
transforming an input signal into an amplitude component signal in a frequency domain;
estimating a stationary component signal having a frequency spectrum with a stationary characteristic based on the amplitude component signal in the frequency domain;
inversely transforming the new amplitude component signal into an enhanced signal.
(Supplementary Note 18)
There is provided a signal processing program for causing a computer to execute a method, comprising:
transforming an input signal into an amplitude component signal in a frequency domain;
estimating a stationary component signal having a frequency spectrum with a stationary characteristic based on the amplitude component signal in the frequency domain;
generating a new amplitude component signal using the amplitude component signal obtained in the transform and the stationary component signal, and replacing the amplitude component signal by the new amplitude component signal; and
inversely transforming the new amplitude component signal into an enhanced signal.
This application claims the benefit of Japanese Patent Application No. 2013-83411 filed on Apr. 11, 2013, which is hereby incorporated by reference herein in its entirety.
Number | Date | Country | Kind |
---|---|---|---|
2013-083411 | Apr 2013 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2014/058961 | 3/27/2014 | WO | 00 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2014/168021 | 10/16/2014 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
6122384 | Mauro | Sep 2000 | A |
20040049383 | Kato et al. | Mar 2004 | A1 |
20040185804 | Kanamori | Sep 2004 | A1 |
20060271362 | Kato et al. | Nov 2006 | A1 |
20100014681 | Sugiyama | Jan 2010 | A1 |
20100296665 | Ishikawa et al. | Nov 2010 | A1 |
20110081026 | Ramakrishnan | Apr 2011 | A1 |
20120288116 | Saito | Nov 2012 | A1 |
20130010974 | Nakadai et al. | Jan 2013 | A1 |
20130246056 | Sugiyama | Sep 2013 | A1 |
Number | Date | Country |
---|---|---|
101627428 | Jan 2010 | CN |
102549659 | Jul 2012 | CN |
2002-204175 | Jul 2002 | JP |
2003-058186 | Feb 2003 | JP |
2004-187283 | Jul 2004 | JP |
2006-337415 | Dec 2006 | JP |
2009-055583 | Mar 2009 | JP |
2010-271411 | Dec 2010 | JP |
2012-239017 | Dec 2012 | JP |
2013-020252 | Jan 2013 | JP |
2008111462 | Sep 2008 | WO |
2011041738 | Apr 2011 | WO |
2012070668 | May 2012 | WO |
Entry |
---|
Masanori Kato et al., Noise Suppression with High Speech Quality Based on Weighted Noise Estimation and MMSE STSA, Electronics and Communications in Japan, Part 3, vol. 89, No. 2, Jan. 1, 2006, pp. 43-53, XP-001236340. |
Extended European Search Report for EP Application No. EP14783172.1 dated Nov. 23, 2016. |
M. Kato et al., “Noise suppression with high speech quality based on weighted noise estimation and MMSE STSA,” IEICE Trans. Fundamentals (Japanese Edition), Jul. 2004, pp. 851-860, vol. J87-A, No. 7, IEICE, Japan, Cited in the Specification. |
R. Martin, “Spectral subtraction based on minimum statistics,” EUSPICO-94, Sep. 1994, pp. 1182-1185, Aachen, Germany, Cited in the Specification. |
“IEEE Transactions on Acoustics, Speech, and Signal Processing”, Dec. 1984, pp. 1109-1121, vol. 32, No. 6, IEEE, NJ, USA, Cited in the Specification. |
3GPP TS 26.094 V5.0.0 (Jun. 2002), “Technical Specification Group Services and System Aspects; Mandatory speech codec speech processing functions; Adaptive Multi-Rate (AMR) speech codec;Voice Activity Detector (VAD) (Release 5)”, Jun. 2002, Valbonne, France, Cited in the Specification. |
3GPP TS 26.194 V5.0.0 (Mar. 2001), “Technical Specification Group Services and System Aspects; Speech Codec speech processing functions; AMR Wideband speech codec; Voice Activity Detector (VAD) (Release 5)”, Mar. 2001, Valbonne, France, Cited in the Specification. |
S. Nordholm et al., “Statistical Voice Activity Detection Using Low-Variance Spectrum Estimation and an Adaptive Threshold”, IEEE Transactions on Audio, Speech, and Language Processing, Mar. 2006, pp. 412-424, vol. 14, No. 2, IEEE, NJ, USA, Cited in the Specification. |
K. Li et al., “An Improved Voice Activity Detection Using Higher Order Statistics,” IEEE Transactions on Speech and Audio Processing, Sep. 2005, pp. 965-974, vol. 13, No. 5, IEEE, NJ, USA, Cited in the Specification. |
Shingo Kuroiwa et al., “Wind Noise Reduction Method Using the Observed Spectrum Fine Structure and Estimated Spectrum Envelope”, 2006 International Conference on Communication Technology, Jan. 1, 2007, pp. 1-12, vol. J90-A, No. 1, IEEE, NJ, USA, Cited in ISR. |
International Search Report for PCT Application No. PCT/JP2014/058961, dated Jul. 1, 2014. |
Sugiyama, “Single-Channel Impact-Noise Suppression With no Auxiliary Information for its Detection”, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, Dec. 2007, pp. 127-130. (Cited in JPOA). |
Japanese Office Action for JP Application No. 2015-511204 dated Apr. 3, 2018 with English Translation. |
Chinese Office Action for CN Application No. 201480020786.1 dated Jun. 26, 2018 with English Translation. |
Chinese Office Action for CN Application No. 201480020786.1 dated Mar. 1, 2019 with English Translation. |
Number | Date | Country | |
---|---|---|---|
20160055863 A1 | Feb 2016 | US |