1. Field of the Invention
The present invention relates to a signal processing device for estimating a correlation matrix in such a manner to adapt to input signals.
2. Description of the Related Art
Correlation matrix estimation is used for MUSIC, BF (BeamFormer), or BSS (Blind Source Separation) (see N. Kikuma, Adaptive Signal Processing with Array Antenna, Kagaku Gijutsu Shuppan, Inc., 1999, and H. Saruwatari et al., IEEE Trans. on Speech and Audio Process, vol. 14(2), pp. 666-678, 2006). The correlation matrix Rxx is defined by Equation (1) for signal x(t) represented as an Nth-dimensional column vector using discrete time t as a variable.
Rxx=E[x(t)xH(t)], (1)
where H denotes a complex conjugate transposed matrix, and E(A(t)) represents an expected value or time average for A(t). Since Equation (1) includes the expected value operation, the calculation of the correlation matrix requires that the signal x(t) be known at all times t.
However, since the adaptive BF or adaptive BSS cannot use future signals and the signal varies over time, signals acquired up to the time of buffering are used to estimate the correlation matrix. Specifically, an estimated correlation matrix Rxx^(t) at time t is calculated using a window function w(t) for signal extraction according to Equation (2).
where * represents convolution. In the actual processing, a rectangular window having a certain length is often used as the window function w(t). In order to reduce the amount of calculation, a technique for calculating a spaced, estimated correlation matrix Rxx^(t) and a technique using calculated values based on only data in the current time have been proposed (see J. M. Valin et al., Proc. IEEE/RSJ Intelligent Robot and Systems, pp. 2123-2128, 2004, and Nakajima et al., Technical Report of IEICE, Vol. EA2007-30, pp. 19-24, 2007).
The accuracy of correlation matrix estimation varies depending on the kind of window. Here, two signals x1(t) and x2(t) are considered, which are respectively defined by Equations (3) and (4) in which the correlation value r is a (|a|≦1). For the sake of simplicity, signal power is normalized.
x1(t)=n1(t), (3)
x2(t)=an1(t)+(1−a2)1/2n2(t), (4)
where n1(t) and n2(t) represent noise signals that follow the standard normal distribution. If the expected value operation using a rectangular window having length N is defined by Equation (5), the estimated correlation value r^ of the two signals is calculated according to Equation (7).
EN[u(t)]=(1/N)Σt=0-N-1u(t) (5)
As N approaches infinity, since EN[n12(t)] approaches 1 whereas EN[n1(t)n2(t)] approaches 0, the estimated correlation value r^ substantially matches a true value. However, if N is finite, an error e expressed by Equation (8) occurs.
EN[n12(t)] follows the chi-square distribution χN2 with a N degrees of freedom, and its mean is 1 and the dispersion is 2/N. EN[n1(t)n2(t)] follows the product of normal distribution, and it is estimated that its mean is 0 and the dispersion is 1/N. It shows that the average power E[e2] of the estimated error of the correlation value is inversely proportional to N as expressed in Equation (9).
E[e2]=(a2+1)/N (9)
When a correlation value is estimated for input signals discretely or continuously, an exponential window exhibits better effects than the rectangular window in terms of the amount of storage and the amount of calculation. The exponential window is often used for signal power estimation (see I. Kohen and B. Berdugo, Signal Processing, Vol. 81, 2001, pp. 2403-2418, 2001). On the other hand, there are fewer reports for correlation matrix estimation.
The following describes the fact that the estimation accuracy of the estimate value depends on the area of a squared window. When the expected value operation using an exponential window with attenuation factor α (0<α<1) is defined by Equation (10), the estimate values using the window are recursively calculated according to Equation (11).
Ea[u(t)]=(1−α)Στu(t−τ)ατ (10)
Ea[u(t)]=αEa[u(t−1)]+(1−α)u(t) (11)
The dispersion of signals averaged with the window w(t) is Σtwt2 times the dispersion of signals that are not averaged, i.e., it becomes double the area. Therefore, the average power of the estimated error of the correlation value using the exponential window is estimated by Equation (12).
It is found from Equations (9) and (12) that the estimated error with the exponential window having the attenuation factor α matches the error with a rectangular window having a window length N=(1+α)/(1−α).
However, since the estimation accuracy of the correlation value depends on the area of the squared window, the estimation accuracy of the correlation value is higher as the window length is longer, but it reduces the tracking of variations in sequential processing.
Therefore, it is an object of the present invention to provide a device capable of improving the convergence rate and estimation accuracy in estimating a correlation value.
A signal processing device of the first invention comprises a state detection unit which outputs plural signals according to a state, a correlation calculation unit which calculates a correlation matrix of plural output signals from the state detection unit, and a correlation estimation unit which smoothens the correlation matrix calculated by the correlation calculation unit to determine an estimated correlation matrix. The signal processing device further comprises an estimated error evaluation unit which evaluates an estimated error based on the estimated correlation matrix determined by the correlation estimation unit and a window length adjusting unit which adjusts a window length as the length of the window function in such a manner to reduce the estimated error evaluated by the estimated error evaluation unit.
According to the signal processing device of the first invention, since the window length is adjusted to reduce the estimated error in the correlation matrix, the convergence rate and estimation accuracy in estimating the correlation matrix and the correlation value as its off-diagonal element can be improved.
A signal processing device of the second invention is based on the signal processing device of the first invention. The correlation estimation unit determines the estimated correlation matrix according to an exponential window as the window function. The window length adjusting unit adjusts the window length α(t) according to the following equations (C1) and (C2) based on an allowable error rate β, an estimated correlation value r^(t) as an off-diagonal element of the estimated correlation matrix at time t, and an upper value Nmax of the window length equivalent to that of a rectangular window having the window length:
N^(t)=min[1/(βr^(t))2,Nmax], (C1)
α(t)=(N^(t)−1)/(N^(t)+1). (C2)
According to the signal processing device of the second invention, since the window length of the exponential window is adjusted to reduce the estimated error in the correlation matrix, the convergence rate and estimation accuracy in estimating the correlation value as the off-diagonal element of the correlation matrix can be improved. Further, it can be prevented that the window length exceeds the threshold Nmax when the correlation estimate value r^(t) approaches 0 due to an error or the like.
A signal processing device of the third invention is based on the signal processing device of the first or second invention, further comprising a state estimation unit which estimates the state by performing signal processing on the plural output signals according to the estimated correlation matrix when the estimated error has become a predetermined value after the window length adjusting unit adjusted the window length in such a manner that the estimated error would be equal to or less than the predetermined value.
According to the signal processing device of the third invention, in such a high-probability condition that the correlation of plural output signals according to the state is estimated with a high degree of precision, signal processing is performed on the plural signals, so that the state can be estimated with a high degree of precision.
a)-6(c) are bar charts, where 6(a) shows comparison of SNR as the sound source separation results of respective methods, 6(b) shows comparison of CC as the sound source separation results of respective methods, and 6(c) shows comparison of ASR as the sound source separation results of respective methods.
An embodiment of a signal processing device of the present invention will now be described with reference to the accompanying drawings. The technique of the present invention is called an OCRA (Optimum Controlled Recursive Average) method below. A signal processing device 10 shown in
The following describes the functions of the signal processing device 10 having the above-mentioned structure. First, an exponent k indicating discrete time t is set to “1” (S001 in
The correlation estimation unit 13 smoothes the correlation matrix x(k)xH(k) calculated by the correlation calculation unit 12 according to the exponential window as the window function w(k) to determine the estimated correlation matrix R^(k) according to the above-mentioned Equation (2) (S006 in
The estimated error evaluation unit 14 evaluates the estimated error e(k) based on the estimated correlation matrix R^(k) calculated by the correlation estimation unit 13 (S008 in
Then, the window length adjusting unit 15 determines whether the estimated error (precisely, its average power) is equal to or less than a predetermined value (S009 in
On the other hand, if it is determined that the estimated error is more than the predetermined value (NO in S009), the window length adjusting unit 15 adjusts the window length in such a manner to reduce the estimated error evaluated by the estimation error evaluation part 14 (S012 in
N^(t)=min[1/(βr^(k))2,Nmax] (C1)
α(t)=(N^(k)−1)/(N^(k)+1) (C2)
Note that a model independent of the error accuracy a was used in consideration of handling complex signals. After that, the exponent k is incremented by “1” (S013 in
According to the signal processing device 10 that achieves the above-mentioned functions, since the window length is so adjusted that the estimated error of the correlation matrix R will be reduced, the convergence rate and estimation accuracy in estimating the correlation matrix and the correlation value as its off-diagonal element can be improved (see Equations (C1), (C2), and S012 in
The following describes the performance testing results of the signal processing device 10.
The following describes the performance testing results when the OCRA method is applied to BSS. Plural microphones Mi (i=1, 2, . . . , n) that constitute the state detection unit 11 are arranged, for example, as shown in
In this performance testing, a GSSAS method for adaptively adjusting the step size to the optimum value (see Nakajima et al., Technical Report of IEICE, Vol. EA2007-30, pp. 19-24, 2007) based on a GSS separation method by decorrelation with a geometric constraint (see J. M. Valin et al., Proc. IEEE/RSJ Intelligent Robot and Systems, pp. 2123-2128, 2004). The algorism of the GSSAS method is expressed by Equations (21) to (28).
y=Wtx (21)
Wt+1=Wt−μLCJLC′−μssJss′ (22)
Ess=Ryy−diag[Ryy] (23)
Jss′=2ExxWtRxx (24)
μss=∥Ess∥2/2∥Jss′∥2 (25)
ELC=WD−I (26)
JLC′=ELCDH (27)
μLC=∥ELC∥2/2∥JLC′∥2 (28)
where x is the output signals from respective microphones Mi, y is a separate signal (the number of sound sources N, where N<the number of microphones), Wt is an unmixing matrix, and D is a transfer function matrix of direct sound components. According to GSSAS, imprompt data (xxH, yyH) is used for the correlation matrix (Rxx, Ryy) in Equations (23) and (24). On the other hand, according to the OCRA method, correlation matrix data smoothened with the exponential window is used. Further, the case where a fixed value is used as the attenuation factor α of the window length was compared with the case where the attenuation factor α of the window length is adaptively defined by the OCRA method.
In the experiment, two clean voices were used as the sound source signals sj(t). Specifically, male voice as the first sound source signal and female voice as the second sound source signal were used. As the impulse response hji(t), an actual measurement value in an experimental laboratory was employed. The experimental laboratory was 4.0 m wide, 7.0 m long, and 3.0 m high, and the reverberation time was about 0.2 s.
SNR[dB]=10 Log10[(1/T)Σt=1-T|y(t)|2/|n^(t)|2],
n^=y−s^ (31)
The separation results were further evaluated based on an average correlation coefficient CC calculated in the time-frequency domain according to Equation (32). It means that the lower the average correlation coefficient CC, the more accurately the sound source is separated.
CC[dB]=10 Log10[(1/F)Σf=1-FCCω(2πf)],
CCω(ω)≡|Σt=1-Ty1*(t)·y2(t)|/(Y1(ω)Y2(ω)),
Y1(ω)≡(Σt=1-T|y1(ω,t)|2)1/2,
Y2(ω)≡(Σt=1-T|y2(ω,t)|2)1/2 (32)
Using Julius as the speech recognition engine (see A. Lee et al., Proc. 7th European Conf. on Speech Comm. and Tech., Vol. 3, pp. 1691-1694, 2001), the correct rate of isolated word recognition for 216 ATR phonemically balanced words was evaluated as the correct rate of automatic speech recognition (ASR). The words were trained according to a clean model without reverberation and noise. For the speech features, a total of 48-dimensional mel-frequency log spectral features of a 24-dimensional mel-frequency log power value and a corresponding 24-dimensional linear regression coefficient were used. The direct sound components D of the transfer function matrix were created based on waveforms at the beginning of the impulse response.
a) illustrates SNR of a sound source signal separated by each technique.
Note that, in addition to BSS, the OCRA method is also applicable to position estimation of MUSIC etc., reverberation suppression, noise suppression, estimation of the number of sound sources, etc. Further, in addition to the speech signal, the target signal can be a waveform signal such as an electroencephalographic signal or a communication signal. Further, in addition to the robot R, the signal processing device 10 can be installed in a vehicle (four-wheel vehicle), or any other machine or device in an environment in which plural sound sources exist. In addition, the number of microphones Mi can be arbitrarily changed.
Number | Date | Country | Kind |
---|---|---|---|
2008-182616 | Jul 2008 | JP | national |
Number | Name | Date | Kind |
---|---|---|---|
6377213 | Odachi et al. | Apr 2002 | B1 |
6642888 | Kishigami et al. | Nov 2003 | B2 |
7031719 | Miyano et al. | Apr 2006 | B2 |
7076433 | Ito et al. | Jul 2006 | B2 |
7379020 | Tsuchihashi et al. | May 2008 | B2 |
8050141 | Carroll et al. | Nov 2011 | B1 |
8380150 | Aoyama | Feb 2013 | B2 |
20040202243 | Lin et al. | Oct 2004 | A1 |
20050013369 | Lee | Jan 2005 | A1 |
20050101264 | Farlow et al. | May 2005 | A1 |
20060136402 | Lee | Jun 2006 | A1 |
20060208947 | Tsuchihashi et al. | Sep 2006 | A1 |
20080005048 | Weng | Jan 2008 | A1 |
Number | Date | Country |
---|---|---|
2003-318792 | Nov 2003 | JP |
Entry |
---|
Adaptive Signal Processing with Array Antenna, Published by: Kagaku Gjjutsu Shuppan Inc., Date of Publishation: Nov. 25, 1998, English abstract included. |
Blind Source Separation Based on a Fast-Convergence Algorithm Combining ICA and Beamforming, Hiroshi Saruwatari et al., IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, No. 2, Mar. 2006, (English text). |
High performance blind source separation using an adaptive step-size parameter method, Hirofumi Nakajima et al., IEICE Technical Report EA2007-30 (Jun. 2007), (English abstract). |
Speech enhancement for non-stationary noise environments, Israel Cohen et al., Signal Processing 81 (2001) 2403-2418, (English text). |
Number | Date | Country | |
---|---|---|---|
20090063605 A1 | Mar 2009 | US |
Number | Date | Country | |
---|---|---|---|
60968444 | Aug 2007 | US |