The present invention relates to an apparatus, a method and a program for estimation of positional information of each of a plurality of signal sources such as sound sources and radio wave sources on the basis of observation with a plurality of sensors of signals which are radiated from the signal sources and then mixed together, or more specifically, for estimation of information containing at least one of parameters which indicate the position to be used in the detection of arrival directions of signals and in the separation of signals into each signal source as well as in the recovery of signals.
The use of the independent component analysis (hereafter abbreviated as ICA) has been proposed to estimate arrival directions of source signals and to separate source signals from oncoming signals observed by a plurality of sensors when signals from a plurality of signal sources are mixed in the space before they reach the sensors. Mixing in the space results in a convolutive mixture in which certain signals are mixed with plural time delays inasmuch as arrival delay and attenuation factors from the signal sources to the sensors have different values for direct waves and for a plurality of reflected waves caused by propagation obstacles. ICA process which directly determines separation filters in the time domain is very slow in convergence to a final solution, and hence an approach which applies the ICA process to individual frequency in the frequency domain is more realistic.
A conventional approach which uses the ICA process in the frequency domain to estimate the direction of a signal source as positional information will be briefly described with reference to
The estimation of the arrival direction of a signal is made frequently in the frequency domain. At this end, the observed signal xj(t) is subject to a short-time Fourier transform to obtain a time series signal Xj(ω, m) in the frequency domain where ω represents an angular frequency (ω=2πf where f represents a frequency) and m is a number representing time. Assuming that the source signal si(t) (i=1, . . . , I) is similarly transformed into time series signal Si(ω, m) in the frequency domain, the observed signal Xj(ω, m) can be expressed as
where Aji(ω) represents a frequency response from the signal source of the signal si to the sensor 1j. This can be expressed in terms of vectors and a matrix as follows:
X(ω, m)=A(ω)S(ω, m) (1)
where
X(ω, m)=[X1(ω, m), . . . , Xj(ω, m)]T (2)
S(ω, m)=[S1(ω, m), . . . , S1(ω, m)]T (3)
are vector representations of observed signals by J sensors and I source signals. A(ω) is a J×I matrix having a frequency response Aji(ω) as elements, and is referred to as a mixture matrix since it represents the frequency response of a signal mixture system. Denotation [a]T represents the transposition of a vector or a matrix a.
In
Aji(ω)=exp(jωc−1dj cos θi) (4)
Denoting the arrival direction vector which has the direction θ as follows:
a(ω, θ)=[exp(jωc−1d1 cos θ), exp(jωc−1d2 cos θ), . . . , exp(jωc−1dj cos θ)], the observed signal can be expressed by an approximation
A method of estimating the direction of a signal source using the independent component analysis is described in S. Kurita, H. Saruwatari, S. Kajita, K. Takeda and F. Itakura, “Evaluation of blind signal separation method using directivity pattern under reverberant conditions”, in Proc. ICASS2000, 2000, pp. 3140–3143 (referred to as literature 1), for example. This method will be briefly described below.
The observed signal X(ω, m) is equal to A(ω)S(ω, m), indicating a mixture of the source signal S(ω, m), and thus is not mutually independent. When the independent component analysis is applied to X(ω, m),
Y(ω, m)=W(ω)X(ω, m) (5)
there are obtained mutually independent separated signals
Y(ω, m)=[Y1(ω, m), . . . , Y1(ω, m)]T (6)
W(ω) is an I×J matrix having elements Wij(ω), and is referred to as a separation matrix. For example, when I=J=2, the independent component analysis seeks for the separation matrix W(ω) which satisfies
so that Y1(ω, m) and Y2(ω, m) are independent from each other. When the source signals S1(ω, m), . . . , S1(ω, m) are independent from each other, it follows that the separated signals Y1(ω, m), . . . , Y1(ω, m) should correspond to some one of the source signals. However, it should be noted that the independent component analysis is based only on the independence of signals, and accordingly there remains arbitrariness as regards the sequence and the magnitude of the separated signals. In other words, if rows of the separation matrix W(ω) are interchanged or if rows of W(ω) are multiplied by a constant, they still remain to be solutions of the independent component analysis. As will be described later, the arbitrariness of the sequence leads to a permutation problem and the arbitrariness of the magnitude leads to a scaling problem.
Considering an i-th row of the separation matrix W(ω), which is wi(ω)=[Wij(ω), . . . , WiJ(ω)], it is seen that wi(ω) creates the separated signal Yi(ω, m). Accordingly, it follows that wi(ω) designates one of the source signals S1(ω, m), . . . , S1(ω, m) by emphasis while suppressing others. By analyzing the directivity pattern formed by wi(ω), an analysis can be made to see what is the direction in which the oncoming signal is extracted and what is the direction in which the oncoming signal is suppressed. Thus, this analysis can be relied upon to estimate the arrival direction of the source signal si(t). When this process is repeated for every wi(ω), i=1, . . . , I, the arrival direction Θ=[θ1(ω), . . . , θ1(ω)]T of the source signal which is extracted by each of wi(ω) in the separation matrix W(ω) can be estimated.
The directivity pattern defined by wi(ω) can be expressed as Bi(ω, θ)=wi(ω)a(ω, θ) using an arrival direction vector a(ω, θ). Bi(ω, θ) is considered as a frequency response from a source signal located in the direction θ to the separated signal Yi(ω, θ). The gain |Bi(ω, θ)| of directivity patterns obtained by the independent component analysis at 3156 Hz is shown in
MUSIC (Multiple Signal Classification) method (see S. Unnikrishna Pillai, “Array Signal Processing”, Springer-Verlag, 1989, ISBN 0-387-96951-9, ISBN 3-540-96951-9, for example) is known as a method of estimating the directions of a plurality of signal sources using a plurality of sensors and transforming observed signals from the sensors in the frequency domain. With this method, the directions of signal sources up to (J−1) which is one less than the number J of the sensors can be estimated. By contrast, according to the method which incorporates the independent component analysis (such method is simply referred to as ICA method), two sensors can accommodate for a mixture of two signals, and thus this method is superior to MUSIC method in this respect. However, with this ICA method, accommodation for a mixture of three or more signals involves difficulties which will be described later. In addition, the determination of a minimum gain of the directivity pattern requires a high computational cost calculations.
The application of the ICA method for a mixture of three signals using three sensors will be described. In this instance, the ICA can take place in the similar manner as before with a 3×3 separation matrix, but the analysis of the gain of a directivity pattern involves difficulties. The gain |Bi(ω, θ)| of the directivity pattern at the frequency of 2734 Hz subsequent to the ICA process is shown in
Blind Signal Separation
The prior art of blind signal separation utilizing the ICA will now be described. The blind signal separation represents a technology which estimates a source signal or signals from observed mixed signals. In the description to follow, an example will be dealt with in which a mixed signal comprising a mixture of I source signals is observed with J sensors.
Denoting a source signal generated by a signal source i by si(t) (i=1, . . . , I; t represents time) and a mixed signal observed by a sensor j by xj(t) (j=1, . . . , J), the mixed signal xj(t) can be expressed as follows:
where aji represents an impulse response from the signal source i to the sensor j, and * a convolution operator. The purpose of the blind signal separation is to determine a filter wkj which is required for the separation and the separated signal yk(t)(k=1, . . . , I) using only the observed signal xj(t) according to the following equation:
A convolutive mixture in the time domain can be transformed into a plurality of instantaneous mixtures in the frequency domain. Specifically, above equations (7) and (8) are represented by the equations (1) and (5), respectively, where W(ω) is a separation matrix which is calculated using ICA so that Yk(ω, m) and Yk′(ω, m) are mutually independent, and is a solution of the ICA.
What matters during the blind signal separation in the frequency domain relates to a permutation problem and a scaling problem.
As mentioned previously, an interchange of rows in the separation matrix W(ω) also results in a solution of the independent component analysis. Thus, assuming that W(ω) is a solution of the ICA at a certain angular frequency ω, and denoting an arbitrary diagonal matrix by D(ω) and denoting an arbitrary permutation matrix (a multiplication of this matrix from the left of an arbitrary matrix results in a matrix which is obtained by the permutation of the rows of the arbitrary matrix) by P(ω), then P(ω)D(ω)W(ω) is also a solution of the ICA. This is because the ICA performs a separation of source signal based only on the statistical independence between the source signals. The freedom of a solution which is given by D(ω) is called a scaling ambiguity, and the freedom of a solution which is given by P(ω) is called a permutation ambiguity.
Accordingly, in order to perform an appropriate blind signal separation, a solution W(ω) must be identified which is appropriate to serve as a separation matrix from among the solutions of the ICA for all values of ω. Generally, the identification of the appropriate solution W(ω) is made by multiplying a solution of the ICA which is arbitrarily obtained by an appropriate D(ω) or P(ω), with the result being chosen as an appropriate solution W(ω). Determining D(ω) in an appropriate manner for all values of ω is called a scaling problem, and determining P(ω) in an appropriate manner is called a permutation problem. Permutation refers to a bijection function Z:{1, 2, . . . , I}→{1, 2, . . . , I} from {1, 2, . . . , I} to {1, 2, . . . , I}, and has a one-to-one correspondence to a permutation matrix.
The scaling freedom is equivalent to the freedom of a filter which changes a frequency response in the time domain. Accordingly, in order to produce distortion-free separated signals in the time domain, it is necessary to determine D(ω) in an appropriate manner for all values of ω. This scaling problem can be readily solved by choosing D(ω)=diag(W−1(ω)), for example. diag(α) represents a diagonalization of a matrix α (which is to make all elements other than diagonal elements to be 0). Thus, for a solution of the ICA which is arbitrarily obtained, W0(ω), an inverse matrix is obtained, and it is diagonalized to provide a matrix D(ω), and D(ω) W0(ω) is identified as a appropriate separation matrix W(ω). This is already known in the art. For example, it is described in a reference literature: K. Matsuoka and S. Nakashima, “Minimal Distortion Principle of Blind Source Separation”, Proc. ICA 2001, pp. 722–727.
On the other hand, because of the permutation ambiguity, it is possible that as a result of calculation according to the equation (5), a separated signal Y1(ω, m) is delivered as an estimate for a source signal S1(ω, m) at a certain angular frequency ω1 while the separated signal Y1(ω2, m) may be delivered as an estimate of a source signal S2 (ω2, m) at another angular frequency ω2. In such instance, a component of a source signal s1(t) and a component of a source signal s2(t) in the time domain may be present in admixture in an output y1(t) in the time domain, preventing separated signals from being properly produced. Accordingly, in order for the output signal y1(t) in the time domain to properly be an estimate for the source signal s1(t), it is necessary that P(ω) be properly determined so that Y1(ω, m) be an estimate of S1(ω, m) for all values of ω.
A method of estimating the arrival direction of a signal as disclosed in the cited literature 1 is known as a typical solution of the prior art for the permutation problem. Specifically, a directivity pattern corresponding to each row of the separation matrix W(ω) is determined at each frequency in a manner as described above with reference to
This method of solving the permutation problem requires a high computational cost in determining minimum gains of directivity patterns as mentioned previously, and in addition, where the number I of signal sources is equal to or greater than 3, a trial-and-error of appropriately rearranging W(ω)'s for all frequencies is necessary. In addition, as described above with reference to
In addition, the accuracy of estimating the arrival direction of a signal by searching for a low gain of the directivity pattern depends on the position of the signal source. In particular, when the arrival direction of a signal is close to a straight line (hereafter referred to as a sensor pair axis) which joins a pair of sensors 1j and 1j′, the magnitude of the error increases. This has been experimentally demonstrated. As shown in
A blind signal separation which has been conducted while moving the sound sources 111 and 112 in this manner is illustrated in
It is seen from
The process of using the independent component analysis to determine a separation matrix, obtaining directivity characteristic pattern from each row of the separation matrix, searching for a direction of a low gain to determine the direction of a signal source (arrival signal direction) and utilizing it to effect the blind signal separation requires a large volume of calculation time in determining the directivity characteristic pattern and searching for a direction of a low gain.
It is an object of the present invention to provide an apparatus, a method and a program for estimation of positional information which allows a calculation time required in estimating positional information of signal sources to be reduced.
According to the present invention, inverse matrices (or pseudo-inverse matrices for I<J) for separation matrices W(ω1), . . . , W(ωN) in the frequency domain are calculated to produce estimates H(ω1), . . . , H(ωN) of mixed matrices A(ω1), . . . , A(ωN) up to the scaling and the permutation ambiguity. On the basis of the ratio between two elements Hji(ωn) and Hj′i(ωn) for each column of H(ωn)(n=1, . . . , N) for each frequency (where j and j′ are parameters representing sensors and i is a parameter representing a signal source), one of parameters of positional information of a signal source i such as a conical surface or a curved surface on which the signal source exists, for example, is calculated.
What is required for the calculation is the calculation according to formulae which are expressed in terms of the ratio of elements in the matrix, and the amount of calculation can be reduced than when determining the directivity pattern of a separated signal and searching for a minimum location angle thereof. Using the ratio of elements avoids the influence of the scaling ambiguity.
An embodiment in which the present invention is applied in the estimation of direction information which represents positional information of a signal source will be described first. In the description to follow, identical or corresponding parts are designated by like reference numerals throughout the drawings in order to avoid a duplicate description.
In this first embodiment, the direction of a signal source, or the arrival direction of a source signal which is radiated from the signal source is determined.
J sensors 11, 12, . . . , 1J, which are equal to or greater in number than the number I of signal sources, are disposed in an array as shown in
W(ω)=(W(ω1), W(ω2), . . . , W(ωN))
An inverse matrix of the separation matrix W(ωn) for each frequency is calculated in an inverse matrix calculator 13, thus determining an inverse matrix H(ωn) (step S3, FIG. 6):
H(ω)=(H(ω1), H(ω2), . . . , H(ωN))
It is to be noted that the calculation of the inverse matrix is changed into the calculation of the pseudo-inverse matrix for J>I. The pseudo-inverse matrix may be the Moore-Penrose generalized inverse, for example.
In this embodiment, direction information, or specifically, the arrival direction of a source signal is calculated in an angle calculator 14 from the argument of the ratio of two elements Hji(ωn) and Hj′i(ωn) of each column i of an inverse matrix H(ωn) for at least one frequency (step S4,
An argument calculator 14b calculates an argument of the ratio between the selected elements Gi(ωn)=arg [Hji(ωn)/Hj′i(ωn)], and a spacing calculator 14c derives position information dj and dj′ of sensors 1j and 1j′ from a sensor information storage 15 to determine a spacing dj−dj′ between the sensors 1j and 1j′ (step S4c,
A decision unit 14f determines whether or not a result of division Gi(ωn) by the divider 14e has an absolute magnitude equal to or less than 1 (step S4e,
G1(ωn)=arg [Hji(ωn)/Hj′i(ωn)]/(ω(dj−dj′)/c) θi(ωn)=cos−1Gi(ω): for |Gi(ωn)|≦1 (9)
If |Gi(ωn)| is not equal to or less than 1 at step S4e, the angle θi(ωn) would be an imaginary number, resulting in the selection of another combination. For this reason, a decision is rendered by the decision unit 14f to see if every combination of elements in the i-th column has been selected (step S4g,
A result of calculation of the equation (9), namely, the directions of I signal sources (signal arrival directions) Θ(ωn)=(θ1(ωn), θ2(ωn), . . . , θ1(ωn)) are delivered from the angle calculator 14 in a manner corresponding to each column of the inverse matrix H(ωn) for the selected angular frequency ωn in H(ω) in the sequence selected. Specifically, if the selection is sequentially made beginning with a first column, in that sequence. It will be seen that θ1(ωn), θ2(ωn), . . . , θ1(ωn) should correspond to either one of arrival directions of the source signals s1(t), s2(t), . . . , s1(t) (the directions of the signal sources 1, 2, . . . , I).
A mechanism which is used in this embodiment to allow the estimation of signal arrival directions will now be described. If the separation were accomplished by the independent component analysis (ICA) process, it follows that the separation matrix W(ω) which is calculated by the ICA process and a true mixing matrix A(ω) is related such that P(ω)D(ω)W(ω)A(ω)=I where D(ω) represents a diagonal matrix indicating the scaling ambiguity, P(ω) a permutation matrix indicating the permutation ambiguity and I a unit matrix. If the ICA process is used, the mixing matrix A(ω) itself cannot be generally calculated. However, when an inverse matrix of W(ω) or H(ω)=W−1(ω)=A(ω)P(ω)D(ω) is calculated, an estimate of the mixed matrix containing the scaling ambiguity and the permutation ambiguity is obtained. Thus, the inverse matrix H(ω) comprises the mixed matrix A(ω), the columns of which are permutated according to P(ω) and are multiplied by the diagonal elements of D(ω).
In this embodiment, two elements Hji(ω) and Hj′i(ω) are taken from the same column i of the inverse matrix H(ω), and the ratio Hji(ω)/Hj′i(ω) is determined to eliminate the scaling ambiguity which is caused by D(ω) which cannot be calculated. Thus
where Z represents a permutation which corresponds to the multiplication of the permutation matrix P(ω) from the right. When the calculation according to the equation (21) is made for every column i of the inverse matrix H(ω), the arrival directions of all signals can be estimated irrespective of the permutation Z by P(ω).
In the description of the background technique, an element of the mixed matrix A(ω) has been modeled as Aji(ω)=exp(jωc−1dj cos θi)). However, such a simple model is insufficient for the purpose of the present embodiment because an estimate of the mixed matrix A(ω) up to the scaling ambiguity and the permutation ambiguity is calculated using the inverse matrix H(ω) of the separation matrix W(ω). Accordingly, using an amplitude attenuation factor αji (a real number) and a phase difference exp(jφi) at the origin, a different model Aji(ω)=αji exp(jφi)exp(jωc−1dj cos is used. When AjZ(i)(ω)/Aj′z(ω) is calculated using this model, it follows from the equation (21):
As a consequence, we have
Gi(ω)=arg[Hji(ω)/Hj′i(ω)]/(ωc−1(dj−dj))=cos θZ(i).
If |Gi(ω)|≦1, θZ(i)=cos−1Gi(ω) represents a real number, allowing an arrival direction to be estimated. All of I directions Θ(ω)=[θZ(i)(ω), . . . , θZ(1)(ω)] which are properly rearranged or permuted according to the permutation Z correspond to the directions of signals s1, . . . , s1.
Alternatively, the angle θi(ωn) may be determined for each column and for each inverse matrix H(ωn) (n=, . . . , N) for a plurality of frequencies or every frequency in H(ω) in the manner mentioned above, and individual arrival directions may be determined on the basis of the whole assembly of these angles. Specifically, the arrival directions of individual frequencies which are estimated by the angle calculator 14 are sorted into different angles in a sorter 32 shown in
Accordingly, the sorted angles θi(ω1), θi(ω2), . . . , θi (ωN) are unified into a single angle θi in a unification unit 33 and this angle θi is deemed as an arrival direction (step S6,
In the calculation and the estimation of the angle shown in
An experimental example of the first embodiment will now be described. Three microphones are arrayed in one row at a spacing of 56.6 mm in a room having a reverberation time of 190 ms, and three sound sources are disposed at angles of 48°, 73° and 119° as referenced to the direction of the array. Acoustical signals from the sound sources are mixed together for six seconds, and observed signals are sampled with a sampling frequency of 8 kHz and with a maximum frequency of 3 kHz below which a spatial aliasing is prevented together with a short-time Fourier transform frame of 1024 samples. Angles calculated for each frequency are shown in
To compare the method of this embodiment against MUSIC method, a similar experiment has been conducted by placing two sound sources which have directions of sound sources at 48° and 119°, thus a relatively large angular offset therebetween. A result obtained with the method of the present embodiment is shown in
As discussed above, according to the present embodiment, a substitution of values into the equation (9) allows estimated directions to be determined, and accordingly, a computational time according to the embodiment are much less than a conventional method which searches for a direction where the gain of a directivity pattern is low. The separation matrix W(ω) which is obtained by the ICA process contains the scaling freedom and the permutation freedom as mentioned previously, and accordingly, it may be contemplated to calculate an inverse matrix of a separation matrix W′(ω) in which the problem of the freedom is solved, or to calculate a true mixed matrix A(ω), and to estimate an arrival direction on the basis of a ratio of two elements for each column of the mixed matrix A(ω). However, the true mixed matrix A(ω) itself cannot be determined without the use of a restriction that the average power of a signal si(t) of a signal source is chosen to be 1, for example. The use of such a restriction upon a signal source may be possible in the radio communication field, but where the signal si(t) of a signal source is a voice signal which is directly uttered by a man, a restriction of the kind described cannot be used. On the other hand, according to the first embodiment, the problem of the scaling freedom is solved by forming the ratio of two elements for each column of the inverse matrix H(ω) of the separation matrix W(ω) which contains the scaling and the permutation ambiguity, and such approach is applicable to any signal source. In addition, this avoids the need of calculating a separation matrix in which the both problems are solved, thus reducing the calculation time. In addition, once estimated directions which are obtained for individual frequencies are sorted in a predetermined sequence, the permutation problem can also be solved in a simple manner. If the number of signal sources is equal to the number of sensors J, the directions of each signal source can be estimated. If the directions of certain sound sources are located relatively close to each other, an estimation can be made with a fairly good accuracy.
A second embodiment is directed to obtaining direction information which is one item of positional information of signal sources. According to the second embodiment, at least three sensors which are disposed in at least two dimensions are used to allow a direction of a signal source to be estimated wherever the signal source is directed, thus allowing the permutation problem involved with the blind signal separation to be solved in a relatively simple manner. Specifically, a conical surface which is based on direction information is estimated, and a straight line which is in common to a plurality of conical surfaces is estimated to determine direction information.
A functional arrangement of the second embodiment as applied to the blind signal separation system is shown in
A separation matrix calculator 12 calculates by the independent component analysis a separation matrix for each frequency:
from the signal Xj(ωn) in the frequency domain (step S12).
An inverse matrix calculator 13 calculates an inverse matrix H(ω) for each separation matrix W(ω) for each frequency (step S13):
In the second embodiment, a conical surface estimator 14 estimates conical surfaces on which some signal source would be present on the basis of ratios of elements for a plurality of element pairs, for each column of an inverse matrix H(ω) for each frequency. Each conical surface has a center axis defined by a sensor pair axis joining two sensors which correspond to the elements. In this manner, a plurality of conical surfaces, each corresponding to each column of one mixing matrix H(ω), are estimated (step S14).
The functional arrangement of the conical surface estimator 14 is substantially similar to the angle calculator 14 shown in
Initially, control parameters i which is stored in a register within the conical surface estimator 14 is initialized (step S20) where i corresponds to the number of each signal source.
i is incremented by one and control parameter p is stored in a register within the number of conical surface estimator 14 is intialized where p represents the number of conical surfaces which have been estimated for each value of I (step S21), P is incremented by one (step S22), and control parameters j, j′ which are mutually different natural numbers (j≠j′) equal to or less than J are selected at random, for example (step S23). The pair of control parameters j, j′ which is once selected is not selected again for the same value of i. For example, if (j, j′)=(1, 2) is selected once for i=1, (j, j′)=(1, 2) is not selected again at step S23 until the end of the processing operation for i=1. (It is desirable that this selection be made such that a sensor pair axis, namely a straight line passing through sensors j and j′ which are specified by the selected j and j′ is not aligned with a sensor pair axis specified by j, j′ which were previously selected during this routine. It then follows that the conical surface estimator 14 will estimate a plurality of conical surfaces having center axes which do not overlap within a given range of errors. Whether or not several sensor pair axes are aligned can be determined, for example, by storing vectors which indicate the positions of sensors in a sensor information storage 15, and by retrieving information representing the vectors which indicate the positions of the sensors.)
A vector dj indicating the position of a j-th sensor j which corresponds to the parameter j selected at S23 and a vector dj′ indicating the position of a j′-th sensor j′ corresponding to the parameter j′ are retrieved from the sensor information storage 15 (step S24). A j-row i-column element Hji(ω) and a j′-row and an i-column element Hj′i(ω) are designated to be retrieved (step S25). These operations are performed by the selector 14a shown in
Using retrieved information, a calculation is made according to the following equation: (step S26)
where ∥dj−dj′∥ represents a spacing or distance between sensors 1j and 1j′. In the second embodiment, a plurality of sensors are disposed in two or three dimensions. Accordingly, positional information of individual sensors are represented by two or three element coordinate vector having an origin at the center of a circle on which sensors 11 to 14 are disposed. It will be recalled that the equation (9) is developed for a two-dimensional angle indicating an arrival direction of a signal when sensors are disposed as a linear array, but the equation (9′) is an extension of the equation (9) in that the sensors may be disposed in two or three dimensions and angles indicating arrival directions of signals may be in a three-dimensional space. Accordingly, it is to be understood that the equation (9′) comprehends the equation (9). The angle {circumflex over (θ)}i,jj′ which is estimated according to the equation (9′) and associated parameters i, j and j′ are temporarily stored in registers (memory) within the conical surface estimator 14 as a conical surface information (step S27). As indicated in broken lines in
A decision is made to see if p=P (step S28). P is the number of conical surfaces which are to be estimated for each value of i, and this step decides whether or not P conical surfaces have been estimated for the particular i. If p is not equal to P, the operation returns to step S22, and if p=P, a decision is made to see if i=I (step S29). Thus, a decision is made to see if the estimation of conical surfaces have been completed for all values of i. If i is not equal to I, the operation returns to step S21, but if i=I, the processing operation is completed (this completes the description of a specific example of step S14). In the first embodiment, when one value of the angle θi(ω) is estimated for a particular value of i, the operation transfers to the estimation of the angle for next value of i. However, in the second embodiment, the plurality of angles (conical surfaces) {circumflex over (θ)}i,jj′(ω) are estimated for each value of i. The operations of steps S28 and S29 are performed by the decision unit 14f.
An arrival direction decision unit 16 shown in
A method of estimating an arrival direction ui(ω) of a signal will now be described with reference to
A specific example of the method of determining the direction θi(ω) of the common straight line 5i which takes place in the arrival direction decision unit 16 shown in
A normalized axis vector calculator 16a normalizes an axis vector (dj(p)−dj′(p)) which joins the pair of sensor positions to a length of 1. In other words
vp=(dj(p)−dj′(p))/∥dj(p)−dj′(p)∥
is calculated. An inner product of vp and a conical surface vector is supposed to be a cosine of an angle formed between these vectors. Thus,
vpT·u/∥u∥=cos {circumflex over (θ)}jj′(ω, p)
applies. Since what we want to know is only the direction of the common straight line 5i, the conical surface vector u is represented by a unit vector, or ∥u∥=1. To determine the direction of a straight line which is in common with all conical surfaces, denoting
V=(v1. . . vP)T, ĉ(ω)=(cos {circumflex over (θ)}jj′(ω, 1) . . . cos {circumflex over (θ)}jj′(ω, P))T
the following simultaneous equations may be solved for u.
Vu=ĉ(ω) (∥u∥=1)
A solution for the simultaneous equations generally does not exist or cannot be determined uniquely. Accordingly, u which minimizes ∥Vu−ĉ(ω)∥ is then determined and is made to be a solution of the simultaneous equations or the direction ui(ω) of the common straight line 5i. A calculation for determining u which minimizes this error is performed in a calculator 16b. Since the direction ui(ω) represents a direction in three dimensions, it follows that the direction is given in polar coordinates in terms of the azimuth θi(ω) and the angle of elevation φi(ω).
The following approach may be used to simplify the calculation. As shown in
ui(ω)=V+ĉ(ω)/∥V+ĉ(ω)∥
is calculated in a calculator 16d.
In this manner, the direction of a straight line which is regarded as being common to a plurality of estimated conical surfaces is determined for each frequency and for each signal source.
A permutation solver 17 shown in
To give a specific example of what is performed by the permutation solver 17, a permutation is performed in accordance with an arrival azimuth angle θi in a manner to be described below, and for a column or columns for which the problem could not have been solved, a permutation is performed in accordance with an arrival angle of elevation φi in a similar manner. Specifically, a permutated matrix in which columns in the inverse matrix H(ω) are permuted so that arrival azimuth angles (θi, ω) which have been calculated and determined assume a given sequence, for example, an ascending order of θ1, θ2, . . . , 741 for any frequency and so that angles of elevation φi are similarly in an ascending order for those columns for which the permutation failed is produced in a permuted matrix generator 17a. An inverse matrix P(ω) of the permuted matrix is produced in an inverse matrix generator 17b. In a permutater 17c, the inverse matrix P(ω) is multiplied to the separation matrix W(ω) from the left. The permuted matrix generator and the inverse matrix generator 17b form together a permutation matrix generator.
A processing operation which takes place in the permuted matrix generator 17a will be specifically described. In this example, it is assumed that (θ1(ω), φi(ω)) is calculated for a first column of the inverse matrix H(ω), (θ2(ω), φ2(ω)) is calculated for a second column . . . and (θ1(ω), φ1(ω)) is calculated for the I-th column in accordance with the equation (9′). Arrival directions which are input from the arrival direction decision unit 61 are denoted by θ and φ having suffices with primes “′” or as (θ1′(ω), φ1′(ω)), (θ2′(ω), φ2′(ω)), . . . , (θ1′(ω), φ1′(ω)) in order to discriminate them angles which are permuted in the ascending order and they are arranged in an ascending order of θi′(ω), for example. If the result is such that (θ3′(ω), φ3′(ω))>(θ1′(ω), φ1′(ω))>(θ2′(ω), φ2′(ω))>. . . , a movement takes place so that the third column of the inverse matrix H(ω) assumes the first column, the first column assumes the second column and the second column assumes the third column together with a similar movement of remaining columns. For columns which assumes an identical value of θi′(ω), columns are moved so that φi′(ω) assume an ascending order. A permuted matrix which moves or permutates columns in this manner is produced. Producing a matrix which performs such a permutation is known in the art. A permuted matrix is produced using arrival directions (θ1(ω), φ1(ω)) . . . (θ1(ω), φ2(ω)) which are obtained for every frequency, and an inverse matrix thereof or a permutation matrix P(ω) is calculated (step S16,
The permutation matrix P(ω) which is calculated in this manner is multiplied to the separation matrix W(ω) from the left in a permuter unit (17c), and a resulting matrix W′(ω)=P(ω)W(ω) is delivered as a separation matrix for which the permutation problem has been solved (step S17). Thus, for any frequency, in the separation matrix W′(ω), the first row contains elements which separate a signal from signal source 21, the second row contains elements which separate a signal from signal source 22, and similarly elements in a common row are elements which separate signals from a same signal source.
The separation matrix W′(ω) is transformed in a time domain transformer 18, by an inverse Fourier transform, for example, into a time domain separation coefficient bank
which is set up in a signal separator 19.
The signal separator 19 performs the calculation according to the equation (8) using observed signals x1(t), . . . , xJ(t) from the sensors and the separation filter coefficient bank to deliver separated signals y1(t), . . . , yJ(t).
As indicated in broken lines in
As described above, in this embodiment again, the calculation according to the equation (9′) estimates conical surface information ({circumflex over (θ)}jj′(ω), {circumflex over (φ)}jj′(ω)) without searching for a direction of a low gain in the directivity pattern, and accordingly, the amount of calculation is reduced. In addition, because a plurality of a conical surfaces are estimated for a single signal source, and the arrival direction of the signal is determined on the basis of a common straight line therebetween, it is possible to estimate the direction of a signal source 3; uniquely irrespective of wherever the signal source is located in a range from 0° to 360°. The estimated direction is utilized in the determination of the permutation matrix P(ω), and accordingly, the permutation problem can be properly solved irrespective of locations of signal sources.
In a third embodiment, a curved surface on which a signal source exists is used as positional information which is based on the ratio of distances between a pair of sensors and a single signal source. In the first and the second embodiment, an assumption is made that signal sources are located remotely from sensors, and accordingly, signals from the signal sources are oncoming as plane waves to the sensors. However, when distances between signal sources and sensors are relatively short, a signal is oncoming as a spherical wave to a sensor. In view of this, when a ratio Aji(ω)/Aj′i(ω) of elements of a mixing matrix A(ω) is interpreted according to a spherical wave (close distance field) model, information other than the direction of a signal source i can be estimated.
Specifically, using a close distance field model, a frequency response Aji(ω) can be expressed as follows:
Aji(ω)=(1/∥qi−dj∥)exp(jωc−1(∥qi−dj∥))
where qi is a vector indicating the position of a signal source i.
A ratio of two elements in a common column of a mixing matrix, Aji(ω)/Aj′i(ω), is formed using the frequency response expressed in the manner mentioned above, and the absolute magnitude of the ratio is calculated as follows:
∥qi−dj′∥/∥qi−dj∥=|Aji(ω)/Aj′i(ω)| (10)
where |β| represents the absolute magnitude of β.
A set of innumerable number of points qi which satisfy the equation (10) defines a curved surface on which a signal source i exists, and allows a distance from a sensor to the signal source i to be estimated when used in combination with a direction (or a conical surface) which is estimated using a far distance field (plane wave model). Accordingly, if two or more signal sources are located in a same direction or mutually adjacent directions, if there is a difference in the distance, they can be discriminated, allowing the permutation problem to be properly solved.
In the third embodiment illustrated here, positional information representing a curved surface on which a signal source exists and the direction information which is treated above are used in solving the permutation problem of a separation matrix.
A functional arrangement of the third embodiment is shown in FIG. 17 and a processing procedure is shown in
∥qz(i)−dj′∥/∥qz(i)−dj∥=|Ajz(i)(ω)/Aj′z(i)(ω)|=|Hji(ω)/Hj′i(ω)|=DRi,jj′ (10′)
A specific example of steps 35 which takes place in the distance ratio calculator 31 will be described with reference to
In this embodiment, a ratio DRi,jj′(ω) of the selected two elements is calculated (step S41). A decision is then rendered to see if i=I (step S29), and unless i=I, the operation returns to step S21, but if i=I, the processing operation is completed.
Distance ratio information DRi,jj′(ω) which is calculated by the distance ratio calculator 31 is fed to a permutation solver 17, which uses direction information ui(ω) which is estimated in the arrival direction decision unit 16 and the distance ratio information DRi,jj′(ω) calculated by the distance ratio calculator 31 to solve the permutation problem for the separation matrix which is calculated by separation matrix calculator 12.
The permutation problem is solved by performing a permutation of rows in W(ω). For example, using direction information and distance ratio information, the distance ∥qi(ω)∥ of a signal source 2i is calculated by a distance estimator 17d (step S36).
A method of calculating the distance ∥qi(ω)∥ will be described with reference to
DR1,23(ω)=|H21(ω)/H31(ω)|=∥q1−d3∥/∥q1−d2∥.
In this manner, ∥q1(ω)∥ can be estimated. If a signal source 22 exists, a curved surface 62 on which the signal source 22 exists can be estimated from a distance ratio
DR2,23(ω)=|H22(ω)/H32(ω)|=∥q2−d3∥/∥q2−d2∥.
Thus ∥q2(ω)∥ can be estimated.
The position of the signal source 21 can be estimated as existing on a common area between the straight line u1=u2 and the curved surface 61, and the position of the signal source 22 can be estimated as existing on a common area on the straight line u1=u2 and a curved surface 62. For example, an equation representing the straight line u1=u2 and equations which represent the curved surfaces 61 and 62 may be used as simultaneous equations, which can be solved to determine ∥q1(ω)∥ and ∥q2(ω)∥. In this manner, if the directions of signal sources are same or closely related, it is possible to discriminate the positions of the signal sources.
The permutation solver 17 performs a permutation of rows in the separation matrix W(ω) so that the distances of signal sources ∥qi(ω)∥ for each frequency which are obtained in the manner mentioned above are in a given sequence, for example, in an ascending order. At this end, a permutation matrix P(ω) is calculated (step S37). The permutation matrix P(ω) can be calculated in the similar manner as the permutation matrix is calculated by the permutation solver in the second embodiment. The calculated permutation matrix P(ω) is multiplied to the separation matrix W(ω) from the left, and a resulting matrix W′(ω)=P(ω)W(ω) is delivered as a separation matrix (step S17).
The delivered separation matrix W′(ω) is fed to a time domain transformer 18 where it is used for purpose of signal separation.
As will be understood from
In the permutation solver 17, a permuted matrix in which the distance ratio DRi,jj′(ω) determined are in an ascending order in the sequence of the first column, the second column, the third column, . . . , and the I-th column of the inverse matrix H(ω) for each frequency may be produced using the calculated distance ratio DRi,jj′(ω), as indicated at step S37 in
The distance ratio DRi,jj′(ω) is determined initially, a permutation of rows in the separation matrix W(ω) is then performed on the basis of a result determined and where the distance ratio DRi,jj′(ω) could not have been determined, a permutation of the separation matrix W(ω) can be continued by using direction information ui(ω). Again, a permutation matrix P(ω) which performs both permutations simultaneously is produced. Generally, ui(ω) can be utilized with a higher accuracy than with DRi,jj′(ω), and accordingly, it is preferred that the permutation according to ui(ω) is principally made, and where it is impossible, a permutation in accordance with DRi,jj′(ω) is performed.
When the right side of the equation (10′) is denoted by the distance ratio DRi,jj′(ω) and the equation is solved for qi, there is obtained a spherical surface having a center Oi,jj′(ω) and a radius Ri,jj′(ω) which are given by the following equations (11) and (12):
Oi,jj′(ω))=dj−(dj′−dj)/(DR2i,jj′(ω)−1) (11)
Ri,jj′(ω)=∥DRi,jj′(ω)·(dj′−dj)/(DR2i,jj′(ω)−1)∥ (12)
By way of example, when the sensors 1j′ and 1j are located such that dj=(0, 0.15, 0) and dj′=(0, −0.15, 0) where the unit is given in meter, the spherical surface which is determined by the equation (10′) will appear as shown in
This means that the signal source i exists on the spherical surface given by the equations (11) and (12) which serve as positional information. Accordingly, the distance estimator 17d of the permutation solver 17 shown in
Where the permutation matrix P(ω) cannot be determined if the conical surface information {circumflex over (θ)}i,jj′(ω) or direction information ui(ω) in combination with spherical surface information DRi,jj′(ω) or Ri,jj′(ω) is used, a conventional correlation method (see, for example, H. Sawada et al “A robust and precise method for solving the permutation problem of frequency-domain blind source separation”, in Proc. Intl. Symp. on Independent Component Analysis and Blind Signal Separation (ICA 2003), 2003, pp. 505–510) may be applied for such frequency or frequencies. As indicated in broken lines in
An experimental example of the third embodiment will now be described in which a separation experiment has been conducted using a mixed voice as a signal source in which impulse responses which are actually measured in a room are convoluted. As shown in
An estimation of direction of sound source (conical surface) is made using rows of the INVERSE H(ω) frequency domain separation matrix W(ω) which correspond to the microphone pair 11 and 13, 12 and 14, 11 and 12, and 12 and 13. The histogram of the estimated diretions is shown in
For two remaining sound sources, oncoming signals have been discriminated in accordance with the radius of a spherical surface on which a sound source may exist, using widely spaced microphone pairs 17 and 15, 17 and 18, 16 and 15, and 16 and 18. The radius of a spherical surface which is estimated using the microphone pair 17 and 15 is shown in
It is impossible to solve the permutation problem completely on the basis of positional information alone due to influences of reverberations and errors of the estimation. Accordingly, for frequencies for which signals could have been sorted without contradiction on the basis of the estimated positional information, a permutation matrix is produced on the basis of such information, and for remaining frequencies, the approach which is based on the correlation is used to solve the permutation problem. Lastly, a spectral smoothing has been made when determining separation filter coefficients in the time domain. For the spectral smoothing, see, for example, H. Sawada et al, “Spectral smoothing for frequency-domain blind source separation”, in Proc. IWAENC 2003, 2003, pp. 311–314. An evaluation result of the separation performance (SIR) is indicated in
In the second embodiment, sensors are disposed in two dimensions, but because a spherical surface which is estimated by a pair of sensors appears symmetrically with respect to a bisector of the sensors, where a signal source exists in three dimensions, it cannot be determined using sensors disposed in two dimensions, requiring that sensors be also disposed in three dimensions.
As discussed above, in the third embodiment, conical surface information is estimated according to the equation (9′) and curved surface information is estimated according to the equation (10′), and therefore the amount of calculation is reduced. In addition, since the permutation problem is solved by a combination of the conical surface and one of the distance ratio DRi,jj′(ω) or the distance ∥qi(ω)∥ or the radius Ri,jj′(ω) of the spherical surface, if two or more signal sources are located in a same direction or closely adjacent directions, they can be discriminated. When the correlation method is added, it is possible to achieve a more reliable separation. It will be seen that DRi,jj′(ω) is preferred as spherical information because this simplifies the calculation.
In a fourth embodiment, the reliability of an estimated conical surface is verified, and a conical surface which has been determined as having a high reliability is used in solving the permutation of the separation matrix. As illustrated in
For this reason, a minimum value θmin and a maximum value θmax which are estimated as permitting the permutation to take place properly are stored as a permissible angle information in the permissible angle storage 42. If the estimated conical surface information {circumflex over (θ)}i,jj′(ω) lies between θmin and θmax, it is determined to be a reliable conical surface in the conical surface verifier 41 to be delivered, namely, it can be used in solving the permutation problem. However, if it is not located between θmin and θmax that {circumflex over (θ)}i,jj′(ω) is discarded as lacking the reliability and is not used in solving the permutation. For example, a conical surface 413 shown in
Conical surface information {circumflex over (θ)}i,jj′(ω) which is verified to be reliable by the conical surface verifier 41 is fed to the arrival direction decision unit 16 shown in
|Δ{circumflex over (θ)}/Δ arg Ĥ|=|1/(ωc−1|dj−dj′|sin {circumflex over (ω)}i))| (13)
The equation (13) has been calculated for several frequencies, and results are shown in
In the fourth embodiment, estimated conical surface information which lacks the reliability is discarded, and accordingly, the arrival direction could be estimated correctly in that it is not adversely influenced by non-reliable information, and accordingly, a correct permutation matrix P(ω) can be produced to improve SIR (performance) of separated signals.
A fifth embodiment uses the distance ratio or spherical surface information which is estimated therefrom as positional information. A functional arrangement therefor is shown in
In the similar manner as the preceding embodiments, the time domain observed signals are transformed into frequency domain signals as indicated in
A permutation matrix which is to operate upon the separation matrix W(ω) is produced using such spherical surface information so that the sequence in which such information appears assumes a predetermined sequence, and the permutation of rows of the separation matrix W(ω) takes place (step S54). In the third embodiment, this processing operation has taken place in the permutation solver 17, but in the fifth embodiment, only spherical surface information is used. If there is a frequency for which the permutation problem cannot be solved with spherical surface information alone, the permutation problem is solved according to the correlation method mentioned above for such frequency (step S55).
A separation experiment has been conducted for twelve combinations mixed voices which are of impulse responses which are measured in the experiment room convoluted with voices of four speakers by using the microphones 16 and 18 disposed within the room in the manner shown in
A result of an experiment which used the approach according to the position of the sound sources and the approach according to the correlation in combination under the conditions shown in
With the fifth embodiment, spherical surface information is determined according to the equation (10′), and accordingly, the amount of calculation is reduced. The distance ratio DRi,jj′(ω) is preferred as spherical surface information.
A sixth embodiment intends to solve the permutation problem based on a single estimated conical surface, for example. Conical surfaces {circumflex over (θ)}i,jj′(ω), . . . , {circumflex over (θ)}i,jj′(ωN) which are estimated in the conical surface estimator 14 are directly input to the permutation solver 17 as indicated in broken lines in
In this instance, a single conical surface may only be estimated for each signal source i at step S14. While not shown, when a permutation of rows failed, a permutation of rows may be performed according to the correlation method which is mentioned above.
According to the sixth embodiment, the scaling problem is simply eliminated by forming a ratio of two elements in each column of the inverse matrix H(ω), and what is required is the calculation according to the equation (9′), whereby a calculation time can be reduced.
Summary of Signal Separation
A method of producing a separation matrix after the permutation problem has been solved during the blind signal separation is illustrated in
In
In
In addition, there is an approach which is shown in
The signal separation may take place using the frequency domain separation matrix W(ω) and the observed signal X(ω), and subsequently separated frequency domain signal Y(ω) may be transformed into the time domain signal y(t) in the third, the fifth and the sixth embodiment in the similar manner as described above in connection with the second embodiment.
In the description of the first embodiment, the estimation of the arrival direction of the signal in two dimensions is made, but it is also applicable to the estimation of the arrival direction of the signal in three dimensions as mentioned above in connection with the second embodiment. The second to the sixth embodiment are applicable to the signal separation in two dimensions. In this instance, the estimated conical surface {circumflex over (θ)}i,jj′(ω) assumes two directions which are located symmetrically with respect to the sensor axis of the sensor pair which is used in the estimation, and the estimated spherical surface Ri,jj′(ω) or DRi,jj′(ω) will be the radius of the circle or a substantial equivalent thereto.
The apparatus shown in
Number | Date | Country | Kind |
---|---|---|---|
2003-057070 | Mar 2003 | JP | national |
2003-297580 | Aug 2003 | JP | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP2004/002610 | 3/3/2004 | WO | 00 | 9/30/2004 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2004/079388 | 9/16/2004 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
6243471 | Brandstein et al. | Jun 2001 | B1 |
6625587 | Erten et al. | Sep 2003 | B1 |
Number | Date | Country |
---|---|---|
2000-242624 | Sep 2000 | JP |
2002-084593 | Mar 2002 | JP |
2003-090871 | Mar 2003 | JP |
2003-099093 | Apr 2003 | JP |
2004-145172 | May 2004 | JP |
98 58450 | Dec 1998 | WO |
00 54404 | Sep 2000 | WO |
Number | Date | Country | |
---|---|---|---|
20050203981 A1 | Sep 2005 | US |