The present invention relates to a beam-forming device that carries out beamforming in order to acquire a signal in which a target signal is enhanced from a plurality of microphone signals.
In order to construct a call system, such as a vehicle-mounted handsfree system, in a high-noise environment or an environment where a plurality of signal sources exist, a technique of separating and extracting only a signal of a specific signal source (speaker) is required. A beamformer is provided as one example of this technique. The beamformer enhances a signal in a target direction by adding signals of multiple channels provided by a microarray, and includes a fixed beamformer and an adaptive beamformer.
The simplest fixed beamformer is based on Delay and Sum, and is comprised of microphones 901 and 902 of two channels, a signal delaying unit 903, and a delay summing unit 904, as shown in
In order to improve the directivity in a low frequency region, it is necessary to lengthen the array length of the entire microphone array. For example, when a main lobe is made to have directivity of about ±10 degrees for a 1000-Hz sound, it is necessary to make the array length be about 2 m. A further problem is that when the array length is increased by simply lengthening the intervals of the microphone array, a grating lobe occurs in a direction other than the target direction, and the directivity degrades (refer to nonpatent reference 1). Therefore, another problem is that in order to suppress the grating lobe and maintain the directivity in the low frequency region, it is necessary to arrange a large number of microphones densely, and hence the fixed beamformer costs highly.
In contrast with this, the adaptive beamformer is based on a method of forming directivity in such a way that a noise sound source is located in a blind spot while holding the sensitivity in a target direction at a constant level, and is effective also for a low frequency region and can carryout noise suppression in a reverberation environment. Although there are various methods for use in the adaptive beamformer, there is a generalized sidelobe canceller (GSC) as one of methods which can be assumed to be an extension of the Delay and Sum. The generalized sidelobe canceller is a beamformer that suppresses noise by using a fixed beamformer and an adaptive filter, and a typical Griffith-Jim type GSC using microphones of two channels is constructed as shown in
It is considered that only a noise component in which a target signal is subtracted remains in the output of the subtraction-type beamformer, and the noise component can be removed from the result of the Delay and Sum by applying the output as an input to the adaptive filter. A problem is, however, that only the simple subtraction cannot sufficiently remove the target signal in many cases, and the adaptive filter cannot sufficiently remove the noise, but ends up removing the target signal.
As a measure against this problem, in a device disclosed by patent reference 1, a target sound blocker is constructed of an adaptive filter using an output of a fixed beamformer and microphone inputs, and is constructed in such a way as to remove a target signal from each of the microphone inputs. Because a signal from which the target sound is removed more sufficiently as compared with a simple subtraction-type beamformer is acquired, the noise suppression performance of the adaptive filter in the next stage can be improved.
Patent reference 1: Japanese Unexamined Patent Application Publication No. H 08-122424
Nonpatent reference 1: “Acoustic systems and digital technology” written by Ohga Juro, Yamazaki Yoshio, Kaneda Yutaka, First Edition, The Institute of Electronics, Information and Communication Engineers, Mar. 25, 1995, pp. 181-186
A problem with the technique disclosed by above-mentioned patent reference 1 is that because the SN ratio (Signal to Noise Ratio) is improved by synchronizing the phases of a plurality of input signals by using a fixed FIR (Finite Impulse Response) filter or the like in a fixed beamformer, when the phase shift or the intensity differs or changes for each frequency band dependently upon a sound field environment, the phases cannot be synchronized with a high degree of accuracy and the performance of phase synchronization degrades.
The present invention is made in order to solve the above-mentioned problem, and it is therefore an object of the present invention to provide an improvement in the accuracy of phase synchronization of a plurality of input signals, and acquire an output signal having an improved SN ratio.
In accordance with the present invention, there is provided a beam-forming device including: a first target sound blocker and a second target sound blocker that remove a target signal having a correlation mutually from a first sound signal and a second sound signal into which sounds collected by different microphones are converted respectively; a phase synchronizer that synchronizes the phases of the first sound signal and the second sound signal and synthesizes these sound signals by using information acquired when the first target sound blocker removes the target signal; and a noise learner that learns a noise component included in an output signal of the phase synchronizer from signals from which the target signal is removed by the first target sound blocker and the second target sound blocker.
According to the present invention, synchronization of the phases of the plurality of input signals can be carried out with a high degree of accuracy, and an output signal having an improved SN ratio can be acquired.
Hereafter, in order to explain this invention in greater detail, the preferred embodiments of the present invention will be described with reference to the accompanying drawings.
The beam-forming device in accordance with Embodiment 1 is comprised of a first microphone 101, a second microphone 102, a first target sound blocker 103, a second target sound blocker 104, a phase synchronizer 105, and a noise learner 106.
The first microphone 101 and the second microphone 102 convert an external sound into electric signals (a first sound signal and a second sound signal). The first target sound blocker 103 performs a process of blocking a target sound from a signal of the first microphone 101 by using a signal of the second microphone 102. The second target sound blocker 104 performs a process of blocking the target sound from the signal of the second microphone 102 by using the signal of the first microphone 101. The phase synchronizer 105 carries out phase synchronization between the input signals inputted thereto from the first microphone 101 and the second microphone 102 by using a processed result inputted thereto from the first target sound blocker 103. The noise learner 106 learns a noise component from an output signal of the phase synchronizer 105 by using a signal which is a mixture of signals outputted from the first target sound blocker 103 and the second target sound blocker 104.
Next, the operation of the beam-forming device in accordance with this Embodiment 1 will be explained.
Hereafter, an explanation will be made by taking, as an example, a case in which an adaptive filter using LMS (Least Mean Squares filter) is used for each of the first target sound blocker 103 and the second target sound blocker 104.
As shown in
When the signal of the first microphone 101 at a time n is expressed by x1(n) and the signal of the second microphone 102 at the time n is expressed by x2(n), the output of the first target sound blocker 103 is expressed by y1(n), and the filter coefficient of the LMS adaptive filter of the first target sound blocker 103 is expressed by F(n)=[h0(n), h1(n), . . . , hp-1(n)]T, a signal e1(n) after sound removal is determined by using the following equations (1) to (3).
X2(n)=[x2(n),x2(n−1), . . . , x2(n−p−1)]T (1)
e1(n)=x1(n)−y1(n)=x1(n)−FT(n)·X2(n) (2)
F(n+1)=F(n)+μ·e1(n)·X2(n) (3)
In the equation (3), μ is a constant for determining a learning speed and is a positive value smaller than 1. In the equation (1), p is the length of the LMS adaptive filter. In the equations (1) and (2), T shows a transposed matrix. As the length p of the LMS adaptive filter, a length of the order in which a sound signal has a correlation is used. Because the learning of the filter coefficient easily advances when the LMS adaptive filter has strong power, the learning advances during a sound interval, and a sound signal can be easily removed from the signal x1 of the first microphone 101.
Similarly, the second target sound blocker 104 receives, as its input, signals from the signal x2 of the second microphone 102 to the signal x1 of the first microphone 101, and determines a residual signal by using the LMS adaptive filter. As a result, the signal (target signal) included in both the second microphone 102 and the first microphone 101 and having a correlation can be removed from the signal x2 of the second microphone 102.
On the other hand, the phase synchronizer 105 synthesizes the signal x1 of the first microphone 101 and the signal x2 of the second microphone 102 by making them pass through an FIR filter. In this case, as the coefficient of the FIR filter, the filter coefficient F(n) of the LMS adaptive filter which the first target sound blocker 103 has learned is set up. Because the filter coefficient F(n) which has been learned by the first target sound blocker 103 is the one which is learned in such a way that the phase of the signal x2 of the second microphone 102 is synchronized with that of the signal x1 of the first microphone 101, a signal whose phase is synchronized with the signal x1 of the first microphone 101 can be acquired by convolving the filter coefficient with the signal x2 of the second microphone 102. More specifically, the signal x1 of the first microphone 101 and the signal which is acquired by convolving the filter coefficient F(n) which the first target sound blocker 103 has learned with the signal x2 of the second microphone 102 are added and averaged. The output signal z (n) of the phase synchronizer 105 at the time n is expressed by the following equation (4).
z(n)=(x1(n)+FT(n)·X2(n))/2 (4)
Through the process by the phase synchronizer 105, beamforming of further enhancing the sound as compared with the Delay and Sum shown in the conventional example can be implemented.
Further, the output signal y1 of the first target sound blocker 103 and the output signal y2 of the second target sound blocker 104 are added to generate a noise signal noise, and this noise signal is inputted to the noise learner 106. The noise learner 106 receives this noise signal noise as its input, and learns a noise component included in the output signal z of the phase synchronizer 105 by using an NLMS (Normalized Least Mean Squares filter) adaptive filter that assumes the output signal z of the phase synchronizer 105 as the target signal. By subtracting the output signal of the noise learner 106 from the output signal z of the phase synchronizer 105, a signal e from which the noise is removed can be acquired.
When a sum signal which is the sum of the output signal y1 of the first target sound blocker 103(n) at the time n and the output signal y2 of the second target sound blocker 104(n) at the time n is expressed by noise(n), and the filter coefficient is expressed by FN(n)=[hn0(n), hn1(n), . . . , hnp-1(n)]T, the signal e(n) after noise removal is calculated according to the following equations (5) to (7).
N(n)=[noise(n),noise(n−1), . . . ,noise(n−p−1)]T (5)
e(n)=z(n)−FNT(n)·N(n) (6)
FN(n+1)=FN(n)+μ·ne(n)·N(n)/NT(n)N(n) (7)
Although the example of using LMS as the adaptive filter of each of the first target sound blocker 103 and the second target sound blocker 104 and using NLMS as the adaptive filter of the noise learner 106 is shown in the above-mentioned explanation, each of the adaptive filters can be alternatively constructed by using another adaptive filter, such as RLS (Recursive Least Squares) or an affine projection filter.
As mentioned above, because the beam-forming device in accordance with this Embodiment 1 is constructed in such a way as to apply the filter coefficient which the first target sound blocker 103 has learned as the filter coefficient of the phase synchronizer 105, a signal having a better SN ratio compared with those provided by a generalized sidelobe canceller (GSC) and a fixed beamformer can be acquired from the phase synchronizer 105. Further, because the coefficient acquired in the arithmetic process by the first target sound blocker 103 can be applied as the filter coefficient of the phase synchronizer 105, the phase synchronization process can be performed efficiently.
In addition, because the beam-forming device in accordance with this Embodiment 1 is constructed in such a way that the noise learner 106 learns the noise component included in the output signal of the phase synchronizer 105 and subtracts the learned noise component, the noise can be suppressed and a signal having an improved SN ratio can be acquired.
Hereafter, the same components as those of the beam-forming device in accordance with Embodiment 1 or like components are designated by the same reference numerals as those used in Embodiment 1, and the explanation of the components will be omitted or simplified.
The first target sound blocker 103′ is comprised of an adaptive filter, and estimates a noise component y1 included in a signal x1 of a first microphone 101 from the signal x1 of the first microphone 101 and a signal x2 of a second microphone 102. By removing the estimated noise component y2 from the signal x1 of the first microphone 101, a signal e1 after sound removal is acquired. The second target sound blocker 104′ is comprised of an adaptive filter, and estimates a noise component y2 included in the signal x2 of the second microphone 102 from the signal x1 of the first microphone 101 and the signal x2 of the second microphone 102. By removing the estimated noise component y2 from the signal x2 of the second microphone 102, a signal e2 after sound removal is acquired.
The gain adjuster 107a adjusts the gain of the output signal y1 of the first target sound blocker 103′, and the synthesizer 107b subtracts the signal whose gain is adjusted from the signal x1 of the first microphone 101. As a result, a signal which is the same as the output signal z of the phase synchronizer 105 in accordance with Embodiment 1 is acquired. A noise learner 106 learns a noise component from the output signal z after gain adjustment by using a sum signal which is the sum of the signal e1 after sound removal of the first target sound blocker 103′ and the signal e2 after sound removal of the second target sound blocker 104′. By subtracting an output signal of the noise learner 106 from the output signal z after gain adjustment, a signal e from which noise is removed can be acquired.
Although the example of performing a convolution operation by using the FIR filter in the phase synchronizer 105 is shown in above-mentioned Embodiment 1, the convolution operation using the FIR filter becomes unnecessary when an adaptive filter is used for each of the first target sound blocker 103′ and the second target sound blocker 104′, as shown in this Embodiment 2, an output signal z (n) can be acquired by using the output of the first target sound blocker 103′ and the gain adjuster 107a according to the following equations (8) and (9) that are calculated on the basis of the above-mentioned equations (2) and (4).
First, the following equation (8) is acquired from the above-mentioned equation (2).
FT(n)·X2(n)=x1(n)−e1(n) (8)
Using the above-mentioned equations (4) and (8), the output signal z (n) is expressed by the signal x1(n) of the first microphone 101 and the signal e1(n) after sound removal on which the gain adjustment is performed, as shown in the following equations (9).
As shown in the equation (9), after the signal e1(n) after sound removal is outputted to the gain adjuster 107a and the gain adjuster 107a adjusts the gain of the signal e1(n) to ½, the output signal z(n) is acquired by subtracting the signal from the signal x1(n) of the first microphone 101. Although the case in which the gain in the gain adjuster 107a is set to ½ in the equation (9) in order to acquire the same result as that acquired in above-mentioned Embodiment 1 is shown, the numerical value can be changed properly according to the gain balance between the first microphone 101 and the second microphone 102, etc.
As mentioned above, because the beam-forming device in accordance with this Embodiment 2 is constructed in such a way that the noise component included in the signal of the first microphone 101 and the signal of the second microphone 102 is estimated by using adaptive filters as the first target sound blocker 103′ and the second target sound blocker 104′, and the gain adjuster 107a adjusts the gain of the signal after sound removal and subtracts this signal from the signal of the first microphone 101, it is not necessary to dispose an FIR filter for performing phase synchronization, and the amount of computations can be reduced.
Although the structure equipped with the following two microphones: the first microphone 101 and the second microphone 102 is shown in above-mentioned Embodiments 1 and 2, a beam-forming device in which the number of microphones is increased to N which is three or more will be explained in this Embodiment 3.
The beam-forming device in accordance with Embodiment 3 is comprised of an array microphone unit 108, a target sound blocking pair collective unit 109, a phase synchronizer 105, and a noise learner 106.
The array microphone unit 108 is comprised of the following N microphones: a first microphone 108A, a second microphone 108B, . . . , and an Nth microphone 108N. Each of the microphones 108A, 108B, . . . , and 108N converts an external sound into an electric signal. The target sound blocking pair collective unit 109 is provided with N−1 target sound blocking pairs with respect to the number N of microphones. In the example of
The first target sound blocking pair 109A is comprised of a first input target sound blocker 111A and a second input target sound blocker 112A. The first input target sound blocker 111A blocks the target sound from the signal x1 of the first microphone 108A, and outputs information for performing phase synchronization in the phase synchronizer 105. The second input target sound blocker 112A blocks the target sound from the signal x2 of the second microphone 108B, and outputs a signal for learning noise in the noise learner 106.
The phase synchronizer 105 performs phase synchronization on signals inputted thereto from the N microphones 108A, 108B, . . . , and 108N by using results inputted thereto from the N−1 target sound blocking pairs 109A, 109B, . . . , and 109 (N−1). The noise learner 106 learns a noise component from an output signal of the phase synchronizer 105 by using a sum signal which is the sum of the signals outputted from the N−1 target sound blocking pairs 109A, 109B, . . . , and 109 (N−1).
The first input target sound blocker 111K in the Kth target sound blocking pair 109K (1≦K≦N−1) performs a learning process of removing the target signal from the signal x1 of the first microphone 108A by using an adaptive filter according to NLMS, as shown in the following equations (10) to (12), like the above-mentioned equations (1) to (3), with the signal x1 of the first microphone 108A being set as a teacher signal and the signal xK+1 of the (K+1)th microphone being set as an input signal.
XK(n)=[xK(n),xK(n−1), . . . ,xK(n−p−1)]T (10)
(n)=x1(n)−y1K(n)=x1(n)−FKT(n)·XK(n) (11)
FK(n+1)=FK(n)+μ·e1K(n)·XK(n) (12)
In the above-mentioned equations (10) to (12), XK is the signal xK+1 of the (K+1)th microphone, FK is a filter coefficient of NLMS, and y1K is a residual signal in NLMS.
On the other hand, the second input target sound blocker 112K in the Kth target sound blocking pair 109K performs a learning process reverse to that shown by the above-mentioned equations (10) to (12) according to the following equations (13) to (15) with the signal x1 of the first microphone 108A being set as an input signal and the signal xK+1 of the (K+1)th microphone being set as a teacher signal.
X1(n)=[x1(n),x1(n−1), . . . ,x1(n−p−1)]T (13)
eK(n)=xK(n)−yK(n)=xK(n)−F1KT(n)·x1(n) (14)
F1K(n+1)=F1K(n)+μ·eK(n)·X1(n) (15)
In the above-mentioned equations (13) to (15), X1 is the signal of the first microphone 101, F1K is the filter coefficient of NLMS, and yK is an output signal of the Kth target sound blocking pair 109K, i.e., a residual signal.
The phase synchronizer 105 adds a signal which the phase synchronizer acquires by carrying out convolution on an output signal of the first input target sound blocker 111A, i.e., output signals of microphones from the second microphone 108B to the Nth microphone by using an FIR filter having FK as a coefficient to the signal x1 of the first microphone 108A.
The noise learner 106 receives, as its input, a noise signal noise which is the sum of the output signals y1, y2, . . . , yN-1 which are outputted from the second input target sound blockers 112A, 112B, . . . , and 112(N−1) of the first through (N−1)th target sound blocking pairs 109A, 109B, . . . , and 109 (N−1) and in which the target sound is blocked, and learns the noise component included in the output signal z of the phase synchronizer 105 by using an NLMS adaptive filter that assumes the output signal z of the phase synchronizer 105 as the target signal. By subtracting the output signal of the noise learner 106 from the signal of the phase synchronizer 105, a signal e from which the noise is removed can be acquired.
As mentioned above, because the beam-forming device in accordance with Embodiment 3 is constructed in such a way that the beam-forming device includes the array microphone unit 108 comprised of the N microphones whose number is three or more, and the target sound blocking pair collective unit 109 comprised of the N−1 target sound blocking pairs, and each of the target sound blocking pairs includes the first input target sound blocker that receives a signal of a representative microphone and signals of the other microphones as its input, and removes a target signal from the signal of the representative microphone, and the second input target sound blocker that removes the target signal from the input signal of each of the other microphones, the device equipped with the three or more microphones, too, can improve the accuracy of phase synchronization. Further, efficient phase synchronization can be carried out.
Although the example of constructing the target sound blocking pair collective unit 109 by using both the signal of the first microphone 108A which is the representative microphone and the signals of the other microphones 108B, . . . , and 108N is shown in above-mentioned Embodiment 3, the representative microphone can alternatively consist of a microphone other than the first microphone 108A. For example, switching among the microphones, such as a selection of a microphone having the highest SN ratio as the representative microphone, can be carried out according to surrounding conditions.
Further, although the example of using LMS as each adaptive filter is shown in above-mentioned Embodiment 3, each adaptive filter can be alternatively constructed by using another algorithm, such as NLMS or an affine projection filter.
The sound interval detector 120 receives a signal of a first microphone 101 and a signal of a second microphone 102 as its input, and detects a sound interval of each of the inputted signals. A known technique can be applied to the detection of a sound interval. For example, a detection technique which a sound interval discriminating device, disclosed by reference 1 shown below, uses can be applied.
Reference 1: Japanese Unexamined Patent Application Publication No. Hei 10-171487
A first target sound blocker 103 and a second target sound blocker 104 can be constructed in such a way as to refer to the detection results of the sound interval detector 120, and, when the detection results showing that it is a sound interval are inputted, perform a learning process of learning an adaptive filter; otherwise, not perform the learning process of learning the adaptive filter.
As mentioned above, because the beam-forming device in accordance with Embodiment 4 is constructed in such a way that the beam-forming device includes the sound interval detector 120 that detects a sound interval of each of the signals of the first and second microphones 101 and 102, and the first and second target sound blockers 103 and 104 refer to the detection results of the sound interval detector 120, and, only when the detection results showing that it is a sound interval are inputted, perform the learning process of learning the adaptive filter, erroneous learning of the adaptive filter can be prevented and the filter coefficient can be learned with a higher degree of accuracy.
Although the example of applying the sound interval detector 120 to the beam-forming device shown in Embodiment 1 is shown in above-mentioned Embodiment 4, the sound interval detector can also be applied to the beam-forming device shown in Embodiments 2 and 3.
While the invention has been described in its preferred embodiments, it is to be understood that an arbitrary combination of two or more of the above-mentioned embodiments can be made, various changes can be made in an arbitrary component in accordance with any one of the above-mentioned embodiments, and an arbitrary component in accordance with any one of the above-mentioned embodiments can be omitted within the scope of the invention.
Because the beam-forming device in accordance with the present invention can carry out phase synchronization in a fixed beamformer with a high degree of accuracy, the beam-forming device is suitable for use in a sound system having a function of carrying out high-accuracy beamforming which is not affected by variations in a sound field environment.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP2012/069997 | 8/6/2012 | WO | 00 | 12/30/2014 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2014/024248 | 2/13/2014 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
20070076898 | Sarroukh | Apr 2007 | A1 |
20110313763 | Amada | Dec 2011 | A1 |
20120243559 | Pan | Sep 2012 | A1 |
20130114835 | Holmberg | May 2013 | A1 |
20130271902 | Lai | Oct 2013 | A1 |
20150181329 | Mikami | Jun 2015 | A1 |
Number | Date | Country |
---|---|---|
8 122424 | May 1996 | JP |
10 171487 | Jun 1998 | JP |
2006 217649 | Aug 2006 | JP |
2010 152021 | Jul 2010 | JP |
2010 160245 | Jul 2010 | JP |
2010 109708 | Sep 2010 | WO |
Entry |
---|
Ohga, J. et al., “Acoustic systems and digital technology”, The Institute of Electronics, Information and Communication Engineers, pp. 181-186, (Mar. 25, 1995). |
International Search Report Issued Nov. 13, 2012 in PCT/JP12/069997 Filed Aug. 6, 2012. |
Number | Date | Country | |
---|---|---|---|
20150181329 A1 | Jun 2015 | US |