1. Field of the Invention
The present invention relates to a microphone system that executes an adaptive signal processing by using signals outputted from two microphones and outputs a speaker's voice signal with the signal to noise ratio improved.
2. Related Art
The technological development of voice recognition systems at present has evolved to such a level that a recognition rate of about 95% can be achieved in an environment that the SN (signal to noise) ratio of more than 15 dB is obtained. However, the conventional voice recognition system has the property that as the SN ratio is lowered by the surrounding noises, the recognition rate sharply decreases.
Accordingly, inside a car's passenger compartment filled with various noises (engine noise, road noise, pattern noise, whistling noise, etc.) that a running car creates, the deterioration of the foregoing recognition capability is unavoidable. This is a significant problem when incorporating a voice recognition system in a car.
In view of these circumstances, various systems have been proposed which reduce the influence by the surrounding noises on receiving the voice with a high SN ratio, in which can be quoted the high SN ratio voice reception system using plural microphones and digital signal processing as an example. The most simple configuration of such a high SN ratio voice reception system is illustrated in
In
4 signifies a target response setter that receives a signal outputted from the microphone 1 as the target signal to satisfy the causality. When the signal delay time of half the tap length of the adaptive filter 3b is given by d, the target response setter 4 has a delay characteristic of the time d, and flat characteristic (characteristics of the gain 1) in the audio frequency band. That is, the target response setter 4 is provided with the flat frequency response characteristics of the gain 1 as shown in
Returning to
During the non-recognition of a voice, the microphones 1, 2 receive only noises, and the adaptive signal processor 3 determines the filter coefficient W so as to minimize the power, namely, the noise output of the error signal e. On the other hand, during the recognition of a voice, the adaptive signal processor 3 does not update the filter coefficient, and sets the filter coefficient W determined during the non-recognition of a voice to the adaptive filter 3b to output a voice signal.
The ideal characteristic desired for the system shown in
En(z)=Xn1(z)z−d−Xn2(z)·W(z) (1)
by determining the adjustable parameters (coefficient W of the adaptive filter 3b) so as to minimize the power of the error signal e, to realize the following expression (2) is the ideal condition to obtain.
Es(z)=Xs1(z)z−d−Xs2(z)W(z)≈Xs(z) (2)
Here, Xn1(z), Xn2(z) are the noises contained in the output signals from the microphones 1, 2, and given that the propagation characteristics from a noise source (noise=xn) to the first and second microphones 1, 2 are CN1, CN2,
Xn1(z)=CN1·xn
Xn2(z)=CN2·xn
expression (1) is reduced to the following.
En(z)=(CN1·z−d−CN2·W(z))xn (1′)
Further, Xs1(z), Xs2(z) are the voice signals contained in the output signals from the microphones 1, 2, and given that the propagation characteristics from the mouth of a speaker (speaker's voice=xs) to the first and second microphones 1, 2 are CS1, CS2,
Xs1(z)=CS1·xs
Xs2(z)=CS2·xs
expression (2) is reduced to the following.
Es(z)=(CS1·z−d−CS2·W(z))xs (2′)
Here, considering the actual conditions in a car passenger compartment, there are many noise sources and the coherence of the noises in the car that the microphones 1, 2 pick up is inclined to decrease, as the distance between the microphones 1, 2 is set larger. Accordingly, as the two microphones 1, 2 are moved further apart, the noise output expressed by the equation (1) becomes greater, so that the microphones 1, 2 need to be laid out as close together as possible.
However, if they are laid out as close together as possible, the two microphones 1, 2 will likely receive the voice and noise having virtually the same level and components. If the noise is eliminated by the adaptive filter coefficient W determined in the optimum condition to remove the noise, even the voice will be eliminated. However, if the adaptive filter coefficient W is determined so as to satisfy the expression (2), the voice will not be damaged, but on the other hand, the noise will hardly be eliminated either and the SN ratio will hardly be improved, which is a problem to be solved.
Thus, in pursuit of achieving the maximum suppression of the noises, it is desirable to lay out the two microphones adjacently. On the other hand, in order to minimize the suppression of the voice, it is desirable that the two microphones are separated far from each other. Both of the two conditions cannot be satisfied at the same time. Therefore, in the conventional microphone system, the SN ratio of the voice signal cannot be improved significantly, which is disadvantageous.
Therefore, it is an object of the invention to provide a microphone system (noise reduction system) using two microphones that improves the SN ratio of the voice signal.
According to one aspect of the invention to accomplish the object, the microphone system executes an adaptive signal processing by using output signals from two microphones and outputs a speaker's voice signal with an improved SN ratio, in which the two microphones having directional characteristics are laid out close to each other, and the angles formed by the orientations of the microphones and a speaker's vocalizing direction are made different for each of the microphones.
With this configuration, in spite of the close layout of the two microphones, one microphone can pick up the speaker's voice with a high SN ratio, and the other microphone can pick up the speakers voice with a low SN ratio. On the other hand, since the close layout of the microphones restricts the decrease of the coherence between the noises outputted from the two microphones, the correlation between the reception noises by the microphones can be increased, and the difference between the reception sensitivities to a voice by the microphones can be enlarged, thereby improving the SN ratio of the voice signal.
As an example of the microphone layout, the two microphones are mounted adjacently on the sun visor, or on the ceiling above the driver's assistant seat (i.e., front passenger seat) or the driver's seat of a vehicle, with the angles formed by the orientations of the microphones and the speaker's vocalizing direction made different.
Further, according to another aspect of the invention, the microphone system executes the adaptive signal processing by using the output signals from the two microphones and outputs the speaker's voice signal with an improved SN ratio, in which the microphones are laid out adjacently, and the SN ratio of the output signal from one microphone is raised, and the SN ratio of the output signal from the other microphone is decreased.
With this configuration, the noises Xn1(z), Xn2(z) contained in the output signals of the two microphones can be made almost equal. On the other hand, the voice signals Xs1(z), Xs2(z) contained in the output signals of the two microphones can be differentiated. Therefore, when the adaptive filter coefficients are determined to minimize the root mean square of En(z) during the noise signal input, the voice output Es(z) given by the expression (2) does not become zero, thus improving the SN ratio of the voice signal.
As an example of the microphone layout, one microphone is disposed right above a speaker's face, and the other microphone is spaced apart on the occipital side by about 1 to 5 cm from the position of the first microphone. With this configuration, in spite of the adjacent positioning of the two microphones, one microphone can pick up the speaker's voice with as high an SN ratio as possible, and the other microphone can pick up the speaker's voice with as low an SN ratio as possible.
Principle of the Invention
In a noise reduction system using two microphones, it is ideal to intensify the correlation between the reception noises of the microphones, and in addition to increase the difference between the reception sensitivities to a voice of the microphones. However, there is a trade-off between “the correlation between the reception noises” and “the difference between the reception sensitivities to a voice” of the two microphones, and to satisfy the one by adjusting the distance will not satisfy the other accordingly. For example, as the two microphones are moved closer, the correlation between the reception noises is increased but at the same time, the difference between the reception sensitivities to a voice is also diminished, resulting in receiving the voice equally. Therefore, if the adaptive signal processing is executed, the noise will be suppressed, but the voice will also be suppressed at the same time, and consequently the improvement of the SN ratio cannot be expected.
In the present invention, two microphones having directional characteristics are laid out adjacently, and the angles formed by the orientations of the microphones with respect to the speaker's vocalizing direction are different for each microphone. With the microphones positioned in this manner, although the two microphones are laid out adjacently, the configuration of the two can be set such that one microphone picks up the speaker's voice with a high SN ratio, and the other one picks up the speaker's voice with a low SN ratio. Accordingly, the close placement of the two microphones enhances the correlation between the reception noises as well as increases the difference between the reception sensitivities of the two microphones to a voice, which improves the SN ratio of the voice signal.
Further, in this invention, the relatively adjacent layout of the microphones 11, 12 restricts the decrease of the coherence between the noises outputted from the two microphones. Also, in consideration of the voice emission characteristics of a human being, in spite of the relatively adjacent layout of the microphones 11, 12, one microphone 11 picks up the voice with as high an SN ratio as possible, and the other microphone 12 picks up the voice with as low an SN ratio as possible. As the result, if the adaptive filter coefficient W is determined so as to zero the noise output, the voice output will not be diminished in the same manner as the noise output, whereby the SN ratio of the voice signal can be improved.
(a) Configuration of the Microphone System
E(θ)=E0(1+cosθ)/2
and the sensitivity of the microphone decreases as the direction of the microphone deviates from the orientation θ=0°.
As an example, the first and second microphones 11, 12 in
3 signifies an adaptive signal processor which receives an error signal e and an output signal X2 from the microphone 12 as the reference signal, and executes the adaptive signal processing on the basis of the LMS (Least Mean Square) algorithm so as to minimize the power of the error signal e. In the adaptive signal processor 3, 3a signifies an LMS calculator, 3b an adaptive filter with a configuration of the FIR type digital filter. The LMS calculator 3a determines the coefficients of the adaptive filter 3b so as to minimize the power of the error signal e by the adaptive signal processing. The adaptive signal processor 3 determines the coefficient W of the adaptive filter 3b only during the non-recognition of a voice, by the adaptive signal processing. During the recognition of a voice, the adaptive signal processor 3 does not update the filter coefficient, and sets the filter coefficient W determined during the non-recognition of a voice to the adaptive filter 3b.
4 signifies a target response setter that receives a signal outputted from the microphone 11 as the target signal, and has a delay characteristic of the time d and flat characteristics (characteristics of the gain 1) in the audio frequency band. 5 signifies a subtracter that subtracts the output signal of the adaptive filter 3b from a target response outputted from the target response setter 4, and outputs the error signal e.
According to the layout of the microphones in
(b) Operation
During the non-recognition of a voice, when only noises are inputted to the microphones 11, 12, the adaptive signal processor 3 determines the filter coefficient W of the adaptive filter 3b so as to minimize the power of the error signal e by the adaptive signal processing. Ideally, the filter coefficient W(z) is reduced to the following.
W(z)=CN1·z−d/CN2 (3)
On the other hand, during the recognition of a voice, the adaptive signal processor 3 does not update the filter coefficient, and sets the filter coefficient W(z) determined during the non-recognition of a voice to the adaptive filter 3b to output a voice signal. As the result, the voice signal is reduced to the following expression, from the expressions (2)′ and (3).
Provided that CN1≈CN2 is met by the adjacent layout of the microphones, the voice signal Es(z) of the expression (4) is given by the following expression:
Es(z)=(CS1−CS2)·z−d·xs (4)′
From the sensitivity difference of the microphones 11, 12 and the voice emission characteristics, CS1≠CS2 is given; accordingly, the voice signal Es(z) will not be reduced to zero. In other words, even when the adaptive filter coefficient W(z) is determined so as to minimize the power of the error signal e during the noise input, the voice signal Es(z) of the expression (4) is not reduced to zero, and the SN ratio of the voice signal can be improved. And, when CN1≈CN2 is met, the magnitude of the voice signal Es(z) depends mainly on the difference of (CS1−CS2), namely, the difference between the sensitivities of the microphones 11, 12.
(c) Examination of the Microphone Layout and the SN Ratio Improvement Rate
Thus, to improve the SN ratio, the fundamental philosophy is that, while receiving a noise having a correlation as high as possible two microphones, the voice should be received only by one microphone as much as possible. Based on this fundamental philosophy, the optimum microphone layout was examined. As the place where the microphones are mounted, (1) the sun visor of a car and (2) the ceiling above the front passenger seat of a car are selected.
(c-1) Layout of the Microphones
a) illustrates a layout with the microphones mounted on the sun visor, in which the first and second microphones 11, 12 are spaced apart with a distance d on the sun visor (not illustrated) in front of the speaker 10, the orientation of the first microphone 11 is fixed to coincide with the speaker's vocalizing direction, and the orientation of the second microphone 12 is set with the angle θ against the speaker's vocalizing direction. The vertical distance H from the speaker's mouth to the microphones, and the horizontal distance D from the speaker's mouth to the microphones are constant, both of which are approximately 30 cm. In the examination of the SN ratio improvement rate,
(1) the positions of the first and second microphones 11, 12 are fixed, and the orientation of the second microphone 12 is varied (refer to
(2) the orientations of the first and second microphones 11, 12 are fixed, and the position of the second microphone 12 is moved to vary the distance between the microphones (refer to
b) illustrates a layout with the microphones mounted on the ceiling above the front passenger seat, in which the first and second microphones 11, 12 are spaced apart a distance d longitudinally on the ceiling above the driver's seat, and the orientations of the first and second microphones 11, 12 are set perpendicularly or with a specific angle θ to the speaker's vocalizing direction. The vertical distance H and horizontal distance D from the speaker's mouth to the microphones are constant, both of which are approximately 30 cm. In the examination of the SN ratio improvement rate,
(3) the orientations of the first and second microphones 11, 12 are set perpendicularly to the speaker's vocalizing direction, and the position of the second microphone 12 is moved (refer to
(4) the orientation of the first microphone 11 is fixed to form the angle θ with respect to the direction perpendicular to the speaker's vocalizing direction (set to face to the speaker's mouth), while the orientation of the second microphone 12 is set perpendicularly to the speaker's vocalizing direction, and the position of the second microphone is varied (refer to
(c-2) Result of the Examination
(1)
(2)
(3)
(4)
Thus, by adapting the microphone layouts as in the cases (1) through (4), the SN ratio can be improved about 4 to 5 dB. This improvement of the SN ratio will enhance the recognition rate to a great extent.
In
(a) Configuration of the Microphone System
(a) Configuration of the Microphone System
3 signifies an adaptive signal processor which receives an error signal e and an output signal x2 from the microphone 12 as the reference signal, and executes the adaptive signal processing on the basis of the LMS (Least Mean Square) algorithm so as to minimize the power of the error signal e. In the adaptive signal processor 3, 3a signifies an LMS calculator, 3b an adaptive filter with a configuration of the FIR type digital filter. The LMS calculator 3a determines the coefficients of the adaptive filter 3b so as to minimize the power of the error signal e by the adaptive signal processing. The adaptive signal processor 3 determines the coefficient W of the adaptive filter 3b only during the non-recognition of a voice, by the adaptive signal processing; and during the recognition of a voice, the adaptive signal processor 3 does not update the filter coefficient, and sets the filter coefficient W determined during the non-recognition of a voice to the adaptive filter 3b.
4 signifies a target response setter that receives a signal outputted from the microphone 11 as the target signal, and has a delay characteristic of the time d and flat characteristics (characteristics of the gain 1) in the audio frequency band. 5 signifies a subtracter that subtracts the output signal of the adaptive filter 3b from a target response signal outputted from the target response setter 4, and outputs the error signal e.
(b) Voice Emission Characteristics of a Human Being
Therefore, if the first microphone 11 is disposed on the ceiling right above the face of the speaker 10, and the second microphone 12 is disposed on the ceiling on the occipital side by about 1 to 5 cm from the first microphone position, as shown in
(c) Operation
During the non-recognition of a voice when only the noise is inputted to the microphones 11, 12, the adaptive signal processor 3 determines the filter coefficient W of the adaptive filter 3b to minimize the average of {En(z)}2 in the expression:
En(z)=Xn1(z)z−d−Xn2(z)·W(z) (1)
On the other hand, during the recognition of a voice, the adaptive signal processor 3 does not update the filter coefficient, and sets the filter coefficient W determined during the non-recognition of a voice to the adaptive filter 3b to output a voice signal. Here, the voice signals Xs1(z), Xs2(z) contained in the output signals of the microphones 11, 12 are different, and accordingly [Xn1(z)/Xn2(z)]≠[Xs1(z)/Xs2(z)] is satisfied. Therefore, the voice output Es(z) given by the following expression (2) does not become minimum (does not become diminished very much, compared to the noise).
Es(z)=Xs1(z)z−d−Xs2(z)·W(z) (2)
Thus, when the adaptive filter coefficient W is determined to zero the power of the noise output En(z) given by the expression (1), the voice output Es(z) given by the expression (2) does not become as diminished as the noise, and the SN ratio of the voice signal can be improved accordingly.
To summarize the above explanations, the relatively close disposition of the microphones 11, 12 as shown in
(d) Examination of the Microphone position and the SN Ratio Improvement Rate
The emission characteristics in
Although the embodiment in which the two microphones are positioned above the head of the speaker has been explained, if one microphone can pick up a voice with as high an SN ratio as possible, and the other microphone can pick up the voice with as low an SN ratio as possible in the condition of a relatively adjacent disposition of the two microphones, the positioning is not limited to “above the head”.
Thus, according to the invention, since the two microphones having directional characteristics are positioned adjacently, and in addition the angles formed by the orientations of the microphones relative to the speaker's vocalizing direction are different for each microphone, the SN ratio of a voice signal outputted from one microphone can be raised, and the SN ratio of the voice signal outputted from the other microphone can be lowered. Consequently, if the adaptive filter coefficient is determined to minimize the noise output, the voice signal output will not become zero, which improves the SN ratio of the voice signal.
Further, according to the invention, with a simplified configuration such that the microphones are mounted on the sun visor of a car, or on the ceiling above the front passenger seat or the driver's seat, and the orientations of the microphones are different, in spite of the relatively adjacent positioning of the microphones, one microphone can pick up a voice with as high an SN ratio as possible, and the other microphone can pick up the voice with as low an SN ratio as possible, thus improving the SN ratio.
Further, according to the invention, since the two microphones are laid out adjacently, and the SN ratio of a voice signal outputted from one microphone is raised while the SN ratio of the voice signal outputted from the other microphone is lowered, if the adaptive filter coefficient is determined to minimize the noise output, the voice signal output will not become zero, which improves the SN ratio of the voice signal. In other words, in spite of the limited number of microphones, the microphone system is able to receive and output the voice signal with a high SN ratio.
Also, according to the invention, with the layout of one microphone on the ceiling right above the face of the speaker and the layout of the other microphone on the ceiling on the occipital side by about 1 to 5 cm from the position of the first microphone, in spite of the relatively adjacent layout of the microphones, the first microphone can pick up a voice with as high an SN ratio as possible, and the other microphone can pick up the voice with as low an SN ratio as possible.
The invention being thus described, it will be obvious that the same may be varied in many ways. Such variations are not to be regarded as a departure from the spirit and scope of the invention, and all such modifications as would be obvious to one skilled in the art are intended to be included within the scope of the following claims.
Number | Date | Country | Kind |
---|---|---|---|
11-121517 | Apr 1999 | JP | national |
11-121518 | Apr 1999 | JP | national |
Number | Name | Date | Kind |
---|---|---|---|
4658426 | Chabries et al. | Apr 1987 | A |
5208864 | Kaneda | May 1993 | A |
5303307 | Elko et al. | Apr 1994 | A |
5402496 | Soli et al. | Mar 1995 | A |
5442813 | Walters | Aug 1995 | A |
5471538 | Sasaki et al. | Nov 1995 | A |
5473702 | Yoshida et al. | Dec 1995 | A |
5675655 | Hatae | Oct 1997 | A |
5754665 | Hosoi | May 1998 | A |
5796819 | Romesburg | Aug 1998 | A |
6061456 | Andrea et al. | May 2000 | A |
6430295 | Handel et al. | Aug 2002 | B1 |
6760449 | Matsuo | Jul 2004 | B1 |
6999541 | Hui | Feb 2006 | B1 |
Number | Date | Country |
---|---|---|
457176 | Aug 1991 | DE |
61028294 | Feb 1986 | JP |
08-040070 | Feb 1996 | JP |