The present invention relates to a system which can provide a sound solution using a microphone array which is set to form a plurality of beams. More particularly, the present invention relates to a multi-beam sound system which is disposed inside a vehicle such that it can provide an efficient sound solution inside the vehicle.
Recently, in response to the development of a variety of technologies such as Bluetooth, in-vehicle sound solutions (e.g. a voice call solution) have become convenient and are being actively used. However, the call quality of such solutions has not reached the level of call quality obtained from using a typical mobile phone and their environment is inferior, since there are several problems such as noise inside a driving vehicle and echo caused by the use of a speaker. In addition, as voice recognition becomes more common, it is expected that a speech recognition success rate of considerable level be guaranteed inside the vehicle.
For in-vehicle calling and speech recognition, a voice signal must be inputted first using a microphone. In this case, when the voice signal is inputted using only a single microphone, a sufficient signal noise ratio (SNR) of the signal is not ensured. In addition, the voice signal is very vulnerable to acoustic interference, such as driving noise and distortion and echo caused by the space of the vehicle, which is problematic.
In addition, a sound solution for in-vehicle calling or speech recognition is required to receive voices of the driver as well as voices of other persons. As for this problem, the input SNR can be improved and made robust against sound interference signals by forming a beam using a microphone array.
As an approach for forming such a beam, techniques for an adaptive beamformer are disclosed. Among these techniques, a linearly constrained minimum variance (LCMV) adaptive beamformer was disclosed in the report of Otis Lamont Frost III in 1972.
The adaptive beamformer is frequently used for in-vehicle sound solutions. The adaptive beamformer is generally used to provide a more efficiently sound solution inside a vehicle by adaptively changing the direction of a beam in response to a sound source (e.g. a speaker) or a noise.
However, even though the beam is formed using the adaptive beamformer of the related art, there is a problem in that it is difficult for the microphone array using one beam to receive voices of several persons or the performance thereof is low.
For this, a microphone array system having two or more beams can be required. Since various noises and interference signals are present, the difference in steering between an interference signal and a desired signal can also be decreased, thereby requiring beamforming to be more precise.
Therefore, an object of the present invention is to provide a microphone array which can form a multiplicity of beams and an adaptive beamformer for the microphone array.
Also provided is a multi-beam sound system which is more robust in the in-vehicle environment as described above, and to which an adaptive algorithm is applied for this purpose.
Also provided is a beamformer which uses a self-tuning algorithm having a relatively small amount of computation and is robust against non-stationary interference signals and echo.
According to an aspect of the invention for realizing the foregoing object, provided is a multi-beam sound system that includes a microphone array disposed at a predetermined position inside a vehicle, the microphone array comprising a plurality of microphones, and receiving an input signal; and an adaptive beamformer which forms beams of the microphone array. The adaptive beamformer forms at least two beams of the microphone array in different directions
The beamformer may include a fixed beamforming section which steers the input signal inputted from the microphone array to an intended direction; a blocking matrix which receives the input signal and acquires a noise reference signal from the input signal; a variable beamforming section which acquires an adaptive noise signal from the noise reference signal outputted from the blocking matrix; and a generalized sidelobe canceller (GSC) which includes canceling means for outputting an object signal from the input signal outputted from the fixed beamforming section by removing the adaptive noise signal from the input signal. The fixed beamforming section steers the input signal in at least two directions.
The generalized sidelobe canceller (GSC) may be designed under constraints according to a following formula in order to steer the input signal in at least two directions:
where Ci indicates an ith constraint matrix, a(θi) indicates a steering vector, w is a weight vector matrix, and f indicates an impulse response that is intended.
The variable beamforming section may acquire the adaptive noise signal using a self-tuning recursive least squares (RLS) algorithm.
The microphone array may be disposed at a predetermined position corresponding to two seats from among seats provided inside the vehicle. The predetermined position may be situated at a predetermined point between the two seats on a line orthogonal to a direction of the two seats, at least one of the at least two beams may be formed in a direction facing toward a seat which is positioned on one side with respect to the orthogonal line, and the other at least one of the at least two beams may be formed in a direction facing toward a seat which is positioned on the other side with respect to the orthogonal line.
The multi-beam sound system may further include an echo remover which removes an echo signal from the input signal when the echo signal based on a sound signal outputted from an audio-video-navigation (AVN) device of the vehicle is included in the input signal.
The echo remover may receive information about the sound signal from the audio-video-navigation (AVN) device, store the received information about the sound signal, estimate the echo signal based on the stored information about the sound signal, and remove the estimated echo signal from the input signal.
The multi-beam sound system may further include a second microphone array which is disposed so as to correspond to at least one seat from among the seats provided inside the vehicle except for the two seats.
The multi-beam sound system may further include a second adaptive beamformer, wherein the adaptive beamformer or the second adaptive beamformer forms at least one beam of the second microphone array.
According to another aspect of the invention for realizing the foregoing object, provided is a multi-beam sound system that includes: an adaptive beamformer which forms beams of a microphone array, the microphone array being disposed at a predetermined position inside a vehicle, comprising a plurality of microphones, and receiving an input signal; and an echo remover which removes an echo signal from an output signal outputted from the microphone array, the echo signal being based on a sound signal outputted from an audio-video-navigation (AVN) device of the vehicle. The adaptive beamformer forms at least two beams of the microphone array.
The input signal may include a speaker signal or a voice command signal of an occupant inside the vehicle. The multi-beam sound system may output an echo-removed speaker signal or an echo-removed voice command signal to the audio-video-navigation (AVN) device by removing the echo signal from the speaker signal or the voice command signal inputted through at least one beam of the at least two beams. The audio-video-navigation (AVN) device may transmit the echo-removed speaker signal to a counterpart communication device or outputs a control signal for controlling a predetermined device of the vehicle in response to the echo-removed voice command signal.
According to a further aspect of the invention for realizing the foregoing object, provided is a multi-beam sound system that includes: a fixed beamforming section which steers the input signal inputted from the microphone array to an intended direction; a blocking matrix which receives the input signal and acquires a noise reference signal from the input signal; a variable beamforming section which acquires an adaptive noise signal from the noise reference signal outputted from the blocking matrix; and a generalized sidelobe canceller (GSC) which includes canceling means for outputting an object signal from the input signal outputted from the fixed beamforming section by removing the adaptive noise signal from the input signal. The fixed beamforming section steers the input signal in at least two directions.
The generalized sidelobe canceller (GSC) may be designed under constraints according to a following formula in order to steer the input signal in at least two directions:
where Ci indicates an ith constraint matrix, a(θi) indicates a steering vector, w is a weight vector matrix, and f indicates an impulse response that is intended.
Since the multi-beam sound system according to the invention can adaptively form a plurality of beams, there is an effect in that the recognition of a plurality of sound sources can be improved.
In addition, when the multi-beam sound system according to the invention is applied to a vehicle, there is an effect in that not only a voice of the driver but also voices of other passengers can be efficiently received.
Furthermore, the present invention can be robust against noises and echo inside and outside the vehicle. In particular, when the echo remover is provided, there is an effect in that the invention can be more robust against noises and echo.
In addition, there is an effect in that beams robust against non-stationary interference signals and echo can be formed within a relatively short time using the self-tuning algorithm having a relatively small amount of computation.
Brief descriptions of individual figures are given in order to enhance understanding of the drawings which are referred to in the detailed description of the invention.
The present invention, advantages associated with the operation of the present invention and objects that are realized by the practice of the present invention will be apparent from the accompanying drawings which illustrate exemplary embodiments of the invention and the detailed description of the invention which are illustrated in the drawings.
Throughout the specification, it will be understood that, when an element is referred to as “transmitting” a data to another element, the element not only can directly transmit the data to another element but also indirectly transmit the data to another element via at least one intervening element.
In contrast, when an element is referred to as “directly transmitting” a data to another element, the element can transmit the data to another element without an intervening element.
The present invention will now be described more fully hereinafter with reference to the accompanying drawings, in which exemplary embodiments thereof are shown. Reference should be made to the drawings, in which the same reference numerals are used throughout the different drawings to designate the same or similar components.
First, referring to
In this way, the multi-beam sound system 1 according to an embodiment of the invention can form at least two beams based on the technical principle that will be described later. A plurality of beams can be easily formed in order to receive voice signals not only from the driver of each vehicle but also from a passenger.
The microphone array 200 can be disposed such that it corresponds to a plurality of seats. The configuration in which the microphone array 200 is disposed so as to correspond to the plurality of seats can indicate that the beams are formed in the direction facing toward the positions of the plurality of seats. For example, when the microphone array 200 forms two beams, as shown in
According to an embodiment of the invention, when two beams are formed by the microphone array 200, the two beams can be realized such that they are formed in different directions about the center of the microphone array 200. Forming the beams in different directions like this can have the effect of reducing the influence of interference between the beams.
Therefore, as shown in
Referring to
The second microphone array 200-1 can be disposed or buried at a position where it can easily receive the voices of passengers seated in rear seats. For example, as shown in
In addition, since a plurality of beams is formed, there is an effect in that the amount of data to be computed in real time is smaller than in the case in which one beam is formed and the direction of the beam is changed in real time.
Referring to
The control unit 100 can be connected to a predetermined audio-video-navigation (AVN) unit 300. An output signal from the control unit 100 (e.g. a signal received through the microphone array 200) can be transmitted to a predetermined device (e.g. a counterpart communication device or a device which is intended to execute a voice command) via the AVN unit 300. In addition, the control unit 100 can receive a signal that is to be transmitted from the AVN unit 300 to the outside or a sound signal that is to be outputted through a speaker, and use the received signal. For example, the echo remover 120 included in the control unit 100 can receive information about the signal that is to be transmitted from the AVN unit 300 to the outside or information about the sound signal that is to be outputted into the vehicle, and estimate an echo signal in an input signal inputted from the microphone array 200 using the received information, thereby cancelling the estimated echo signal.
When the multi-beam sound system 1 according to an embodiment of the invention is disposed in the vehicle, the AVN unit 300 can indicate any type of sound system, which is provided in the vehicle, such as an audio system, a video system, a navigation system or a voice call system. In addition, when the multi-beam sound system 1 is used in a place rather than the vehicle, the AVN unit 300 can be used as any type of sound system which can transmit a signal to another device or output a signal received from the outside.
When a sound signal is outputted from a speaker, which is a constituent part of the AVN unit 300, and is inputted again as a part of an input signal that is inputted through the microphone array 200, the echo remover 120 can perform the function of removing an echo signal from the input signal. In other words, the echo remover 120 removes the echo signal from the input signal inputted from the microphone array 200, and the input signal from which the echo signal is removed can be transmitted to the AVN unit 300.
For this, the echo remover 120 can receive information about the sound signal that is (to be) outputted through the speaker from the AVN unit 300, and temporarily store the received information. In addition, the echo signal can be estimated based on the information about the sound signal, and the estimated echo signal can be removed from the input signal inputted through the microphone array 200. In this way, a technical idea of previously storing a signal that is to be outputted through the speaker and removing an echo signal using the stored signal can be used in the echo remover 120. This technical idea is disclosed in a Korean patent application, which was filed by the applicant at Nov. 18, 2009 (Korean Patent Application No. 10-2009-0111323, titled “SIGNAL SEPARATION METHOD, AND COMMUNICATION SYSTEM AND VOICE RECOGNITION SYSTEM USING THE SIGNAL SEPARATION METHOD,” hereinafter referred to as “earlier-filed application”). The technical idea and entire contents of the earlier-filed application are incorporated herein by the reference.
The echo remover 120 can separate the echo signal from the input signal in real time within a short time using a modified blind source separation (BSS) algorithm, as disclosed in the earlier-filed application. In addition, according to the modified BSS algorithm disclosed in the earlier-filed application, the two signals can be separated from each other using one microphone. Specifically, when the input signal inputted from the microphone array 200 is outputted as an object signal through the adaptive beamformer 110, the echo remover 120 can remove the echo signal by processing the object signal as an input signal inputted through one microphone, as in the earlier-filed application. Of course, the echo remover 120 can remove the echo by a variety of other known techniques in addition to the technical idea disclosed in the earlier-filed application.
For example, when a voice signal from a driver or passenger in the vehicle is received through at least one of the two beams formed by the microphone array 200, the voice signal can be a speaker signal for calling or a voice command signal for a voice recognition command. Then, the speaker signal or voice command signal can be outputted through the adaptive beamformer 110 to the echo remover 120. An echo signal can be removed from a speaker or voice command signal by the echo remover 120, and the resultant speaker or voice command signal can be outputted to the AVN unit 300. In sequence, the AVN unit 300 can output the speaker signal, from which the echo signal is removed, to a counterpart communication device, or as a control signal for controlling a predetermined device subject to the voice recognition command (e.g., a navigation device, a window of the vehicle, or other devices of the vehicle). When the input signal is the voice recognition command, the control unit 100 and/or the AVN unit 300 can include a voice recognition device (not shown) which recognizes a voice signal and converts the voice signal into a command signal. If the voice recognition device (not shown) is provided as a part of the AVN unit 300, the signal outputted through the echo remover 120 is inputted into the voice recognition device (not shown), which then can generate the control signal based on the input signal, and output the generated control signal to a predetermined device of the vehicle.
In addition, as shown in
The second microphone array 200-1 can form at least one beam, as described above. For example, it can form one beam such that the direction of the beam is adaptively changed depending on the position of a passenger in a rear seat, or form two or more beams that face toward the positions where passengers are to be seated in rear seats.
In addition, the function of the adaptive beamformer 110 will be described with reference to
First, referring to
It can be assumed that the beamformer shown in
Here, an input signal and a weight vector can be defined as follows:
X(n)=[{right arrow over (x)}(n)T {right arrow over (x)}(n−1)T . . . {right arrow over (x)}(n−J)T]T
{right arrow over (x)}(n)=[x1(n) x2(n) . . . xK(n)]T [Formula 1]
W(n)=[w1(n) w2(n) . . . wKJ(n)]T [Formula 2]
Here, n indicates a sample number, {right arrow over (x)}(n) indicates an input signal vector, and WKJ(n) indicates a weight vector.
In addition, linear constraints are given as follows:
CTW=F [Formula 3]
In the above formula, F is a desired impulse response, and can indicate a JX1 vector consisting of values from f1 to fj. As shown in
C=[c
1
. . . c
j
. . . c
J]
where,
c
j=[0KT . . . 0KT {right arrow over (a)}(θ)T 0KT . . . 0KT]T [Formula 4]
In Formula 4 above, 0K indicates a column vector having a length K, {right arrow over (a)}(θ) indicates a steering vector having a length K. Therefore, Cj becomes a column vector having a length KJ, in which a steering vector is present in the jth group, and the other elements are 0.
The typical broadband beamformer of the related art is designed such that it has only one beam, i.e. one mainlobe, in which the constraint matrix can be expressed by the following formula.
Here, a(θ) indicates a steering vector in the look direction, in which case the impulse response f indicates a desired response in that direction. In addition, as will be described later, the broadband beamformer can be implemented as a generalized sidelobe canceller (GSC), in which a blocking matrix Ca of the GSC is defined as a null space of C, and its size can be defined as KJ*(K−1)J. In addition, the size of an adaptive weight vector can also be designed to be (K−1)J*1.
The adaptive beamformer 110 according to an embodiment of the invention can also be configured based on the broadband LCMV adaptive beamformer shown in
An orthogonal complement matrix of the constraint matrix C of Formula 3 can be Ca. A KJ*KJ matrix U and a KJ*1 vector {right arrow over (q)} can be defined as follows:
U=[CCa] [Formula 6]
{right arrow over (q)}=U
−1
{right arrow over (w)}=[{right arrow over (v)}
T
−{right arrow over (w)}
a
T]T [Formula 7]
According to Formula 6 and Formula 7, the weight vector can be expressed as follows:
{right arrow over (w)}=U{right arrow over (q)}=C{right arrow over (v)}−C
a
{right arrow over (w)}
a [Formula 8]
According to Formula 8 and Formula 3, the following formula can be obtained.
C
H
C{right arrow over (v)}−C
H
C
a
{right arrow over (w)}
a
=F [Formula 9]
The definition of the orthogonal complement is arranged to CHCa=0, and thus Formula 9 can be arranged as follows:
{right arrow over (v)}=(CHC)−1F [Formula 10]
A fixed beamformer component of the beamformer can be obtained using Formula 11. This can be expressed by the following formula.
{right arrow over (w)}
q
=Cv=C(CHC)−1F [Formula 11]
According to Formula 8 and Formula 11, the weight vector can be expressed as follows:
{right arrow over (w)}={right arrow over (w)}
q
−C
a
{right arrow over (w)}
a [Formula 12]
From Formula 12, the GSC can be produced, and the structure of the GSC can be the same as in
Referring to
In addition, when the existing GSC is set to satisfy Formula 3, the constraints that the blocking matrix 112 produces in order to generate the noise reference signal according to an embodiment of the invention can be set to satisfy the following formula.
The fixed weight vector acquired by the fixed beamforming section 111 and the blocking matrix Ca used by the blocking matrix 112 can be preset to designed values of the adaptive beamformer 110. Here, the fixed weight vector acquired by the fixed beamforming section 111 can be set so as to correspond to a plurality of beams which are formed according to an embodiment of the invention. Constraints C for acquiring the blocking matrix Ca can be designed to be the constraints according to Formula 13.
In the case of the beamformer of the related art, the size of the constraint matrix C is defined as KJ*J. In contrast, the adaptive beamformer 110 having a plurality of beams (mainlobes) according to an embodiment of the invention is in the form of KJ*NJ, where N is the number of mainlobes. Therefore, an impulse response vector f can be designed in such a fashion that its size changes from J*1 to NJ*1 and impulse responses, each of which corresponds to each beam, vertically cascading each other.
The design values of the adaptive beamformer 110 are also accordingly modified. In particular, the size of an adaptive weight vector is designed to be (K−N)J*1. Accordingly, the size of the blocking matrix Ca corresponding to the transform matrix of the weight vector can also be designed to be KJ*(K−N)J. This can be designed by subjecting the constraint matrix C to singular value decomposition, and taking the decomposed matrix by designed sizes in the order of the size of the singular values.
In general, in the case of the GSC with no constraints, the degree of freedom of the variable beamforming section 113 is equal to the number of sensors (microphones). If there are constraints, the degree of freedom becomes the number of the sensors (microphones)−the number of the constraints. This can be understood when the size of the adaptive weight vector in the above is considered. As the number of the constraints increases, the degree of freedom of the variable beamforming section 113 decreases, and the performance of the GSC is more deteriorated. Therefore, the number of beams that are actually available is limited by the number of the sensors (microphones). When the microphone array 200 is implemented using 4 sensors (microphones), the number of beams that are actually meaningful can be about 1 or 2. Accordingly, it is preferred that the microphone array 200 include 4 or more sensors (microphones).
In the meantime, the adaptive beamformer 110 according to an embodiment of the invention can use a self-tuning recursive least squares (RLS) algorithm in the adjustment of the adaptive noise signal, i.e. the adaptive weight vector, acquired by the variable beamforming section 113. Since the self-tuning RLS algorithm is an algorithm that has a fast adaption speed from among the adaptive algorithms, it is robust even to a non-stationary interference signal, and can rapidly adapt even after the look direction is changed.
The variable beamforming section 113 can acquire the adaptive weight vector using the self-tuning RLS algorithm, which can be an algorithm for recursively obtaining the solution of a least squares problem.
A description will be given below of a common case of the least squares problem.
A data matrix A and a desired signal {right arrow over (d)} can be defined as follows:
In addition, an optimum solution targeted in common adaptive signal processing satisfies the following condition.
A(n){right arrow over (w)}(n)={right arrow over (d)}(n) [Formula 16]
In order to obtain the optimum solution that satisfies the condition as in Formula 16, the least squares problem can be defined as follows:
J(n)=∥{right arrow over (d)}(n)−A(n){right arrow over (w)}(n)∥2 [Formula 17]
The optimum solution of the least squares problem which minimizes the foregoing cost function is generally given as follows:
{right arrow over (w)}(n)=(AH(n)A(n))−1AH(n){right arrow over (d)}(n) [Formula 18]
This can be expressed by a time-averaged autocorrelation matrix Φ and a time-averaged cross-correlation vector Z as follows:
{right arrow over (w)}(n)=Φ−1(n){right arrow over (z)}(n) [Formula 19]
The RLS algorithm recursively obtains the solution, and the signal processing process is as follows:
Here, ζ(n) indicates a priori error, {right arrow over (k)}(n) indicates a gain vector, and λ indicates a forgetting vector.
Therefore, {right arrow over (u)}(n) in Formula 20 can correspond to an output signal from the blocking matrix 112, and the adaptive weight vector of the adaptive beamformer 110 according to an embodiment of the invention having the constraints of Formula 13 can be recursively obtained using Formula 20.
Referring to
According to an embodiment of the invention, the microphone array 200 can be positioned at a specific point 10 on a vertical line 11 between predetermined seats (e.g. the seats S1 and S2). Among the plurality of beams formed by the microphone array 200, one beam can be formed on one side with respect to the vertical line 11, and the other beam can be formed on the other side with respect to the vertical line 11. Of course, when the microphone array 200 is positioned at a different point instead of being positioned at the specific point 10 on the vertical line 11 between the seats (e.g. the seats S1 and S2), the plurality of beams can be formed in the same direction.
In the meantime, the multi-beam sound system 1 can also include the second microphone array 200-1, which can also form a plurality of beams. In addition, according to an implementation, the second microphone array 200-1 can be configured such that it forms one beam and adaptively changes the direction of the beam.
When the second microphone array 200-1 forms a plurality of beams, the plurality of beams can be formed depending on the positions (e.g. S3, S4 and S5) of passengers seated in the rear seats.
Although
The multi-beam sound system according to an embodiment of the invention can be embodied as computer readable codes that are stored in a computer readable record medium. The computer readable record medium includes all sorts of record devices in which data that are readable by a computer system are stored. Examples of the computer readable record medium include read only memory (ROM), random access memory (RAM), compact disc read only memory (CD-ROM), a magnetic tape, a hard disc, a floppy disc, an optical data storage device and the like. Further, the record medium may be implemented in the form of a carrier wave (e.g. Internet transmission). In addition, the computer readable record medium may be distributed to computer systems over a network, in which the computer readable codes are stored and executed in a decentralized fashion. In addition, functional programs, codes and code segments for embodying the invention can be easily construed by programmers having ordinary skill in the art to which the invention pertains.
While the present invention has been described with reference to the certain exemplary embodiments which are shown in the drawings, it will be understood by a person having ordinary skill in the art that various modifications and equivalent other embodiments may be made therefrom. Therefore, the true scope of the present invention shall be defined by the technical principle of the appended claims.
The present invention is applicable to a sound system for a vehicle.
Number | Date | Country | Kind |
---|---|---|---|
10-2010-0106625 | Oct 2010 | KR | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/KR2011/008213 | 10/31/2011 | WO | 00 | 4/26/2013 |