The present disclosure relates to a sound collecting apparatus for beam forming.
Beamforming is a technique of generating a signal with a sound emphasized in a target sound direction by using voice signals acquired from a plurality of microphone elements. As one example of a beam former using an adaptive filter, a generalized sidelobe canceller is disclosed in L. Griffiths and C. W. Jim, “An alternative approach to linearly constrained adaptive beamforming”, IEEE Trans. Antennas Propagation, vol. AP-30, pp. 27-34, January 1982.
One non-limiting and exemplary embodiment provides a sound collecting apparatus capable of effectively suppressing sounds other than a target sound.
In one general aspect, the techniques disclosed here feature a sound collecting apparatus including a plurality of microphone elements, in which among a plurality of microphone pairs each configured of any two microphone elements included in the plurality of microphone elements, a total number of a plurality of effective microphone pairs in which a distance between the two microphone elements is smaller than a distance D is larger than a total number of the plurality of microphone elements, the distance D is represented by D=c/2f, where a frequency of a target sound acquired from the plurality of microphone elements is f and sound velocity is c, and when an angle formed by a straight line connecting two microphone elements configuring each of the plurality of effective microphone pairs and a predetermined straight line is θ, the angles θ of all of the plurality of effective microphone pairs acquired from the plurality of microphone elements are varied.
The sound collecting apparatus of the present disclosure can effectively suppress sounds other than a target sound.
Additional benefits and advantages of the disclosed embodiments will become apparent from the specification and drawings. The benefits and/or advantages may be individually obtained by the various embodiments and features of the specification and drawings, which need not all be provided in order to obtain one or more of such benefits and/or advantages.
In the following, embodiments are described with reference to the drawings. The embodiments described below represent general or specific examples. Numerical values, shapes, materials, components, arrangement and connection modes of the components, and so forth described in the following embodiments are merely examples, and are not meant to restrict the present disclosure. Also, among the components in the following embodiments, a component not described in an independent claim representing a broadest concept is described as an optional component.
Furthermore, each drawing is merely a schematic drawing, and is not strictly depicted. Still further, in each drawing, components having a substantially same function are provided with the same reference character, and redundant description may be omitted or simplified.
Still further, in the following embodiments, when the sound collecting apparatus takes a sound coming from one direction as a main output target, that direction is represented as a target sound direction and that sound is represented as a target sound. Still further, sounds other than the target sound may be represented as noise.
In the following, a general outline of the sound collecting apparatus according to one embodiment is described by using
As depicted in
The signal processing unit 30 performs beamforming by using a voice signal acquired from each of the plurality of microphone elements 20a to 20d. Beamforming of the signal processing unit 30 is a signal process of forming directivity so that noise is at a dead angle while sensitivity in the target sound direction is ensured. That is, according to beamforming of the signal processing unit 30, noise coming from directions other than the target sound direction is suppressed. While each of the plurality of microphone elements 20a to 20d is a non-directional microphone element, the sound collecting apparatus 10 has high sensitivity in the target sound direction by beamforming of the signal processing unit 30.
Next, a functional structure of the sound collecting apparatus 10 is described.
The plurality of microphone elements 20a to 20d are a microphone array for generating a main signal Xm and reference signals Xr1 to Xr6 for use in beamforming. In other words, the plurality of microphone elements 20a to 20d are used for the signal processing unit 30 as a beamformer to acquire a voice signal. The plurality of microphone elements 20a to 20d are arranged on the same plane. In the present embodiment, the sound collecting apparatus 10 includes four microphone elements 20a to 20d, but a total number of microphone elements is not particularly limited. The total number of microphone elements may be an even number or an odd number. The sound collecting apparatus 10 may include, for example, four or more microphone elements.
The signal processing unit 30 is a beamformer. More specifically, the signal processing unit 30 has a structure similar to that of a generalized sidelobe canceller. The signal processing unit 30 is achieved by a processor, for example, such as a digital signal processor (DSP), but may be achieved by a microcomputer or circuit. Also, the signal processing unit 30 may be achieved by a combination of two or more of a processor, a microcomputer, and a circuit. The signal processing unit 30 includes delay devices 31a to 31d, a main signal generating unit 31, reference signal generating units 32a to 32f, adaptive filter units 33a to 33f, a subtracting unit 34, and a coefficient updating unit 35.
The delay devices 31a to 31d correspond to voice signals acquired from the plurality of microphone elements 20a to 20d in a one-to-one relation. The delay devices 31a to 31d give the voice signals acquired from the plurality of microphone elements 20a to 20d, respectively, a delay in accordance with the target sound direction, and output the resultant signal as an output signal.
The main signal generating unit 31 is one example of a first signal generating unit, generating a main signal Xm by adding the voice signals acquired from the plurality of microphone elements 20a to 20d and given, by the delay devices 31a to 31d, the delay in accordance with the target sound direction. The main signal Xm is one example of a first signal.
The reference signal generating units 32a to 32f are one example of a second signal generating unit. The reference signal generating units 32a to 32f correspond to six microphone pairs each configured of any two microphone elements included in the plurality of microphone elements 20a to 20d in a one-to-one relation. One reference signal generating unit generates a reference signal by performing subtraction on the voice signals acquired from the microphone elements configuring one microphone pair and given, by the delay devices 31a to 31d, the delay in accordance with the target sound direction. Each of the reference signals Xr1 to Xr6 is one example of a second signal.
Also, the adaptive filter units 33a to 33f correspond to the reference signal generating units 32a to 32f in one-to-one relation. The adaptive filter units 33a to 33f applies filter coefficients α1 to α6 to the corresponding reference signal generating units 32a to 32f.
For example, the reference signal generating unit 32a generates a reference signal Xr1 by performing subtraction on voice signals acquired from the microphone elements 20a and 20b, respectively, and given, by the delay devices 31a and 31b, the delay in accordance with the target sound direction (output signals from the delay devices 31a and 31b). The adaptive filter unit 33a applies the filter coefficient α1 to the reference signal Xr1.
Similarly, the reference signal generating unit 32b generates a reference signal Xr2 by performing subtraction on voice signals acquired from the microphone elements 20a and 20c, respectively, and given, by the delay devices 31a and 31c, the delay in accordance with the target sound direction (output signals from the delay devices 31a and 31c). The adaptive filter unit 33b applies the filter coefficient α2 to the reference signal Xr2.
The reference signal generating unit 32c generates a reference signal Xr3 by performing subtraction on voice signals acquired from the microphone elements 20a and 20d, respectively, and given, by the delay devices 31a and 31d, the delay in accordance with the target sound direction (output signals from the delay devices 31a and 31d). The adaptive filter unit 33c applies the filter coefficient α3 to the reference signal Xr3.
The reference signal generating unit 32d generates a reference signal Xr4 by performing subtraction on voice signals acquired from the microphone elements 20b and 20c, respectively, and given, by the delay devices 31b and 31c, the delay in accordance with the target sound direction (output signals from the delay devices 31b and 31c). The adaptive filter unit 33d applies the filter coefficient α4 to the reference signal Xr4.
The reference signal generating unit 32e generates a reference signal Xr5 by performing subtraction on voice signals acquired from the microphone elements 20b and 20d, respectively, and given, by the delay devices 31b and 31d, the delay in accordance with the target sound direction (output signals from the delay devices 31b and 31d). The adaptive filter unit 33e applies the filter coefficient α5 to the reference signal Xr5.
The reference signal generating unit 32f generates a reference signal Xr6 by performing subtraction on voice signals acquired from the microphone elements 20c and 20d, respectively, and given, by the delay devices 31c and 31d, the delay in accordance with the target sound direction (output signals from the delay devices 31c and 31d). The adaptive filter unit 33f applies the filter coefficient α6 to the reference signal Xr6.
The subtracting unit 34 subtracts the reference signals Xr1 to Xr6 applied with the filter coefficients α1 to α6 from the generated main signal Xm. An output signal Y, which is a signal acquired as a result of subtraction, is represented by the following Equation 1. The output signal Y is one example of a third signal. In Equation 1, n is the number of microphone pairs. That is, n is a natural number, and n =6 holds in the sound collecting apparatus 10.
Y=X
m−Σk=1nαkXrk (1)
The coefficient updating unit 35 updates the filter coefficients α1 to α6 based on the output signal Y acquired by subtraction of the subtracting unit 34.
As depicted in
In the sound collecting apparatus 10, the signal processing unit 30 can change the beam direction in the output signal Y. For example, the sound collecting apparatus 10 includes a user interface such as a touch panel or operation button, and the signal processing unit 30 changes the beam direction based on user operation accepted through the user interface. Alternatively, the signal processing unit 30 automatically changes the beam direction by detecting a sound volume or the like.
In this manner, when the signal processing unit 30 performs beamforming with a variable beam direction, sensitivity in the output signal Y in directions other than any beam direction has to be reduced as much as possible. To ensure this performance, the arrangement of the plurality of microphone elements 20a to 20d is defined in the sound collecting apparatus 10.
In the sound collecting apparatus 10, the total number of effective microphone pairs is larger than the total number of the plurality of microphone elements 20a to 20d. Here, effective microphone pairs are among microphone pairs each configured of any two microphone elements included in the plurality of microphone elements 20a to 20d, in which a distance between two microphone elements is shorter than a distance D. The distance D is represented by D=c/2f, where the frequency of the target sound acquired from the plurality of microphone elements 20a to 20d is f and sound velocity is c. In the sound collecting apparatus 10, the total number of effective microphone pairs is six, and the total number of the plurality of microphone elements is four.
Note that the distance D varies depending on the frequency of the target sound. For example, when the target sound has a frequency of 8 kHz, the distance D is 2.125 cm if the sound velocity c=34000 cm/s. Also, when the target sound has a frequency of 4 kHz, the distance D is 4.25 cm if the sound velocity c=34000 cm/s.
The reference signal calculated from a non-effective microphone pair in which the distance between the two microphone elements is equal to or longer than the distance D may not have sensitivity characteristics expected from the arrangement of the non-effective microphone pair due to, for example, occurrence of a folding component in signal processing. That is, the reference signal calculated from the non-effective microphone pair may have unexpected sensitivity characteristics, hindering generation of the output signal Y with high accuracy. In the sound collecting apparatus 10, with the total number of effective microphone pairs being larger than the total number of the plurality of microphone elements 20a to 20d, generation of the output signal Y with high accuracy is achieved.
Note in the sound collecting apparatus 10 that the microphone pairs acquired from the plurality of microphone elements 20a to 20d are all effective microphone pairs. That is, the total number of microphone pairs acquired from the plurality of microphone elements 20a to 20d is equal to the total number of effective microphone pairs. However, part of the microphone pairs acquired from the plurality of microphone elements 20a to 20d may be effective microphone pairs.
Also, in a planar view when a plane where the plurality of microphone elements 20a to 20d are arranged is viewed from a direction perpendicular to the plane and an angle formed by a straight line connecting two microphone elements configuring an effective microphone pair and a predetermined straight line is θ, the angles θ of all effective microphone pairs included in the plurality of microphone elements 20a to 20d are varied.
As depicted in
Similarly, an angle formed by a straight line L4 connecting the microphone elements 20a and 20c configuring an effective microphone pair and the X axis is θ4. An angle formed by a straight line L5 connecting the microphone elements 20a and 20b configuring an effective microphone pair and the X axis is θ5, and an angle formed by a straight line L6 connecting the microphone elements 20c and 20d configuring an effective microphone pair and the X axis is θ6.
Here, θ1 is different from any of θ2 to θ6, and θ2 is different from any of θ1 and θ3 to 06. The same goes for θ3 to θ6. Note that what θ is different from the others means that θ defined based on the same reference as that as depicted in
This difference in θ is a difference in sensitivity characteristics in the reference signal. If all θ1 to θ6 are the same, the reference signals Xr1 to Xr6 acquired from six effective microphone pairs have similar sensitivity characteristics.
As depicted in
By contrast, when θ1 to θ6 are all different from one another, the reference signals Xr1 to Xr6 acquired from six effective microphone pairs have different sensitivity characteristics.
As depicted in
As described above, the arrangement of the plurality of microphone elements 20a to 20d in the sound collecting apparatus 10 is only required to satisfy two requirements. One requirement is that the total number of effective microphone pairs included in the sound collecting apparatus 10 is more than the total number of the plurality of microphone elements 20a to 20d included in the sound collecting apparatus 10. The other requirement is that the angles θ of all effective microphone pairs included in the sound collecting apparatus 10 are varied.
This allows the sound collecting apparatus 10 to supplement the dead angle of one reference signal by another reference signal. Thus, directions in which sensitivity is not decreasable in the output signal Y are reduced, and noise in various directions can be suppressed. That is, the sound collecting apparatus 10 can effectively suppress sounds other than the target sound. Also, the arrangement of the plurality of microphone elements 20a to 20d is particularly useful when the sound collecting apparatus 10 can change the target sound direction or is used for a system which can change the target sound direction.
The dead angles of the reference signals Xr1 to Xr6 are preferably distributed. Ideally, the dead angles of the reference signals Xr1 to Xr6 are preferably equally distributed. To equally distribute the dead angles in the reference signals Xr1 to Xr6, θ1 to θ6 are preferably varied by 180°/6=30° in the sound collecting apparatus 10. For example, (θ2, θ3, θ4, θ5, θ6)=(θ1+30°, θ1+60°, θ1+90°, θ1+120°, θ1+150°) is preferable. When the total number of effective microphone pairs is n (n is a natural number), n effective microphone pairs preferably have angles θ varied by 180°/n. This reduces directions in which sensitivity is not decreasable in the output signal Y and can suppress noise in various directions.
Here, as a scheme of evaluating the arrangement of the plurality of microphone elements, an evaluation scheme based on a difference in angles θ between effective microphone pairs is conceivable. Specifically, the effective microphone pairs are sorted in the descending order of the angles θ, and the arrangement of the plurality of microphone elements can be evaluated based on the difference in angles θ between adjacent effective microphone pairs. Here, an evaluation value A is represented by, for example, the following Equation 2. Tk in Equation 2 is represented by Equation 3, and Tideal in Equation 2 is represented by Equation 4.
The evaluation value A is better as being smaller. That is, as the evaluation value A is smaller, directions in which sensitivity is not decreasable in the output signal Y are reduced, and noise in various directions can be suppressed.
In
Meanwhile, in
Meanwhile, in the arrangement of
As described above, the total number of effective microphone pairs may be equal to or smaller than (the total number of the plurality of microphone elements-1)×2.
While the sound collecting apparatus 10 includes four microphone elements 20a to 20d, the total number of microphone elements included in the sound collecting apparatus 10 is not particularly limited. The sound collecting apparatus 10 may include, for example, six or more microphone elements.
As depicted in
Note that in the field of acoustic technology, an even number of loudspeakers or microphone elements are often used in a device such as a stereo system. Thus, if the total number of microphone elements included in the sound collecting apparatus 10 is an even number, an effect of easy compatibility with another hardware can be acquired.
All effective microphone pairs acquired from the plurality of microphone elements 20a to 20d may be arranged so that dead angle ranges of the reference signals acquired from the effective microphone pairs do not overlap one another. In the following, the dead angle ranges of the reference signals are described.
A first dead angle range R1 is, for example, an angle range in which sensitivity is equal to or smaller than −60 dB in the sensitivity characteristics of the first reference signal. A second dead angle range R2 is, for example, an angle range in which sensitivity is equal to or smaller than −60 dB in the sensitivity characteristics of the second reference signal. Note that each dead angle range is in a range in which sensitivity is equal to or smaller than a predetermined value in the sensitivity characteristics of the reference signal and −60 dB is one example of the predetermined value.
Here,
When the distance between two microphone elements configuring a target microphone pair is 2.125 cm, the dead angle range is in a range of ±0.05° centering at an angle at which sensitivity is minimum. Note that a difference between the angle at which sensitivity is minimum in the sensitivity characteristics of the first reference signal and the angle at which sensitivity is minimum in the sensitivity characteristics of the second reference signal is equal to the difference between the angle θ of the first effective microphone pair and the angle θ of the second effective microphone pair. Therefore, when the first dead angle range R1 and the second dead angle range R2 do not overlap, this means that the angle θ of the first effective microphone pair and the angle θ of the second effective microphone pair are different from each other at least by 0.1° or more.
In this manner, in the sound collecting apparatus 10, all effective microphone pairs acquired from the plurality of microphone elements 20a to 20d may not have dead angle ranges overlap. The dead angle range is an angle range in which sensitivity in sensitivity characteristics of the second signal acquired from the effective microphone pair has a value equal to or smaller than a predetermined value. This reduces directions in which sensitivity is not decreasable in the output signal Y and allows suppression of noise in various directions.
As has been described in the foregoing, the sound collecting apparatus 10 includes the plurality of microphone elements 20a to 20d. Among microphone pairs each configured of any two microphone elements included in the plurality of microphone elements 20a to 20d, the total number of effective microphone pairs in which a distance between the two microphone elements is shorter than the distance D is larger than the total number of the plurality of microphone elements 20a to 20d.
The distance D is represented by D=c/2f where the frequency of the target sound acquired from the plurality of microphone elements 20a to 20d is f and sound velocity is c. When an angle formed by a straight line connecting two microphone elements configuring an effective microphone pair and a predetermined straight line is θ, the angles θ of all effective microphone pairs acquired from the plurality of microphone elements 20a to 20d are varied.
This allows the sound collecting apparatus 10 to supplement the dead angle of one reference signal by another reference signal, thereby suppressing noise in various directions. That is, the sound collecting apparatus 10 can effectively suppress sounds other than the target sound.
Also, for example, the total number of the plurality of microphone elements 20a to 20d is an even number.
This allows acquirement of an effect of easy compatibility with another hardware.
Furthermore, for example, the total number of the plurality of microphone elements 20a to 20d is equal to or larger than six.
This allows acquirement of a sufficient noise suppression amount.
Still further, for example, when the total number of effective microphone pairs is n (n is a natural number), all of the effective microphone pairs included in the plurality of microphone elements 20a to 20d have angles θ varied by 180/n [°].
This reduces directions in which sensitivity is not decreasable in the output signal Y and can suppress noise in various directions.
Still further, for example, in all effective microphone pairs acquired from the plurality of microphone elements 20a to 20d, angle ranges in which sensitivity in the sensitivity characteristics of the second signal acquired from the effective microphone pair has a value equal to or smaller than a predetermined value do not overlap one another.
This reduces directions in which sensitivity is not decreasable in the output signal Y and can suppress noise in various directions.
Still further, for example, the total number of microphone pairs acquired from the plurality of microphone elements 20a to 20d is equal to the total number of effective microphone pairs.
Thus, since all microphone pairs function as effective microphone pairs, the sound collecting apparatus 10 can effectively suppress sounds other than the target sound.
Still further, for example, the plurality of microphone elements are arranged at positions corresponding to vertexes of an equilateral N-gon (N is an odd number) and a center position of the equilateral N-gon.
In this manner, if the plurality of microphone elements are arranged so as to form an equilateral N-gon (N is an odd number) surrounding and centering on one microphone element, as depicted in
Still further, for example, the sound collecting apparatus 10 further includes: the delay devices 31a to 31d which give a delay to voice signals acquired from the plurality of microphone elements 20a to 20d; the main signal generating unit 31 which generates the main signal Xm by adding the output signals from the delay devices 31a to 31d; the reference signal generating units 32a to 32f which generate the reference signals Xr1 to Xr6 by performing subtraction on output signals corresponding to two microphone elements configuring an effective microphone pair among output signals from the delay devices 31a to 31d; the adaptive filter units 33a to 33f which apply filter coefficients to the reference signals Xr1 to Xr6; the subtracting unit 34 which subtracts the reference signals Xr1 to Xr6 applied with the filter coefficients from the generated main signal Xm; and the coefficient updating unit 35 which updates the filter coefficients based on the output signal Y acquired by subtraction of the subtracting unit 34.
The delay devices 31a to 31d are one example of delay devices. The main signal Xm is one example of the first signal, and is a signal acquired by adding voice signals given, by the delay devices 31a to 31d, the delay in accordance with the target sound direction (output signals from the delay devices 31a to 31d) to voice signals acquired from the plurality of respective microphone elements 20a to 20d. The reference signals Xr1 to Xr6 are one example of the second signal, and is a signal acquired by performing subtraction on voice signals acquired from two microphone elements configuring an effective microphone pair and given, by the delay devices 31a to 31d, the delay in accordance with the target sound direction (output signals from the delay devices 31a to 31d). The main signal generating unit 31 is one example of the first signal generating unit, each of the reference signal generating units 32a to 32f is one example of the second signal generating unit, and the output signal Y is an example of the third signal.
This allows the sound collecting apparatus 10 to perform beamforming based on the voice signals acquired from the plurality of microphone elements 20a to 20d.
While the present embodiment has been described, the present disclosure is not limited to this embodiment.
For example, the shape and others of the sound collecting apparatus described in the above embodiment is merely one example, and the sound collecting apparatus may have another shape such as a rectangular parallelepiped shape.
The configuration of the signal processing unit according to the above embodiment is merely one example. The signal processing unit may include a component such as, for example, a D/A converter, a low-pass filter (LPF), a high-pass filter (HPF), a power amplifier, or an A/D converter. Also, signal processing to be performed by the signal processing unit is, for example, digital processing, but may be partially analog signal processing.
Also in the above embodiment, the signal processing unit may be achieved by being configured of dedicated hardware or by executing a software program suitable for the signal processing unit. The signal processing unit may be achieved by a program executing unit such as a CPU or processor reading and executing a software program recorded on a recording medium such as a hard disk or semiconductor memory.
Also, the signal processing unit may be a circuit (or an integrated circuit). These circuits may configure one circuit as a whole, or may be separate circuits. Also, these circuits may be general-purpose circuits or dedicated circuits.
Other forms acquired from various modifications conceived by people skilled in the art on the above embodiment and achieved by combining any of the components and functions described in the above embodiment in a range not deviating from the gist of the present disclosure are also included in the present disclosure.
For example, the present disclosure may be achieved as a system including the sound collecting apparatus of the above embodiment. Also, the present disclosure may be an evaluation method to be executed by a computer as a method of evaluating the arrangement of a plurality of microphone elements based on the above Equations 2 to 4.
The sound collecting apparatus of the present disclosure is useful as a sound collecting apparatus for use in a telephone conference system or the like.
Number | Date | Country | Kind |
---|---|---|---|
2017-124815 | Jun 2017 | JP | national |