1. Technical Field
The present invention relates to a sound pickup apparatus, which is incorporated in a portable communication terminal and a speech recognition terminal, capable of suppressing ambient sounds and clearly picking up the sound of a user, a portable communication apparatus and an image pickup apparatus provided with the sound pickup apparatus.
2. Background Art
There are many cases where a portable communication terminal and a speech recognition terminal are used in an environment, in which much noise exists, such as outdoors, and a lowering in communication sound quality and speech recognition performance becomes problematic due to a mixture of noise into sound signals. It is desired that a sound pickup apparatus incorporated in such a terminal has a directivity by which a beam (a direction of especially high sensitivity) is formed in the direction in which a user utters. Therefore, noise that reaches the sound pickup apparatus from the surroundings of the user is suppressed, wherein the sound of the user is intensified, and improvement in the communication sound quality and speech recognition performance can be expected. Hereinafter, it is assumed that target signals such as the sound of a user are called “target sounds”, and signals other than the above signals are called “noise”.
In recent years, a sound pickup apparatus of a microphone array system has been developed in order to achieve such a directivity, which is composed of a plurality of microphones and can obtain a desired directional characteristic by processing and combining signals output from the microphones. In comparison with a sound pickup apparatus composed of a single microphone, it may be listed, as advantages of the microphone array system, that a desired directional characteristic can be easily obtained by digital signal processing and there is little restriction in arrangement of sound holes since non-directional-type microphones can be utilized. Here, the sound hole means a hole made in the casing of a communication terminal in order to guide sound to microphones in the casing of the communication terminal.
Several types of systems have been known as signal processing to form directivity using a microphone array. As a representative system, a delay-and-sum type microphone array may be listed, which is described in Acoustic Systems and Digital Processing For Them edited by the Institute of Electronics, Information and Communication Engineers and published in April, 1995 and JP-A-2007-27939. Also, as another system, a two-channel SS system microphone array may be listed, which is described in JP-A-2004-289762.
A description is given of an example of the delay-and-sum type microphone array composed of two microphones with reference to
Based on the above description, the output signal of the microphone 121 is delayed by delay devices 123 and 124 by D sin θ/c with respect to the microphone 122, the phases of the signals are adjusted, and the output signals are added by an adder 125, whereby a directivity having a beam (a direction of especially high sensitivity) in the direction θ can be formed for the output signal 126 of the adder 125. Therefore, if the beam is turned to the direction in which the target sound comes, it is possible to suppress noise and to intensify the target sound. Also, although the interval D between the microphones is required to be equal to or less than one half (½) the wavelength in the upper limit frequency of input sound waves, the sensitivity of the entire microphone array will be lowered if the interval D between the microphones is too small.
On the other hand, in the delay-and-sum type microphone array shown in
The directional characteristic formed by the delay-and-sum type microphone array is determined by the delay time given to the delay devices 123 and 124. However, as a matter for automatically forming a null in the noise arriving direction, an adaptive-type microphone array has been known.
Using
Generally in a portable communication apparatus and a speech recognition terminal, it is preferable that a sound pickup apparatus is disposed in a planar-shaped casing, and directivity having a beam in the front side direction thereof is formed. However, in order to achieve the same by a delay/addition-type microphone array, it is necessary to arrange a number of microphones. In this case, since the space and cost are increased, it becomes difficult to mount the microphones in a small-sized terminal. In addition, in the case of a delay-and-subtraction type microphone array using a subtractor in the delay/addition-type microphone array, although the null can be formed with a small number of microphones, the delay-and-subtraction type microphone array is not suitable for use for forming a beam in a desired direction. According to the microphone array of the two-channel SS system, which is described in JP-A-2004-289762, although a comparatively sharp beam can be formed with two microphones, the microphone array is still not suitable for the purpose of forming a beam only in the front side direction of the sound pickup apparatus as shown in
The present invention has been developed in view of such situations, and it is therefore an object of the invention to provide a sound pickup apparatus capable of forming a directivity having a sharp beam or a null in a specified direction by a microphone array composed of a small number of microphones, and a portable communication apparatus including the sound pickup apparatus, and an image pickup apparatus.
According to an aspect of the present invention, there is provided a sound pickup apparatus, including: a microphone array including at least three microphones, wherein a first pair of microphones in which two of the at least three microphones are aligned on a first axis, and a second pair of microphones in which two of the at least three microphones are aligned on a second axis; a first null signal generator which outputs a first null signal based on a differential output of the first pair of microphones, the first null signal having a directional characteristic in which a first null surface is defined by rotating a virtual line extending toward a direction of the lowest sensitivity around the first axis; a second null signal generator which outputs a second null signal, based on a differential output of the second pair of microphones, the second null signal having a directional characteristic in which a second null surface is defined by rotating a virtual line extending toward a direction of the lowest sensitivity around the second axis; and a combiner which generates a first target signal based on the first null signal and the second null signal, the first target signal having a directional characteristic in which the lowest sensitivity is formed in a direction to a line along which the first null surface meets the second null surface.
In addition, the sound pickup apparatus may further include a frequency domain subtractor which is adapted to perform subtraction in frequency domain of the first target signal from a signal output from one of the at least three microphones to output a second target signal.
According to the above configurations, since a beam (a direction of especially high sensitivity) or a null (a direction of especially low sensitivity) is formed only in the direction of a target sound by means of a microphone array including at least three microphones, which can be easily mounted in a small-sized terminal, it is possible to achieve a sound pickup apparatus having favorable performance to suppress ambient sounds.
In the accompanying drawings:
An aspect of the present invention provides a sound pickup apparatus, including: a microphone array including at least three microphones, wherein a first pair of microphones in which two of the at least three microphones are aligned on a first axis, and a second pair of microphones in which two of the at least three microphones are aligned on a second axis; a first null signal generator which outputs a first null signal based on a differential output of the first pair of microphones, the first null signal having a directional characteristic in which a first null surface is defined by rotating a virtual line extending toward a direction of the lowest sensitivity around the first axis; a second null signal generator which outputs a second null signal, based on a differential output of the second pair of microphones, the second null signal having a directional characteristic in which a second null surface is defined by rotating a virtual line extending toward a direction of the lowest sensitivity around the second axis; and a combiner which generates a first target signal based on the first null signal and the second null signal, the first target signal having a directional characteristic in which the lowest sensitivity is formed in a direction to a line along which the first null surface meets the second null surface.
Therefore, it becomes possible to form a null (a direction of especially low sensitivity) only in the direction of the target sound by an easily mountable microphone array including at least three microphones, wherein a sound pickup apparatus having favorable performance to suppress noise in a specified direction can be composed.
The sound pickup apparatus may further include a frequency domain subtractor which is adapted to perform subtraction in frequency domain of the first target signal from a signal output from one of the at least three microphones to output a second target signal.
Therefore, it becomes possible to form a beam (a direction of especially high sensitivity) only in the direction of the target sound by an easily mountable microphone array including at least three microphones, wherein a sound pickup apparatus having favorable performance to suppress noise can be composed.
In the sound pickup apparatus, one microphone of the first pair of microphones may be the same as one microphone of the second pair of microphones.
Therefore, a sound pickup apparatus having favorable performance to suppress ambient sound by an easily mountable microphone array including at least three microphones, and the mounting cost can be reduced.
In the sound pickup apparatus, the first axis may intersect the second axis at right angles.
Therefore, it becomes possible to further accurately form a null (a direction of especially low sensitivity) or beam (a direction of especially high sensitivity) only in the direction of the target sound, wherein it is possible to compose a sound pickup apparatus having favorable performance to suppress ambient sounds.
The sound pickup apparatus may be configured in that the combiner includes: a first FFT section which transforms the first null signal into a first frequency signal having a first frequency characteristic related to first frequency bins; a second FFT section which transforms the second null signal into a second frequency signal having a second frequency characteristic related to second frequency bins; and an operator which generates the first target signal based on the first frequency signal related to the first frequency bins and the second frequency signal related to the first frequency bins.
Therefore, it becomes possible to estimate ambient sound signals upon changing the signals in the time domain to those in the frequency domain.
In the sound pickup apparatus, the operator may generate the first target signal by selecting each value of respective frequency bins of the first or second frequency signals, whichever is greater, in each frequency bin.
Therefore, since, in output signals of the two sets of null signal generators, the ambient sound signal existing in both the sets and the ambient signals existing only in either one of them are reflected in the output signals of the ambient sound signal estimator by the same weighting, it becomes possible to uniformly lower the side lobe (the sensitivity in the direction other than the direction of target sound) in the output signals of the frequency domain subtractor.
In the sound pickup apparatus, the operator adds each value of the respective frequency bins of the first frequency signal to each value of the respective frequency bins of the second frequency signal.
Therefore, it becomes possible to form a null (a direction of especially low sensitivity) in the direction of the target sound.
In the sound pickup apparatus, each of the first and second null signal generators may include a delay device and a subtractor to be implemented as a delay-and-subtraction type microphone array.
Therefore, a null is formed in an intended direction by the null signal generator applying a preset delay time to the delay device, wherein it becomes possible to form a beam in the intended direction in the output signals, of the frequency domain subtractor, obtained by using the same.
In the sound pickup apparatus, each of the first and second null signal generators may include a delay device and an adaptive filter to be implemented as an adaptive-type microphone array.
Therefore, where the null signal generator forms a null by automatically following the direction where the direction of the target sound is not obvious or fluctuates, it becomes possible to continuously form a beam having a high sensitivity in the direction of the target sound in the output signals, of the frequency domain subtractor, obtained by using the same.
The sound pickup apparatus may include an adjustor for adjusting individual differences in sensitivity of the at least three microphones to have the same sensitivity each other.
Therefore, such an effect can be brought about by which influences due to individual differences with respect to microphone sensitivity are reduced, and particularly, the accuracy is improved where a null signal is formed by a preset coefficient.
Further, there can be provided a portable communication apparatus including a display screen and the sound pickup apparatus disposed on a plane for arranging the display screen thereon.
In the portable communication apparatus, the direction of the line along which the first null surface may meet the second null surface is fixed in a front direction of the display screen.
Therefore, in a case of a video phone by which a user is capable of hand-free communication while looking at a display screen of a communication terminal, such an effect can be brought about by which the sound of a speaker located in the front side direction of the display screen can be clearly picked up.
In the portable communication apparatus, the direction of the line along which the first null surface may meet the second null surface automatically follows a direction of a target sound within a certain area centered around a front direction of the display screen.
Therefore, in a case of a video phone by which a user is capable of hand-free communication while looking at a display screen of a communication terminal, a beam is formed, following the direction even if the direction of the speaker changes centering around the front side direction of the display screen, wherein such an effect can be brought about by which the sound of the speaker can be clearly picked up and a favorable communication quality is obtained.
Further, there can be provided a portable communication apparatus including a key pad and the sound pickup apparatus disposed on a plane for arranging the key pad thereon.
Therefore, where a user carries out communications while operating keys, such an effect can be brought about by which the sound of the speaker located in the front side direction of the key pad can be clearly picked up.
The sound pickup apparatus may be configured in that the first null signal generator generates a third null signal based on signals output from the first pair of microphones, and the second null signal generator generates a fourth null signal based on signals output from the second pair of microphones, and the combiner directs, based on the third null signal and the fourth null signal, a direction of a line along which a third null surface of the third null signal meets a fourth null surface of the fourth null signal toward a direction of another target sound to be picked up.
Therefore, since sound waves arriving from a plurality of directions are individually separated and picked up where a user utters from a plurality of directions, the apparatus is effective for a sound conference apparatus and a speech recognition apparatus.
In the sound pickup apparatus, the frequency domain subtractor may be adapted to perform the subtraction based on an arbitrary subtraction ratio.
Therefore, it is possible to control the strength of the directivity of the sound pickup apparatus in accordance with the intention and situations of a user.
Further, there can be provided an image pickup apparatus including a camera for capturing an image and the sound pickup apparatus, wherein the direction of the line along which the first null surface meets the second null surface is set to a direction of the image to be captured, and wherein the subtraction ratio is determined in conjunction with a zoom ratio of the camera.
Therefore, such an effect can be brought about by which sound pickup limited to the sound sources existing in the image pickup range of a camera device is performed, and ambient sounds coming from outside the image pickup range can be suppressed.
Further, there can be provided an image pickup apparatus including a camera for capturing an image and the sound pickup apparatus, wherein a delay time of at least one of delay devices included in the first and second null signal generators is changed in response to a variation of a capturing direction of the camera so as to direct the line along which the first null surface meets the second null surface toward a direction of the image to be captured.
Therefore, even if the image capturing direction is changed by performing a pan and tilt operation of the image pickup apparatus, the beam direction can be followed to the direction, wherein such an effect can be brought about by which the image pickup screen and acoustic signals are continuously coincident with each other.
Hereinafter, a description is given of embodiments of the present invention with reference to the drawings.
A user of the terminal carries out a communication operation by using the key pad 16 and carries out sound input by the microphones while watching the display screen 14. In the case of such a use method, it is assumed that it is desirable that the sound pickup apparatus 10 has a beam (a direction of especially high sensitivity) in the direction of the z axis when it is assumed that the direction from the microphone 12 to the microphone 11 is x axis, the direction from the microphone 12 to the microphone 13 is y axis, and the direction perpendicular to the x-y plane is z axis in a three-dimensional orthogonal coordinate system.
As the sound pickup apparatus to achieve such directivity, a microphone array 20 composed of three microphones 11 through 13 is mounted in the communication terminal 1 in Embodiment 1. Here, although it is necessary to set the intervals Dx and Dy between the microphones to half the wavelength of the upper limit of the frequency of signal band in order not to produce spatial aliasing (folding noise), the sensitivity of the sound pickup apparatus 10 will be lowered if the interval is excessively small. For example, where the analog output signal of the microphone is converted to a digital signal of a sampling frequency 16 kHz, since the upper limit of the frequency is 8 kHz, the wavelength becomes 40 mm or slightly more, wherein it is favorable that the intervals Dx and Dy between the microphones are 20 mm or slightly less.
In addition, in order to make the sensitivities of the microphones 11 through 13 almost equivalent to each other, it is desirable that an adjustor for adjusting individual differences in the sensitivity of microphones is provided. A coefficient for adjustment is preset in the adjustor, for example, before shipment. Therefore, influences due to individual differences with respect to microphone sensitivity are reduced.
In the above configuration, analog signals that the microphones 11 through 13 output are subjected to signal processing in the DSP 30 after having been digitalized in the ADC 34. That is, respective processing of the X-direction null signal generator 21, the Y-direction null signal generator 22, the ambient sound signal estimator 23 and the frequency domain subtractor 24 in the operation block in
The ambient sound signal estimator 23 includes frame dividing sections 413 through 415, window framing sections 417 through 419, FFT sections 406 through 408, and a combiner 409. The frequency domain subtractor 24 includes an attenuation filter calculator 410, a spectral attenuator 411, an IFFT section 412, and a frame combiner 416.
Hereinafter, a detailed description is given of operation description of the sound pickup apparatus according to Embodiment 1 of the present invention.
First, a description is given of the operation of the X-direction and Y-direction null signal generators 21 and 22. Analog electric signals output upon sound waves reaching the microphones 11 through 13 are converted to digital signals by the ADC 34 and are input into the DSP 30. The X-direction null signal generator 21 and the Y-direction null signal generator 22 form directivity having a null (a direction of especially low sensitivity) in the direction of the target sound in the output signal on the planes (x-z plane and y-z plane) defined by the x axis and the z axis, and the y axis and the z axis in
Here, the angle between a plane and a straight line is defined as follows. As shown in
Using the delay-and-subtraction type microphone array shown in
It is assumed that the coordinates of the point P are made into (x, y, z), and the straight line linking the origin O to the point P is a straight line r, and that the angle between the straight line r and the yz plane defined by the y axis and the z axis is made into θx. That is, ∠POPy becomes θx. The X-direction null signal generator 21 forms directivity having a null in the direction of θx. Therefore, the relationship between the delay times τ1 and τ2 given by the delay devices 401 and 402 in
τ1−τ2=Dx·sin θx/c (c: acoustic velocity) [Mathematical Expression 1]
That is, since the sound wave of the sound source P located at the point P in
Similarly, with respect to the Y-direction null signal generator 22, the angle between the straight line r and the xz plane defined by the x axis and the z axis is made into θy, wherein ∠POPx becomes θy. The relationship between the delay times τ2 and τ3 given by the delay devices 402 and 403 in
τ3−τ2=Dy·sin θy/c (c: acoustic velocity) [Mathematical Expression 2]
Here, since τ2 is common in the x direction of [Mathematical Expression 1] and the y direction of [Mathematical Expression 2], τ1 and τ3 may be obtained as the already known fixed value as in [Mathematical Expression 3]. If the value of τ2 is set to, for example, a value obtained by dividing either one of Dx or Dy, whichever is greater, by the acoustic velocity c, there is no case where τ1 and τ3 become negative in all the angle ranges that are obtainable by θx and θy.
τ1=τ2+Dx·sin θx/c
τ3=τ2+Dy·sin θy/c [Mathematical Expression 3]
In addition, if a difference is provided between the delay τ1 of the delay device 401 into which signals are input from the microphone 11 and the delay τ2 of the delay device 402 into which signals are input from the microphone 402, the direction of the null surface can be varied. The pattern is shown in
In the above description, the ideal condition is that the microphone is spot-shaped, and the difference in the phase of sound waves reaching the microphone is accurately obtained in accordance with the angle of the sound source. Actually, however, the wider the area of the diaphragm of the microphone becomes, the more unclear the difference in phase becomes, wherein a shallow null having spread to some extent is brought about.
Next, a description is given of the operation description of the ambient sound signal estimator 23. Output signals of the X-direction null signal generator 21, the delay device 402 and the Y-direction null signal generator 22 are divided into frame signals having a predetermined time length and interval by the frame dividing sections 413 through 415, respectively. For example, the output signals are divided so that sampling is carried out at 8 kHz, the frame length is 128 points and the frame interval is 64 points. Therefore, the front half of the frame overlaps the latter half of the former frame, and the latter half of the frame overlaps the front half of the subsequent frame. This is to prevent the waveform from becoming discontinuous at the boundary of frames when the frames are combined and connected by the frame combiner 416 in the subsequent stage.
The window framing sections 417 through 419 carry out a window framing process on frame-by-frame divided signals so that frequency resolution accuracy required to perform an FFT process in a subsequent stage is obtained. A Hanning window as shown in, for example, the next [Mathematical Expression 4] may be used as the window function.
w(n)=0.5−cos {2πn/(L−1)} [Mathematical Expression 4]
Where L is the number of samples per frame, n expresses the sample position in a frame, that is, n=(0, 1, . . . , L−1) is established. In the window function, when the former frame is overlapped on the latter frame, the sums of the overlapped sections become equal to each other.
It is assumed that the sample row obtained by processing the output of the subtractor 404 by the window framing section 417 is xX-R,n, where n is a sample number. It is assumed that the sample row obtained by processing the output of the subtractor 402 by the window framing section 418 is xR,n. The sample row obtained by processing the output of the subtractor 405 by the window framing section 419 is XY-R,n.
The processes of the FFT sections 406, 407 and 408 are shown in the following [Mathematical Expression 5]. The output of the FFT section 406 is expressed by XX-R,p, the output of the FFT section 407 is expressed by XR,p and the output of the FFT section 408 is expressed by XY-R,p.
where N is the total number of frequency bins, and p is a frequency bin number.
In the process of the combiner 409, it is assumed that the real part of XX-R,p is [XX-R,p], the imaginary part thereof is ℑ[XX-R,p], the real part of XR,p is [XR,p], and the imaginary part thereof is ℑ[XR,p], and the real part of the XY-R,n is [XY-R,p] and the imaginary part thereof is ℑ[XY-R,p]. The real part [XM,p] of the selection-processed output signal XM,p and the imaginary part ℑ[XM,p] thereof are obtained by the next [Mathematical Expression 6].
Next, the frequency domain subtractor 24 carries out a subtraction process in the frequency domain using XR,p and XM,p with respect to all the frequencies p, and outputs a sample row xZ,n of the time domain. Hereinafter, a detailed description is given of the operations of the frequency domain subtractor 24. First, in the attenuation filter calculator 410, Hp that is the ratio of XR,p and XM,p is calculated as in the [Mathematical Expression 7]. δ is a coefficient to prevent the denominator from becoming zero.
H
p=([XM,p]2+[XM,p]2)/([XR,p]2+[XR,p]2+δ)
Hp=1 if HP>1 [Mathematical Expression 7]
Next, the spectral attenuator 411 multiples the real part [XR,p] and the imaginary part [XR,p] of XR,p by Hp as in the [Mathematical Expression 8], and the real part [XZ,p] of XZ,p and the imaginary part [XZ,p] thereof are obtained. Based on the above, XM,p is subtracted from XR,p in the frequency domain.
[XZ,p]=(1−Hp)×[XR,p]
[XZ,p]=(1−HP)×[XR,p] [Mathematical Expression 8]
The IFFT section 412 performs an inverse FFT calculation of [Mathematical Expression 9] using XZ,p, and obtains a sample row xZ,n of the time domain.
The frame combiner 416 combines continuous sound waveforms by adding the overlapped frames between the former and the latter frames one after another with respect to the frame-by-frame sample rows xZ,n, and finishes combining.
A description is given of a state where a selection process of such spectral signals is carried out, using
Power spectra may be calculated instead of the amplitude spectra in the ambient sound signal estimator 23, and the frequency filter bank may be used without carrying out the FFT process.
As described above, since, in the combiner 409, the ambient sound signals existing in both output signals of the two sets of null signal generators and the ambient sound signal existing in only either one thereof are reflected onto the output signal of the ambient sound signal estimator at the same weighting, it becomes possible to uniformly lower the side lobe (the sensitivity in the direction other than the target sound) in the output signal of the frequency domain subtractor 24 described later.
Further, in Embodiment 1, a description is given of a state where a selection process of spectra of the null signal in the X direction and the null signal in the Y direction is carried out. However, the present invention is not limited thereto. That is, a simple addition calculation may be adopted with respect to the spectral addition.
A null is formed along the direction of 0 degrees in the X axis and the Y axis, respectively, in
Since Embodiment 1 according to the present invention, which is achieved as described above, can form a sharp beam only in the target directions including the front side direction by a microphone array composed of a small number (three) of microphones, Embodiment 1 is suitable for the purpose of being incorporated in a small-sized apparatus as shown in
A description is given of Embodiment 2 according to the present invention by use of
In the present embodiment, two types of null signals are formed by an adaptive-filter-type microphone array, respectively. In the operation of the X-direction null signal generator 221, the signal of microphone 11 is delayed by the delay device 401, the adaptive filter 244 performs filter calculations using the signal of the microphone 12 as input, and the output signal of the adaptive filter 244 and the output signal of the delay device 401 are added to each other by the adder 241. In the adaptive filter 244, the filter coefficient is continuously updated so that the output signal of the adder 241 is minimized. Similarly, in the operation of the Y-direction null signal generator 222, the signal of the microphone 13 is delayed by the delay device 403, the adaptive filter 245 performs filter calculations using the signal of the microphone 12 as input, and the output signal of the adaptive filter 245 and the output signal of the delay device 403 are added to each other by the adder 243. And, in the adaptive filter 245, the filter coefficient is continuously updated so that the output signal of the adder 243 is minimized. The configurations of the ambient sound signal estimator 23 and the frequency domain subtractor 24, which come in the subsequent stage, are similar to those of Embodiment 1.
Such an adaptive filter can be achieved by an algorithm such as the LMS (Least Mean Square) method and the learning identification method. By applying a restriction condition to the learning process of the adaptive filter, the range to follow the target sound may be restricted, or distortion of the output signal can be reduced, and as such a method, a restriction learning method of Griffiths-Jim and AMNOR (Adaptive Microphone array for NOise Reduction) method have been known.
Based on the above configuration, the X-direction null signal generator 221 and the Y-direction null signal generator 222 automatically detect the direction of the target sound on the respective axes and can continuously form a null in the direction. Respective null signals output from the X-direction null signal generator 221 and the Y-direction null signal generator 222 are corrected by the combiner 409 of the ambient sound signal estimator 23. As a result, such an effect can be obtained by which a sharp beam is continuously formed only in the direction of the target sound in the output 225 of the frequency domain subtractor 24. In an actual use environment, although it is necessary to update the coefficient of the adaptive filter only in the case of the target sound by distinguishing the target sound from the ambient sound, such a method can be taken into consideration that distinguishes the sound and ambient sound from each other, paying attention to frequency bias between the sound and the ambient sound, wherein the output of the FFT section can be applied.
A description is given of Embodiment 3 according to the present invention with reference to
The target sound direction information section 341 shown in
In detail, the microphones 11 through 13 are disposed in the form that the pan (horizontal) direction of the image pickup section 302 corresponds to the X axis, and the tilt (vertical) direction corresponds to the Y axis. In this case, the Z axis corresponds to the image capturing direction of a camera in the default state of the image pickup section 302 (that is, in a state where the camera is not panned or tilted).
When the image pickup section 302 is moved in the horizontal direction from the default state, the image capturing direction, that is, the target sound direction moves on the X axis. That is, θx becomes a greater value than 0°. Also, when the image pickup section 302 is moved in the vertical direction from the default state, the image capturing direction, that is, the target sound direction moves on the Y axis. That is, θy becomes a greater value than 0°.
The delay time that determines the direction of the directivity of sound pickup when θx and θy change and is given to the delay devices τ1 and τ3 in FIG. 4 is given, as in [Mathematical Expression 3] by referencing τ2. Therefore, a null can be formed to follow the image capturing direction in null signals output from the X-direction null signal generator 21 and the Y-direction null signal generator 22. As a result, it becomes possible that the null direction of the null signal output by the ambient sound signal estimator 23 is coincident with the image capturing direction, and the beam direction of the beam signal output by the frequency domain subtractor 324 is coincident with the image capturing direction.
Further, the sound pickup magnification information section 343 acquires information on the zoom ratio of image pickup from the image pickup apparatus 301, and sets the degree of the level by which the ambient sound signals are subtracted in the attenuation ratio setting section 342, wherein the level of directivity of the sound pickup apparatus is changed over. In detail, as in [Mathematical Expression 10], it is possible to adjust the level of the directivity by multiplying the coefficient Hp of [Mathematical Expression 7] by an attenuation ratio α.
H
p
′=α·H
p
0≦α≦1 [Mathematical Expression 10]
It is possible to adjust the level of directivity, for example, narrow directivity is obtained when the attenuation ratio α is near 1, non-directivity of the microphone 12 is obtained when the attenuation ratio α is near 0, and intermediate directivity therebetween is obtained when the attenuation ratio α is 0.5 or so. Therefore, it is possible to attempt to coincide the sound source existing in the range of the image pickup screen and the acoustic signals picked up, wherein an effect can be obtained by which ambient sounds are prevented from being mixed from outside the image pickup range.
Also, it is not necessary to provide both of the target sound direction information section 341 and a set of the attenuation ratio setting section 342 and the sound pickup magnification information section 343. The target sound direction information section 341 may be independently provided, or only the attenuation ratio setting section 342 and the sound pickup magnification information section 343 may be provided.
In addition, although the target sound direction was set to the center in the image capturing direction of the image pickup section 302, the target sound direction may be set to the direction based on the result obtained from a calculation using parameters preset in the target sound direction information section 341 with respect to the information on the acquired image capturing direction.
In the above, the embodiments of the present invention were described. However, the present invention is not limited to the above-described embodiments, and appropriate modifications and changes can be made without departing from the essence of the present invention. Further, materials, shapes, dimensions and forms of the constituent elements can be set arbitrarily and no limitation is placed thereon.
In the above-described embodiments, a sound pickup apparatus having favorable performance to suppress ambient sounds has been achieved by forming a beam (the point of especially high sensitivity) in the target sound direction. However, with the present invention, it is possible to apply the present invention to suppress the sound only in a specified direction by using, for example, an output signal (that is, a null signal having a null (the point of especially low sensitivity) in the target sound direction as shown in
In the above-described embodiments, three microphones 11 through 13 were disposed at right angles centering around the microphone 12. However, the arrangement of the microphones is not limited to the right angle. That is, the relationship may be acceptable in which the axes on which the first pair of the microphones 11 and 12 and the second pair of the microphones 12 and 13 are disposed cross each other so that the microphones 11 and 12 composing the first pair and the microphones 12 and 13 composing the second pair can form a null in different directions. In this case, although the accuracy of a beam of the output signal of the frequency domain subtractor 24 is lowered more or less, the degree of freedom to dispose the microphones is increased. Accordingly, the configuration is effective for a case where there is a restriction in arrangement of microphones as in a small-sized terminal such as a mobile phone.
In Embodiment 1 described above, a folding-type communication terminal 1 was assumed. However, as in
In the above-described embodiments, the microphone 12 of the three microphones 11 through 13 is used as a common microphone to form a null in the X direction and the Y direction. However, the common microphone to form a null in the X direction and the Y direction may not be prepared, such a configuration may be adopted in which a null is formed separately in the X direction and the Y direction. That is, as shown in
In the above-described embodiment, a beam is formed in one certain target sound direction. However, since the direction of the target sound is determined by setting the delay time as shown in [Mathematical Expression 3], a beam may be formed in a plurality of directions.
According to the present invention, since a beam or a null can be formed only in the target sound direction by a microphone array composed of at least three microphones, it is possible to achieve a sound pick apparatus that can be easily mounted in a small-sized terminal, and has favorable performance to suppress ambient sounds.