The present invention relates to a audio conference device used in the audio conference held at a plurality of spots via the network, or the like and, more particularly, a audio conference device for collecting a sound emitted from a talker's direction by detecting the talker's direction.
In the prior art, as the method of holding the audio conference between remote locations, the method of providing a audio conference unit to each spot where the audio conference should be held respectively and then connecting these audio conference devices via the network to transmit/receive a sound signal is often employed. Also, various audio conference devices utilized in such audio conference have been devised.
In the audio conference device in Patent Literature 1, the sound is emitted from a speaker being arranged on a ceiling based on the sound signal input via the network, the sound is collected by the microphones being provided on respective side surfaces to direct in plural different directions respectively, and the sound signals from the microphones are sent out to the outside via the network.
Also, in an in-hall loudspeaker in Patent Literature 2, the talker's direction is detected by applying a delay process to the collected signals from respective microphones in a microphone array, and a sound volume of the sounds emitted from the speakers near the talker is lowered.
Patent Literature 1: JP-A-8-298696
Patent Literature 2: JP-A-11-55784
However, in the audio conference device in Patent Literature 1, the microphones and the speaker are arranged closely mutually, and thus plenty of sounds gone around from the speaker are contained in the collected signals of respective microphones. Therefore, when the talker's direction is specified based on the collected signals of respective microphones and then the collected sound signal corresponding to this direction is selected, the talker's direction is detected incorrectly because of the presence of detoured sounds.
Also, in the in-hall loudspeaker in Patent Literature 2, the talker's direction is detected by applying the delay process to the collected signals containing the detoured sounds. Therefore, like Patent Literature 1, the influence of detoured sounds cannot be removed and in some cases the talker's direction is detected incorrectly.
Also, apart from the audio conference devices in Patent Literature 1 and Patent Literature 2, there is a long-length audio conference device equipped with a microphone array, as shown in
A audio conference device 1′ shown in
When the microphone interval D0 is wide, the sound signal in a low frequency range (fLOW) can get a sufficient gain around the beam front direction in a wide directional range, as shown in
For this reason, as shown in
Therefore, it is an object of the present invention to provide a audio conference device capable of detecting a talker's direction exactly, and collecting a sound emitted from this direction at a high signal S/N ratio.
A audio conference device of the present invention includes:
a microphone array which has a plurality of microphones aligned linearly, and wherein a microphone interval between the microphones aligned in a center portion of the plurality of microphones is smaller than a microphone interval between the microphones aligned in both end portions in the plurality of microphones;
a first sound collecting beam generating portion which applies a delay process to sound collecting signals obtained from the microphones aligned in the center portion of the microphone array to generate a plurality of detecting sound collecting beam signals that have directivity in different directions respectively;
a detecting portion which detects a talker's direction by comparing the plurality of detecting sound collecting beam signals; and
a second sound collecting beam generating portion which generates a sound collecting beam signal that has a directivity in the detected talker's direction, by applying a predetermined delay process to the sound collecting signals obtained from all microphones of the microphone array.
In this configuration, the plurality of microphones in the microphone array are aligned linearly. These microphones are aligned such that the interval between a predetermined number of microphones that are aligned in the center portion of the microphone group is smaller than the interval between the microphones that are aligned in both end portions to put the microphones in the center portion between them. The sound collecting beam signals generated by the first sound collecting beam generating portion are produced only by the sound collecting signals obtained by the microphones aligned in the center portion. Therefore, as shown in
The detecting portion compares the signal intensity between a plurality of generated detecting sound collecting beam signals, extracts the detecting sound collecting beam signal whose signal intensity is highest, and gets the concerned direction. The detecting portion gives the detected direction to the second sound collecting beam generating portion. Then, the second sound collecting beam generating portion generates the sound collecting beam signal having the highest directivity in the detected direction from the sound collecting signals of all microphones, and outputs such sound collecting beam signal. When all microphones are employed in this manner, as shown in
Also, in the audio conference device of the present invention, the first sound collecting beam generating portion has a band-pass filter that has a particular frequency band decided based on a talker's sound as a pass band. The first sound collecting beam generating portion generates the plurality of detecting sound collecting beam signals by using the sound collecting signals subjected to a band passing process by the band-pass filter.
In this configuration, noise components except the talker's sound are eliminated by the band-pass filter. Therefore, the sound consisting only of the sound emitted from the talker is input into the first sound collecting beam generating portion. As a result, the signal intensity of the detecting sound collecting beam signal depends substantially only on the talker's sound, and the talker's direction can be detected more precisely.
According to the present invention, the audio conference device capable of detecting the talker's direction without fail and thus collecting the sound emitted from the concerned direction at a high level can be implemented.
A audio conference device according to an embodiment of the present invention will be explained with reference to the drawings hereinafter.
The audio conference device 1 of the present embodiment is constructed by providing a plurality of speakers SP1 to SP16, a plurality of microphones MIC101 to MIC116, MIC201 to MIC216, and function portions shown in
The case 2 is an almost rectangular parallelepiped that is long in one direction. A foot portion 3 of a predetermined height, which keeps a lower surface of the case 2 away from the mounted surface by a predetermined interval, is provided to both end portions of a long side (surface) of the case 2 respectively. Here, in the following explanation, a long-length surface out of four side surfaces of the case 2 is called a long surface, and a short-length surface is called a short surface.
The nondirectional single-body speakers SP1 to SP16 that are formed into the same shape respectively are provided to the lower surface of the case 2. These single-body speakers SP1 to SP16 are provided linearly at a predetermined interval along the longitudinal direction.
The microphones MIC101 to MIC116 having the same specifications respectively are aligned linearly on one long surface of the case 2 along the longitudinal direction. A microphone array is composed of a group of microphones being aligned in this way. A microphone interval is set to D1 (<D0 (a microphone interval in the prior art)) from the microphone MIC 104 to the microphone MIC 113 in a center portion of the microphone array in the alignment direction (longitudinal direction), while a microphone interval is set to D2 (>D0>D1) from the microphone MIC 101 to the microphone MIC 104 and from the microphone MIC 113 to the microphone MIC 116 in both end portions. That is, ten microphones in the center portion in the alignment direction are aligned densely whereas three microphones in both ends in the alignment direction are aligned coarsely respectively.
The microphones MIC201 to MIC216 having the same specifications respectively are aligned on the other long surface of the case 2 in positions that oppose to the microphones MIC101 to MIC116. Concretely, a microphone interval is set to D1 (<D0 (a microphone interval in the prior art)) from the microphone MIC 204 to the microphone MIC 213 in the center portion in the alignment direction (longitudinal direction), while a microphone interval is set to D2 (>D0>D1) from the microphone MIC 201 to the microphone MIC 204 and from the microphone MIC 213 to the microphone MIC 216 in both end portions.
Here, in the present embodiment, the number of microphones in each microphone array is set to 16 respectively. But the number of microphones is not limited to this number, and the number of microphones may be set appropriately according to the specifications.
Next, as shown in
The input/output I/F 12 converts the input sound signal, which is input from other sound emitting device via the input/output connector 11, from a data format (protocol) compatible with the network to a predetermined sound data format, and then sends the converted sound signal to the directivity-of-emitted-sound controlling portion 13 via the echo canceling portion 20. Also, the input/output I/F 12 converts the output sound signal generated by the echo canceling portion 20 into a data format (protocol) compatible with the network, and sends out the converted sound signal to the network via the input/output connector 11.
The directivity-of-emitted-sound controlling portion 13 applies the delay process, the amplitude process, etc. peculiar to the speakers SP1 to SP16 of the speaker array respectively to the input sound signal based on the specified directivity of the emitted sound, and thus produces individual emitted sound signals. The directivity-of-emitted-sound controlling portion 13 outputs these individual emitted sound signals to the D/A converters 14 provided to the speakers SP1 to SP16 respectively. Respective D/A converters 14 convert the individual emitted sound signals in an analog form, and output the converted sound signals to the sound emitting amplifiers 15. Respective sound emitting amplifiers 15 amplify the individual emitted sound signals, and feed the amplified sound signals to the speakers SP1 to SP16.
The speakers SP1 to SP16 convert the fed individual emitted sound signals into individual sounds, and emit the sounds to the outside respectively. At this time, the speakers SP1 to SP16 are fitted to the lower surface of the case 2. Therefore, the emitted sounds are reflected from an upper surface of the desk on which the audio conference device 1 is put, and then are propagated obliquely upward to pass by the audio conference device at which the conferee sits now.
Respective microphones MIC101 to MIC116, MIC201 to MIC216 in the microphone array may be either non-directional or directional. But it is desirable that these microphones should have the directivity. Respective microphones pick up sounds from the outside of the audio conference device 1, convert the sounds into electric signals, and output sound collecting signals to the sound collecting amplifiers 16. The sound collecting amplifiers 16 amplify the sound collecting signals respectively, and feed the amplified signals to the A/D converters 17. The A/D converters 17 convert the sound collecting signals into the digital signals, and output the digital signals to the sound collecting beam generating portions 181, 182.
The sound collecting signals picked up by the microphones MIC101 to MIC116 of the microphone array MA10 fitted on one long surface respectively are input into the sound collecting beam generating portion 181. The sound collecting signals picked up by the microphones MIC201 to MIC216 of the microphone array MA20 fitted on the other long surface respectively are input into the sound collecting beam generating portion 182.
The sound collecting beam generating portions 181, 182 have the same configuration respectively.
The sound collecting beam generating portion 181 has a band-pass filter (BPF) 810, a detecting beam generating portion 811, an outputting beam generating portion 812, and an outputting beam selecting portion 813.
The band-pass filter 810 passes only a predetermined frequency component of sound collecting signals SS104 to SS113 picked up by the microphones MIC104 to MIC113, and outputs it to the detecting beam generating portion 811. Here, the predetermined frequency component is set to a particular frequency component in the human sound. In the present embodiment, a high frequency range (2 kHz to 3 kHz) whose energy is relatively small is set as a pass band.
The detecting beam generating portion 811 applies delay-sum control to the sound collecting signals SS104 to SS113 that are picked up by the center microphones MIC104 to MIC113 and passed through the band-pass filter 810. Thus, as shown in
The outputting beam generating portion 812 applies the delay-sum control to the sound collecting signals SS101 to SS116. Thus, the outputting beam generating portion 812 generates sound collecting beam signals MB101′ to MB114′ that have the sharp directivity in the same direction as the sound collecting beam signals MB101 to MB114 respectively.
When direction data MS is input from the sound collecting beam selecting portion 19 described later, the outputting beam selecting portion 813 detects the sound collecting beam signal corresponding to the concerned direction from the sound collecting beam signals MB101′ to MB114′, and then outputs the concerned sound collecting beam signal as an output sound collecting beam signal MB100 to the sound collecting beam selecting portion 19.
The sound collecting beam generating portion 182 has the same configuration as the sound collecting beam generating portion 181. The sound collecting beam generating portion 182 generates sound collecting beam signals MB201 to MB214 from the sound collecting signals SS201 to SS216 (not shown) fed from the microphones MIC201 to MIC216 and then outputs these sound collecting beam signals to the sound collecting beam selecting portion 19. Also, the sound collecting beam generating portion 182 outputs an output sound collecting beam signal MB200 to the sound collecting beam selecting portion 19 based on the direction data MS from the sound collecting beam selecting portion 19.
The sound collecting beam selecting portion 19 compares signal intensities between the sound collecting beam signals MB101 to MB114 and MB201 to MB214, and detects the sound collecting beam signal whose sound signal is highest. The sound collecting beam selecting portion 19 feeds the direction data MS corresponding to the detected sound collecting beam signal to the sound collecting beam generating portions 181, 182.
Then, the sound collecting beam selecting portion 19 outputs the output sound collecting beam signal MB100 from the sound collecting beam generating portion 181 and the output sound collecting beam signal MB200 from the sound collecting beam generating portion 182 to the echo canceling portion 20 as an output sound collecting beam signal MB.
According to this configuration, the audio conference device 1 of the present embodiment possesses following advantages.
As shown in
Then, as shown in
In this manner, with the arrangement of the present embodiment, the talker's direction can be detected precisely and also the talker's sound can be collected by the beam whose directivity is sharp.
The echo canceling portion 20 is equipped with echo cancellers 21 to 23 that are provided independently respectively and are connected in series. That is, the output sound collecting beam signal MB from the sound collecting beam selecting portion 19 is input into the echo canceller 21, and an output of the echo canceller 21 is input into the echo canceller 22. Then, an output of the echo canceller 22 is input into the echo canceller 23, and an output of the echo canceller 23 is input into the input/output I/F 12.
The echo canceller 21 is equipped with an adaptive filter 211 and a post processor 212. Also, although not shown, the echo cancellers 22, 23 are constructed by the same configuration as the echo canceller 21, and are equipped with adaptive filters 221, 231 and post processors 222, 232 respectively.
The adaptive filter 211 of the echo canceller 21 generates a pseudo regression sound signal, which is based upon the directivity of the emitted sound to be set and the directivity of the collected sound of the particular sound collecting beam signal MB to be selected, in response to an input sound signal S1. The post processor 212 subtracts the pseudo regression sound signal corresponding to the input sound signal S1 from the particular sound collecting beam signal being output from the sound collecting beam selecting portion 19, and outputs a resultant signal to the post processor 222 of the echo canceller 22.
The adaptive filter 221 of the echo canceller 22 generates a pseudo regression sound signal, which is based upon the directivity of the emitted sound to be set and the directivity of the collected sound of the particular sound collecting beam signal MB to be selected, in response to an input sound signal S2. The post processor 222 subtracts the pseudo regression sound signal corresponding to the input sound signal S2 from a first subtraction signal being output from the post processor 212 of the echo canceller 21, and outputs a resultant signal to the post processor 232 of the echo canceller 23.
The adaptive filter 231 of the echo canceller 23 generates a pseudo regression sound signal, which is based upon the directivity of the emitted sound to be set and the directivity of the collected sound of the particular sound collecting beam signal MB to be selected, in response to an input sound signal S3. The post processor 232 subtracts the pseudo regression sound signal corresponding to the input sound signal S3 from a second subtraction signal being output from the post processor 222 of the echo canceller 22, and outputs a resultant signal to the input/output I/F 12 as the output sound signal. Here, any one of the echo cancellers 21 to 23 is operated when one input sound signal is input, and any two of the echo cancellers 21 to 23 are operated when two input sound signals are input.
An appropriate echo elimination can be done by executing such echo canceling process, and only the talker's sound of user's own audio conference device can be sent out to the network as the output sound signal.
As described above, with the arrangement of the present embodiment, the audio conference device capable of detecting the talker's direction precisely and also outputting the talker's sound only at a high S/N ratio can be constructed.
In this case, an example where fourteen sound collecting beam signals are generated respectively is illustrated in the above explanation. But the numbers of beams may be set appropriately according to the specifications.
Also, an example where the band-pass filter is provided is illustrated in the above explanation. But such a configuration may be employed that no band-pass filter is provided.
Also, in the above explanation, the method of generating the detecting sound collecting beam signals MB101 to MB114 and the outputting sound collecting beam signals MB101′ to MB114′ simultaneously and then selecting the concerned sound collecting beam signal is illustrated. But the concerned outputting sound collecting beam signal may be generated from all sound collecting signals SS101 to SS116, based on the results of the detecting sound collecting beam signal MB101 to MB114.
Also, the sound collecting beam selecting portion 19 may select one sound collecting beam based on the amplitude of the sound collecting beam signal. Otherwise, the sound collecting beam selecting portion 19 may select one sound collecting beam based on a time average energy of the sound collecting beam signals, or the like.
Also, an interval between the microphones MIC 104 to MIC113 is set constant as D1, and an interval between the microphones MIC 101 to MIC104 and MIC 113 to MIC116 is set constant as D2. In this event, if the arrangement is not affected by the side lobe of the detecting beam in the detected frequency band, all microphone intervals may be made different, for example.
The present invention is explained in detail with reference to particular embodiment. But it is obvious for those skilled in the art that various variations and modifications can be applied without departing from a spirit and a scope of the present invention or an intended range.
The present invention is based upon Japanese Patent Application (Patent Application No. 2006-145697) filed on May 25, 2006; the contents of which are incorporated herein by reference.
Number | Date | Country | Kind |
---|---|---|---|
2006-145697 | May 2006 | JP | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP2007/060167 | 5/17/2007 | WO | 00 | 11/24/2008 |