This invention relates to a sound emitting and collecting apparatus for detecting the talker direction based on a collected sound signal.
Generally, a sound emitting and collecting apparatus for detecting the sound collecting direction in which the output of the microphone array becomes the maximum as the arrival direction of a sound source by changing the directivity of a microphone array made up of a plurality of microphones.
However, the sound emitting and collecting apparatus as described above involves a problem in that when a loudspeaker produces a sound, the produced sound is collected in the microphone, and the sound collection direction (azimuth) of the microphone positioned in the proximity of the loudspeaker is erroneously detected as the sound arrival direction.
Patent Document 1 discloses a sound emitting and collecting apparatus, when detecting a receiving signal from a communication destination, for preventing the directivity of a microphone array from aiming at a sound collecting area positioned in the proximity of the loudspeaker emitting a sound based on the receiving signal.
However, the sound emitting and collecting apparatus shown in Patent Document 1 involves a problem in that when a loudspeaker emits a sound based on the receiving signal (produced sound signal), the sound emitting and collecting apparatus cannot precisely detect the talker direction.
It is therefore an object of the invention to provide a sound emitting and collecting apparatus that can precisely detect the talker direction based on a collected sound signal even when a sound is emitted from a loudspeaker.
A sound emitting and collecting apparatus of the invention includes a sound emitting section, a plurality of sound collecting sections, a difference level calculation section, and a talker direction detection section, and emits a sound based on an emitting sound signal, collects a sound from the surroundings of the apparatus to generate a collected sound signal, and detects the talker direction based on the collected sound signal. The sound emitting section outputs an emitting sound based on the emitting sound signal. The plurality of sound collecting sections form sound collecting areas which are set so that the emitting sound from the sound emitting section is collected by all of the sound collecting sections equally and collect a sound from the sound collecting areas to generate a collected sound signal. The difference level calculation section calculates logarithm values of power of the collected sound signals from the plurality of sound collecting sections and an average value of the logarithm values of power of the collected sound signals and subtracts the average value from the logarithm value of power of each of the collected sound signals to generate difference level signals corresponding to the sound collecting sections respectively. The talker direction detection section compares level values of the difference level signals to detect the maximum value among the level values, and detects a direction of the sound collecting section corresponding to the difference level signal indicating the maximum value as a talker direction.
In this configuration, a sound is collected by the sound collecting areas which are set so that the sound emitted from the sound emitting section is collected by all of the sound collecting sections equally, to generate the collected sound signals. The logarithm values of power of the collected sound signals and the average value of the logarithm values of power of the collected sound signals are calculated. The average value is subtracted from the logarithm values of power of the collected sound signals to generate the difference level signals. Further, the sound collecting direction of the sound collection section corresponding to the difference level signal indicating the maximum value is detected as the talker direction. Accordingly, even when the sound emitting section emits a sound, the talker direction can be detected based on the sound collecting area indicating the maximum value by comparing the difference signals.
Preferably, the talker direction detection section presets a talker sound detection threshold value for the level value of the difference level signal. When the maximum value becomes larger than the talker sound detection threshold value, the talker direction detection section detects the direction of the sound collecting section corresponding to the difference level signal indicating the maximum value as the talker direction.
Preferably, the difference level calculation section calculates the logarithm values of power of the collected sound signals and the average value of the logarithm values of power of the collected sound signals using only a low frequency component of the collected sound signal.
Accordingly, the talker direction can be detected using the low frequency component much containing the frequency component of a voice of a human being, of the frequency components of the audible range contained in the collected sound signal.
According to the invention, in a sound emitting and collecting apparatus with a loudspeaker and a plurality of microphones installed in one case, if the loudspeaker emits a sound, the talker direction can be precisely detected based on the collected sound signal.
A sound emitting and collecting apparatus 1 according to one embodiment of the invention will be discussed below with reference to the accompanying drawings:
The sound emitting and collecting apparatus 1 has a tubular case (not shown) which becomes shaped like a circle on a top view.
As shown in
The loudspeakers SP1 and SP2 are provided in the case roughly in the center of the sound emitting and collecting apparatus 1 on the top view and emit a sound based on an emitting sound signal S with upper face side and lower face side areas of the case as sound emitting areas.
The microphone units MU1 to MU8 are placed so as to have 45-degree rotational symmetry with the placement position of the loudspeakers SP1 and SP2 as the center on the top view. Here, “the 45-degree rotational symmetry” means that when one pattern is rotated 45 degrees with the rotational symmetry center point as the reference, it overlaps the original pattern. The 45-degree rotational symmetry can also be represented as 8-fold rotational symmetry.
Sound collecting directivity is set in each of the microphone units MU1 to MU8 so as to collect a sound in each of sound collecting areas MA1 to MA8 respectively. The sound collecting areas MA1 to MA8 are formed so as to have 8-fold rotational symmetry with the placement position of the loudspeakers SP1 and SP2 as the center.
In such placement of the microphone units, echo sound transmission path lengths until an emitting sound from the loudspeaker SP1, SP2 is collected in the respective microphone units MU1 to MU8 through the sound collecting areas MA1 to MA8 become roughly the same in all microphone units MU1 to MU8. Accordingly, the echo sound level in which the microphone unit MU1 to MU8 collects the sound emitted from the loudspeaker SP1, SP2 can be made uniform.
The configuration of each of the microphone units MU1 to MU8 will be discussed below by taking the microphone unit MU1 as an example. The microphone units MU1 to MU8 differ only in sound collecting area and have the same configuration.
The microphone unit MU1 has microphones MIC1 to MIC4, linear filters F1 to F4, and an adder SU1.
The microphones MIC1 to MIC4 are placed in a row along a predetermined reference plane and have each predetermined sound collecting directivity.
The linear filters F1 to F4 perform delay processing for collected sound signals collected in the microphones MIC1 to MIC4. The adder SU1 performs combining processing of the collected sound signals subjected to the delay processing in the linear filters F1 to F4. Such a configuration and processing are used, thereby setting sound collecting directivity realizing the sound collecting area MA1 as the whole microphone unit MU1.
The adder SU1 outputs a composite signal SA1 resulting from the combining processing to the logarithm calculation section L1 (see
The logarithm calculation sections L1 to L8 calculate a logarithm value (logarithm power) of a low frequency component contained in the composite signal SAk output from the microphone unit MU1 to MU8 according to expression (1). k is a subscript from 1 to 8 indicating the microphone units MU1 to MU8.
Generally, the frequency band of the audible range of a human being is from 20 Hz to 20000 Hz; the voice of the human being much contains a frequency band component of 400 Hz to 4000 Hz of a comparatively low frequency component of the audible range.
Then, in the sound emitting and collecting apparatus 1, for example, the logarithm value of signal power of the frequency band of 400 Hz to 4000 Hz of the low frequency component mentioned above is used in the logarithm calculation sections L1 to L8. Accordingly, the frequency component much contained in the voice of a human being can be used for talker direction detection. Thus, the talker direction can be detected more precisely.
where xk indicates the signal level of the composite signal SAk (SA1 to SA8) and Pk indicates the logarithm value of the signal level (power level) of a power signal SBk (SB1 to SB8) for the composite signal SAk. k is a subscript of 1 to 8 indicating which of the microphone units MU1 to MU8 outputs the composite signal. t indicates the time. T is set according to the sampling time length of the composite signal SAk.
The logarithm calculation sections L1 to L8 output each the logarithm value Pk of the power level calculated according to Expression (1) mentioned above (see
The adder 10 and the amplification section 11 calculate power level average value AV from the logarithm value Pk of the power level based on Expression (2). More specifically, the adder 10 calculates the sum of the logarithm values Pk of the power levels and outputs the result to the amplification section 11. The amplification section 11 divides the sum of the power levels Pk of the logarithm values by the number of composite signals SAk, N, (in the embodiment, N=8), thereby calculating the power level average value AV.
The subtracters SR1 to SR8 subtract each the power level average value AV from the logarithm value Pk of the power level to generate a differential signal level Dk (see the following Expression (3)).
[Expression 3]
D
k
=P
k
−AV (3)
Here, Dk indicates the differential signal level.
The maximum value detection section 12 detects a differential signal level DkM indicating the maximum value from among the differential signal levels Dk and outputs the detected differential signal level DKM to the comparator 14 (see
The comparator 14 makes a comparison between a threshold value Th and the differential signal level DkM indicating the maximum value output from the maximum value detection section 12. If the differential signal level DkM is larger than the threshold value Th, the differential signal level DkM is output to the control section 20. The threshold value Th is a level at which it can be determined that the talker for the apparatus talks and the sound generated by the talk is collected, and is set from the differential signal level in a state that the collected sound level becomes as high as a predetermined level relative to the emitting sound level based on the level. On the other hand, if the differential signal level DkM becomes equal to or less than the threshold value Th, the comparator 14 does not output the differential signal level DkM to the control section 20. Accordingly, when the talker talks by a larger voice to some extent than the emitting sound in any of the sound collecting areas MA1 to MA8, the differential signal level DkM in the sound collecting area where the talker talks can be used for talker direction detection.
When the control section 20 accepts the differential signal level DkM from the comparator 14, the control section 20 outputs direction information associated with the microphone unit outputting the differential signal level DkM from among the microphone units MU1 to MU8 as talker direction information. The control section 20 maintains the detected talker position until the control section 20 newly accepts the differential signal level DkM exceeding the threshold value Th from the comparator 14.
Accordingly, if the loudspeaker SP1, SP2 emits a sound based on the emitting sound signal S, the talker direction can be precisely detected based on the composite signal SAk output from the microphone units MU1 to MU8.
In the sound emitting and collecting apparatus 1 according to the embodiment, the comparator 14 makes a comparison between the differential signal level DkM and the threshold value Th by way of example. However, the invention is not limited to this example. For example, it is also possible to output the differential signal level DkM indicating the maximum value directly to the control section 20 every predetermined time for detecting the talker direction instead of using the comparator 14.
As a detection method of the talker direction, it is also considered that a comparison is made between the signal level of the emitting sound signal S and signal levels xk of the composite signals SA1 to SA8, and the talker position is detected based on the difference signal therebetween. In this case, however, if an emitted sound does not exist, the value of the emitting sound signal S becomes 0. Therefore, if an attempt is made to perform calculation using the level of the emitting sound signal, “0,” as the reference level, a large calculation error easily occurs and it is feared that a problem will occur in signal processing. Since the emitting sound signal and the collected sound signal are differ in noise characteristic, if both are simply compared, it is difficult to detect the talker direction with good accuracy; this is also a problem.
On the other hand, in the sound emitting and collecting apparatus 1, the power level average value AV of the logarithm value is subtracted from the logarithm value Pk of the power level to calculate the differential signal level Dk, as shown in Expression (3). Thus, the differential signal level Dk can be calculated without directly using the signal level of the emitting sound signal S for the calculation expression. Thus, the talker direction can be detected with good accuracy based only on the signal level xk of the composite signals SA1 to SA8. In Expression (3), using the logarithm value, the differential signal level Dk can be calculated as the difference between the logarithm value Pk of the power level and the power level average value AV. Thus, the threshold value Th can be set as a fixed value and there is also the advantage that the talker direction can be detected using the threshold value Th of the fixed value.
In the embodiment, the threshold value Th is fixed by way of example. However, the invention is not limited to this example. For example, it is also possible to previously store a plurality of threshold values in the comparator 14. In this case, the threshold value Th can be switched in response to the use environment of the sound emitting and collecting apparatus 1.
Next, a specific example of talker direction detection of the sound emitting and collecting apparatus 1 will be discussed with
In a time zone I shown in
In a time zone II shown in
In this case, as shown in
In a time zone III shown in
In a time zone IV shown in
In this case, as shown in
By performing such processing, the talker direction can be reliably detected regardless of the sound emitting state from the loudspeaker SP1, SP2. If it becomes impossible to detect the talker direction according to the emitted sound level from the loudspeaker, the immediately preceding talker direction is maintained, whereby the talker direction does not disappear or does not change at random and a direction having the highest talker direction possibility can be maintained without modification.
In the embodiment described above, the microphone units MU1 to MU8 is placed like an octagon so as to have 8-fold rotational symmetry with the loudspeakers SP1 and SP2 as the center. However, the invention is not limited to this embodiment. That is, the echo sound of the sound emitted from the loudspeaker may be reached in all microphone units equally; for example, if the sound collecting areas are formed so as to have rotational symmetry with the loudspeakers SP1 and SP2 as the center, the microphone units may be placed like an equilateral triangle. In this case, the sound collecting areas in which the microphone units collect a sound can be formed so as to have 3-fold rotational symmetry, so that similar advantages to those of the embodiment described above can be achieved.
In the embodiment described above, the sound collecting areas MA1 to MA8 are formed so as to have rotational symmetry with the loudspeakers SP1 and SP2 as the center by way of example. However, the invention is not limited to the example. For example, the echo sound of the sound emitted from the loudspeaker in a predetermined sampling time width becomes equal in all microphone units collecting the sound, it is also possible to make setting so as to switch ON/OFF of the microphone unit collecting a sound for each predetermined sampling time width or change the shape of each sound collecting area. In this case, similar advantages to those of the embodiment described above can also be provided.
If the sound producing characteristic (directivity) from the loudspeaker SP1, SP2 is variable, the sound collecting directivity of each microphone unit may be controlled so as to obtain the echo sounds at the same level in all microphone units in response to the change. That is, if the echo sound levels in all microphone units become the same, the mechanical positional relationship is not limited.
It is to be understood that the description of the embodiment is illustrative and not restrictive. The scope of the invention is indicated by Claims rather than the embodiment described above. Further, all changes that fall within meets and bounds of the Claims or equivalence of such meets and bound are intended to embraced by Claims.
This application is based on Japanese Patent Application (No. 2007-257419) filed on Oct. 1, 2007, which is incorporated herein by reference.
Number | Date | Country | Kind |
---|---|---|---|
2007-257419 | Oct 2007 | JP | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP2008/067770 | 9/30/2008 | WO | 00 | 4/1/2010 |