This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2018-181307, filed on Sep. 27, 2018, the entire contents of which are incorporated herein by reference.
The embodiments discussed herein are related to a sound-source-direction determining apparatus, a sound-source-direction determining method, and a storage medium.
There are sound-source-direction determining apparatuses that determine the direction in which a sound source is located. In each of such sound-source-direction determining apparatuses, a first directional microphone is arranged to detect sound propagating in a first direction and a second directional microphone is arranged to detect sound propagating in a second direction that intersects with the first direction. If sound pressure of sound detected by the first directional microphone is greater than sound pressure of the sound detected by the second directional microphone, the sound-source-direction determining apparatus determines that the sound is sound that has propagated in the first direction. On the other hand, if sound pressure of sound detected by the second directional microphone is greater than sound pressure of the sound detected by the first directional microphone, the sound-source-direction determining apparatus determines that the sound is sound that has propagated in the second direction.
Examples of the related art documents include, for example, Japanese Laid-open Patent Publication No. 2018-40982; Watanabe et al., “Basic study on estimating the sound source position using directional microphone”, [online], [retrieved on Sep. 13, 2018], Internet (URL: http://www.cit.nihon-u.ac.jp/kouendata/No.41/2_denki/2-008.pdf); and Yamamoto Kohei, “Calculation Methods for Noise Screen Effect”, The journal of the INCE of Japan, Japan, Vol. 21, No. 3, pp. 143 to 147, 1997.
Directional microphones are larger in size and more costly than omnidirectional microphones. Thus, sound-source-direction determining apparatuses using directional microphones are undesirably larger in size and more costly than those using omnidirectional microphones.
According to an aspect of the embodiments, a sound-source-direction determining apparatus, includes a microphone disposed portion having therein a first sound path having a first end and a second end and a second sound path having a first end and a second end, the first sound path having, at the first end thereof, a first opening that is open at a first flat surface, sound propagating through the first sound path from the first opening, the second sound path having, at the first end thereof, a second opening that is open at a second flat surface intersecting with the first flat surface, sound propagating through the second sound path from the second opening, a first microphone that is omnidirectional and is disposed at or in the vicinity of the second end of the first sound path, a second microphone that is omnidirectional and is disposed at or in the vicinity of the second end of the second sound path, a speaker that outputs synthesized sound, and a processor, wherein the processor updates a reference threshold such that the reference threshold increases as a sound pressure difference increases, the sound pressure difference being a difference between sound pressure of a certain frequency component of sound acquired by the first microphone and sound pressure of the certain frequency component of the sound acquired by the second microphone when the synthesized sound is output from the speaker, and determines a direction in which a sound source of sound is located, based on comparison between the reference threshold and a sound pressure difference between sound pressure of a certain frequency component of the sound acquired by the first microphone and sound pressure of the certain frequency component of the sound acquired by the second microphone when the synthesized sound is not output from the speaker.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention.
It is desirable to increase the accuracy of determining the direction of the sound source by using omnidirectional microphones, regardless of the size of a gap between a housing of an information processing terminal and a wearer of the information processing terminal.
Hereinafter, an example of a first embodiment will be described in detail with reference to the accompanying drawings.
The sound-source-direction determining apparatus 10 includes a first microphone 11, a second microphone 12, a determining unit 13, an updating unit 14, and a speaker 15. The speech translating apparatus 16 includes a first translating unit 16A and a second translating unit 16B.
Each of the first microphone 11 and the second microphone 12 is an omnidirectional microphone, and acquires sound propagating from all directions. The determining unit 13 determines a direction in which a sound source of sound acquired by the first microphone 11 and the second microphone 12 is located (hereinafter referred to as the direction of the sound source).
The updating unit 14 updates a reference threshold used when the determining unit 13 determines the direction of the sound source. Based on the direction of the sound source determined by the determining unit 13, the speech translating apparatus 16 translates a language represented by a sound signal corresponding to the sound that propagates from the direction of the sound source and is acquired by the first microphone 11 or the second microphone 12 into a certain language.
Specifically, for example, when the determining unit 13 determines that the sound source is located in a first direction which is the upward direction, the first translating unit 16A translates a language represented by a sound signal corresponding to the acquired sound into a first language (for example, English). For example, when the determining unit 13 determines that the sound source is located in a second direction which is the forward direction, the second translating unit 16B translates a language represented by a sound signal corresponding to the acquired sound into a second language (for example, Japanese). The speaker 15 outputs the language obtained as a result of the first translating unit 16A or the second translating unit 16B translating the original language, voice guidance, and the like, by using synthesized sound.
An opening 11O provided at one end of a first sound path is present at the upper surface of the housing 18. The opening 11O is an example of a first opening. The first microphone 11 is disposed at the other end of the first sound path. An arrow FR in
An opening 12O provided at one end of a second sound path is present at the front surface of the housing 18. The second microphone 12 is disposed at the other end of the second sound path. An arrow UP in
The sound-source-direction determining apparatus 10 determines that sound whose sound source is determined to be located in the upward direction is voice uttered by the user. The sound-source-direction determining apparatus 10 then sends a sound signal corresponding to the sound to the first translating unit 16A of the speech translating apparatus 16 so that the sound is translated into the first language and the resulting voice is output from the speaker 15. The sound-source-direction determining apparatus 10 determines that sound whose sound source is determined to be located in the forward direction is voice uttered by the interaction partner. The sound-source-direction determining apparatus 10 sends a sound signal corresponding to the sound to the second translating unit 16B of the speech translating apparatus 16 so that the sound is translated into the second language and the resulting voice is output from the speaker 15.
The opening 11O that is open at the upper surface of the housing 18 is present at one end of a first sound path 11R. The first microphone 11 is disposed at the other end of the first sound path 11R.
On the other hand, in the case where the area of the front surface of the information processing terminal 1 is equal to 63 [square cm], which is an example of a size larger than the certain value, sound pressure of sound whose sound source is located in front of the information processing terminal 1 is equal to −24 [dBov]. Sound pressure of sound whose sound source is located above the information processing terminal 1 is equal to −30 [dBov]. Thus, the sound pressure difference between the sound pressure of the sound from the sound source located in front of the information processing terminal 1 and the sound pressure of the sound from the sound source located above the information processing terminal 1 is equal to 6 [dB].
That is, the sound pressure difference is larger and thus it is easier to determine the direction of the sound source in the case where the area of the front surface of the information processing terminal 1 is equal to 63 [square cm] than in the case where the area of the front surface of the information processing terminal 1 is equal to 2 [square cm]. This is because sound whose sound source is located in front of the information processing terminal 1 is sufficiently reflected if the area of the front surface is larger than the certain value.
The certain value may be, for example, 1000 times the cross-sectional area of the sound path. Specifically, in the case where the diameter of the microphone hole of the second microphone 12 is equal to 0.5 [mm], for example, and the second sound path 12R has a circular cross section whose diameter is 1 [mm], which is twice the diameter of the microphone hole of the second microphone 12, the area may be larger than approximately 785 [square mm]. For example, the second sound path 12R may have a uniform diameter from the one end to the other end. Alternatively, the diameter of the second sound path 12R may gradually decrease from the one end toward the other end. The second sound path 12R may also have a quadrangular cross section, for example.
The length from the one end to the other end of the second sound path 12R may be equal to, for example, 3 [mm]. However, the length may be longer than or shorter than 3 [mm]. The second sound path 12R may be orthogonal to the front surface of the housing 18. Alternatively, the second sound path 12R and the front surface of the housing 18 may intersect at an angle other than 90 [degrees].
Sound pressures obtained by the first microphone 11 in the case where the sound source is located above the information processing terminal 1 and in the case where the sound source is located in front of the information processing terminal 1 will be described with reference to
The length of the upper surface of the housing 18 in the front-rear direction is short and the area of the upper surface is less than or equal to the certain value. Thus, in the case where the sound source is located above the information processing terminal 1, acquisition of reflected sound and diffracted sound of sound illustrated in
Specifically, a distance between the solid line and the broken line in the vertical direction represents the sound pressure difference between the sound pressure of the sound acquired by the first microphone 11 in the case where the sound source is located above the information processing terminal 1 and the sound pressure of the sound acquired by the first microphone 11 in the case where the sound source is located in front of the information processing terminal 1. The horizontal axis of the graph in
A sound attenuation amount R [dB] due to diffraction is expressed by Equation (1), for example.
In Equation (1), N is a Fresnel number and is denoted by Equation (2).
N=δ/(A/2)=δ·f/165 (2)
In Equation (2), 6 denotes a path difference [m] between a diffraction path and a direct path, A denotes a wavelength [m] of the sound, and f denotes a frequency [Hz] of the sound. Equation (2) assumes the case where the sound velocity (=λ×f) is equal to 330 [m/s]. That is, as illustrated in the graph of
In the case where the diameter of the microphone hole of the first microphone 11 is equal to 0.5 [mm], the first sound path 11R may have a circular cross section having a diameter of 1 mm, which is twice the diameter of the microphone hole. For example, the first sound path 11R may have a uniform diameter from the one end to the other end. Alternatively, the diameter of the first sound path 11R may gradually decrease from the one end toward the other end.
The first sound path 11R may have a diameter that gradually decreases from the one end toward the bend 11K and that is uniform from the bend 11K to the other end. Further, the first sound path 11R may have a quadrangular cross section, for example.
The length from the one end to the bend 11K of the first sound path 11R and the length from the bend 11K to the other end of the first sound path 11R may be equal to, for example, 3 [mm]. Alternatively, the lengths may be longer than or shorter than 3 [mm]. In addition, a portion from the one end to the bend 11K of the first sound path 11R may be orthogonal to the upper surface of the housing 18. Alternatively, the portion of the first sound path 11R may intersect with the upper surface of the housing 18 at an angle other than 90 [degrees]. Further, a portion from the bend 11K to the other end of the first sound path 11R may be orthogonal to the portion from the one end to the bend 11K of the first sound path 11R. Alternatively, the portions may intersect at an angle other than 90 [degrees].
Further, the first microphone 11 is surrounded by a side wall constituting the first sound path 11R and the other end of the first sound path 11R. There is no gap between the other end and the side wall of the first sound path 11R. The first microphone 11 is open in a direction toward the opening 11O. Also, the second microphone 12 is surrounded by a side wall constituting the second sound path 12R and the other end of the second sound path 12R. There is no gap between the other end and the side wall of the second sound path 12R. The second microphone 12 is open in a direction toward the opening 12O. The upper surface and the front surface of the housing 18 are orthogonal to each other. However, the first embodiment is not limited to an example in which the upper surface and the front surface of the housing 18 are orthogonal to each other. The upper surface and the front surface of the housing 18 may intersect at an angle other than 90 [degrees].
As described above, the sound pressure difference between the sound pressure of the sound acquired by the first microphone 11 and the sound pressure of the sound acquired by the second microphone 12 appears markedly at high-frequency components. Therefore, a high-frequency sound-pressure-difference calculating unit 13C calculates, as a high-frequency sound pressure difference, an average of sound pressure differences in respective frequency bands at frequencies higher than a certain frequency. A sound-source-direction determining unit 13D determines the position of the sound source based on the high-frequency sound pressure difference calculated by the high-frequency sound-pressure-difference calculating unit 13C.
Specifically, the high-frequency sound-pressure-difference calculating unit 13C calculates spectral power pow1[bin] of the sound signal corresponding to the sound acquired by the first microphone 11, by using Equation (3). The high-frequency sound-pressure-difference calculating unit 13C calculates spectral power pow2[bin] of the sound signal corresponding to the sound acquired by the second microphone 12, by using Equation (4).
pow1[bin]=re1[bin]2+im1[bin]2 (3)
pow2[bin]=re2[bin]2+im2[bin]2 (4)
In Equations (3) and (4), bin=0, . . . , F−1, and F denotes the number of frequency bands and may be equal to 256, for example. In Equation (3), re1[bin] denotes the real part of the frequency spectrum of the frequency band bin, which is obtained when the sound signal of the sound acquired by the first microphone 11 is subjected to the time-frequency conversion. In addition, im1[bin] denotes the imaginary part of the frequency spectrum of the frequency band bin, which is obtained when the sound signal of the sound acquired by the first microphone 11 is subjected to the time-frequency conversion.
In Equation (4), re2[bin] denotes the real part of the frequency spectrum of the frequency band bin, which is obtained when the sound signal of the sound acquired by the second microphone 12 is subjected to the time-frequency conversion. In addition, im2[bin] is the imaginary part of the frequency spectrum of the frequency band bin, which is obtained when the sound signal of the sound acquired by the second microphone 12 is subjected to the time-frequency conversion.
Then, the high-frequency sound-pressure-difference calculating unit 13C calculates a high-frequency sound pressure difference d_pow by using Equation (5).
d_pow=(Σi=sF-110 log10(pow[i]/pow2[i]))/((F−1)−s) (5)
The high-frequency sound pressure difference d_pow is an example of a difference between a first sound pressure and a second sound pressure. The high-frequency sound pressure difference d_pow is an average of values obtained by subtracting the logarithm of the spectral power pow2[i] from the logarithm of the spectral power pow1[i]. In Equation (5), s denotes the lower limit of the frequency band number of the high-frequency bands and may be equal to 96, for example. In the case where the sampling frequency of the sound signal is equal to 16 [kHz] and s is equal to 96, the high frequency bands indicate 3000 [Hz] to 8 [kHz].
The sound-source-direction determining unit 13D compares the high-frequency sound pressure difference d_pow with a reference threshold. If the high-frequency sound pressure difference d_pow is greater than the reference threshold, the sound-source-direction determining unit 13D determines that the sound source is located at a position facing the upper surface of the housing 18, that is, above the housing 18. If the high-frequency sound pressure difference d_pow is equal to or less than the reference threshold, the sound-source-direction determining unit 13D determines that the sound source is located at a position facing the front surface of the housing 18, that is, in front of the housing 18.
When the high-frequency sound pressure difference d_pow is determined, the spectral power for the second microphone 12 for which the opening 12O is provided at the front surface of the housing 18 is used as a reference in Equation (5). However, as indicated by Equation (6), the determination result changes in the case where the high-frequency sound pressure difference d_pow is determined by using, as the reference, the spectral power for the first microphone 11 for which the opening 11O is provided at the upper surface of the housing 18.
d_pow=(Σi=1F-110 log10(pow2[i]/pow1[i])/((F−1)−s) (6)
The sound-source-direction determining unit 13D compares the high-frequency sound pressure difference d_pow with the reference threshold. If the high-frequency sound pressure difference d_pow is greater than the reference threshold, the sound-source-direction determining unit 13D determines that the sound source is located at a position facing the front surface of the housing 18, that is, in front of the housing 18. If the high-frequency sound pressure difference d_pow is equal to or less than the reference threshold, the sound-source-direction determining unit 13D determines that the sound source is located at a position facing the upper surface of the housing 18, that is, above the housing 18.
Note that Equations (5) and (6) used to determine the high-frequency sound pressure difference are merely examples and the first embodiment is not limited to these equations. Further, the example has been described in which the high-frequency sound pressure difference, which is a difference between sound pressure of a high-frequency component of sound acquired by the first microphone 11 and sound pressure of the high-frequency component of the sound acquired by the second microphone 12, is used. However, the first embodiment is not limited to this example.
A difference between sound pressure of a certain frequency component of sound acquired by the first microphone 11 and sound pressure of the certain frequency component of the sound acquired by the second microphone 12 may be used instead of the high-frequency sound pressure difference. The certain frequency component may be a high-frequency component or a frequency component for which the sound pressure difference appears markedly between the first microphone 11 and the second microphone 12 depending on the direction of the sound source.
The updating unit 14 updates the reference threshold. The sound pressure difference changes depending on the size of a gap between the body of a wearer and the information processing terminal 1. Thus, the direction of the sound source may be erroneously determined if a fixed threshold is used to determine the direction of the sound source. The size of the gap between the body of the wearer and the information processing terminal 1 changes depending on the posture or the like of the wearer.
The updating unit 14 updates the reference threshold based on a sound pressure difference of sound collected when synthesized sound is reproduced. In the case where a synthesized-sound output control unit 14A performs control so that synthesized sound is output from the speaker 15, the high-frequency sound pressure difference calculated by the high-frequency sound-pressure-difference calculating unit 13C is output to a reference threshold updating unit 14B instead of being output to the sound-source-direction determining unit 13D.
The reference threshold updating unit 14B updates the reference threshold such that the reference threshold increases as the sound pressure difference of the sound collected when the synthesized sound is reproduced increases. Specifically, for example, as indicated by Equation (7), the reference threshold updating unit 14B updates the reference threshold by adding, to an initial threshold TH, a value obtained by subtracting a minimum sound pressure difference DX_MIN obtained when the synthesized sound is reproduced from an average sound pressure dx of the synthesized sound interval and by multiplying the subtraction result by a correction coefficient a. The correction coefficient varies depending on the positions of the speaker 15, the first microphone 11, and the second microphone 12. The correction coefficient may be experimentally determined in advance. The initial threshold TH may be equal to 0.0 [dB], for example. The minimum sound pressure difference DX_MIN may be equal to 3.0 [dB], for example. The correction coefficient a may be equal to 0.75, for example.
Reference Threshold=TH+(dX−DX_MIN)*a (7)
The calculations described above may be performed in advance, and the reference thresholds corresponding to the respective average sound pressure differences of the synthesized sound interval may be stored in a table in advance.
As illustrated in
When the threshold is set to TH_CH1, the sound pressure difference obtained in the case GU where the sound source is located above and there is a gap is less than the threshold TH_CH1. Thus, it is determined that the corresponding sound is sound propagating from the front. On the other hand, when the threshold is set to TH_C2, which is smaller than the threshold TH_C1, the sound pressure difference obtained in the case NF where the sound source is located in front and there is no gap is greater than the threshold TH_C2. Thus, it is determined that the corresponding sound is sound propagating from the above. That is, since the sound pressure of the sound acquired by the first microphone 11 changes depending on the size of the gap between the information processing terminal 1 and the body UB of the wearer, the direction of the sound source may be erroneously determined.
In the first embodiment, the reference threshold is updated by using sound collected when synthesized sound is reproduced so that the direction of the sound source is not erroneously determined depending on the size of the gap between the information processing terminal 1 and the body UB of the wearer. The information processing terminal 1 expectedly reproduces synthesized sound such as guidance and notifications of translation results frequently.
As illustrated in
Sound pressure differences have been measured for the case where there is a gap and the case where there is no gap when five kinds of synthesized sound are reproduced and collected. The result of the measurement has confirmed that a clear difference of 3 [dB] to 5 [dB] exists for the sound pressure differences of the sound collected at the time of reproduction of the synthesized sound in the case where there is a gap and the case where there is no gap. That is, the size of the gap is successfully determined based on the sound pressure difference of sound collected when synthesized sound is reproduced.
Thus, in the first embodiment, the reference threshold is updated by using Equation (7), for example, such that the reference threshold increases as the average sound pressure difference dx of the synthesized sound interval increases as illustrated in
The CPU 51, the primary storage unit 52, the secondary storage unit 53, the external interface 54, the first microphone 11, the second microphone 12, and the speaker 15 are connected to each other via a bus 59.
The primary storage unit 52 is, for example, a volatile memory such as a random access memory (RAM).
The secondary storage unit 53 includes a program storage area 53A and a data storage area 53B. The program storage area 53A stores, by way of example, programs such as a sound-source-direction determining program and a speech translating program. The sound-source-direction determining program causes the CPU 51 to execute the sound-source-direction determining process. The speech translating program causes the CPU 51 to execute a speech translating process based on the determination result obtained in the sound-source-direction determining process. The data storage area 53B stores sound signals corresponding to sound acquired by the first microphone 11 and the second microphone 12, intermediate data temporarily generated in the sound-source-direction determining process and the speech translating process, and so forth.
The CPU 51 reads out the sound-source-direction determining program from the program storage area 53A and loads the sound-source-direction determining program to the primary storage unit 52. The CPU 51 executes the sound-source-direction determining program to operate as the determining unit 13 and the updating unit 14 illustrated in
An external device is connected to the external interface 54. The external interface 54 manages transmission and reception of various kinds of information performed between the external device and the CPU 51. For example, the speaker 15 may be an external device that is connected via the external interface 54, instead of being included in the information processing terminal 1.
An overview of an operation performed by the information processing terminal 1 will be described next.
In step 102, the CPU 51 performs time-frequency conversion on each of the sound signals read in step 101. In step 103, the CPU 51 calculates the spectral power of each of the sound signals subjected to the time-frequency conversion by using Equations (3) and (4), and calculates the high-frequency sound pressure difference d_pow by using Equation (5).
In step 104, the CPU 51 determines whether or not the sound signals read in step 101 are sound signals of a synthesized sound interval. Since synthesized sound is output under the control of the CPU 51, the CPU 51 may determine whether or not the synthesized sound is being output by the CPU 51.
If the determination in step 104 is YES, the CPU 51 cumulatively adds the high-frequency sound pressure difference d_pow in step 107. The process then returns to step 101. If the determination in step 104 is NO, the CPU 51 determines whether or not the previous frame is in the synthesized sound interval in step 108.
If the determination in step 108 is YES, the CPU 51 calculates in step 109 the average sound pressure difference dx by dividing the cumulative sum of the high-frequency sound pressure difference d_pow calculated in step 107 by the number of frames of the synthesized sound interval for which the cumulative addition has been performed. The CPU 51 updates the reference threshold based on the average sound pressure difference dx by using, for example, Equation (7). The process then proceeds to step 110. If the determination in step 108 is NO, the CPU 51 does not update the reference threshold. The process then proceeds to step 110.
In step 110, the CPU 51 determines whether or not the sound signals read in step 101 are sound signals of an utterance interval. An existing utterance interval determining technique may be used to determine whether or not the target interval is an utterance interval.
If the determination made by the CPU 51 in step 110 is NO, the process returns to step 101. If the determination in step 110 is YES, the CPU 51 compares in step 111 the high-frequency sound pressure difference d_pow calculated in step 103 with the reference threshold updated in step 109. If the high-frequency sound pressure difference d_pow is greater than the reference threshold, the CPU 51 determines that the sound source is located above the information processing terminal 1. The process then proceeds to step 112. In step 112, the CPU 51 distributes the sound signals to a process of translating a second language into a first language. The process then proceeds to step 114. The distributed sound signals are translated from the second language into the first language by using an existing speech translation processing technology. The result is output as voice from the speaker 15, for example.
If it is determined that the high-frequency sound pressure difference d_pow is equal to or less than the reference threshold, the CPU 51 determines that the sound source is located in front of the information processing terminal 1 in step 111. In step 113, the CPU 51 distributes the sound signals to a process of translating the first language into the second language. The process then proceeds to step 114. The distributed sound signals are translated from the first language into the second language by using an existing speech translation processing technology. The result is output as voice from the speaker 15, for example.
In step 114, the CPU 51 determines whether or not the sound-source-direction determining function of the information processing terminal 1 is turned off by a user operation, for example. If the determination in step 114 is NO, that is, if the sound-source-direction determining function is ON, the process returns to step 101. In step 101, the CPU 51 reads sound signals of the next frame and continues the sound-source-direction determining process. If the determination in step 114 is NO, that is, if the sound-source-direction determining function is OFF, the CPU 51 ends the sound-source-direction determining process.
The case where the speech translating apparatus 16 is included in the housing 18 of the information processing terminal 1 together with the sound-source-direction determining apparatus 10 has been described. However, the first embodiment is not limited to this configuration. For example, the speech translating apparatus 16 may be located outside the housing 18 of the information processing terminal 1 and may be connected to the sound-source-direction determining apparatus 10 via a wired or wireless link.
If the high-frequency sound pressure difference d_pow is greater than the reference threshold, it is determined in step 111 that the sound source is located above the information processing terminal 1. If the high-frequency sound pressure difference d_pow is equal to or less than the reference threshold, it is determined that the sound source is located in front of the information processing terminal 1. Such an example has been described. However, the first embodiment is not limited to this example.
For example, if the high-frequency sound pressure difference d_pow is greater than a reference threshold+DT, it may be determined that the sound source is located above the information processing terminal 1. If the high-frequency sound pressure difference d_pow is less than a reference threshold −DT, it may be determined that the sound source is located in front of the information processing terminal 1. In this case, if the high-frequency sound pressure difference d_pow is equal to or less than the reference threshold+DT and is equal to or greater than the reference threshold −DT, the direction of the sound source is not determined. DT may be equal to, for example, 0.5 [dB]. This configuration may further reduce the possibility that the direction of the sound source is erroneously determined.
In the first embodiment, a sound-source-direction determining apparatus includes a microphone disposed portion having therein a first sound path and a second sound path. The first sound path has a first opening at one end thereof. The first opening is open at a first flat surface. Sound propagates through the first sound path from the first opening. The second sound path has a second opening at one end thereof. The second opening is open at a second flat surface that intersects with the first flat surface. Sound propagates through the second sound path from the second opening. The sound-source-direction determining apparatus further includes a first microphone, a second microphone, and a speaker. The first microphone is omnidirectional and is disposed at or in the vicinity of the other end of the first sound path. The second microphone is omnidirectional and is disposed at or in the vicinity of the other end of the second sound path. The speaker outputs synthesized sound. An updating unit updates a reference threshold such that the reference threshold increases as a sound pressure difference increases. The sound pressure difference is a difference between sound pressure of a certain frequency component of sound acquired by the first microphone and sound pressure of the certain frequency component of the sound acquired by the second microphone when the synthesized sound is output from the speaker. A determining unit determines a direction in which a sound source of sound is located, based on comparison between the reference threshold and a sound pressure difference between sound pressure of a certain frequency component of the sound acquired by the first microphone and sound pressure of the certain frequency component of the sound acquired by the second microphone when the synthesized sound is not output from the speaker.
According to the first embodiment, with the above-described configuration, the accuracy of determining the direction of the sound source by using omnidirectional microphones is successfully increased, regardless of the size of a gap between the information processing terminal and the body of a wearer.
An example of a second embodiment will be described next. The description of the configuration and operation that are substantially the same as those of the first embodiment will be omitted.
In the second embodiment, the reference threshold is updated by using a sound pressure difference of synthesized sound of a frame that is less affected by noise. If sound other than synthesized sound, that is, noise is present in a synthesized sound interval, the sound pressure difference of the synthesized sound is not appropriately obtained. Consequently, the reference threshold is not appropriately updated. The noise is, for example, sound generated by an utterance of an interaction partner.
As illustrated in
Therefore, even if the reference threshold is updated by using the sound pressure difference between the sound pressure of the sound acquired by the first microphone 11 and the sound pressure of the sound acquired by the second microphone 12 in the synthesized sound interval, an appropriate reference threshold may not be obtained.
In
The top chart of
The reference threshold updating unit 14B illustrated in
In Equation (8), res[bin] denotes the real part of the frequency spectrum of the frequency band bin, which is obtained when the sound signal of the synthesized sound is subjected to the time-frequency conversion. In addition, ims[bin] denotes the imaginary part of the frequency spectrum of the frequency band bin, which is obtained when the sound signal of the synthesized sound is subjected to the time-frequency conversion. Data of the synthesized sound is stored in the data storage area 53B. Data corresponding to a frame of the synthesized sound, output of which is controlled by the synthesized-sound output control unit 14B, is used.
The similarities d1 and d2 are calculated by using all the frequency bands, that is, i=0 to 255. However, the similarities d1 and d2 may be calculated by using frequency bands excluding a low-frequency component such as a direct-current frequency component, for example. The inner product may be used to calculate the similarities d1 and d2 as indicated by the Equation (9).
d1=Σi=0F-1pow1[i]·pows[i]
d2=Σi=0F-1pow2[i]·pows[i] (9)
The covariance may be used to calculate the similarities d1 and d2 as indicated by Equation (10).
d1=Σi=0F-1(pow1[i]−pow1ave)(pows[i]−powsave)
d2=Σi=0F-1(pow2[i]−pow2ave)(pows[i]−powsave) (10)
An overview of an operation performed by the sound-source-direction determining apparatus 10 included in the information processing terminal 1 will be described next.
In step 105, the CPU 51 calculates the similarity d1 between the sound collected by the first microphone 11 and the synthesized sound and the similarity d2 between the sound collected by the second microphone 12 and the synthesized sound by using, for example, Equation (8). In step 106, the CPU 51 determines whether or not both of the similarities d1 and d2 exceed a certain similarity threshold. The certain similarity threshold may be equal to, for example, 0.6.
If the determination made by the CPU 51 in step 106 is YES, the process proceeds to step 107. If the determination made by the CPU 51 in step 106 is NO, the process returns to step 101.
In the second embodiment, the updating unit calculates a similarity between the synthesized sound output from the speaker and the sound acquired by the first microphone when the synthesized sound is output from the speaker and a similarity between the synthesized sound output from the speaker and the sound acquired by the second microphone when the synthesized sound is output from the speaker. If both of the similarities exceed a similarity threshold, the updating unit updates the reference threshold such that the reference threshold increases as the sound pressure difference between sound pressures of a certain frequency component of sound acquired by the first microphone and the second microphone when the synthesized sound is output from the speaker increases.
In the second embodiment, the reference threshold may be appropriately updated by reducing the influence of the noise. Thus, the accuracy of determining the direction of the sound source by using omnidirectional microphones may be further increased, regardless of the size of a gap between the housing of the information processing terminal and the wearer of the information processing terminal.
An example of a third embodiment will be described next. The description of the configuration and operation that are substantially the same as those of the first and second embodiments will be omitted.
In the third embodiment, a first sound path 11AR has a diffraction portion, which is an example of a first diffraction portion that diffracts sound, at an opening 11AO. The first sound path 11AR also has a diffraction portion, which is a bend 11AK that diffracts sound and is an example of a second diffraction portion, midway thereof. A second sound path 12AR has a diffraction portion, which is an example of a third diffraction portion that diffracts sound, at a second opening 12AO. The second sound path 12AR also has a diffraction portion, which is a bend 12AK that diffracts sound and is an example of a fourth diffraction portion, midway thereof.
The front surface of the housing 18A of the information processing terminal 1A has an area greater than the certain value as in the first and second embodiments. The second sound path 12AR has midway thereof the bend 12AK that is a diffraction portion, unlike the first and second embodiments.
In the third embodiment, with the above-described configuration, the accuracy of determining the direction of the sound source by using omnidirectional microphones may be increased based on a sound reduction in a certain frequency component (for example, a high-frequency component) due to diffraction. Thus, the accuracy of determining the direction of the sound source by using omnidirectional microphones may be further increased, regardless of the size of a gap between the housing of the information processing terminal and the wearer of the information processing terminal.
In the first to third embodiments, the example has been described in which a sound signal, for which the direction of the sound source is determined, is translated by the speech translating apparatus 16 from the first language into the second language or from the second language into the first language depending on the direction of the sound source. However, the first to third embodiments are not limited to this example. The speech translating apparatus 16 may include, for example, only one of the first translating unit 16A and the second translating unit 16B.
Also, the information processing terminal 1 may include a conference support apparatus or the like instead of the speech translating apparatus 16. The processing order illustrated in the flowcharts of
[Related Art]
The related art will be described next. In the related art, two directional microphones are arranged such that directivity 11XOR of a directional microphone 11X and directivity 12XOR of a directional microphone 12X intersect with each other as illustrated in
With this configuration, the direction of the sound source may be determined by using a sound pressure difference between sound pressure of sound acquired by the directional microphone 11X and sound pressure of the sound acquired by the directional microphone 12X. Specifically, if the sound pressure of the sound acquired by the directional microphone 11X is greater than the sound pressure of the sound acquired by the directional microphone 12X, the sound source is located above. If the sound pressure of the sound acquired by the directional microphone 12X is greater than the sound pressure of the sound acquired by the directional microphone 11X, the sound source is located in front.
However, directional microphones are larger than omnidirectional microphones as illustrated in
It is difficult to implement a sound-source-direction determining apparatus capable of accurately determining the direction of the sound source, by simply replacing the directional microphones of the sound-source-direction determining apparatus illustrated in
When the sound source is located in front of the information processing terminals 1 and 1Y, the sound pressure difference between the sound pressure of the sound acquired by the first microphone and the sound pressure of the sound acquired by the second microphone is equal to −2.9 [dB] in the related art and is equal to −4.2 [dB] in the first embodiment. That is, when the sound source is located above the information processing terminals 1 and 1Y, the sound pressure difference calculated in the first embodiment is greater than that of the related art by 4.3 [dB]. When the sound source is located in front of the information processing terminals 1 and 1Y, the sound pressure difference calculated in the first embodiment is smaller than that of the related art by 1.3 [dB].
Therefore, in the first embodiment, the possibility of obtaining an erroneous determination result as a result of the determination performed in step 111 of
All examples and conditional language provided herein are intended for the pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although one or more embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2018-181307 | Sep 2018 | JP | national |