A preferred embodiment of the present invention relates to a sound pickup device and a sound pickup method that obtain sound from a sound source by using a microphone.
Japanese Unexamined Patent Application Publication No. 2016-042613, Japanese Unexamined Patent Application Publication No. 2013-061421, and Japanese Unexamined Patent Application Publication No. 2006-129434 disclose a technique to obtain coherence of two microphones, and emphasize a target sound such as voice of a speaker.
For example, the technique of Japanese Unexamined Patent Application Publication No. 2016-042613 obtains an average coherence of two signals by using two non-directional microphones and determines whether or not the sound is a target sound based on an obtained average coherence value.
The conventional technique does not disclose that distant noise is reduced.
In view of the foregoing, an object of a preferred embodiment of the present invention is to provide a sound pickup device and a sound pickup method that are able to reduce distant noise with higher accuracy than conventionally.
A sound pickup device according to a preferred embodiment of the present invention includes a correlation calculator and a level controller. The correlation calculator obtains a correlation between a first sound pickup signal to be generated from a first microphone and a second sound pickup signal to be generated from a second microphone. The level controller performs level control of the first sound pickup signal or the second sound pickup signal, according to a ratio of a frequency component of which the correlation exceeds a threshold value.
According to a preferred embodiment of the present invention, distant noise is able to be reduced with higher accuracy than conventionally.
The above and other elements, features, steps, characteristics and advantages of the present invention will become more apparent from the following detailed description of the preferred embodiments with reference to the attached drawings.
A sound pickup device of the present preferred embodiment includes a first microphone, a second microphone, and a level controller. The level controller obtains a correlation between a first sound pickup signal to be generated from the first microphone and a second sound pickup signal to be generated from the second microphone, and performs level control of the first sound pickup signal or the second sound pickup signal, according to a ratio of a frequency component of which the correlation exceeds a threshold value.
Since nearby sound and distant sound include at least a reflected sound, coherence of a frequency may be extremely reduced. When a calculated value includes such an extremely low value of coherence, the average may be reduced. However, the ratio only affects how many frequency components that are equal to or greater than a threshold value are present, and whether the value itself of the coherence in a frequency that is less than a threshold value is a low value or a high value does not affect the level control at all. Accordingly, the sound pickup device, by performing the level control according to the ratio, a target sound is able to be emphasized with high accuracy and distant noise is able to be reduced.
The microphone 10A and the microphone 10B are disposed on an upper surface of the housing 70. However, the shape of the housing 70 and the placement aspect of the microphones are merely examples and are not limited to these examples.
The level controller 15 receives an input of a sound pickup signal S1 of the microphone 10A and a sound pickup signal S2 of the microphone 10B. The level controller 15 performs level control of the sound pickup signal S1 of the microphone 10A or the sound pickup signal S2 of the microphone 10B, and outputs the signal to the I/F 19. The I/F 19 is a communication interface such as a USB or a LAN. The sound pickup device 1A outputs a pickup signal to other devices through the I/F 19.
The coherence calculator 20 receives an input of the sound pickup signal S1 of the microphone 10A and the sound pickup signal S2 of the microphone 10B. The coherence calculator 20 calculates coherence of the sound pickup signal S1 and the sound pickup signal S2 as an example of the correlation.
The gain controller 21 determines a gain of the gain adjuster 22, based on a calculation result of the coherence calculator 20. The gain adjuster 22 receives an input of the sound pickup signal S2. The gain adjuster 22 adjusts a gain of the sound pickup signal S2, and outputs the adjusted signal to the I/F 19.
It is to be noted that, while this example shows an aspect in which the gain of the sound pickup signal S2 of the microphone 10B is adjusted and the signal is outputted to the I/F 19, an aspect in which a gain of the sound pickup signal S1 of the microphone 10A is adjusted and the adjusted signal is outputted to the I/F 19 may be employed. However, the microphone 10B as a non-directional microphone is able to pick up sound of the whole surroundings. Therefore, it is preferable to adjust the gain of the sound pickup signal S2 of the microphone 10B, and to output the adjusted signal to the I/F 19.
The coherence calculator 20 converts the signals into a signal X(f, k) and a signal Y(f, k) of a frequency axis (S11) by applying the Fourier transform to each of the sound pickup signal S1 and the sound pickup signal S2. The “f” represents a frequency and the “k” represents a frame number. The coherence calculator 20 calculates coherence (a time average value of the complex cross spectrum) according to the following Expression 1 (S12).
However, the Expression 1 is an example. For example, the coherence calculator 20 may calculate the coherence according to the following Expression 2 or Expression 3.
It is to be noted that the “m” represents a cycle number (an identification number that represents a group of signals including a predetermined number of frames) and the “T” represents the number of frames of 1 cycle.
The gain controller 21 determines the gain of the gain adjuster 22, based on the coherence. For example, the gain controller 21 obtains a ratio R(k) of a frequency bin of which the amplitude of the coherence exceeds a predetermined threshold value γth, with respect to all frequencies (the number of frequency bins) (S13).
The threshold value γth is set to γth=0.6, for example. It is to be noted that f0 in the Expression 4 is a lower limit frequency bin, and f1 is an upper limit frequency bin.
The gain controller 21 determines the gain of the gain adjuster 22 according to this ratio R(k) (S14). More specifically, the gain controller 21 determines whether or not coherence exceeds a threshold value γth for each frequency bin, totals the number of frequency bins that exceed the threshold value, and determines a gain according to a total result.
Coherence shows a high value when the correlation between two signals is high. Distant sound has a large number of reverberant sound components, and is a sound of which an arrival direction is not fixed. For example, in a case in which the microphone 10A has directivity and the microphone 10B is non-directivity, sound pickup capability to distant sound is greatly different. Therefore, coherence is reduced in a case in which sound from a distant sound source is inputted, and is increased in a case in which sound from a sound source near the device is inputted.
Therefore, the sound pickup device 1A does not pick up sound from a sound source far from the device, and is able to emphasize sound from a sound source near the device as a target sound.
The sound pickup device 1A of the present preferred embodiment has shown an example in which the gain controller 21 obtains the ratio R(k) of a frequency of which the coherence exceeds a predetermined threshold value γth, with respect to all frequencies, and performs gain control according to the ratio. Since nearby sound and distant sound include a reflected sound, the coherence of a frequency may be extremely reduced. When such an extremely low value is included, the average may be reduced. However, the ratio R(k) only affects how many frequency components that are equal to or greater than a threshold value are present, and whether the value itself of the coherence that is less than a threshold value is a low value or a high value does not affect gain control at all, so that, by performing the gain control according to the ratio R(k), distant noise is able to be reduced and a target sound is able to be emphasized with high accuracy.
It is to be noted that, although the predetermined value R1 and the predetermined value R2 may be set to any value, the predetermined value R1 is preferably set according to the maximum range in which sound is desired to be picked up without being attenuated. For example, in a case in which the position of a sound source is farther than about 30 cm in radius and in a case in which a value of the ratio R of coherence is reduced, a value of the ratio R of coherence when a distance is about 40 cm is set to the predetermined value R1, so that sound is able to be picked up without being attenuated up to a distance of about 40 cm in radius. In addition, the predetermined value R2 is set according to the minimum range in which sound is desired to be attenuated. For example, a value of the ratio R when a distance is 100 cm is set to the predetermined value R2, so that sound is hardly picked up when a distance is 100 cm or more while sound is picked up as the gain is gradually increased when a distance is closer to 100 cm.
In addition, the predetermined value R1 and the predetermined value R2 may not be fixed values, and may dynamically be changed. For example, the level controller 15 obtains an average value R0 (or the greatest value) of the ratio R obtained in the past within a predetermined time, and sets the predetermined value R1=R0+0.1 and the predetermined value R2=R0-0.1. As a result, with reference to a position of the current sound source, sound in a range closer to the position of the sound source is picked up and sound in a range farther than the position of the sound source is not picked up.
It is to be noted that the example of
Subsequently,
The directivity former 25 outputs an output signal M2 of the microphone 10B as the sound pickup signal S2 as it is. The directivity former 26, as shown in
The subtractor 261 obtains a difference between an output signal M1 of the microphone 10A and the output signal M2 of the microphone 10B, and inputs the difference into the selector 262.
The selector 262 compares a level of the output signal M1 of the microphone 10A and a level of a difference signal obtained from the difference between the output signal M1 of the microphone 10A and the output signal M2 of the microphone 10B, and outputs a signal at a high level as the sound pickup signal S1 (S101)(refer to
In this manner, the level controller 15 according to Modification 1, even when using a directional microphone (having no sensitivity to sound in a specific direction), is able to provide sensitivity to the whole surroundings of the device. Even in such a case, the sound pickup signal S1 has directivity, and the sound pickup signal S2 has non-directivity, which makes sound pickup capability to distant sound differ. Therefore, the level controller 15 according to Modification 1, while providing sensitivity to the whole surroundings of the device, does not pick up sound from a sound source far from the device, and is able to emphasize sound from a sound source near the device as a target sound.
The aspect of the directivity former 25 and the directivity former 26 is not limited to the example of
For example,
As shown in
The directivity former 26 in
The directivity former 25 in
As a result, the sound pickup device 1B, even when including all directional (having no sensitivity in a specific direction) microphones, is able to provide sensitivity to the whole surroundings of the device. Even in such a case, the sound pickup signal S1 has directivity, and the sound pickup signal S2 has non-directivity, which makes sound pickup capability to distant sound differ. Therefore, the sound pickup device 1B, while providing sensitivity to the whole surroundings of the device, does not pick up sound from a sound source far from the device, and is able to emphasize sound from a sound source near the device as a target sound.
In addition, for example, even when all the microphones are non-directional microphones, for example, as shown in
Subsequently,
Human voice (sound) has a harmonic structure having a peak component for each predetermined frequency. Therefore, the comb filter setter 75, as shown in the following Expression 5, passes the peak component of human voice, obtains a gain characteristic G(f, t) of reducing components except the peak component, and sets the obtained gain characteristic as a gain characteristic of the comb filter 76.
In other words, the comb filter setter 75 applies the Fourier transform to the sound pickup signal S2, and further applies the Fourier transform to a logarithmic amplitude to obtain a cepstrum value z(c, t). The comb filter setter 75 extracts a c value cpeak(0=argmaxc {z(c, t)} that maximizes this cepstrum value z(c, t). The comb filter setter 75, in a case in which the c value is other than cpeak(t) or approximate value of cpeak(t), extracts the peak component of the cepstrum as a cepstrum value z(c, t)=0. The comb filter setter 75 converts this peak component zpeak(c, t) back into a signal of the frequency axis, and sets the signal as the gain characteristic G(f, t) of the comb filter 76. As a result, the comb filter 76 serves as a filter that emphasizes a harmonic component of human voice.
It is to be noted that the gain controller 21 may adjust the intensity of the emphasis processing by the comb filter 76, based on a calculation result of the coherence calculator 20. For example, the gain controller 21, in a case in which the value of the ratio R(k) is equal to or greater than the predetermined value R1, turns on the emphasis processing by the comb filter 76, and, in a case in which the value of the ratio R(k) is less than the predetermined value R1, turns off the emphasis processing by the comb filter 76. In such a case, the emphasis processing by the comb filter 76 is also included in one aspect in which the level control of the sound pickup signal S2 (or the sound pickup signal S1) is performed according to the calculation result of the correlation. Therefore, the sound pickup device 1 may perform only emphasis processing on a target sound by the comb filter 76.
It is to be noted that the level controller 15, as shown in
In this example, the sound pickup device 1A does not include the level controller 15. The CPU 151 reads out a program from the memory 152 and performs the function of the VoIP 521. In this example, the VoIP 521 converts the pickup signal S1 and the pickup signal S2 into packet data, respectively. Alternatively, the VoIP 521 converts the pickup signal S1 and the pickup signal S2 into one piece of packet data. Even when being converted into one piece of packet data, the pickup signal S1 and the pickup signal S2 are distinguished, respectively, and are stored in the packet data as different data.
In this example, the I/F 19 is a communication interface such as a LAN, and is connected to the network 7. The CPU 151 outputs the packet data that has been converted by the VoIP 521 through I/F 19, to the network 7.
The I/F 91 of the server 9 is a communication interface such as a LAN, and is connected to the network 7. The CPU 93 receives an input of the packet data from the sound pickup device 1A through the I/F 91. The CPU 93 reads out a program stored in the memory 94 and performs the function of a VoIP 92. The VoIP 92 converts the packet data into the pickup signal S1 and the pickup signal S2. In addition, the CPU 93 reads out a program from the memory 94 and performs the function of the above-stated level controller 95. The level controller 95 has the same function as the level controller 15. The CPU 93 outputs again the pickup signal on which the level control has been performed by the level controller 95, to the VoIP 92. The CPU 93 converts the pickup signal into packet data in the VoIP 92. The CPU 93 outputs the packet data that has been converted by the VoIP 92 to the network 7 through the I/F 91. For example, the CPU 93 transmits the packet data to a communication destination of the sound pickup device 1A. Therefore, the sound pickup device 1A is able to transmit the pickup signal on which the level control has been performed by the level controller 95, to the communication destination.
It is to be noted that the I/F 91 is a USB interface, for example, and may be connected to the I/F 19 of the sound pickup device 1A, with a USB cable.
Finally, the foregoing preferred embodiments are illustrative in all points and should not be construed to limit the present invention. The scope of the present invention is defined not by the foregoing preferred embodiment but by the following claims. Further, the scope of the present invention is intended to include all modifications within the scopes of the claims and within the meanings and scopes of equivalents.
Number | Date | Country | Kind |
---|---|---|---|
2017-059020 | Mar 2017 | JP | national |
The present application is a continuation application of International Patent Application No. PCT/JP2018/011318, filed on Mar. 22, 2018, which claims priority to Japanese Patent Application No. 2017-059020, filed on Mar. 24, 2017. The contents of these applications are incorporated herein by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP2018/011318 | Mar 2018 | US |
Child | 16572825 | US |