The embodiments discussed herein relate to a microphone array device.
A microphone array device obtains a target sound from a target sound source. The microphone array device uses, for example, a synchronous subtraction method illustrated in
A microphone array device 01 in
A delay unit 1 delays a sound signal that includes noise obtained by the microphone MIC2 for a certain delay time. A subtraction unit 2 subtracts an output signal of the delay unit 1 from a sound signal that includes a target sound obtained by the microphone MIC1. The microphone array device 01 is configured as a device with directivity that is illustrated by the dotted line in
A microphone array device 02 in
An FFT3a applies Fast Fourier Transform (FFT) to convert a sound signal obtained by the microphone MIC1 into a complex spectrum IN1(f) on a frequency axis. Likewise, an FFT3b applies Fast Fourier Transform (FFT) to convert a sound signal obtained by the microphone MIC2 into a complex spectrum IN2(f) on a frequency axis. A phase spectrum difference calculation unit 4 calculates a phase spectrum difference DIFF(f) between the sound signal obtained by the microphone MIC1 and the sound signal obtained by the microphone MIC2 based on the complex spectrum IN1(f) and the complex spectrum IN2(f). The microphone array device 02 may identify a range where a sound source is included for each frequency by the phase spectrum difference DIFF(f). A gain calculation unit 5 calculates a noise suppression gain G(f) based on the identified range of the sound source. The noise suppression gain G(f) is a variable to determine an input and output ratio. The microphone array device 02 determines how much noise is suppressed by adjusting the noise suppression gain G(f). A noise suppression unit 6 calculates an output OUT(f) in which noise is suppressed based on the complex spectrum IN1(f) and the noise suppression gain G(f). An IFFT7 applies reverse FFT to the output OUT(f) to obtain an output. The microphone array device 02 may obtain a target sound from the target sound source SS while suppressing noise.
The above-described related technology is discussed, for example, in Japanese Laid-open Patent Publication No. 2007-318528.
According to an aspect of the invention, a microphone array device includes a first sound reception unit configured to obtain a first sound signal that is input from a first microphone, a second sound reception unit configured to obtain a second sound signal that is input from a second microphone different from the first microphone, a noise state evaluation unit configured to compare the first sound signal and the second sound signal and to obtain an evaluation parameter to evaluate an influence of a non-target sound included in the second sound signal on a target sound included in the first sound signal according to a result of the comparison, a subtraction adjustment unit configured to set a suppression amount for the second sound signal based on the evaluation parameter and to generate a third sound signal based on the second sound signal and the suppression amount; and a subtraction unit configured to generate a signal to be output based on the third sound signal and the first sound signal.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention, as claimed.
According to the above-described synchronous subtraction method in
Moreover, the microphone array device may erroneously recognize a target sound source SS is present at a suppression direction even when the target sound source SS is present at a sound reception direction. Such erroneous recognition may be caused due to fluctuation of an incoming direction of a sound due to a movement of, for example, a speaker who is a target sound source SS, reflection from a wall, and a surrounding environment such as an air flow. In this case, the microphone array device assumes a target sound that comes from the suppression direction as noise even though the target sound source SS is actually present at the sound reception direction and performs the synchronous subtraction as described above. The above-described erroneous recognition is also results in distortion of a spectrum of the sound signal that includes the target sound that is output from the subtraction unit 2 and there may be an influence, for example, quality of the target sound that is eventually output may be changed.
Similar phenomenon is caused in the case of
Furthermore, when a target sound from a target sound source SS is received, for example, by a mobile phone, the sound reception direction and the sound reception range may be changed depending on how the mobile phone is held by the user. In this case, the microphone array device assumes the target sound as noise when the target sound is received from the suppression direction or the suppression range, and the shift range. As a result, the target sound is distorted.
Suppressing noise using, for example, the above-described synchronous subtraction method in
Hence, embodiments disclosed herein provide a technology to suppress distortion of a target sound while suppressing noise.
According to an embodiment described below, processing is performed using sound signals obtained by two microphones among a plurality of microphones. Out of the two microphones, one microphone mainly obtains a sound that includes a target sound from a sound reception direction or a sound reception range. The other microphone mainly obtains a sound that includes noise from a suppression direction, a suppression range, or a shift range. In other words, the microphone positioned in the sound reception direction or the sound reception range obtains a non-suppression sound signal as a sound signal from a non-suppression direction that is other than the suppression direction, the suppression range, or the shift range. On the other hand, the microphone positioned in the suppression direction, the suppression range, or the shift range obtains a suppression sound signal. The non-suppression sound signal includes a target sound, while the suppression sound signal includes a non-target sound. The non-target sound differs from the target sound, and for example, is noise.
A microphone array device according to the embodiment described below suppresses distortion of a target sound while suppressing noise. The microphone array device obtains an evaluation parameter to evaluate an influence of a non-target sound on the target sound based on a result of comparison between a non-suppression sound signal from the non-suppression direction and a suppression sound signal from the suppression direction. The microphone array device controls a suppression amount of the non-target sound based on the evaluation parameter. Furthermore, the microphone array device controls directivity of the microphones.
The evaluation parameter includes a parameter that indicates a state of noise such as a noise level and a noise level change. Moreover, the evaluation parameter includes a parameter that indicates a direction of a target sound source by an evaluation result of a level of each sound signal. Hereinafter, examples of methods to suppress noise based on an evaluation parameter that indicates a state of noise will be described by referring to the first to third embodiments. Moreover, one example of a method to determine a sound reception direction based on an evaluation parameter that indicates a target sound direction will be described by a fourth embodiment.
According to the first embodiment, a microphone array device obtains a state of noise by processing sound signals obtained by two microphones on a time axis, and suppresses noise by synchronous subtraction processing based on the state of noise.
(1) Hardware Configuration
The microphone array device 104 includes at least two microphones, and here includes microphones MIC1, MIC2, . . . MICn (n is an integer 3 or more). Controlling directivity of the microphone array device 104 allows to receive mainly a desired target sound from a sound reception direction, thereby allows to suppress noise.
The ROM 102 stores various control programs for various controls, which will be described later, performed by the microphone array device 100. The various programs include, for example, a program to obtain a state of noise and a program to suppress noise, which will be described later. The ROM 102 stores various values such as a value A1 and a value A2 as thresholds, and constants or coefficients such as α, β, and τ, which will be described later. Moreover, the ROM 102 stores relationships that are set, for example, between a noise level L(f) and a relative value of the level(f), and that between noise level change S(f) and Rate(f), which will be described later.
The RAM 103 temporarily stores various control programs in the ROM 102 and sound signals obtained by the microphone array device 104. The RAM 103 temporarily stores information such as various flags according to execution of various control programs.
The CPU 101 expands various programs stored in the ROM 102 into the RAM 103 and performs various controls.
A communication I/F 105 connects the microphone array device 100 to an external network etc. based on control by the CPU 101. For example, the microphone array device 100 is connected to a sound recognition device through the communication I/F 105 and outputs a sound signal processed by the microphone array device 100 to the sound recognition device.
(2) Functional Configuration
In
A distance d between the microphone MIC1 and the microphone MIC2 are set by the following expression (1) so as to satisfy the sampling theorem.
Microphone distance d=speed of sound c/sampling frequency fs. (1)
Processing by functional units of the microphone array device 100 is executed in collaboration with the CPU 101, the ROM 102, the RAM 103, and the microphone array 104 and so on.
The functional units of the microphone array device 100 include, for example, a first sound reception unit 111, a second sound reception unit 112, a first delay unit 113, a first subtraction unit 114, a second delay unit 115, a second subtraction unit 116, a noise state evaluation unit 117, and a subtraction adjustment unit 118. Each of the functional units will be described below.
(2-1) the First Sound Reception Unit and the Second Sound Reception Unit
The microphone MIC1 obtains a sound that includes a target sound. The microphone MIC1 converts the obtained sound into an analog signal and inputs the analog signal to the first sound reception unit 111. The first sound reception unit 111 includes an Amplifier (AMP) 111a, a Low Pass Filter (LPF) 111b, and an analog to digital (A/D) converter 111c. The first sound reception unit 111 generates a sound signal by processing the sound including the target sound that is input from the microphone MIC1.
The AMP111a amplifies the analog signal that is input from the microphone MIC1 and inputs the amplified signal to the LFP 111b.
The LFP111b, which is a low pass filter, applies a low-pass filter to an output of the AMP111a, for example, by a cut-off frequency fc. Here, typically the low pass filter is used. However, the low pass filter may be used together with a band pass filter or a high frequency pass filter.
The A/D converter 111c takes in an output of the LFP 111b at a sampling frequency fs (fs>2fc), and converts the output of the LFP 111b into a digital signal. The A/D converter 111c outputs a sound signal in1(ti) on a time axis.
The microphone MIC2 obtains a sound including noise, converts the sound into an analog signal, and inputs to the second sound reception unit 112. The second sound reception unit 112 includes an AMP112a, an LPF112b, and an A/D converter 112c. The second sound reception unit 112 processes the sound including noise that is input from the microphone MIC2 to generate a sound signal. Processing by the AMP112a, the LPF112b, and the A/D converter 112c is substantially the same as that of the AMP111a, the LPF111b, and the A/D converter 111c. The second sound reception unit 112 outputs a sound signal in2(ti) as a digital signal on a time axis.
(2-2) the Second Delay Unit and the Second Subtraction Unit
The second delay unit 115 and the second subtraction unit 116 control directivity of the microphone array that is made up of the microphone MIC1 and the microphone MIC2. For example, the second delay unit 115 and the second subtraction unit 116 control directivity so that a sound from a direction other than the sound reception direction, in other words, a sound from the suppression direction is taken in. One example of directivity of a sound signal that is output from the second delay unit 115 and the second subtraction unit 116 is indicated in
Processing by the second delay unit 115 and the second subtraction unit 116 is applied to a direction opposite to processing by the first delay unit 113 and the first subtraction unit 114. Processing by the first delay unit 113 and the first subtraction unit 114 controls directivity so that a sound from the sound reception direction is taken in as will be described later. In other words, directivity controlled by the first delay unit 113 and the first subtraction unit 114 is indicated by the dashed line in
The second delay unit 115 receives a sound signal in1(ti) that includes a target sound from the first sound reception unit 111. The second delay unit 115 generates a sound signal that is obtained by delaying the sound signal in1(ti) for a certain period Ta. The sound signal delayed by the second delay unit 115 is represented by the in1(ti−1). The certain period Ta here is, for example, time dependent on a microphone distance d between the microphone MIC1 and the microphone MIC2. When the microphone distance d is set as in the above expression (1), the certain period Ta is defined by the expression below:
signal sampling interval=1/sampling frequency fs
The ti is time when a sound signal is taken in the microphone and the subscript i of t is a sampling number of each sound signal when the sound is taken in with a sampling frequency fs. The t is an integer of one or more.
The second subtraction unit 116 receives a sound signal in2(ti) that includes noise from the second sound reception unit 112 and subtracts the sound signal in1(ti−1) after applying the delay from the sound signal in2(ti). In other words, the second subtraction unit 116 calculates a noise signal N (ti) by the expression (2) below.
noise signal N(ti)=sound signal in2(ti)−sound signal in1(t−1) (2)
The above described processing sets directivity of the noise signal N (ti) that is output from the second subtraction unit 116 to “opposite directivity.” In other words, a sound from a direction other than the sound reception direction that includes a target sound source SS is mainly taken in while suppressing a sound signal that includes a target sound from the sound reception direction. As a result, the second subtraction unit 116 outputs a noise signal N (ti) in which noise from the suppression direction is emphasized. The microphone array device 100 according to the embodiment may recognize a state of noise by the noise signal N (ti).
(2-3) NOISE STATE EVALUATION UNIT
The noise state evaluation unit 117 evaluates a state of noise based on the noise signal N (ti) that is an output of the second subtraction unit 116. The state of noise includes, for example, a noise level and a noise level change. The noise level is an indicator that represents a magnitude of noise. The noise level change is an indicator that represents whether temporal noise level change is large or small. When a noise level change is small, steadiness of the noise is high. In other words, non-steadiness of noise is low. Conversely, when noise level change is large, steadiness of noise is low. In other words, non-steadiness of noise is high. The noise level and noise level change are represented, for example, by the expressions (3) and (4) below.
noise level L(ti)=10 log10(N(ti)2) (3)
noise level change S(ti)=noise level L(ti)/average value of noise level before time ti (4)
The noise state evaluation unit 117 may obtain a combined value LS (ti) as a function in which both the noise level L(ti) and the noise level change S(ti) are variables.
(2-4) Subtraction Adjustment Unit
The subtraction adjustment unit 118 sets a gain g(ti) for adjusting a suppression amount of noise on a time axis according to a state of noise. Adjusting the gain g(ti) adjusts an input and output ratio of the subtraction adjustment unit 118. The subtraction adjustment unit 118 adjusts a subtraction amount when the first subtraction unit 114 subtracts the sound signal in2(ti-1) from the sound signal in1(ti). As a result, a suppression amount of noise included in a sound that is obtained by the microphone MIC1 is adjusted. The gain g(ti) is 0 or more and 1.0 or less. Moreover, the gain g(ti) may be updated at each sampling of a sound signal. Alternatively, the gain g(ti) may be updated in units of a plurality of samplings.
For example, the subtraction adjustment unit 118 makes the gain g(ti) closer to 1.0 as the noise level L(ti) becomes higher. The subtraction adjustment unit 118 makes the gain g(ti) closer to 1.0 as a noise level change L(ti) is larger and steadiness is lower. The subtraction adjustment unit 118 makes the gain g(ti) closer to 0 as a noise level change IN is smaller and steadiness is higher. Specific examples will be described below.
Setting Gain g(ti) According to Noise Level L(ti)
(a1) Noise Level L(ti)<Value A1: Gain g(ti)=0
For example, when the noise level IN is smaller than the value A1, the subtraction adjustment unit 118 determines the noise level L(ti) is low and sets the gain g(ti) to 0.
(a2) Noise Level L(ti)>Value A2: Gain g(ti)=1.0
Conversely, when the noise level L(ti) is greater than the value A2, the subtraction adjustment unit 118 determines the noise level L(ti) is high and sets the gain g(ti) to 1.0.
(a3) Value A1≦Noise Level L(ti) Value A2
When the noise level IN is the value A1 or more and the value A2 or less, for example, the gain g(ti) is set by a simple weighted average indicated by the following expression (5). The simple weighted average is one example and an arithmetic average, a quadratic weighted average, and a cubic weighted average may be used as well.
gain g(ti)=(noise level L(ti)−A1)/(A2−A1) (5)
(b) Setting Gain g(ti) According to a Noise Level Change S(ti)
(b1) Noise Level Change S(ti)<Value B1: Gain g(ti)=0
For example, when a noise level change S(ti) is smaller than the value B1, the subtraction adjustment unit 118 determines the noise level change is small and steadiness is high, and sets the gain g(ti) to 0.
(b2) Noise Level Change S(ti)>Value B2: Gain g(ti)=1.0
Conversely, when a noise level change S(ti) is greater than the value B2, the subtraction adjustment unit 118 determines the noise level change is large and steadiness is low, and sets the gain g(ti) to 1.0.
(b3) Value B1 Noise Level Change S(ti) Value B2
When the noise level change S(ti) is the value B1 or more and the value B2 or less, the subtraction adjustment unit 118 sets the gain g(ti) by a simple weighted average by the following expression (6). The simple weighted average is one example, and an arithmetic average, a quadratic weighted average, and a cubic weighted average may be used as well.
gain g(ti)=(noise level change S(ti)−B1)/(B2−B1) (6)
(c) Setting Gain g(ti) According to Noise Level L(ti) and Noise Level change S(ti)
The subtraction adjustment unit 118 may set a gain g(ti) based on either one of the noise level L(ti) or the noise level change S(ti), or both of the noise level L(ti) and the noise level change S(ti).
For example, when noise level L(ti)<value A1, and/or noise level change S(ti)<value B1, the subtraction adjustment unit 118 sets the gain g(ti) to 0. Moreover, when noise level L(ti)>value A2, and/or noise level change s(ti)>value B2, the subtraction adjustment unit 118 sets the gain g(ti) to 1.0
When one of the following conditions is satisfied: value A1 noise level L(ti)≦value A2, and/or, value B1≦noise level change S(ti)≦value B2, the gain g(ti) may be set as follows. The subtraction adjustment unit 118 sets the gain g(ti) based on the above expression (5) when a state of noise that satisfies the condition is the noise level L(ti). Moreover, the subtraction adjustment unit 118 sets the gain g(ti) based on the above expression (6) when a state of noise that satisfies the condition is the noise level S(ti). Meanwhile, the subtraction adjustment unit 118 sets the gain g(ti) based on the above expression (5) or expression (6) when both of the conditions are satisfied.
Other than the above described settings, the subtraction adjustment unit 118 may set the gain g(ti) according to a combined value LS(ti). Accordingly, noise suppression processing that takes account of the noise level L(ti) and noise level change S(ti) may be performed.
The subtraction adjustment unit 118 receives a sound signal in2(ti-1) from a first delay unit 113, which will be described later. The subtraction adjustment unit 118 multiplies the sound signal in2(ti) by the gain g(ti) and outputs the multiplication result to the first subtraction unit 114.
(2-5) the First Delay Unit and the First Subtraction Unit
The first delay unit 113 and the first subtraction unit 114 control directivity so that a sound mainly from the sound reception direction is taken in. The directivity is indicated by the dashed line in
The first delay unit 113 takes in a sound signal in2(ti) including noise from the second sound reception unit 112. The first delay unit 113 generates a sound signal, for example, in2(ti-1) that is obtained by delaying the sound signal in2(ti) for a certain period Ta. The first delay unit 113 outputs the in2(ti-1) to the subtraction adjustment unit 118.
The first subtraction unit 114 receives a sound signal in1(ti) including a target sound from the first sound reception unit 111. The first subtraction unit 114 receives a result of multiplying the sound signal in2(ti-1) by the gain g(ti) from the subtraction adjustment unit 118. The first subtraction unit 114 subtracts the multiplication result from the sound signal in1(ti) and outputs a target sound signal OUT (ti) as represented by the expression (7) below.
target sound signal OUT(ti)=sound signal in1(t1)−sound signal in2(ti-1)×gain g(ti) (7)
Through the above described processing, the target sound signal OUT (ti) that is output from the first subtraction unit 114 indicates a directivity that takes in a sound from the sound reception direction as indicated by the dashed line in
The gain g(ti) determines a subtraction amount of the sound signal in2(ti-1) to be subtracted from the sound signal in1(ti) by the first subtraction unit 114. In other words, the gain g(ti) determines a suppression amount of noise in the sound signal in1(ti) that includes the target sound. Moreover, a suppression amount of noise is determined by a state of noise because the gain g(ti) is determined by a state of noise as described above.
As described above, noise is suppressed when needed according to a state of noise or suppression processing is alleviated or stopped when the necessity to suppress noise is small. Accordingly, distortion of a target sound from a target sound source SS is suppressed while suppressing noise.
The microphone array device 100 may erroneously recognize that a target sound source SS in the sound reception range is present in the suppression direction. The erroneous recognition may be caused due to fluctuation of an incoming direction of the sound due to a movement of, for example, a speaker who is a target sound source SS, reflection from a wall, and surrounding environment such as an air flow. Even in the above case, distortion of the target sound may be suppressed when a degree of noise suppression is small because noise is suppressed according to the state of noise.
Identifying a direction of a sound source of noise with high steadiness by a microphone array is generally difficult. For example, noise with high steadiness generally comes from various directions and the noise level change is small. Thus, identifying the sound source direction is difficult. Therefore, the microphone array device 100 according to the embodiment reduces the suppression amount of the noise. In other words, the microphone array device 100 controls so as to suppress distortion of a target sound from the target sound source SS rather than to suppress noise when steadiness of noise is high. Meanwhile, identifying a sound source direction of noise with low steadiness is generally easy. Accordingly, the microphone array device suppresses the identified noise for the target sound.
(3) Processing Flow
Hereinafter, processing according to the embodiment will be described by referring to
Operation S1:
The first sound reception unit 111 obtains a sound signal in1(ti) that includes a target sound from the sound reception direction. The second sound reception unit 112 obtains a sound signal in2(ti) that includes noise from the suppression direction.
Operation S2:
The second delay unit 115 receives the sound signal in1(ti) that includes the target sound from the first sound reception unit 111 and generates a sound signal in1(ti-1) that is obtained by delaying the sound signal in1(ti) for a certain period Ta.
Operation S3:
The second subtraction unit 116 subtracts the sound signal in1(ti−1) from the sound signal in2(ti) and calculates a noise signal N(ti).
Operation S4:
The noise state evaluation unit 117 evaluates a state of noise based on a noise signal N(ti) that is an output from the second subtraction unit 116. The state of noise includes, for example, a noise level (ti) and a noise level change S(ti).
Operation S5:
The subtraction adjustment unit 118 sets a gain g(ti) for adjusting a suppression amount of noise on a time axis according to a state of noise.
Operation S6:
The first delay unit 113 receives a sound signal in2(ti) that includes noise from the second sound reception unit 112 and generates a sound signal in2(ti-1) that is obtained by delaying the sound signal in2(ti) for a certain period Ta.
Operation S7:
The subtraction adjustment unit 118 multiplies the sound signal in2(ti-1) by the gain g(ti) and outputs the multiplication result to the first subtraction unit 114.
Operation S8:
The first subtraction unit 114 receives the sound signal in1(t1) that includes the target sound from the first sound reception unit 111 and subtracts the multiplication result from the sound signal in1(ti).
A microphone array device 200 according to a second embodiment obtains a state of noise by processing sound signals obtained by two microphones on a frequency axis and suppresses the noise by synchronous subtraction processing based on the state of noise. The hardware configuration of the microphone array device 200 according to the second embodiment is substantially the same as that of the first embodiment. Moreover, the same reference numerals are assigned to components that are the same as the first embodiment.
(1) Functional Configuration
In
In
A microphone distance d between the microphone MIC1 and the microphone MIC2 is set substantially the same as that of the first embodiment.
Processing by functional units of the microphone array device 200 is executed in collaboration with the CPU 101, the ROM 102, the RAM 103, and the microphone array 104.
The microphone array device 200 includes a first sound reception unit 111, a second sound reception unit 112, a range setting unit 121, a first signal converter 122, a second signal converter 123, a phase spectrum difference calculation unit 124, a noise state evaluation unit 125, a synchronization coefficient calculation unit 126, a synchronization unit 127, a subtraction unit 128, and a signal restoration unit 129. According to the embodiment, a suppression unit 130 includes the range setting unit 121, the synchronization coefficient calculation unit 126, the synchronization unit 127, and the subtraction unit 128. Hereinafter, each of the functional units will be described.
(1-1) Range Setting Unit
The range setting unit 121 makes initial settings of a sound reception range, a shift range, and a suppression range for each microphone, for example, based on a user input. The microphone array device 200 accepts a user input through a user input acceptance unit (not illustrated) and the user input acceptance unit outputs the accepted user input to the range setting unit 121.
The range setting unit 121 may make initial settings of a sound reception range, a shift range, and a suppression range for each microphone based on initial values stored in the ROM102.
Moreover, the range setting unit 121 receives state of noise from the noise state evaluation unit 125 that include a noise level L(f), a noise level change S(f) and a combined value LS(f). The range setting unit 121 controls the sound reception range, the shift range, and the suppression range based on the state of the noise. Controlling the ranges will be described in a paragraph of the noise state evaluation unit 125.
(1-2) the First Sound Reception Unit and the Second Sound Reception Unit
The first sound reception unit 111 and the second sound reception unit 112 are substantially the same as those of the first embodiment. The first sound reception unit 111 samples a sound signal from the microphone MIC1 at a certain sampling frequency fs. The first sound reception unit 111 outputs a sound signal in1(ti) as a digital signal on a time axis. The second sound reception unit 112 samples a sound signal from the microphone MIC2 at a certain sampling frequency fs. The second sound reception unit 112 outputs a sound signal in2(ti) as a digital signal on a time axis.
(1-3) First Signal Converter and Second Signal Converter
The first signal converter 122 frequency-converts the sound signal in1(ti) on the time axis and generates a complex spectrum IN1(f). The f here indicates a frequency. For example, a fast Fourier transform (FFT), a discrete cosine transform (DCT), and a wavelet transform may be used for the frequency conversion. A plurality of band pass filtering techniques such as subband decomposition may be used as well. Here, the first signal converter 122 uses the FFT and multiplies the sound signal in1(ti) by a window function while overlapping each signal interval. The first signal converter 122 applies an FFT to the multiplication result and generates a complex spectrum IN1(f) on a frequency axis.
Likewise, the second signal converter 123 frequency-converts the sound signal in2(ti) on the time axis and generates a complex spectrum IN2(f) on the frequency axis.
The complex spectrum IN1(f) and the complex spectrum IN2(f) are represented by the following expressions (8) and (9).
IN1(f)=W1(f)exp(j(2πfti+φ1(f))) (8)
IN2(f)=W2(f)exp(j(2πfti+φ2(f))) (9)
The f represents a frequency, W1 and W2 represent amplitudes, j represents a unit imaginary number, φ1 (f) and φ2 (f) represent phase delays that are functions of a frequency f. The ti represents time when a sound signal is fed to the microphone. The subscript i of t is a sampling number of each sound signal when the sound is taken in at sampling frequency fs. The subscript i is an integer of one or more.
The overlap window functions include hamming window function, Hanning window function, Blackman window function, 3 sigma Gaussian window function, and triangular window function.
(1-4) Phase Spectrum Difference Calculation Unit
The phase spectrum difference calculation unit 124 receives the complex spectrum IN1(f) and the complex spectrum IN2(f) from the first signal converter 122 and the second signal converter 123 respectively. The phase spectrum difference calculation unit 124 calculates a phase spectrum difference DIFF(f) for each frequency based on the complex spectrum IN1(f) and the complex spectrum IN2(f). The phase spectrum difference DIFF(f) represents a sound source direction for each frequency f between the microphone MIC1 and the microphone MIC 2 which are spaced apart by the distance d.
The phase spectrum difference DIFF(f) is represented by the following expression (10).
The phase spectrum difference calculation unit 124 identifies a range where a sound source of an incoming sound is included based on the relationship in
The phase spectrum difference DIFF(f) is included in one of the sound reception range, the shift range, and the sound reception range because the microphone distance d is set by the expression (1) according to the first embodiment.
As described above, processing a sound signal for each certain frequency on the frequency axis allows a phase spectrum difference between each of the microphones to be detected more accurately than processing a sound signal on the time axis. For example, a target sound from a target sound source SS and noise generated at various frequencies by other plurality of sound sources coexist in a sound signal from the microphone MIC1 and a sound signal from the microphone MIC2. Hence, a sound source direction and a state of noise for each sound may be detected with higher accuracy by detecting a phase spectrum difference for each frequency.
(1-5) Noise State Evaluation Unit
The noise state evaluation unit 125 receives a range of a sound source of an incoming sound that is determined by the phase spectrum difference DIFF from the phase spectrum difference calculation unit 124. The noise state evaluation unit 125 evaluates a state of noise. The noise state evaluation unit 125 assumes an incoming sound is noise when the phase spectrum difference DIFF (f) is included in the suppression range in
The state of the noise includes, for example, a noise level and a noise level change, and examples of calculating the noise level and the noise level change will be described below.
(a) Calculating a State of Noise
(a1) Calculating a Noise Level L(f)
A method to calculate a noise level L(f) is described.
The noise state evaluation unit 125 calculates an average value of |IN1(f)| based on the following expression (11) when a sound source of an incoming sound is included in the suppression range.
average value of |IN1(f)|=β×(average value of an analysis frame preceding |IN1(f)|)+(1−β)×|IN1(f)| (11)
Here, the β represents a time constant to obtain an average value of |IN1(f)| and indicates an addition ratio or a combination ratio of the preceding analysis frame. The preceding analysis frame, here is a shift of an analysis window in the FFT, in other words, time which goes back for an amount of an overlap. The β is larger than 0 and less than 1.0.
Calculating an average of |IN1(f)| is substantially the same as applying a smoothing filter to |IN1(f)|, and in this case, the β is a time constant of the smoothing filter.
The noise state evaluation unit 125 calculates a relative level value (f) for a full scale of a noise level represented by an average value of |IN1(f)|. The |IN1(f)| that is a digital signal is represented by a bit. The full scale here is a ratio, represented by a decibel, of a substantially maximum value and a substantially minimum value for the level of the |IN1(f)| that is represented by a bit. For example, when the |IN1(f)| is represented by 16 bits, the ratio of the substantially maximum value and the substantially minimum value of the level of the |IN1(f)| is about 98 decibel. Accordingly, in this case, the full scale may be set to be 98 decibel. Note that a value of the full scale is changed according to the number of bits that represents the |IN1(f)|. Hereinafter, the |IN1(f)| is represented in 16 bits.
The relative level value (f) of the average value of |IN1(f)| is represented by the following expression (12).
Moreover, the noise state evaluation unit 125 calculates a noise level L(f) based on a relationship between the noise level L(f) and the relative level value (f) that is set.
For example, when the relative level value (f) is larger than γ2 (relative level value (f)>γ2), in other words, the noise level is high, the noise state evaluation unit 125 calculates the noise level L(f) as 1.0. Moreover, when the relative level value (f) is smaller than γ1 (relative level value (f)<γ1), in other words, the noise level is low, the noise state evaluation unit 125 calculates the noise level L(f) as 0. For example, the γ1 is 58 db and the γ2 is 68 db, and the values may be obtained through an experiment.
When the relative level value (f) is γ1 or more and γ2 or less (γ1≦relative level value (f)≦γ2), for example, the noise level is calculated by a simple weighted average represented by the following expression (13). The simple weighted average is just one example, and an arithmetic average, a quadratic weighted average, and a cubic weighted average may be used as well.
noise level L(f)=(relative level value (f)−γ1)/(γ2−γ1) (13)
(a2) Calculating a Noise Level Change S(f)
A method to calculate a noise level change S(f) is described.
The noise state evaluation unit 125 calculates an average value of |IN1(f)| based on the above expression (11) when a sound source of an incoming sound is included in the suppression range.
The noise state evaluation unit 125 calculates a Rate(f) that is a ratio of |IN1(f)| to an average value of |IN1(f)| by the expression (14) below.
Rate(f)=|IN1(f)|/average value of |IN1(f)| (14)
Moreover, the noise state evaluation unit 125 calculates the noise level change S(f) based on a relationship between the noise level change S(f) and the Rate(f) that is set.
For example, when the Rate(f) is larger than δ2 (Rate(f)>δ2), the noise state evaluation unit 125 calculates the noise level change S(f) as 1.0. When the Rate(f) is smaller than δ1 (Rate(f)<δ1), the noise state evaluation unit 125 calculates the noise level change S(f) as 0. For example, the δ1 is 0.7, and δ2 is 1.4, and the values may be obtained by an experiment.
The noise level change S(f) is calculated, for example, by a simple weighted average represented in the expression (15) below when the Rate(f) is δ1 or more, and δ2 or less (δ1≦Rate(f)≦δ2). The simple weighted average is just one example, and an arithmetic average, a quadratic weighted average, and a cubic weighted average may be used as well.
(a3) Calculating a Combined Value LS(f)
The noise state evaluation unit 125 calculates a combined value LS(f) as a function in which both the noise level L(f) and the noise level change S(f) are variables. The combined value LS(f), may be calculated by a simple weighted average of the noise level L(f) and the noise level change S(f) using the expression (16) below.
Combined value LS(f)=τ×L(f)+(1−τ)×S(f) (16)
The τ here determines a ratio that the noise level L(f) and the noise level change S(f) to the combined value LS(f), and may be obtained by an experiment. Moreover, the τ is defined in a range of 0≦τ≦1.0.
The combined value LS(f) is defined in a range of 0≦combined value LS(f)≦1.0. The combined value LS(f) approaches 1.0 as the noise level change S(f) is greater. Conversely, the combined value LS(f) approaches 0 as the noise level L(f) and the noise level change S(f) are smaller.
The noise state evaluation unit 125 increases τ when a state that noise level L(f)<noise level change S(f) continues for a certain period. Accordingly, the noise state evaluation unit 125 reduces an impact of the noise level change S(f) on the combined value LS(f) under a state of noise level L(f)<noise level change S(f). Conversely, the noise state evaluation unit 125 decreases τ when a state that noise level L(f)>noise level change S(f) continues for a certain period. Accordingly, the noise state evaluation unit 125 reduces an impact of the noise level L(f) on the combined value LS(f) under a state that noise level L(f)>noise level change S(f). Through the above described processing, the combined value LS(f) may become a function in which both the noise level L(f) and the noise level change S(f) are appropriately taken account of.
(b) Controlling Ranges Based on a State of Noise by a Range Setting Unit
A method to control the sound reception range, the shift range, and the suppression range based on a state of noise will be described.
The range setting unit 121 receives a state of noise that includes the noise level L(f) and the noise level change S(f). The range setting unit 121 controls the sound reception range, the shift range, and the suppression range based on the state of noise. In other words, the range setting unit 121 controls directivity of the microphone array that includes the microphone MIC1 and the microphone MIC2.
The range setting unit 121 controls each range in the same manner as
In
The microphone array device 200 may erroneously recognize a target sound source SS that is actually in the sound reception range is present in a shift direction. The erroneous recognition may be caused due to fluctuation of an incoming direction of a sound due to a movement of, for example, a speaker who is a target sound source SS and surrounding environment. Even in the above case, controlling the ranges as illustrated in
The range setting unit 121 controls each range in the same manner as in
In the above description, the range setting unit 121 controls typically the shift range and the suppression range. However, the sound reception range may be controlled as well. For example, in
(1-6) Synchronization Coefficient Calculation Unit
The synchronization coefficient calculation unit 126 receives information on the sound reception range, the shift range, and the suppression range that are set based on a state of noise from the range setting unit 121. The synchronization coefficient calculation unit 126 receives a phase spectrum difference DIFF(f) from the phase spectrum difference calculation unit 124. The synchronization coefficient calculation unit 126 calculates synchronization coefficients as will be described in (a1) to (a3) below based on the sound reception range, the shift range, and the suppression range that are set based on a state of noise and the phase spectrum difference DIFF(f).
(a) Synchronization Coefficient C(f)
(a1) when the Phase Spectrum Difference DIFF(f) is in the Suppression Range
The synchronization coefficient calculation unit 126 calculates a synchronization coefficient C(f) when the phase spectrum difference DIFF(f) is in the suppression range.
The synchronization coefficient calculation unit 126 makes the following estimation on noise obtained by the microphone MIC1. A sound obtained by the microphone MIC1 for a specific frequency f includes noise from the suppression range. The synchronization coefficient calculation unit 126 estimates that the noise obtained by the microphone MIC1 is substantially the same noise included in a sound obtained by the microphone MIC2 and the noise reaches the microphone MIC1 after delaying for a phase spectrum difference DIFF(f).
synchronization coefficient α×C(f)′+(1−α)×(IN1(f)/IN2(f)) (17)
Here, the C(f)′ is a synchronization coefficient before an update. The synchronization coefficient C(f) may be updated, for example, for each analysis frame. The α represents an addition ratio or a combination ratio of a phase delay amount of a preceding analysis frame for synchronization. The α is larger than 0 and less than 1.0.
(a2) when a Phase Spectrum Difference DIFF(f) is in the Sound Reception Range
The synchronization coefficient calculation unit 126 calculates a synchronization coefficient C(f) based on the following expressions (18) or (19) when the phase spectrum difference DIFF(f) is in the sound reception range.
synchronization coefficient C(f)=exp(−2πf/fs) (18)
synchronization coefficient C(f)=0 (19)
(a3) when a Phase Spectrum Difference DIFF(f) is in the Shift Range
The synchronization coefficient calculation unit 126 applies, for example, a weighted average to a calculated result of the synchronization coefficient C(f) based on the above-described (a1) and (a2). Accordingly, the synchronization coefficient calculation unit 126 calculates a synchronization coefficient C(f).
An example of calculating a synchronization coefficient C(f) will be described by referring to
In
Synchronization coefficient Cg(f) that is dependent of the gain g(f)
The synchronization coefficient calculation unit 126 may calculate the synchronization coefficient Cg(f) that is dependent of the gain g(f) by further multiplying the synchronization coefficient C(f) that is calculated based on the above (a1) to (a3) by a gain g(f).
synchronization coefficient Cg(f)=gain g(f)×synchronization coefficient C(f) (20)
The gain g(f) is a value to adjust a suppression amount of noise on a frequency axis. The synchronization coefficient calculation unit 126 sets the gain g(f) according to a state of noise.
Here, the gain g(f) is calculated based on the combined value LS(f). However, the gain g(f) may be calculated based on a noise level L(f) or a noise level change S(f).
(1-7) Synchronization Unit
The synchronization unit 127 receives the synchronization coefficient C(f) or the synchronization coefficient Cg(f) that is dependent of the gain g(f) from the synchronization coefficient calculation unit 126. The synchronization unit 127 performs synchronization by using the synchronization coefficient C(f) or the synchronization coefficient Cg(f) based on the state of noise. Alternatively, the synchronization unit 127 may perform synchronization based on an initial setting that specify which of the synchronization coefficients is used.
For example, when the synchronization coefficient Cg(f) is used, the synchronization unit 127 multiplies the complex spectrum IN2(f) by the synchronization coefficient Cg(f) as represented by the expression (21) below. Accordingly, a complex spectrum INs2(f) that is obtained by synchronizing the complex spectrum IN2(f) with the complex spectrum IN1(f) is calculated.
INs2(f)=Cg(f)×IN2(f) (21)
Here, the Cg(f) is used as a synchronization coefficient; however the C(f) may be used instead.
(1-8) Subtraction Unit
As represented in the following expression (22), the complex spectrum INs2(f) that is synchronized is subtracted from the complex spectrum IN1(f) to obtain an output OUT(f).
OUT(f)=IN1(f)−INs2(f) (22)
(1-9) Signal Restoration Unit
The signal restoration unit 129 converts the output OUT(f) from the subtraction unit 128 into a signal on a time axis. Processing by the signal restoration unit 129 is inverse to conversions by the first signal converter 122 and the second signal converter 123. Here, the signal restoration unit 129 applies an inverse Fast Fourier Transform (IFFT) to the output OUT(f). Moreover, the signal restoration unit 129 performs an overlap add operation for the result of the IFFT to generate an output signal of the microphone MIC1 on a time axis.
(2) Processing Flow
Hereinafter, processing according to the embodiment will be described by referring to
Operation S11
The range setting unit 121 makes initial settings of a sound reception range, a shift range, and a suppression range for each microphone, for example, based on a user input.
Operation S12
The first sound reception unit 111 and the second sound reception unit 112 obtain a sound signal in1(ti) and a sound signal in2(ti) on a time axis.
Operation S13 and Operation S14
The first signal converter 122 multiplies each signal interval of the sound signal in1(ti) by an overlap window function (Operations S13) and generates a complex spectrum IN1(f) on a frequency axis by further applying the FFT (Operation S14). Likewise, the second signal converter 123 frequency-converts the sound signal in2(ti) to generate a complex spectrum IN2(f) on the frequency axis.
Operation S15:
The phase spectrum difference calculation unit 124 calculates a phase spectrum difference DIFF(f) between a complex spectrum IN1(f) and a complex spectrum IN2(f) for each frequency.
Operation S16:
The phase spectrum difference calculation unit 124 determines a range in which the phase spectrum difference DIFF(f) is included among the sound reception range, the shift range, and the suppression range. When the phase spectrum difference DIFF(f) is included in the suppression range, the process proceeds to Operation S17, otherwise, returns to Operation S12.
Operations S17:
The noise state evaluation unit 125 assumes an incoming sound as noise and evaluates the state of noise when the phase spectrum difference DIFF(f) is included in the suppression range, in other words, the sound source of the incoming sound is included in the suppression range. The state of noise includes, for example, a noise level L(f), a noise level change S(f), and a combined value LS(f) of the noise level L(f) and the noise level change S(f).
Operation S18:
The range setting unit 121 obtains the state of noise from the noise state evaluation unit 125 and controls directivity of the microphone array by controlling the sound reception range, the shift range, and the suppression range based on the state of noise.
Operation S19
The synchronization coefficient calculation unit 126 calculates the synchronization coefficient C(f) based on the sound reception range, the shift range, and the suppression range that are set based on the state of noise and the phase spectrum difference DIFF(f).
Operation S20
When the synchronization coefficient C(f) is further adjusted to calculate the synchronization coefficient Cg(f) that is dependent of the gain g(f), the process proceeds to Operation S21, otherwise, returns to Operation S24.
Operation S21:
The synchronization coefficient calculation unit 126 multiplies the synchronization coefficient C(f) by the gain g(f) to calculate the synchronization coefficient Cg(f) that is dependent of the gain g(f). The gain g(f) is a numerical value to adjust a suppression amount of noise on the frequency axis.
Operation S22:
The synchronization unit 127 multiplies the complex spectrum IN2(f) by the synchronization coefficient Cg(f) to synchronize the complex spectrum IN2(f) with the complex spectrum IN1(1).
Operation S23:
The subtraction unit 128 subtracts the multiplication result of Operation S22 from the complex spectrum IN1(f) to obtain an output OUT(f).
Operation S24:
The synchronization unit 127 multiplies the complex spectrum IN2(f) by the complex spectrum C(f) to synchronize the complex spectrum IN2(f) with the complex spectrum IN1(1).
Operation S25:
The subtraction unit 128 subtracts the multiplication result of Operation S24 from the complex spectrum IN1(f) to obtain an output OUT(f).
Operation S26:
The signal restoration unit 129 converts the output OUT(f) from the subtraction unit 128 to a signal on the time axis and further performs an overlap add operation and outputs an output signal in a time domain of the microphone MIC1. After completing the processing, the process returns to Operation S12 and the above described processing is repeated at an interval, for example, based on a certain sampling frequency.
The microphone array device 200 according to the embodiment controls the sound reception range, the shift range, and the suppression range according to a state of noise, and therefore may suppress noise according to the state of noise. For example, when a noise level L(f) is high, the microphone array device 200 may efficiently suppress noise the sound source of which is in the suppression range by narrowing the shift range to expand the suppression range.
The microphone array device 200 according to the embodiment may suppress noise the sound source of which is in the suppression range while suppressing distortion of a target sound from a target sound source SS as well by expanding the shift range to narrow the suppression range for example when the noise level L(f) is small. At this time, shifting from the sound reception range to the suppression range is gradual because the shift range is expanded. As a result, the microphone array device 200 according to the embodiment may gradually change a degree of noise suppression.
Even if a target sound source SS that is actually in the sound reception range is erroneously recognized present in the shift range, a degree of suppressing an incoming sound that comes to the microphone array device 200 from the shift range may be reduced depending on the state of noise. For example, as described above, when the shift range is expanded, the degree of suppressing the target sound that is erroneously recognized as noise is reduced, and distortion of the target sound from the target sound source SS may be suppressed.
As described above, noise is suppressed according to a state of noise, and therefore according to how much the noise needs to be suppressed. Hence, distortion of a target sound may be suppressed.
A microphone array device 300 according to a third embodiment obtains a state of noise by processing sound signals obtained by two microphones on a frequency axis. Moreover, the microphone array device 300 suppresses noise by adjusting a gain for adjusting a suppression amount of noise based on the state of noise.
The hardware configuration of the microphone array device 300 according to the third embodiment is substantially the same as that of the first embodiment. Moreover, the same reference numerals are assigned to components that are the same as the second embodiment.
(1) Functional Configuration
Hereinafter, a gain calculation unit 140 and a gain multiplication unit 141 will be described. In the third embodiment, the suppression unit 130 includes the range setting unit 121 and the gain calculation unit 140.
(1-1) Gain Calculation Unit
The gain calculation unit 140 receives information on a sound reception range, a shift range, and a suppression range that are set based on a state of noise from the range setting unit 121. Moreover, the gain calculation unit 140 receives a phase spectrum difference DIFF(f) from the phase spectrum difference calculation unit 124. The gain calculation unit 140 calculates a gain G(f) for adjusting a suppression amount of noise on a frequency axis based on the sound reception range, the shift range, and the suppression range that are set based on a state of noise, and the phase spectrum difference DIFF(f). The gain g(f) is 0 or more and 1.0 or less.
For example, the gain calculation unit 140 sets a gain G(f) to 1.0 when the phase spectrum difference DIFF(f) is included in the sound reception range, and to 0 when the phase spectrum difference DIFF(f) is included in the suppression range. Moreover, the gain calculation unit 140 obtains a simple weighted average of the gain G(f) in the suppression range and the gain G(f) in the sound reception range according to a position of the phase spectrum difference DIFF(f) when the phase spectrum difference DIFF(f) is included in the shift range. The simple weighted average is just one example, and an arithmetic average, a quadratic weighted average, and a cubic weighted average may be used as well.
Adjusting the gain G(f) by the gain calculation unit 140 adjusts an amount to suppress a level of the complex spectrum IN1(f) by the gain multiplication unit 141. The microphone array device 300 adjusts an amount of suppressing noise included in a sound obtained by the microphone MIC1. Furthermore, the gain G(f) may be updated at each sampling of a sound signal.
The range setting unit 121 sets each range, for example, as illustrated in
Meanwhile, the range setting unit 121 sets each range, for example, as illustrated in
(1-2) Gain Multiplication Unit
The gain multiplication unit 141 obtains a gain G(f) from the gain calculation unit 140. The gain multiplication unit 141 multiplies the complex spectrum IN1(f) by the gain G(f) to output an OUT(f) as represented by the following expression (23).
OUT(f)=IN1(f)×G(f) (23)
The OUT(f) is processed by the signal restoration unit 129 and is output as an output signal of the microphone MIC1 on a time axis.
Processing Flow
Hereinafter, processing according to the embodiment will be described by referring to
Operation S31 to Operation S38:
The Operation S31 to Operation S38 are substantially the same as the Operation S11 to Operation S18 in
Operation S39:
The gain calculation unit 140 calculates a gain G(f) for adjusting a suppression amount of noise on a frequency axis based on the sound reception range, the shift range, and the suppression range that are set based on a state of noise, and the phase spectrum difference DIFF(f).
Operation S40:
The gain multiplication unit 141 multiplies the complex spectrum IN1(f) by the gain G(f) to output an OUT(f).
Operation S41:
The signal restoration unit 129 converts the output OUT(f) to a signal on a time axis and further performs an overlap add operation and outputs an output signal in a time domain of the microphone MIC1. After completing the processing, the process returns to Operation S32. The above described processing is repeated at an interval, for example, based on a certain sampling frequency.
As in the first and the second embodiments, noise is suppressed according to the state of noise in the third embodiment as well, and therefore the noise is suppressed according to how much the noise needs to be suppressed. Hence, distortion of a target sound may be suppressed.
According to the first to the third embodiments, a direction where a target sound source SS is present, in other words, a sound reception direction where the target sound comes is initially set. The microphone array device adjusts a suppression amount of a target sound from the sound reception direction and the sound reception range assuming the sound reception direction as where the target sound comes from. Meanwhile, a microphone array device 400 according to a fourth embodiment detects a direction of a target sound source SS and sets a sound reception direction based on the direction of the target sound source SS. The microphone array device 400 according to the embodiment is applicable to a case when a sound reception direction is initially set, and for example, the initially set sound reception direction is changed, for example, based on the detected direction of the target sound source SS. Hereinafter, the microphone array device 400 according to the fourth embodiment will be described.
The microphone array device 400 according to the fourth embodiment, as in the second and the third embodiments, sound signals obtained by the two microphones MIC1 and MIC2 are processed on a frequency axis. The hardware configuration of the microphone array device 400 according to the fourth embodiment is substantially the same as that of the first embodiment. Moreover, the same reference numerals are assigned to components that are substantially the same as the first embodiment.
(1) Functional Configuration
Hereinafter, a part of the configuration that is different from that of the second embodiment will be described.
(1-1) Range Setting Unit
The range setting unit 121 does not perform initial settings of a sound reception range, a shift range, and a suppression range for each microphone. Accordingly, each of the microphones is set to a state of non-directivity at the initial settings.
Alternatively, the range setting unit 121 may set initial settings of a sound reception range, a shift range, and a suppression range for each microphone based on a user input. Moreover, the range setting unit 121 may set initial settings of the sound reception range, the shift range, and the suppression range for each microphone based on initial values stored in a ROM 102.
Furthermore, the range setting unit 121 receives an evaluation result of a level of a sound received by the two microphones MIC1 and MIC2. The range setting unit 121 controls the sound reception range, the shift range, and the suppression range based on the evaluation result. Controlling the ranges will be described in a paragraph for the level evaluation unit 150 below.
(1-2) Level Evaluation Unit
(a) Level Evaluation
The level evaluation unit 150 receives the complex spectrum IN1(f) and the complex spectrum IN2(f) from the first signal converter 122 and the second signal converter 123 respectively. The level evaluation unit 150 calculates, for each frequency, a level 1 of a sound signal in1(t1) obtained by the microphone MIC1 and a level 2 of a sound signal in2(ti) obtained by the microphone MIC2. A level of each sound signal may be calculated by the following expressions (24) and (25).
Level 1=Σ|IN1(f)|2 (24)
Level 2=Σ|IN2(f)|2 (25)
(b) Detecting a Direction of a Target Sound Source SS
The level evaluation unit 150 detects a magnitude of levels of the above described sound signals and detects a direction of a target sound source SS. For example, the level evaluation unit 150 may detect a direction of a target sound source SS based on an evaluation described below.
The level evaluation unit 150 determines a target sound source SS is present near the microphone MIC1 side when level 1>>level 2. The level 1>>level 2 is, for example, Σ|IN1(f)|2≧2.0×Σ|IN2(f)|2.
The level evaluation unit 150 determines a target sound source SS is present at a position where distances to the microphone MIC1 and the microphone MIC2 are substantially the same when level 1≈level 2.
The level evaluation unit 150 determines a target sound source SS is present near the microphone MIC2 side when level 1<<level 2. The level 1<<level 2 is, for example, when 2.0×Σ|IN1(f)|2≦Σ|IN2(f)|2.
The relationship of the level 1, the level 2, and the direction of the target sound source SS may be determined, for example, by an experiment.
The level evaluation unit 150 may determine as described above when the target sound source SS is present, for example, within a distance that is, for example, about 10 times of a microphone distance d from the microphone MIC1 or the microphone MIC2. According to the embodiment, for example, a sound source near the microphone is assumed to be a target sound source SS, for example, a mouth of a user who uses a handset of a telephone.
(c) Controlling Ranges Based on a Direction of a Target Sound Source SS by a Range Setting Unit
A method to control the sound reception range, the shift range, and the suppression range based on a direction of the target sound source SS detected by the level evaluation unit 150 will be described.
The range setting unit 121 sets each range, for example, as illustrated in
The range setting unit 121 sets the sound reception range narrower than the suppression range because a level 1 of the sound signal in1(ti) obtained by the microphone MIC1 is higher than the level 2 of the sound signal in1(ti) obtained by the microphone MIC2. The microphone MIC1 may sufficiently receive a target sound from the target sound source SS even if the sound reception range is narrow because the target sound source is estimated to be near the microphone MIC1.
The range setting unit 121 sets each range as illustrated in
The range setting unit 121 sets each range as illustrated in
Respective sizes of the sound reception range, the sound suppression range, and the shift range according to a ratio of the level 1 and the level 2 may be determined, for example, by an experiment.
(1-3) Synchronization Coefficient Calculation Unit
The synchronization coefficient calculation unit 126 receives information on the sound reception range, the shift range, and the suppression range that are set based on the level evaluation from the range setting unit 121. The synchronization coefficient calculation unit 126 receives a phase spectrum difference DIFF(f) from the phase spectrum difference calculation unit 124. The synchronization coefficient calculation unit 126 calculates a synchronization coefficient C(f) based on the sound reception range, the shift range, and the suppression range that are set based on the state of noise, and the phase spectrum difference DIFF(f). A method to calculate the synchronization coefficient C(f) is substantially the same as that of the second embodiment. Moreover, the synchronization coefficient calculation unit 126 may calculate a synchronization coefficient Cg(f) that is dependent of a gain g(f) by further multiplying the synchronization coefficient C(f) by the gain g(f) as represented by the expression (20).
(2) Processing Flow
Hereinafter, processing according to the embodiment will be described by referring to
Operation S51 to Operation S53:
Operation S51 to Operation S53 are substantially the same as the Operation S12 to Operation S14 according to the second embodiment. The first sound reception unit 111 and the second sound reception unit 112 obtain a sound signal in1(ti) and a sound signal in2(ti) on a time axis. The first signal converter 122 generates a complex spectrum IN1(f) from the sound signal in1(ti) on a frequency axis. The second signal converter 123 generates a complex spectrum IN2(f) from the sound signal in2(ti) on the frequency axis.
Operation S54:
The level evaluation unit 150 calculates a level 1 and a level 2 of each sound signal based on the complex spectrum IN1(f) and the complex spectrum IN2(f). Moreover, the level evaluation unit 150 identifies a direction of a target sound source SS based on a result of comparison between the level 1 and the level 2.
Operation S55:
The range setting unit 121 controls the sound reception range, the shift range, and the suppression range based on the direction of the target sound source SS.
Operation S56:
The phase spectrum difference calculation unit 124 calculates a phase spectrum difference DIFF(f) between a complex spectrum IN1(f) and a complex spectrum IN2(f) for each frequency.
Operation S57 to Operation S60:
Operation S57 to Operation S60 are substantially the same as the Operation S19 to Operation S26 according to the second embodiment. The synchronization coefficient calculation unit 126 calculates the synchronization coefficient C(f) based on the sound reception range, the shift range, and the suppression range that are set based on the level evaluation, and the phase spectrum difference DIFF(f) (Operation S57). Moreover, a synchronization coefficient Cg(f) that is dependent of the gain g(f) may be calculated.
The synchronization unit 127 multiplies the complex spectrum IN2(f) by the complex spectrum C(f) or the synchronization coefficient Cg(f) to synchronize the complex spectrum IN2(f) with the complex spectrum IN1(1) (Operation S58). The subtraction unit 128 subtracts the multiplication result of Operation S58 from the complex spectrum IN1(f) to obtain an output OUT(f) (Operation S59). The signal restoration unit 129 converts the output OUT(f) from the subtraction unit 128 into a signal on a time axis, further performs an overlap add operation and outputs an output signal in a time domain of the microphone MIC1 (Operation S60). After completing the processing, the process returns to Operation S51 and the above described processing is repeated at an interval, for example, based on a certain sampling frequency.
The microphone array device 400 according to the embodiment sets each range according to a direction of a target sound source SS. For example, an actual direction of a target sound SS may be different from a direction of a target sound source SS that is set beforehand depending on how a mobile phone is held. The microphone array device 400 according to the embodiment may set ranges, for example, a sound reception range according to a change of a direction of the target sound source SS even when the direction of the target source SS is changed. Accordingly, the microphone array device 400 may receive a target sound from the target sound source SS as a sound from the sound reception range, and may suppress noise while suppressing distortion of the target sound.
(3) Combination of the Second Embodiment and the Third Embodiment
The fourth embodiment may be combined with the second embodiment and the third embodiment. In other words, the microphone array device controls the sound reception range, the shift range, and the suppression range based on an evaluation result of a level of sounds received by the two microphones MIC1 and MIC2 as described in the fourth embodiment. The microphone array device controls the sound reception range, the shift range, and the suppression range according to a state of noise as described in the second embodiment and the third embodiment.
(3-1) Combination of the Second Embodiment and the Fourth Embodiment
The level evaluation unit 150 calculates a level 1 and a level 2 of each sound signal of the microphone MIC1 and the microphone MIC2. Moreover, the level evaluation unit 150 identifies a direction of a target sound source SS by comparing the level 1 and the level 2. The range setting unit 121 controls the sound reception range, the shift range, and the suppression range based on the direction of the target sound source SS. A synchronization coefficient C(f) and so on are calculated based on the range settings, and the signal restoration unit 129 outputs an output signal. The above-described processing to control each range based on the detected direction of the target sound source SS is repeated at an interval, for example, based on a certain sampling frequency.
Meanwhile, the noise state evaluation unit 125 assumes an incoming sound as noise when a phase spectrum difference DIFF(f) is included in the suppression range and evaluates a state of noise as in the second embodiment. The range setting unit 121 obtains a state of noise from the noise state evaluation unit 125 and controls the sound reception range, the shift range, and the suppression range based on the state of noise. Furthermore, a synchronization coefficient C(f) and so on are calculated and the signal restoration unit 129 outputs an output signal. The above described processing to control each range based on the state of noise is repeated at an interval, for example, based on a certain sampling frequency.
An example of controlling ranges will be described by referring to
For example, as a result of an evaluation by the level evaluation unit 150, levels of sound signals of the microphones MIC1 and MIC2 are assumed to be level 1>>level 2. In this case, the level evaluation unit 150 determines a target sound source SS is present at the microphone MIC1 side. The range setting unit 121 sets a sound reception range at the microphone MIC1 side as illustrated in
The noise state evaluation unit 125 assumes an incoming sound as noise when a phase spectrum difference DIFF(f) is included in the suppression range as illustrated in
(3-2) Combination of the Third Embodiment and the Fourth Embodiment
The range setting unit 121 controls a sound reception range, a shift range, and a suppression range based on a result of comparison between the level 1 and the level 2 by the level evaluation unit 150.
Meanwhile, the noise state evaluation unit 125 assumes an incoming sound as noise when a phase spectrum difference DIFF(f) is included in the suppression range and evaluates the state of noise as in the second embodiment. The gain calculation unit 140 calculates a gain G(f) for adjusting a suppression amount of noise on a frequency axis based on the sound reception range, the shift range, and the suppression range that are set based on the state of noise, and the phase spectrum difference DIFF(f). The gain multiplication unit 141 multiplies the complex spectrum IN1(f) by the gain G(f) to output an OUT(f). The signal restoration unit 129 converts the output OUT(f) into a signal on the time axis and further performs an overlap add operation and outputs an output signal in a time domain of the microphone MIC1. The above-described processing is repeated at an interval, for example, based on a certain sampling frequency
As described above, setting each range according to a direction of the target sound source SS and a state of noise may suppress noise while suppressing distortion of the target sound.
The above described embodiments may be applied to the alternative embodiments described below.
The first, second, third, and fourth embodiments use a noise level, a noise level change, and a combined value obtained from the noise level and the noise level change to represent a state of noise. However, the above-described elements that represent a state of noise may be used as a state of noise. Moreover, methods to calculate a noise level, a noise level change, and a combined value are not limited to those described in the first to the fourth embodiments.
The second embodiment and the third embodiment adjust a suppression amount of noise by appropriately taking account of both a noise level L(f) and a noise level change S(f). To this end, the microphone array devices according to the second embodiment and the third embodiment measure duration of a state that noise level L(f)<noise level change S(f) or noise level L(f)>noise level change SW. The microphone array device adjusts an influence of the noise level L(f) or the noise level change S(f) on the combined value LS(f) according to the duration. In other words, the microphone array device adjusts an influence of noise on a suppression amount of noise.
The adjustment method may be applied to the first embodiment as well. In the first embodiment, the noise level L(ti) and noise level change S(ti) are set so that the two values may be compared as in the second embodiment. For example, the noise state evaluation unit 125 calculates a relative value for a full scale for a noise level represented by an average value of |in1(ti)|. The noise state evaluation unit 125 calculates a noise level L(ti) based on the relative value. Furthermore, the noise state evaluation unit 125 calculates a ratio of |in1(ti)| and the average value of |in1(ti)|. The noise state evaluation unit 125 calculates a noise level change S(ti) based on the ratio. As a result, both the noise level L(ti) and noise level change S(ti) become 0 or more and 1 or less and may be compared.
The first to the fourth embodiments disclose methods to adjust a suppression amount of noise based on a state of noise and to suppress distortion of a target sound. The configuration to adjust a suppression amount of noise based on the state of noise may be applied, for example, to a synchronous addition method.
According to the first to the fourth embodiments, a plurality of microphones is one-dimensionally disposed on a substantially straight line. Among the plurality of microphones, the microphone MIC1 and the microphone MIC2 are used. However, the plurality of microphones may be two-dimensionally disposed, for example, to a vertex of a triangle. Arranging the plurality of microphones two-dimensionally may achieve more complex and finer control of directivity.
A microphone array device may be incorporated in devices such as an on-vehicle equipment or a car navigation device with an audio recognition device, a hands-free telephone, or a mobile phone.
The above-described processing may be achieved by making each functional unit of the CPU 101 execute programs stored in the ROM 102. However, a signal processing circuit implemented as hardware may execute the above-described processing according to the programs.
Moreover, computer programs that make a computer execute the above-described method and a computer readable storage medium that stores the computer programs are included in a scope of the present disclosure. The computer readable storage medium includes, for example, a flexible disk, a hard disk, a Compact Disc-Read Only Memory (CD-ROM), a Magneto Optical (MO) disk, a Digital Versatile Disc (DVD), a DVD-ROM, a DVD-Random Access Memory (RAM), a Blue-ray Disc (BD), a universal serial bus (USB) memory, and a semiconductor memory. The above-described computer programs are not limited to those stored in the storage medium but may be provided through an electric communication line, a wireless or a wired communication lines and a network such as the Internet.
All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the principles of the invention and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiments of the present inventions have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2010-114897 | May 2010 | JP | national |
This application is a divisional of and based upon and claims the benefit of priority under 35 U.S.C. §120 for U.S. Ser. No. 13/107,497, filed May 13, 2011, and claims the benefit of priority under 35 U.S.C. §119 from Japanese Patent Application No. 2010-114897, filed on May 19, 2010, the entire contents of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
Parent | 13107497 | May 2011 | US |
Child | 14512849 | US |