This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2015-129112, filed on Jun. 26, 2015, the entire contents of which are incorporated herein by reference.
The embodiments discussed herein are related to a noise suppression device and a method of noise suppression.
In a mobile phone, a video conference system, a broadcasting system, or the like, various techniques are known in order to suppress noise included in a sound signal collected by a microphone, or the like (hereinafter also referred to simply as a “microphone”). As noise included in a sound signal, there is, for example, an engine sound of a vehicle that passes by the vicinity of a microphone, an operation sound (stationary noise) of a fan and a motor that are installed in a factory, and the like.
The best known technique, as one of the techniques for suppressing noise, is a technique that suppresses noise by a plurality of sound signals collected using a microphone array including a plurality of microphones. As one of the noise suppression techniques of this kind, a microphone array noise reduction control method is known in which spatial orientation information of sound is directly captured by a microphone array, and update filtering by an adaptive filter is more correctly controlled using the orientation information.
Also, as a noise suppression technique using a microphone array, a technique for suppressing noise based on the phase difference of a plurality of sound signals collected by a microphone array is known, in addition.
Also, as one of related noise suppression techniques, a technique is known for suppressing noise by performing filter processing using a Kalman filter on the sound data in frequency domain, which has been obtained using Fourier transformation. Further, as another related noise suppression technique, a technique is known in which the variation width of an amplitude spectrum is restricted in accordance with the variation direction of the amplitude spectrum obtained by the time-to-frequency transformation, and noise is estimated based on this in order to perform noise suppression.
As examples of related-art techniques, Japanese National Publication of International Patent Application No. 2013-511750, Japanese Laid-open Patent Publication No. 2011-186384, Japanese Laid-open Patent Publication No. 2013-120358, and Japanese Laid-open Patent Publication No. 2008-309955 are known.
According to an aspect of the invention, a noise suppression device includes a memory, and a processor coupled to the memory and configured to generate a first input signal and a second input signal by converting a first sound signal and a second sound signal from time domain to frequency domain, the first sound signal and the second sound signal being collected by a first microphone and a second microphone, respectively, based on the first input signal and the second input signal, determine a stationary noise model, calculate a signal to noise ratio (SNR) based on the first input signal and the stationary noise model, based on the SNR ratio, set a range of phase difference to suppress the first input signal, calculate a phase difference between the first input signal and the second input signal, and when the phase difference is within the range of phase difference, suppress the first input signal.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention, as claimed.
In the above-described noise suppression techniques, if noise included in a sound signal is large and the signal to noise ratio (Signal Noise Ratio, hereinafter also referred to as an “SNR”) is low, sound is suppressed, and thus it becomes difficult to catch the sound.
Reference Example
In (a) in
When noise suppression processing is performed on an input signal as illustrated in (a) in
When a frequency spectrum for a section ΔT2 in the section ΔT1 having large noise in the input signal illustrated in (a) in
In the example illustrated in
Also, when a sound is uttered from the position at the same distance away from two microphone, in an environment in which SNR is high, as illustrated in (a) in
However, if the phase difference range N is fixed although the distribution of the phase difference is changed by SNR, in the environment, in which SNR is low as illustrated in (b) in
[First Embodiment]
As illustrated in
The signal reception unit 101 receives input of a first input signal collected by a first microphone 2A and a second input signal collected by a second microphone 2B.
The transformation unit 102 transforms the first input signal and the second input signal from the signals in time domain into signals in frequency domain. Hereinafter the first input signal and the second input signal transformed into frequency domain by the transformation unit 102 are referred to as a first sound signal and a second sound signal, respectively.
The stationary noise estimation unit 103 estimates stationary noise models of the first sound signal and the second sound signal.
The phase difference calculation unit 104 calculates the phase difference of each frequency band based on the first sound signal and the second sound signal.
The state determination unit 105 determines the state of the first sound signal based on the first sound signal and the stationary noise model. The state determination unit 105 according to the present embodiment determines whether or not the first sound signal is in a low SNR state. The state determination unit 105 calculates an SNR based on the first sound signal and the stationary noise model, and if the calculated SNR is lower than or equal to a predetermined threshold value, the state determination unit 105 determines that the first sound signal is in a low SNR state.
The suppression range setting unit 106 sets the phase difference range in which each frequency band is suppressed in accordance with the determination result (whether or not a low SNR) by the state determination unit 105. In the present embodiment, two suppression phase difference range tables having different phase difference ranges where the input signal is suppressed are provided in advance, and a determination is made of which of the suppression range tables is used in accordance with the SNR.
The suppression coefficient determination unit 107 determines a suppression coefficient to be applied to each frequency band of the first sound signal based on the phase difference calculated by the phase difference calculation unit 104 and the suppression range (the phase difference range where the input signal is suppressed) set by the suppression range setting unit 106.
The suppression signal generation unit 108 multiplies each frequency band of the first sound signal by the suppression coefficient determined by suppression coefficient determination unit 107 to generate a suppression signal.
The inverse transformation unit 109 transforms the suppression signal that is generated from the first sound signal from the signal in frequency domain into a signal in time domain to generate an output sound signal.
The storage unit 110 stores the first suppression phase difference range table and the second suppression phase difference range table, or the like.
In the noise suppression device 1 according to the present embodiment, for example, the first sound signal and the second sound signal are divided for each predetermined frequency band (for example, for each 31.25 Hz), and a suppression coefficient β for suppressing noise is determined based on the phase difference for each frequency band.
It is assumed that if the phase difference is within a predetermined range, the suppression coefficient β is “1”, and if the phase difference is out of the range, the suppression coefficient β is a predetermined value less than 1. Also, the range of the phase difference that causes suppression coefficient β to be 1 is made wider as the frequency band becomes greater. Further, in the above-described embodiment, as described above, the range of the phase difference where the input signal is suppressed is changed in accordance with SNR.
If SNR is equal to or higher than a predetermined threshold value (in the case of high SNR), for example, as illustrated in (a) in
On the other hand, if SNR is lower than the predetermined threshold value (in the case of low SNR), for example, as illustrated in (b) in
In the present embodiment, the range of the phase difference dP(f) where the input signal is suppressed is obtained for each frequency band f for each of the cases of high SNR and low SNR, and the suppression phase difference range tables as illustrated in
The phase difference ranges SA21 and SA22 where the input signal is suppressed at the time of low SNR are set to a value, for example, about ½ or ⅓ times that of the phase difference ranges SA11 and SA12 where the input signal is suppressed at the time of high SNR.
When sound collection by the first microphone 2A and the second microphone 2B is started, the noise suppression device 1 according to the present embodiment performs the processing as illustrated in
The noise suppression device 1 first starts reception of the first input signal and the second input signal (step S1). Step S1 is performed by the signal reception unit 101. The signal reception unit 101 passes the input signal input from the first microphone 2A and the second microphone 2B to the transformation unit 102. In this regard, the signal reception unit 101 continues the processing in step S1 until the sound collection by the first microphone 2A and the second microphone 2B terminate.
Next, the transformation unit 102 transforms the input signal of one frame from time domain into frequency domain (step S2). The transformation unit 102 transforms the input signal, which is a signal in time domain, into a sound signal (frequency spectrum), which is a signal in frequency domain, for example, by Fast Fourier transformation (FFT). When the transformation unit 102 transforms each frame into frequency domain, the transformation unit 102 passes the transformed first sound signal and second sound signal to the stationary noise estimation unit 103 and the phase difference calculation unit 104. Further, the transformation unit 102 passes, for example, the transformed first sound signal to the suppression signal generation unit 108.
Next, the stationary noise estimation unit 103 estimates a stationary noise model based on the received first sound signal and second sound signal (step S3). The stationary noise estimation unit 103 estimates the stationary noise model based on the known estimation method using any one of the stationary noise models. Further, the stationary noise estimation unit 103 passes the first sound signal and the estimated stationary noise model to the state determination unit 105.
Also, when the phase difference calculation unit 104 receives the first sound signal and the second sound signal, the phase difference calculation unit 104 calculates the phase difference between the first sound signal and the second sound signal for each frequency band (step S4). The phase difference calculation unit 104 calculates the phase difference using any one of the known calculation methods. Further, the phase difference calculation unit 104 passes the calculated phase difference to the suppression coefficient determination unit 107.
Also, when the state determination unit 105 receives the first sound signal and the estimated stationary noise model, the state determination unit 105 performs suppression range setting processing in cooperation with the suppression range setting unit 106 (step S5). The state determination unit 105 determines whether or not in the low SNR state based on the first sound signal and the estimated stationary noise model, and notifies the determination result to the suppression range setting unit 106. The suppression range setting unit 106 sets either the first suppression phase difference range table or the second suppression phase difference range table to be used based on the notified determination result. The suppression range setting unit 106 reads the set first suppression phase difference range table or second suppression phase difference range table from the storage unit 110, and passes the table to the suppression coefficient determination unit 107.
Next, the suppression coefficient determination unit 107 performs suppression coefficient determination processing that determines the suppression coefficient β(f) to be applied to each frequency band f of the first sound signal (step S6). The suppression coefficient determination unit 107 determines the suppression coefficient β(f) in accordance with the phase difference of each frequency band f, calculated by the phase difference calculation unit 104, based on the set first suppression phase difference range table or second suppression phase difference range table that has been set by the suppression range setting unit 106. Further, the suppression coefficient determination unit 107 passes the determined suppression coefficient β(f) of each frequency band f to the suppression signal generation unit 108.
When the suppression signal generation unit 108 receives the suppression coefficient β(f) of each frequency band f, the suppression signal generation unit 108 generates a suppression signal produced by applying the suppression coefficient β(f) to a signal component of each frequency band f of the first sound signal received from the transformation unit 102 (step S7). The suppression signal generation unit 108 multiplies the amplitude of each frequency band f by the suppression coefficient β(f) to generate a suppression signal. Further, the suppression signal generation unit 108 passes the generated suppression signal to the inverse transformation unit 109.
The inverse transformation unit 109 transforms the received suppression signal from frequency domain to time domain (step S8). The inverse transformation unit 109 transforms the suppression signal, which is a signal in frequency domain, into an output sound signal, which is a signal in time domain, by Inverse Fast Fourier transformation (IFFT), for example. Further, the inverse transformation unit 109 outputs the transformed output sound signal to a predetermined output destination (for example, a speaker, a memory, a terminal of the other party on the phone, or the like) (step S9).
Also, the noise suppression device 1 checks whether or not there are unprocessed frames after outputting the output sound signal (step S10). If there is an unprocessed frame (step S10; Yes), the noise suppression device 1 performs the processing of steps S2 to S9 on the input signal in sequence for each frame until sound collection by the first microphone 2A and the second microphone 2B is terminated, and there are no unprocessed frames. When there are no unprocessed frames then (step S10; No), the noise suppression device 1 terminates the noise suppression processing.
In the suppression range setting processing that is performed by the state determination unit 105 in cooperation with the suppression range setting unit 106, as illustrated in
Next, the state determination unit 105 compares the calculated entire band SNR average value M1 and the threshold value TH1 and checks whether or not M1<TH1 (step S512).
If the sound included in the sound signal is only stationary noise, the entire band SNR average value becomes a value close to 1.0. Then the entire band SNR average value when a significant sound, such as human voices, or the like is included in the sound signal becomes higher than the entire band SNR average value when the sound signal includes only stationary noise. Further, as the ratio of the stationary noise included in the sound signal becomes lower, the entire band SNR average value becomes higher. The threshold value TH1 to be used for determining whether or not the sound signal is a low SNR is therefore set to a value of about 2.0, for example.
If the entire band SNR average value M1 is equal to or higher than the threshold value TH1 (step S512; No), the state determination unit 105 determines that the first sound signal is a high SNR (not a low SNR) and notifies the determination result to the suppression range determination unit 106. In this case, the suppression range determination unit 106 determines the range of the phase difference where the input signal is suppressed to the first phase difference range based on the notified determination result (step S513). In this regard, the first phase difference range is a phase difference range where the input signal is suppressed, which is defined by the first suppression phase difference range table.
On the other hand, if the entire band SNR average value M1 is lower than the threshold value TH1 (step S512; Yes), the state determination unit 105 determines that the first sound signal is a low SNR and notifies the determination result to the suppression range determination unit 106. In this case, the suppression range determination unit 106 sets the range of the phase difference where the input signal is suppressed to the second phase difference range based on the notified determination result (step S514). In this regard, the second phase difference range is a phase difference range where the input signal is suppressed, which is defined by the second suppression phase difference range table.
Also, when the suppression range setting unit 106 sets the phase difference range where the input signal is suppressed in step S513 or S514, the suppression range setting unit 106 reads the suppression phase difference range table corresponding to the set phase difference range from the storage unit 110 and passes the table to the suppression coefficient determination unit 107. Thereby, the suppression range setting processing for one frame is terminated (return).
In the suppression coefficient determination processing performed by the suppression coefficient determination unit 107, as illustrated in
If the phase difference dP(f) is within the range where the input signal is suppressed (step S612; Yes), the suppression coefficient determination unit 107 calculates a suppression coefficient β(f) corresponding to the phase difference dP(f) (step S613). The suppression coefficient β(f) corresponding to the phase difference dP(f) is calculated by a known method. For example, the suppression coefficient β(f) in the case of within the phase difference range where the input signal is suppressed is set to a fixed value less than 1 (for example, 0.5, or the like) regardless of the phase difference. Also, for example, the suppression coefficient β(f) in the case of within the phase difference range where the input signal is suppressed may be set to have an inversely proportional relationship with the absolute value of the phase difference dP(f).
On the other hand, if the phase difference dP(f) is not within the range where the input signal is suppressed (step S612; No), the suppression coefficient determination unit 107 sets the suppression coefficient β(f) to “1” regardless of the phase difference dP(f) (step S614).
After that, the suppression coefficient determination unit 107 checks whether or not the determination processing of the suppression coefficient β(f) has been performed for all the frequency bands f (step S615). If there is an unprocessed frequency band f (step S615; No), the suppression coefficient determination unit 107 repeats the processing from steps S611 to S614 for all the unprocessed frequency bands f. Then if the processing has been performed for all the frequency bands f (step S615; Yes), the suppression coefficient determination unit 107 passes the suppression coefficient β(f) of the determined each frequency band f to the suppression signal generation unit 108, and terminates the suppression coefficient calculation processing for one frame (return).
In this manner, in the noise suppression processing according to the present embodiment, the phase difference range where the input signal is suppressed is changed in accordance with the SNR of the input sound signal. Specifically, when the SNR is low, the phase difference range where the input signal is not suppressed is widened than that of when the SNR is high, and the phase difference range where the input signal is suppressed is narrowed. In this manner, the phase difference range where the input signal is not suppressed of the input sound signal is widened so that the amount of suppression of the significant sound in a low SNR section is reduced. Accordingly, in the output sound signal suppressed by the noise suppression device 1 according to the present embodiment, it becomes easy to catch sound in a low SNR section.
In this regard, in (a) in
If the noise suppression processing based on the phase difference is performed on the input signal of the waveform as illustrated in (b) in
Further, when a section ΔT3 in a low SNR section in (b) in
As described above, in the noise suppression processing according to the present embodiment, when much noise is included and the SNR is low, the phase difference range where the input signal is not suppressed is widened, and the phase difference range is narrowed so that the amount of suppression of the speech sound is reduced. Accordingly, with the present embodiment, the amount of suppression of the speech sound when the SNR is low is reduced, and thus the voice in the output sound becomes easy to catch.
In this regard, the phase difference ranges SA21 and SA22 where the input signal is suppressed at the time of low SNR may be calculated using a predetermined function in place of storing the ranges in the storage unit 110 as the second suppression phase difference range table as described above. Also, the phase difference ranges SA21 and SA22 where the input signal is suppressed at the time of low SNR, illustrated in
Also, the phase difference ranges SA11 and SA12 where the input signal is suppressed at the time of high SNR, illustrated in
[Second Embodiment]
In a second embodiment, the phase difference range where the input signal is suppressed is set in accordance with whether or not the sound signal to be suppressed is a low SNR and in a voiced state (hereinafter also referred to as a “low SNR voiced state”).
The functional configuration of a noise suppression device according to the present embodiment is the same as that of the noise suppression device 1 according to the first embodiment excluding the state determination unit 105 and the suppression range setting unit 106. The state determination unit 105 of the noise suppression device 1 according to the present embodiment includes, as illustrated in
The entire band SNR average value calculation unit 105A calculates the entire band SNR average value M1 described in the first embodiment.
The low frequency SNR average value calculation unit 105B calculates the average value (low frequency SNR average value) M2 of the SNR of only a frequency band having a larger amplitude than that of a stationary noise model among the frequency band lower than a predetermined frequency.
The low SNR voiced state determination unit 105C determines whether or not the sound signal to be suppressed is in a low SNR voiced state based on the entire band SNR average value M1 and the low frequency SNR average value M2. If the entire band SNR average value M1 is lower than the first threshold value TH1, and the low frequency SNR average value M2 is higher than the second threshold value TH2, the low SNR voiced state determination unit determines that the sound signal to be suppressed is in a low SNR voiced state. The low SNR voiced state determination unit 105C passes the determination result to the suppression range setting unit 106.
The suppression range setting unit 106 sets the phase difference range where the input signal is suppressed for each frequency band in accordance with the determination result (whether or not in the low SNR voiced state). In the present embodiment, two suppression phase difference range tables having different phase difference ranges where the input signal is suppressed are provided in advance in the same manner as the first embodiment, and a determination is made of which suppression phase difference range table is used based on whether or not in a low SNR voiced state.
Whether or not in the low SNR voiced state is determined based on the entire band SNR average value M1 and the low frequency SNR average value M2 as described above. The entire band SNR average value M1 is used for determining whether or not a low SNR, and the low frequency SNR average value M2 is used for determining whether or not in a voiced state. The low frequency SNR average value M2 is produced by calculating the average value of the SNR only by the frequency band having a larger amplitude than that of the stationary noise model among the frequency bands of less than or equal to 500 Hz, for example. Accordingly, the low frequency SNR average value M2 becomes higher than the entire band SNR average value M1. For example, the relationship between the entire band SNR average value M1 in a section in the low SNR voiced state and the low frequency SNR average value M2 becomes a relationship as illustrated in
In the noise suppression device 1 according to the present embodiment, in the same manner as the first embodiment, when sound collection by the first microphone 2A and the second microphone 2B is started, the noise suppression processing as illustrated in
In the suppression range setting processing in the noise suppression processing according to the present embodiment, as illustrated in
Also, the state determination unit 105 calculates a low frequency SNR average value M2 (step S522). Step S522 is performed by the low frequency SNR average value calculation unit 105B. The low frequency SNR average value calculation unit 105B calculates the low frequency SNR average value M2 of only the low frequency band (for example, less than or equal to 500 Hz) and the frequency band where an amplitude of sound signal is larger than an amplitude of the stationary noise model, and passes the calculated low frequency SNR average value M2 to the low SNR voiced state determination unit 105C.
When the low SNR voiced state determination unit 105C receives the entire band SNR average value M1 and the low frequency SNR average value M2, the low SNR voiced state determination unit 105C checks whether M1<TH1 and M2>TH2 (step S523). The first threshold value TH1 to be compared with the entire band SNR average value M1 is set to a value of about 2.0, for example, as described above. Also, the low frequency SNR average value M2 becomes a value higher than the entire band SNR average value M1, and the second threshold value TH2 to be compared with the low frequency SNR average value M2 is set to a value of about 3.0, for example.
If M1≧TH1, the sound signal is not a low SNR. Also, if M2≦TH2, the sound signal is not a voiced state. Thus, if either or both of M1≧TH1 and M2≦TH2 are satisfied (step S523; No), the low SNR voiced state determination unit 105C determines that the sound signal is not in a low SNR voiced state, and notifies the determination result to the suppression range setting unit 106. In this case, the suppression range setting unit 106 sets the phase difference range where the input signal is suppressed to the first phase difference range based on the notified determination result (step S524).
On the other hand, if M1<TH1 and M2>TH2 (step S523; Yes), the low SNR voiced state determination unit 105C determines that the sound signal is in a low SNR voiced state, and notifies the determination result to the suppression range setting unit 106. In this case, the suppression range setting unit 106 sets the phase difference range where the input signal is suppressed to the second phase difference range based on the notified determination result (step S525).
Also, after the suppression range setting unit 106 sets the phase difference range where the input signal is suppressed in step S524 or S525, the suppression range setting unit 106 reads a suppression phase difference range table corresponding to the set phase difference range from the storage unit 110 and passes the table to the suppression coefficient determination unit 107. Thereby, the suppression range setting processing for one frame is terminated (return).
In this manner, in the second embodiment, only when the sound signal to be suppressed is a low SNR and is in a voiced state, the phase difference range where the input signal is not suppressed (the range of setting the suppression coefficient β to 1) is widened, and the phase difference range where the input signal is suppressed is narrowed. That is to say, even when the sound signal to be suppressed is a low SNR, if the sound signal is in a voiceless state, the suppression coefficient determination unit 107 determines the suppression coefficient β based on the same first suppression phase difference range table as that of when the sound signal is a high SNR. Accordingly, it is possible to increase the amount of suppression of noise when the sound signal is a low SNR and is in a voiceless state, and thus uncomfortable feeling, or the like due to large noise is reduced.
On the other hand, if the sound signal to be suppressed is a low SNR and is in a voiced state, the suppression coefficient determination unit 107 determines the suppression coefficient β based on the second suppression phase difference range table in which the phase difference range where the input signal is not suppressed is widened. Accordingly, when the sound signal is a low SNR and in a voiced state, the amount of suppression of the speech sound is reduced, and the speech sound in a low SNR section becomes easy to catch.
[Third Embodiment]
In a third embodiment, a suppression coefficient β is calculated based on the phase difference between the first sound signal and the second sound signal, a suppression coefficient α for the stationary noise is calculated, and a suppression coefficient γ to be applied to a component of the frequency band f is determined based on the suppression coefficients β and α.
The functional configuration of the noise suppression device according to the present embodiment is the same as that of the noise suppression device 1 according to the second embodiment with the exception of the suppression range setting unit 106 and the suppression coefficient determination unit 107. That is to say, the state determination unit 105 in the noise suppression device 1 illustrated in
The suppression range setting unit 106 includes a suppression phase difference range setting unit 106A and a suppression SNR range setting unit 106B.
The suppression phase difference range setting unit 106A sets the phase difference range where the input signal is suppressed when suppression by the phase difference is performed based on the determination result of the state determination unit 105. If the determination result is that it is not in a low SNR voiced state, the suppression phase difference range setting unit 106A sets the phase difference range of the first suppression phase difference range table in the phase difference range where the input signal is suppressed. If the determination result is that it is in a low SNR voiced state, the suppression phase difference range setting unit 106A sets the phase difference range of the second suppression phase difference range table in the phase difference range where the input signal is suppressed.
The suppression SNR range setting unit 106B sets the SNR range when stationary noise is suppressed based on the determination result of the state determination unit 105. If the determination result is that it is not in a low SNR voiced state, the suppression SNR range setting unit 106B sets the SNR range of the first suppression SNR range table in the SNR range where the input signal is suppressed. If the determination result is that it is in a low SNR voiced state, the suppression SNR range setting unit 106B sets the SNR range of the second suppression SNR range table in the SNR range where the input signal is suppressed. In this regard, the first and the second suppression SNR range tables are tables representing corresponding relationships between SNRs and suppression coefficients α, respectively. Compared with the first suppression SNR range table, in the second suppression SNR range table, the SNR range where the input signal is not suppressed (the SNR range that causes the suppression coefficient α to “1”) is widened so as to narrow the SNR range. The first and second suppression SNR range tables are stored in the storage unit 110.
The suppression coefficient determination unit 107 includes a first suppression coefficient calculation unit 107A, a second suppression coefficient calculation unit 107B, and a suppression coefficient decision unit 107C.
The first suppression coefficient calculation unit 107A calculates a suppression coefficient β(f) in accordance with the phase difference dP(f) for each frequency band f based on the first or the second suppression phase difference range table set by the suppression phase difference range setting unit 106A.
The second suppression coefficient calculation unit 107B calculates a suppression coefficient α(f) in accordance with the SNR(f) for each frequency band f based on the first or the second suppression SNR range tables set by the suppression SNR range setting unit 106B.
The suppression coefficient decision unit 107C decides a suppression coefficient γ(f) to be applied to the signal component (amplitude) of the frequency band f based on the suppression coefficient β(f) calculated by the first suppression coefficient calculation unit 107A and the suppression coefficient α(f) calculated by the second suppression coefficient calculation unit 107B. The suppression coefficient γ(f) to be applied is set to, for example, the product of the suppression coefficients α(f) and β(f). Also, the suppression coefficient γ(f) is set to, for example, a coefficient having a lower value out of the suppression coefficients α(f) and β(f).
For the suppression coefficient α when stationary noise is suppressed, for example, as a broken line illustrated by a solid line in
If the suppression coefficient α is determined based on the broken line illustrated by the solid line in
For the solid broken line (function) to be used for determining the suppression coefficient α at the time of high SNR, illustrated in
In the same manner as the first embodiment, when the noise suppression device 1 according to the present embodiment starts sound collection by the first microphone 2A and the second microphone 2B, the noise suppression device 1 performs the noise suppression processing as illustrated in
In the suppression range setting processing in the noise suppression processing according to the present embodiment, as illustrated in
Also, the state determination unit 105 calculates the low frequency SNR average value M2 (step S532). Step S532 is performed by the low frequency SNR average value calculation unit 1056. The low frequency SNR average value calculation unit 105B calculates the average value (low frequency SNR average value M2) of the SNRs of only the frequency band of the low frequency (for example, lower than or equal to 500 Hz) and where an amplitude of the sound signal larger than that of the stationary noise model, and passes the calculated low frequency SNR average value M2 to the low SNR voiced state determination unit 105C.
When the low SNR voiced state determination unit 105C receives the entire band SNR average value M1 and the low frequency SNR average value M2, the low SNR voiced state determination unit 105C checks whether or not M1<TH1 and M2>TH2 (step S533). The first threshold value TH1 and the second threshold value TH2 are a value of about 2.0 and a value of about 3.0, respectively, as described above.
If either or both of M1≧TH1 and M2≦TH2 are satisfied (step S533; No), the low SNR voiced state determination unit 105C determines that the sound signal is not in a low SNR voiced state. In this case, the state determination unit 105 (low SNR voiced state determination unit 105C) notifies the suppression phase difference range setting unit 106A and the suppression SNR range setting unit 106B of the suppression range setting unit 106 that the sound signal is not in the low SNR voiced state. The suppression range setting unit 106 that has received the notification sets the phase difference range and the SNR range where the input signal is suppressed to a first range (step S534). In this regard, the first range is the phase difference range, where the input signal is suppressed, defined by the first suppression phase difference range table and the SNR range, where the input signal is suppressed, defined by the first suppression SNR range table. That is to say, in step S534, the suppression phase difference range setting unit 106A determines the phase difference range where the input signal is suppressed to the phase difference range in the first suppression phase difference range table, and the suppression SNR range setting unit 106B determines the SNR range where the input signal is suppressed to be the SNR range in the first suppression SNR range table.
On the other hand, M1<TH1, and M2>TH2 (step S533; Yes), the low SNR voiced state determination unit 105C determines that the sound signal is in a low SNR voiced state. In this case, the state determination unit 105 (the low SNR voiced state determination unit 105C) notifies the suppression phase difference range setting unit 106A and the suppression SNR range setting unit 106B of the suppression range setting unit 106 of the low SNR voiced state. Then the notified suppression range setting unit 106 sets the phase difference range and the SNR range where the input signal is suppressed to the second range (step S535). In this regard, the second range is the phase difference range, where the input signal is suppressed, defined by the second suppression phase difference range table, and the SNR range, where the input signal is suppressed, defined by the second suppression SNR range table. That is to say, in step S535, the phase difference range suppressed by the suppression phase difference range setting unit 106A is determined to be the phase difference range in the second suppression phase difference range table, and the SNR range suppressed by the suppression SNR range setting unit 106B to be the SNR range in the second suppression SNR range table.
Also, when the suppression phase difference range setting unit 106A sets the phase difference range where the input signal is suppressed in step S534 or S535, the suppression phase difference range setting unit 106A reads the suppression phase difference range table corresponding to the set phase difference range from the storage unit 110 and passes the table to the first suppression coefficient calculation unit 107A. In the same manner, when the suppression SNR range setting unit 106B sets the SNR range where the input signal is suppressed in step S534 or S535, the suppression SNR range setting unit 106B reads the suppression SNR range table corresponding to the set SNR range from the storage unit 110 and passes the table to the second suppression coefficient calculation unit 107B. Thereby, the suppression range setting processing for one frame is terminated (return).
In the suppression coefficient determination processing in the noise suppression processing according to the present embodiment, as illustrated in
Next, the first suppression coefficient calculation unit 107A performs processing for calculating the suppression coefficient β based on the phase difference (step S632), and the second suppression coefficient calculation unit 107B performs processing for calculating the suppression coefficient α based on the SNR (step S633). The first suppression coefficient calculation unit 107A performs, for example, the processing of steps S611 to S615, illustrated in
When the suppression coefficient decision unit 107C receives the suppression coefficients β(f) and α(f), the suppression coefficient decision unit 107C determines a suppression coefficient γ(f) to be applied to the component of the frequency band f based on the received suppression coefficients β(f) and α(f) (step S634). In step S634, the suppression coefficient decision unit 107C determines that, for example, γ(f)=α(f)×β(f) is the suppression coefficient applied to the signal component of the frequency band f.
After that, the suppression coefficient determination unit 107 checks whether or not the processing for determining a suppression coefficient γ(f) for all the frequency bands f has been completed (step S635). If there is an unprocessed frequency bands f (step S635; No), the suppression coefficient determination unit 107 repeats the processing of steps S631 to S634 on the unprocessed frequency band f. Then if the processing for all the frequency bands f has been completed (step S635; Yes), the suppression coefficient determination unit 107 passes the suppression coefficient γ(f) of each of the decided frequency bands f to the suppression signal generation unit 108 and terminates the suppression coefficient calculation processing (return).
When the suppression signal generation unit 108 receives the suppression coefficient γ(f), the suppression signal generation unit 108 applies the suppression coefficient γ(f) to the signal component of each frequency band f in the first sound signal to generate a suppression signal.
In this manner, in the third embodiment, a suppression coefficient γ(f) to be applied to the component of the frequency band f is decided (determined) based on the suppression coefficient β(f) based on the phase difference and the suppression coefficient α(f) based on the stationary noise. Also, if in a low SNR voiced state, the suppression range setting unit 106 widens the phase difference range where the input signal is not suppressed to calculate a suppression coefficient β(f), and widens the SNR range where the input signal is not suppressed to calculate a suppression coefficient α(f). Accordingly, under an environment having stationary noise, the amount of suppression of the speech sound at the time of a low SNR and in a voiced state is reduced, and thus it becomes easy to catch the speech sound in a low SNR section.
In this regard, the second suppression SNR range table to be used for calculating a suppression coefficient α(f) is not limited to a graph produce by parallel translation of a graph corresponding to the first suppression SNR range table, and may be created based on a graph that narrows the SNR range in which the suppression coefficient α(f) becomes the minimum value.
A broken line illustrated by a solid line in
If the second suppression SNR range table is made to correspond to the dotted broken line (function) illustrated in
Further, the second suppression SNR range table according to the present embodiment is not limited to the dotted broken line (function) illustrated in
[Fourth Embodiment]
In a fourth embodiment, a suppression coefficient β is calculated based on the phase difference between the first sound signal and the second sound signal, a suppression coefficient α for the stationary noise is calculated, and a suppression coefficient γ to be applied to a component of the frequency band f is determined based on the suppression coefficients β and α. Also, in the fourth embodiment, when the suppression coefficient α for the stationary noise is calculated, an SNR range in which suppression by the phase difference is studied is set.
The functional configuration of the noise suppression device according to the present embodiment is the same as that of the noise suppression device 1 according to the second embodiment with the exception of the suppression range setting unit 106 and the suppression coefficient determination unit 107. That is to say, the state determination unit 105 in the noise suppression device 1 illustrated in
The suppression range setting unit 106 includes a suppression phase difference range setting unit 106A, a suppression SNR range setting unit 106B, and a study range setting unit 106C.
The suppression phase difference range setting unit 106A sets the phase difference range where the input signal is suppressed when suppression by the phase difference is performed based on the determination result of the state determination unit 105. If the determination result is that it is not in a low SNR voiced state, the suppression phase difference range setting unit 106A sets the phase difference range of the first suppression phase difference range table in the phase difference range where the input signal is suppressed. If the determination result is that it is in a low SNR voiced state, the suppression phase difference range setting unit 106A sets the phase difference range of the second suppression phase difference range table in the phase difference range where the input signal is suppressed.
The suppression SNR range setting unit 106B sets the SNR range when stationary noise is suppressed based on the determination result of the state determination unit 105. If the determination result is that it is not in a low SNR voiced state, the suppression SNR range setting unit 106B sets the SNR range of the first suppression SNR range table in the SNR range where the input signal is suppressed. If the determination result is that it is in a low SNR voiced state, the suppression SNR range setting unit 106B sets the SNR range of the second suppression SNR range table to the SNR range where the input signal is suppressed.
The study range setting unit 106C sets a range for studying suppression by the phase difference in the suppression SNR range set by the suppression SNR range setting unit 106B.
The suppression coefficient determination unit 107 determines a suppression coefficient to be applied to the component of each frequency band f based on the suppression SNR range set by the suppression range setting unit 106, the range for studying suppression by the phase difference, and the suppression phase difference range.
In the present embodiment, two kinds of suppression SNR ranges for stationary noise are provided, for example, a first suppression SNR range corresponding to a broken line illustrated by a solid line in
Also, in the present embodiment, the ranges for studying suppression by the phase difference are set to the first suppression SNR range and the second suppression SNR range, respectively. For example, in the example illustrated in
In the same manner as the first embodiment, when the noise suppression device 1 according to the present embodiment starts sound collection by the first microphone 2A and the second microphone 2B, the noise suppression device 1 performs the noise suppression processing as illustrated in
In the suppression range setting processing in the noise suppression processing according to the present embodiment, as illustrated in
Also, the state determination unit 105 calculates a low frequency SNR average value M2 (step S542). Step S542 is performed by the low frequency SNR average value calculation unit 105B. The low frequency SNR average value calculation unit 105B calculates the average value (the low frequency SNR average value M2) of the SNRs of only a low frequency (for example, lower than or equal to 500 Hz) and the frequency band where an amplitude of the sound signal larger than that of the stationary noise model, and passes the calculated low frequency SNR average value M2 to the low SNR voiced state determination unit 105C.
When the low SNR voiced state determination unit 105C receives the entire band SNR average value M1 and the low frequency SNR average value M2, the low SNR voiced state determination unit 105C checks whether M1<TH1 and M2>TH2 (step S543). The first threshold value TH1 and the second threshold value TH2 are assumed to be values of about 2.0 and about 3.0, respectively, as described above.
If either or both of M1≧TH1 and M2≦TH2 are satisfied (step S543; No), the low SNR voiced state determination unit 105C determines that the sound signal is not in a low SNR voiced state. In this case, the state determination unit 105 (low SNR voiced state determination unit 105C) notifies the suppression phase difference range setting unit 106A and the suppression SNR range setting unit 106B in suppression range setting unit 106 that the sound signal is not in a low SNR voiced state. The notified suppression range setting unit 106 sets the phase difference range where the input signal is suppressed and the SNR range to the first range (step S544). In step S544, the suppression phase difference range setting unit 106A sets the phase difference range to the phase difference range where the input signal is suppressed in the first suppression phase difference range table, and the suppression SNR range setting unit 106B sets the SNR range where the input signal is suppressed to the SNR range in the first suppression SNR range table.
On the other hand, if M1<TH1 and M2>TH2 (step S543; Yes), the low SNR voiced state determination unit 105C determines that the sound signal is in a low SNR voiced state. In this case, the state determination unit 105 (low SNR voiced state determination unit 105C) notifies the suppression phase difference range setting unit 106A and the suppression SNR range setting unit 106B in the suppression range setting unit 106 that that the sound signal is the low SNR voiced state. The notified suppression range setting unit 106 sets the phase difference range where the input signal is suppressed and the SNR range to the second range (step S545). In the processing in step S545, the suppression phase difference range setting unit 106A sets the phase difference range to the phase difference range in the second suppression phase difference range table, and the suppression SNR range setting unit 106B sets the SNR range where the input signal is suppressed to the SNR range of the second suppression SNR range table.
Also, after the suppression phase difference range setting unit 106A sets the phase difference range where the input signal is suppressed in step S544 or S545, the suppression phase difference range setting unit 106A reads a suppression phase difference range table corresponding to the set phase difference range from the storage unit 110 and passes the table to the suppression coefficient determination unit 107. In the same manner, after the suppression SNR range setting unit 106B sets the SNR range where the input signal is suppressed in step S544 or S545, the suppression SNR range setting unit 106B reads a suppression SNR range table corresponding to the set SNR range from the storage unit 110, and passes the table to the suppression coefficient determination unit 107. Further, after the suppression SNR range setting unit 106B determines the SNR range where the input signal is suppressed in step S544 or S545, the suppression SNR range setting unit 106B notifies the determined SNR range to the study range setting unit. When the study range setting unit 106C receives the notification of the SNR range where the input signal is suppressed, the study range setting unit 106C sets the SNR range to be studied for suppression by the phase difference based on the notified SNR range (step S546). The study range setting unit 106C notifies the SNR range to be studied for suppression by the set phase difference to the suppression coefficient determination unit 107. Thereby, the suppression range setting processing for one frame is terminated (return).
In the suppression coefficient determination processing in the noise suppression processing according to the present embodiment, as illustrated in
Next, the suppression coefficient determination unit 107 calculates a suppression coefficient α(f) corresponding to the SNR(f) (step S642).
Also, the suppression coefficient determination unit 107 checks whether or not the SNR(f) is within the range to be studied for suppression by the phase difference in parallel with the processing of step S642 (step S643). If the SNR(f) is not within the range to be studied for suppression by the phase difference (step S643; No), the suppression coefficient determination unit 107 sets the suppression coefficient β(f) based on the phase difference to 1 (step S644).
On the other hand, if the SNR(f) is within the range to be studied for suppression by the phase difference (step S643; Yes), the suppression coefficient determination unit 107 next compares the phase difference dP(f) of the frequency band f with the phase difference range where the input signal is suppressed (step S645).
Next, the suppression coefficient determination unit 107 checks whether or not the phase difference dP(f) is within a range in which suppression by the phase difference is to be performed (step S646). The suppression coefficient determination unit 107 refers to the first suppression phase difference range set by the suppression phase difference range setting unit 106A or the second suppression phase difference range, and determines whether or not the phase difference dP(f) is a range where the input signal is suppressed.
If the phase difference dP(f) is within the range where the input signal is suppressed (step S646; Yes), the suppression coefficient determination unit 107 calculates a suppression coefficient β(f) in accordance with the phase difference dP(f) (step S647). On the other hand, if the phase difference dP(f) is out of the range where the input signal is suppressed (step S645; No), the suppression coefficient determination unit 107 sets the suppression coefficient β(f) to 1 (step S644).
After that, the suppression coefficient determination unit 107 determines a suppression coefficient γ(f) to be applied to the component of the frequency band f based on the suppression coefficient α(f) calculated in step S642 and the suppression coefficient β(f) calculated in step S644 or S647 (step S648). In step S648, the suppression coefficient determination unit 107 determines, for example, γ(f)=α(f)×β(f) to be a suppression coefficient to be applied to the signal component of the frequency band f.
When the suppression coefficient γ(f) to be applied to the component of the frequency band f is determined in step S648, the suppression coefficient determination unit 107 next checks whether or not the processing has been performed for all the frequency bands f (step S649). If there is a frequency band for which processing has not been performed (step S649; No), the suppression coefficient determination unit 107 performs the processing of step S641 and after that for the frequency band f that has not been processed. If the processing for all the frequency bands is performed (step S649; Yes), the suppression coefficient determination unit 107 passes the suppression coefficient γ(f) to be applied for each frequency band f to the suppression signal generation unit 108, and the suppression coefficient determination processing for one frame is terminated (return).
In the present embodiment, when a suppression SNR range for the stationary noise is set, an SNR range higher than a predetermined SNR is set to the SNR range to be studied for suppression by the phase difference. That is to say, if the SNR is high and the stationary noise is low, suppression by the phase difference is studied in addition to suppression by the SNR. Thus if the stationary noise is small, but non-stationary noise is included in the sound signal, non-stationary noise is suppressed by suppression by the phase difference.
In this regard, in the present embodiment, if the suppression coefficient γ(f) to be applied to the component of the frequency band f is determined from the suppression coefficients α(f) and β(f), in place of using γ(f) =α(f)×β(f), for example, a smaller one of α(f) and β(f) may be used as the suppression coefficient γ(f).
It is possible to achieve the noise suppression device 1 according to the first to the fourth embodiments described above by a computer and a program for causing the computer to execute the above-described noise suppression processing. In the following, a description will be given of the noise suppression device 1 that is achieved by the computer and the program with reference to
As illustrated in
The processor 501 is an arithmetic processing unit, such as a central processing unit (CPU), a micro processing unit (MPU), or the like. The processor 501 executes various programs including an operating system so as to control the overall operation of the computer 5.
The main storage device 502 includes a read only memory (ROM) and a random access memory (RAM). In the ROM, for example, a predetermined basic control program that is read by the processor 501 at the time of starting the computer 5, or the like is recorded. Also, the RAM is used as a working storage area as demanded when the processor 501 executes the various programs. In the noise suppression device 1, the RAM of the main storage device 502 may be used for temporarily storing, for example the suppression phase difference range table, the suppression SNR range table, the suppression signal, and the like.
The auxiliary storage device 503 is a storage device having a large capacity compared with the main storage device 502, such as a hard disk drive (HDD), a solid state drive (SSD), or the like. The auxiliary storage device 503 stores the various programs that are executed by the processor 501, various kinds of data, or the like. The programs that are stored in the auxiliary storage device 503 include, for example, a program of the sound input and output processing including the above-described noise suppression processing, and the like.
The input device 504 is, for example, a keyboard device or a mouse device, and when the input device 504 is operated by an operator of the computer 5, the input device 504 transmits input information associated with the operation contents to the processor 501.
The display device 505 is, for example, a display device, such as a liquid crystal display, or the like. The liquid crystal display displays various texts, images, or the like in accordance with the display data transmitted from the processor 501, or the like.
The input and output I/F device 506 is an interface device for coupling various external devices, such as a microphone array 2, a speaker 3, or the like to the computer 5 in order to enable the devices.
The storage medium drive device 507 reads a program and data that are stored in a portable storage medium not illustrated in the figure, and writes the data, or the like stored in the auxiliary storage device 503 into the portable storage medium. As a portable storage medium, for example, it is possible to use a flash memory provided with a USB standard connector. Also, as a portable storage medium, it is possible to use an optical disc, such as a compact disk (CD), a digital versatile disc (DVD), a Blu-ray Disc (Blu-ray is a registered trademark), or the like.
The communication device 508 is a device that couples, for example, the computer 5 and a communication network, such as the Internet, or the like in a communicable manner, and that performs communication with external communication devices, or the like via a communication network. Also, the communication device 508 may be a device that performs telephone calls and communication via a telephone network, such as a mobile phone line, or the like, for example.
In the computer 5, the processor 501 reads a program including the above-described noise suppression processing from the auxiliary storage device 503, or the like, and executes the program so as to suppress noise in the input signal input from the microphone array 2. Also, an output sound signal, from which noise is suppressed, may be output from the speaker 3, for example. Also, if the computer 5 is capable of telephone conversation, such as a mobile phone terminal, a smartphone, or the like, the output sound signal may be transmitted to a terminal of the other party on the phone via the communication device 508.
Also, the computer 5 may be, for example, a car navigation system, or the like. In this case, a program for executing the above-described noise suppression processing may be, for example, combined with a speech recognition program.
All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the invention and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2015-129112 | Jun 2015 | JP | national |
Number | Name | Date | Kind |
---|---|---|---|
8898058 | Shin | Nov 2014 | B2 |
20100323652 | Visser | Dec 2010 | A1 |
20120035920 | Hayakawa | Feb 2012 | A1 |
20120134509 | Matsumoto | May 2012 | A1 |
20120197638 | Li et al. | Aug 2012 | A1 |
20130166286 | Matsumoto | Jun 2013 | A1 |
20130188799 | Otani | Jul 2013 | A1 |
Number | Date | Country |
---|---|---|
2008-309955 | Dec 2008 | JP |
2011-186384 | Sep 2011 | JP |
2013-511750 | Apr 2013 | JP |
2013-120358 | Jun 2013 | JP |
Number | Date | Country | |
---|---|---|---|
20160379614 A1 | Dec 2016 | US |