This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2011-207508, filed on Sep. 22, 2011, the entire contents of which are incorporated herein by reference.
The embodiments discussed herein are related to a reverberation suppression device, a reverberation suppression method, and a reverberation suppression program configured to suppress reverb in sound input into a microphone provided in a device such as a mobile device.
When a mobile device is used indoors, sound emitted by the user not only reaches the microphone of the mobile device directly, but also reaches the microphone after reflecting off objects such as the surrounding walls and ceiling. In the following description, sound that reaches a microphone directly will be designated direct sound, while sound that reaches the microphone after reflecting off objects such as the surrounding walls and ceiling will be designated reverb. Also, a signal obtained by the microphone in response to the arrival of sound will be designated an input signal.
For example, in a comparatively small room such as a bathroom, reverb reflected off the surroundings is greater compared to another place such as a living room. For this reason, when the telephony functions of a mobile device are used in a room such as bathroom, it may be difficult in some cases to generate clear sound from the input signal obtained by the microphone because of the superposition of direct sound and reverb.
Japanese Laid-open Patent Publication No. 2008-58900 proposes a technology that suppresses reverb components included in an input signal obtained by a microphone, in which a reverb power spectrum estimated from the power spectra of past frames is subtracted from the power spectrum of the current frame. This technique attempts reverberation suppression by determining filter coefficients so as to minimize a weighted sum of the residual speech power in a reverb segment at the end of an utterance and the subtracted power in an utterance segment, which are estimated on the basis of change in the input signal over time.
According to an aspect of the embodiment, a reverberation suppression device includes an analyzer configured to analyze change over time in the power of an input signal obtained from a microphone in response to sound input, and thereby compute the decrease per unit time in the power of the input signal in a reverb segment following the end of a segment in which the sound is produced; and a suppression controller configured to control a suppression gain which indicates the rate at which the input signal is attenuated, on the basis of analysis results from the analyzer.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention, as claimed.
These and/or other aspects and advantages will become apparent and more readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawing of which:
Hereinafter, embodiments of a reverberation suppression device, a reverberation suppression method, and a reverberation suppression program of the present disclosure will be described in detail on the basis of the drawings.
A reverberation suppression device 100 of the present disclosure may be applied to the reverberation suppression of input signals obtained by a microphone 101 mounted in various electronic devices, including personal digital assistants equipped with communication functions, telephone handsets, and portable videogame systems.
The reverberation suppression device 100 illustrated by example in
S(n,f)=10 log10|X(n,f)|2 (1)
The analyzer 110 analyzes characteristics of the change over time of an input signal x(t) in a reverb segment following the end of a segment in which sound is produced, on the basis of the input signal spectrum X(n, f) or the input power spectrum S(n, f) for each frame, as discussed later. On the basis of analysis results from the analyzer 110, the suppression controller 120 controls a suppression gain G(n, f) which expresses the attenuation rate applied to the input signal spectra X(n, f) by the suppression applier 103 in order to suppress the reverb component included in the input signal spectra X(n, f). Additionally, by applying such suppression gain G(n, f) to the input signal spectra X(n, f), the suppression applier 103 generates output signal spectra Y(n, f) in which the reverb component has been appropriately suppressed. The inverse transform unit 104 generates the output signal y(t) by, for example, applying an inverse Fourier transform to the output signal spectra Y(n, f) generated by the suppression applier 103.
Next, a technique by which the analyzer 110 analyzes characteristics of change over time in the reverb segment of an input signal x(t) will be described.
The segments labeled Ta1 and Ta3 in
Compared to the reverb segments Ta2 and Ta4 appearing in the input signal x(t) illustrated in
However, the decrease per unit time of the input signal x(t) in the reverb segments Ta2 and Ta4 illustrated in
This is because the reverb component is correlated with the preceding input sound and attenuates according to the reverb characteristics of the room, and thus the decrease per unit time of an input signal x(t) in a reverb segment represents the attenuation rate of the reverb component according to the reverb characteristics. In other words, in the regions not filled with background noise, it is possible to ascertain the attenuation rate of the reverb component according to the reverb characteristics, on the basis of the decrease per unit time in a reverb segment of the input signal x(t).
Consequently, by causing the analyzer 110 illustrated by example in
For example, a small decrease per unit time of the input signal x(t) in a reverb segment indicates that attenuation of the reverb component is slow in the environment where the microphone 101 is placed. In contrast, a large decrease per unit time of the input signal x(t) in a reverb segment indicates that the reverb component rapidly attenuates in the environment where the microphone 101 is placed. In this way, the decrease per unit time of the input signal x(t) in a reverb segment obtained as analysis results by the analyzer 110 indicates the attenuation rate of the reverb component in the environment where the microphone 101 is placed.
Consequently, by causing the suppression controller 120 illustrated by example in
The suppression controller 120 may also apply control so as to reduce the suppression gain G(n, f) applied to the input signal spectra X(n, f) in the case where analysis results obtained by the analyzer 110 indicate a large decrease per unit time of an input signal x(t) in a reverb segment, for example. By having the suppression controller 120 apply such control, it is possible to mitigate over-suppression of an input signal x(t) obtained by a microphone 101 placed in an environment where the reverb component attenuates rapidly.
In step S301, the analyzer 110 illustrated by example in
Subsequently, the analyzer 110 analyzes change in the input signal x(t) over time on the basis of the respective input power spectra S(j, f) (where j=1 to n) of the frames received thus far (step S302). In step S302, the analyzer 110 may also compute an index indicating the decrease per unit time in a reverb segment of the input signal x(t). The analyzer 110 may then output the computed index as an analysis result. Furthermore, the analyzer 110 may also extract characteristics of change over time in the input signal x(t) in a reverb segment on the basis of change over time in the input signal x(j, t) (where j=1 to n) itself up to the nth frame.
On the basis of the analysis result obtained by the processing in step S302, the suppression controller 120 illustrated by example in
Subsequently, the suppression applier 103 and the inverse transform unit 104 illustrated by example in
As discussed above, analysis results from the analyzer 110 indicate how readily the reverb component attenuates in an indoor environment, regardless of the magnitude of background noise. The suppression gain G(n, f) determined for each frame by the suppression controller 120 on the basis of such analysis results becomes a suitable value for suppressing the reverb component included an input signal x(t), regardless of the magnitude of background noise.
Consequently, by executing the processing in the above steps S301 to S304 on individual frame input signals x(n, t), it is possible to obtain an output signal y(t) in which just the reverb component has been accurately suppressed, regardless of the magnitude of background noise. Since the components expressing sound included in the input signal x(t) are faithfully reproduced in an output signal y(t) obtained in this way, reproduction of the original sound with low distortion is possible on the basis of the output signal y(t).
Next, the analyzer 110 illustrated by example in
The change calculator 111 calculates a change D(n) on the basis of the difference between the input power spectrum S(n, f) of the nth frame and the input power spectrum S(n−1, f) of the (n−1)th frame received from the transform unit 102.
The change calculator 111 may also calculate the change D(n) as a sum of differences between the input power spectrum S(n, f) of the nth frame and the input power spectrum S(n−1, f) of the (n−1)th frame for respective frequency numbers, as in Eq. 2, for example.
In the exemplary input signal x(t) illustrated in
Consequently, the change D(j) (where j=n−2 to n+1) computed using the above Eq. 2 for each frame included in this segment become values that reflect the attenuation rate of the input signal x(t) over time. In other words, the change calculator 111 is able to compute values for the change D(j) (where j=n−2 to n+1) that reflect the slope of a line L approximating the change in the input signal x(t) in the segment from the (n−2)th to the (n+1)th frames illustrated in
Furthermore, the change calculator 111 may also apply weights so as to suppress the effects of the background noise component included in the input signal x(t) when computing a change D(n). By suppressing such a background noise component, the change calculator 111 is able to compute a change D(n) that more faithfully reflects the slope of the change in the input signal x(t) over time in the nth frame.
The changes D(n) computed in this way are passed to the averaging unit 114 via the selector 113 illustrated by example in
Herein, a reverb segment is a segment in which the input signal x(t) attenuates in response to the end of an utterance produced indoors. Consequently, among the changes D(n) obtained by the change calculator 111, changes D(n) with negative values reflect the attenuation rate of the input signal x(t) in the reverb segment.
In other words, by having the selector 113 selectively pass the changes D(n) with negative values to the averaging unit 114, it is possible to make the averaging unit 114 compute an average change Dav(n) that indicates the decrease per unit time of the input signal x(t) in the reverb segment.
The selector 113 may, for example, selectively pass to the averaging unit 114 changes D(n) included in a range expressed by given constants d1 and d2, both of which are negative values. Also, the averaging unit 114 may compute an average change Dav for the nth frame by performing a weighted sum of the change D(n) for the nth frame and the average change Dav(n−1) for previous frames up to the (n−1)th frame, with the applied weights being expressed using a given coefficient α. Such an average change Dav(n) computed by the averaging unit 114 may be expressed as in Eq. 3.
Herein, the value of the constant d2 may be determined on the basis of the attenuation rate of an input signal x(t) in an environment where the reverb component is anticipated to be most resistant to attenuation, for example. Also, by using the constant d1 to restrict the minimum value of the change D(n) to be used for computing an average change Dav(n), it is possible to mitigate the effects of sudden noise, for example. Furthermore, the value of the coefficient α may be set such that the value of the change D(n) and the average change Dav(n−1) for previous frames up to the (n−1)th frame are reflected in the value of the average change Dav(n) in respectively suitable ratios.
The average change Dav(n) computed in this way reflects the attenuation rate of the reverb component in the environment where the input signal x(t) was obtained. Consequently, it is possible to use the average change Dav(n) as a basis for determining the desirability of applying a reverberation suppression process to an input signal x(t) in the environment where the microphone 101 is placed.
Comparing the input signal x1(t) and the input signal x2(t) illustrated in
If a first threshold Th1 indicating such a threshold value is determined in advance, the first threshold Th1 may be used in the process of controlling suppression gain conducted by the suppression controller 120 illustrated by example in
The above first threshold Th1 may also be determined on the basis of the decrease per unit time in the reverb segment of an input signal x(t) such that the reverberation suppression process is not applied to signals such as the input signal x2(t) illustrated by example in
Next, the suppression controller 120 illustrated by example in
The threshold value storage 125 illustrated by example in
Consequently, an input signal spectrum X(f) corresponding to an input signal x(t) observed by the microphone 101 in response to sound produced by a sound source may be expressed as the sum of a direct sound component spectrum Xd(f) and a reverb component spectrum Xr(f), as in Eq. 4.
X(f)=Xd(f)+Xr(f) (4)
The direct sound component spectrum Xd(f) may be expressed using a sound spectrum φ(f) that corresponds to sound produced by a sound source So, and the transfer characteristics Hd(f) of the path Pd that reaches the microphone 101 directly from the sound source So, as in Eq. 5. Similarly, the reverb component spectrum Xr(f) may be expressed using the sound spectrum φ(f) and the transfer characteristics Hr(f) of paths that reach the microphone 101 via reflection off the walls and ceiling of the room C, as in Eq. 6.
Xd(f)=Hd(f)·φ(f) (5)
Xr(f)=Hr(f)·Φ(f) (6)
Eqs. 4 to 6 may be transformed to obtain Eq. 7, which expresses the relationship between the reverb component spectrum Xr(f) and the input signal spectrum X(f).
In other words, the reverb characteristics γ(f) may be obtained as the ratio of the transfer characteristics Hr(f) regarding the transfer of reverb versus the overall transfer characteristics H(f) regarding the transfer of all paths reaching the microphone 101 from the sound source So. Reverb characteristics γ(f) thus obtained may then be stored in the reverb characteristics storage 121. Note that the transfer characteristics H(f) and the transfer characteristics Hr(f) may be computed with established techniques, such as by measuring impulse response in a given indoor area where the application of reverberation suppression is desirable, such as a bathroom, for example. For a specific technique of computing reverb characteristics γ(f), see “Reverberation suppression device, reverberation suppression method, and reverberation suppression program”, Japanese Patent Application No. 2011-165274, previously submitted by the Inventors.
The estimator 122 uses reverb characteristics γ(f) stored in the reverb characteristics storage 121 to estimate a reverb power spectrum R(n, f) expressing the reverb component included in the input signal spectrum X(n, f) of the nth (i.e., current) frame.
The estimator 122 may also compute a reverb power spectrum R(n, f) as the convolution of the reverb characteristics γ(f) and the input power spectra S(n−d, f) (where d=1 to M) of the last M frames preceding the current frame, as illustrated in Eq. 8, for example.
On the basis of a reverb power spectrum R(n, f) obtained by the estimator 122, the gain calculator 123 illustrated by example in
The gain calculator 123 may use a function like that illustrated by the bold line in
The gain corrector 124 computes a suppression gain G(n, f) by applying a correction based on analysis results obtained by the analyzer 110 discussed earlier to a standard suppression gain Gs(n, f) computed by the gain calculator 123 as above.
The gain corrector 124 may also use Eq. 9 to compute a suppression gain G(n, f) on the basis of an average change Dav(n) obtained as an index indicating the decrease per unit time in a reverb segment of an input signal x(t) according to analysis by the analyzer 110, for example. According to Eq. 9, the gain corrector 124 takes the suppression gain G(n, f) to be the standard suppression gain Gs(n, f) in the case where the value of the average change Dav(n) is greater than the first threshold Th1 discussed earlier. In contrast, the gain corrector 124 takes the suppression gain G(n, f) to be a given value of 0 dB in the case where the value of the average change Dav(n) is not greater than the first threshold Th1 discussed earlier.
Herein, a value of the average change Dav(n) that is greater than the first threshold Th1 discussed earlier indicates that the attenuation rate of the input signal x(t) in the reverb segment is less than the rate corresponding to the first threshold Th1, similarly to the input signal x1(t) illustrated by example in
In other words, on the basis of a comparison between the value of the average change Dav(n) and the first threshold Th1 discussed earlier, the gain corrector 124 is able to determine whether or not the reverb component readily attenuates in the environment where the input signal x(t) was acquired, or in other words, whether or not reverberation suppression is desirable.
As a result of the gain corrector 124 applying such gain correction, the suppression gain G(n, f) may be set to a given value of 0 dB in the case where the input signal x(t) attenuates sharply in the reverb segment, regardless of the value of the standard suppression gain Gs(n, f). In other words, in the case where the input signal x(t) attenuates at a rate approximately equal to that of an environment where the reverb component attenuates readily, the gain corrector 124 sets the suppression gain G(n, f) to a given value of 0 dB, and is thereby able to stop reverberation suppression of the input signal x(t). In contrast, in the case where reverberation suppression is determined to be desirable on the basis of a comparison between the value of the average change Dav(n) and the first threshold Th1 discussed earlier, the suppression gain G(n, f) corrected by the gain corrector 124 becomes a standard suppression gain Gs(n, f) computed on the basis of the reverb characteristics γ(f). However, the gain corrector 124 may also compute the suppression gain G(n, f) by subtracting a correction value depending on the value of the average change Dav(n) from the standard suppression gain Gs(n, f) in the case where the value of the average change Dav(n) is greater than the first threshold Th1 discussed earlier. For example, the gain corrector 124 may determine the above correction value such that the correction value decreases as the value of the average change Dav(n) approaches the decrease per unit time exhibited by the input signal x(t) in the reverb segment in an environment imparting reverb characteristics γ(f).
In this way, by causing the gain corrector 124 to compute a suppression gain G(n, f) according to analysis results from the analyzer 110, it is possible to realize control of the suppression gain G(n, f) according to the environment in which the microphone 101 illustrated in
The suppression applier 103 uses a suppression gain G(n, f) computed in this way to execute a process that computes an output signal spectrum Y(n, f) in which the reverb component has been suppressed.
The suppression applier 103 may also, for example, compute a corrected power spectrum S′(n, f) corresponding to the output signal spectrum Y(n, f) by applying the suppression gain G(n, f) to the input power spectrum S(n, f) of the nth frame, as expressed in Eq. 10. Furthermore, the output signal spectrum Y(n, f) may also be computed by utilizing the corrected power spectrum S′(n, f) expressed in terms of the output signal spectrum Y(n, f) as in Eq. 11.
S′(n,f)=S(n,f)−G(n,f) (10)
S′(n,f)=10 log10|Y(n,f)|2 (11)
An output signal y(t) may be generated by having the inverse transform unit 104 apply an inverse fast Fourier transform to the output signal spectra Y(n, f) computed for respective frames in this way.
As discussed above, according to the reverberation suppression device 100 illustrated by example in
In addition, the suppression controller 120 illustrated by example in
A reverberation suppression device 100 of the present disclosure may be realized using mobile device hardware, for example.
The mobile device 10 includes a processor 21, memory 22, a microphone 101, a communication processor 105, and a speaker 106. The mobile device 10 additionally includes a recording processor 24, a removable memory card 25, a display controller 26, a liquid crystal display (LCD) 27, an input interface (I/F) 28, and an operable panel 29. In the mobile device 10 illustrated in
The processor 21, memory 22, communication processor 105, microphone 101, speaker 106, recording processor 24, display controller 26, and input I/F 28 are connected to each other via a bus. The recording processor 24 reads data from and writes data to the memory card 25. The display controller 26 controls display processing by the LCD 27. The input I/F 28 relays information representing operations made on the operable panel 29 to the processor 21.
The memory 22 stores the operating system of the mobile device 10, as well as an application program by which the processor 21 executes the reverberation suppression process discussed earlier. The application program includes programs for executing the processing that analyzes change in an input signal over time and the processing that corrects an input signal, which are included in a reverberation suppression method of the present disclosure. The application program for executing the above reverberation suppression process may be distributed by being recorded on the memory card 25, for example. By loading such a memory card into the recording processor 24 and reading out data therefrom, the application program for executing the reverberation suppression process is stored in the memory 22. Additionally, it is also possible to load an application program for executing the reverberation suppression process into the memory 22 via the communication processor 105 and a network such as the Internet.
Also, the reverb characteristics storage 121 illustrated by example in
Also, the processor 21 may fulfill the function of the analyzer 110 illustrated in
First, in step S311 the processor 21 receives an input signal spectrum X(n, f) obtained by applying a fast Fourier transform to the input signal x(n, t) of the nth frame. Subsequently, the processor 21 uses the above Eq. 1 to compute the input power spectrum S(n, f) of the input signal spectrum X(n, f) (step S312).
Next, the processor 21 uses the input power spectra S(n, f) and S(n−1, f) of the nth and the (n−1)th frames as well as Eq. 2 to compute the change D(n) in the input power spectrum S(n, f) for the nth frame (step S313). In this way, the processor 21 is able to fulfill the function of the change calculator 111 illustrated by example in
Next, by conducting the processing in steps S314 to S316, the processor 21 uses the change D(n) computed in step S313 and Eq. 3 to compute an average change Dav(n) that acts as an index indicating the decrease per unit time in the reverb segment of the input signal x(t). First, the processor 21 determines whether or not the change D(n) in the input power spectrum S(n, f) for the nth frame is included in a range expressed by the values d1 and d2 (step S314). In the case of a positive determination in step S314, the processor 21 computes the average change Dav(n) up to the nth frame by multiplying the average change Dav(n−1) up to the (n−1)th frame and the change D(n) by the weights α and (1−α), respectively, and adding the results together (step S315). Meanwhile, in the case of a negative determination in step S314, the processor 21 inherits the value of the average change Dav(n−1) up to the (n−1)th frame without change as the average change Dav(n) up to the nth frame (step S316). In this way, the processor 21 is able to fulfill the function of the index calculator 112 illustrated by example in
First, the processor 21 estimates the reverb power spectrum R(n, f) included in the input power spectrum S(n, f) of the current frame from the input power spectra S(n−d, f) (where d=1 to M) of past frames and the reverb characteristics γ(f) (step S321). The processor 21 may also use the above Eq. 8 and reverb characteristics γ(f) stored in the memory 22 for estimating the reverb power spectrum R(n, f), for example. In this way, the processor 21 is able to fulfill the functions of the reverb characteristics storage 121 and the estimator 122 illustrated by example in
Next, the processor 21 computes the signal-to-reverb ratio SRR(n, f) by subtracting the reverb power spectrum R(n, f) computed in step S321 from the input power spectrum S(n, f) of the current frame (step S322). Subsequently, the processor 21 computes a standard suppression gain Gs(n, f) on the basis of the signal-to-reverb ratio SRR(n, f) computed in step S322 (step S323). The processor 21 may also use a function like that illustrated in
After that, the processor 21 determines the desirability of applying a reverberation suppression process to the input signal x(t), on the basis of a comparison between the average change Dav(n) obtained by the processing in the above step S302 and the first threshold Th1 (step S324). In the case where the average change Dav(n) is less than or equal to the first threshold Th1 (step S324, Yes), the processor 21 determines that there is low desirability to suppress reverb in the environment where the microphone 101 is placed. In this case, the processor 21 computes a suppression gain G(n, f) such that the attenuation rate is lower than the case of applying the standard suppression gain Gs(n, f) (step S325). In step S325, the processor 21 may, for example, uniformly set the suppression gain G(n, f) to a lower-limit value of 0 dB, regardless of the value of the standard suppression gain Gs(n, f) obtained in step S323.
In contrast, in the case where the average change Dav(n) is greater than the first threshold Th1 (step S324, No), the processor 21 determines that there is comparatively high reverb in the environment where the microphone 101 is placed. In this case, the processor 21 may simply take the standard suppression gain Gs(n, f) directly as the suppression gain G(n, f) (step S326).
In this way, the processor 21 is able to fulfill the function of the gain corrector 124 illustrated by example in
Additionally, on the basis of the suppression gain G(n, f) and the input power spectrum S(n, f) computed as above, the processor 21 computes a corrected power spectrum S′(n, f) in which the reverb component has been suppressed. The processor 21 may also, for example, compute a corrected power spectrum S′(n, f) corresponding to the output signal spectrum Y(n, f) by subtracting the suppression gain G(n, f) from the input power spectrum S(n, f) of the nth frame, as expressed in the above Eq. 10. Then, on the basis of the corrected power spectrum S′(n, f) obtained in this way, the processor 21 computes an output signal spectrum Y(n, f) according to the above Eq. 11. By executing such processes, the processor 21 is able to realize the function of the suppression applier 103 illustrated by example in
An output signal y(t) may be generated by having the processor 21 apply an inverse fast Fourier transform to the output signal spectra Y(n, f) computed for respective frames in this way.
Thus, as a result of the processor 21 executing processing that determines a suppression gain G(n, f) on the basis of the slope of the change over time in an input signal x(t) in a reverb segment, it is possible to obtain an output signal y(t) in which suitable reverberation suppression has been applied, regardless of the magnitude of background noise. The processor 21 is then able to supply the output signal y(t) obtained in this way to the communication processor 105 for signal processing.
Thus, according to a mobile device 10 that includes the reverberation suppression device 100 illustrated by example in
In other words, according to a mobile device 10 that includes a reverberation suppression device 100, it is possible to transmit signals expressing clear sound via the communication processor 105 and a network to a mobile device or other device being used by the person with whom the user is communicating, regardless of the environment where the user is using the mobile device 10. Consequently, if the user of a mobile device 10 equipped with a reverberation suppression device 100 of the present disclosure has moved to or is currently in a bathroom, for example, it is possible for the user to conceal that fact from the person with whom he or she is communicating.
The analyzer 110 illustrated by example in
The noise estimator 115 estimates the signal-to-noise ratio (SNR) θ(n, f) of the input signal x(t) for the nth frame, on the basis of an input signal spectrum X(n, f) obtained by the transform unit 102. The noise estimator 115 may also, for example, use established technology to compute a noise power spectrum N(n, f) expressing the noise component on the basis of the input signal spectrum X(n, f) or the input power spectrum S(n, f). The noise estimator 115 may then compute the SNR θ(n, f) by subtracting the noise power spectrum N(n, f) from the input power spectrum S(n, f), as expressed in Eq. 12.
θ(n,f)=S(n,f)−N(n,f) (12)
The noise estimator 115 inputs SNRs θ(n, f) computed for respective frames in this way into the counter 116 included in the index calculator 112 illustrated by example in
Herein, the above constant θ1 may be determined on the basis of the results of actual tests computing the SNR θ(n, f) for plural frames included in a reverb segment, for example. The input signal spectra X(n, f) of frames with an SNR θ(n, f) that is larger than such a constant θ1 faithfully reflect reverb-containing sound input into the microphone 101.
Consequently, on the basis of a comparison between the SNR θ(n, f) obtained by the noise estimator 115 and the above constant θ1, the counter 116 is able to count reliable changes D(n) obtained from frames that are weakly affected by the noise component.
The counter 116 counts the number of changes D(n) respectively occurring in N classes K1 to KN, which correspond to respective ranges obtained by splitting a range from Dmin to Dmax into N parts. Herein, Dmin and Dmax represent values considered to be the minimum and maximum values for the change D(n).
For example, in the case where the value of a change D(n) to be counted is less than the upper limit Kmaxp and equal to or greater than the lower limit Kminp of a range corresponding to the pth class Kp, the counter 116 may count the frequency of occurrence by updating the count for that class Kp.
The above processing by the counter 116 may also be expressed as in Eq. 13, as processing that updates a histogram Hist(n−1, j) (where j=1 to N) according to the comparison results between the SNR θ(n, f) and the constant θ1, with the histogram Hist(n−1, j) including counts for respective classes Kj (where j=1 to N) up to the (n−1)th frame. In this way, a histogram Hist(n, j) (where j=1 to N) may be obtained by adding the value 1 to Hist(n−1, p), which expresses a count of the number of times a class Kp includes a change D(n), but limited to the case where the SNR θ(n, f) of the current frame is greater than a given constant θ1.
By conducting such a counting process, the counter 116 is able to compute a histogram Hist(n, j) (where j=1 to N) for reliable changes D(n) occurring up to the nth frame. On the basis of a histogram Hist(n, j) (where j=1 to N) obtained in this way, the frequency calculator 117 calculates an index expressing the decrease per unit time in the reverb segment of an input signal x(t), as discussed later.
In
In
The input signal x1(t) illustrated in
In the histogram H1 illustrated in
If change D(n) histograms are collected for a sufficient number of frames, a peak corresponding to the decrease per unit time in the reverb segment will appear in the histogram, as illustrated in
Such differences are also reflected as differences between frequencies δ1 and δ2, which express the ratios of total counts Sh1 and Sh2 distributed over the range to the left of the first threshold Th1 versus the overall total for the histograms H1 and H2 illustrated in
The above differences also appear in a histogram Hist(n, j) (where j=1 to N) obtained by the counter 116 counting changes D(n) for a number of frames that is less than the number of frames sufficient to obtain a histogram having a clear peak as illustrated in
In other words, as the decrease per unit time of an input signal x(t) in a reverb segment becomes larger, so too does a frequency δ(n) of changes D(n) which indicates that the decrease per unit time is equal to or greater than a given value in the histogram Hist(n, j) (where j=1 to N). Consequently, the frequency δ(n) of changes D(n) which indicates that the decrease per unit time is equal to or greater than a given value may be used as an index expressing the decrease per unit time of an input signal x(t) in a reverb segment.
The frequency calculator 117 illustrated by example in
The index calculator 112 illustrated by example in
A frequency δ(n) obtained in this way indicates the probability that the decrease per unit time in the reverb segment of an input signal x(t) is equal to or greater than a decrease corresponding to the slope indicated by the first threshold Th1. In the case where it is highly probable that the decrease per unit time in the reverb segment of an input signal x(t) is equal to or greater than a decrease corresponding to the slope indicated by the first threshold Th1, there is low desirability to apply a reverberation suppression process to the input signal x(t). Conversely, in the case where it is lowly probable that the decrease per unit time in the reverb segment of an input signal x(t) is equal to or greater than a decrease corresponding to the slope indicated by the first threshold Th1, it may be determined applying a reverberation suppression process to the input signal x(t) is highly desirable. Consequently, a second threshold Th2 for determining whether or not to apply a reverberation suppression process to an input signal x(t) may be set on the basis of the frequency δ(n), similarly to the average change Dav(n) discussed earlier. By storing the second threshold Th2 in the threshold value storage 125 illustrated by example in
The value of the second threshold Th2 may also be determined on the basis of a frequency obtained using the above Eq. 14 for a histogram whose peak corresponding to changes obtained for respective frames included in a reverb segment is within a range corresponding to the class Kk that contains the first threshold Th1, for example.
The analyzer 110 that includes the noise estimator 115, counter 116, and frequency calculator 117 discussed above may be realized by the cooperative action of the processor 21 and the memory 22 illustrated in
Herein, like reference signs are given to steps illustrated in
Following the processing in step S313, the processor 21 computes a noise power spectrum N(n, f) on the basis of the input power spectrum S(n, f) obtained in step S312 (step S331). Subsequently, the processor 21 computes an SNR θ(n) according to the above Eq. 12 using the noise power spectrum N(n, f) obtained in step S331 and the input power spectrum S(n, f) (step S332). In this way, the processor 21 is able to fulfill the function of the noise estimator 115 illustrated by example in
Next, the processor 21 determines whether or not the SNR θ(n) computed in step S332 is greater than a given value θ1 (step S333). By executing the processing in steps S334 to S336 according to the determination result in step S333, the processor 21 counts a histogram Hist(n, j) (where j=1 to N) for changes D(n) up to the nth frame.
For example, in the case of a positive determination in step S333, the processor 21 first identifies the class Kp containing a change D(n) (step S334). Then, the processor 21 updates the histogram Hist(n, j) (where j=1 to N) in accordance with the occurrence of the change D(n) contained in the class Kp identified in step S334 (step S335). At this point, the processor 21 may add the value 1 to the count for the class Kp expressed by the histogram Hist(n−1, j) (where j=1 to N) up to the (n−1)th frame, while also inheriting the counts for other classes Kj (where j≠p) without change as the histogram Hist(n, j) (where j≠p). In contrast, in the case of a negative determination in step S333, the processor 21 may inherit the counts for each class Kj (where j=1 to N) expressed by the histogram Hist(n−1, j) (where j=1 to N) without change as the histogram Hist(n, j) (where j=1 to N) (step S336). In this way, the processor 21 is able to fulfill the function of the counter 116 illustrated by example in
Subsequently, the processor 21 uses the above Eq. 14 to compute the frequency δ(n) of changes D(n) with values smaller than the first threshold Th1 in the histogram Hist(n, j) (where j=1 to N) up to the nth frame (step S337). In this way, the processor 21 is able to fulfill the function of the frequency calculator 117 illustrated by example in
In addition, the processor 21 is able to fulfill the function of the index calculator 112 illustrated by example in
In the reverberation suppression device 100 illustrated by example in
The threshold value storage 125 included in the suppression controller 120 illustrated by example in
First, on the basis of a frequency δ(n) obtained by the analyzer 110, the gain corrector 124 illustrated by example in
In this way, the correction controller 126 controls computation of a suppression gain G(n, f) as follows, on the basis of the corrected gain G′(n, f) for the nth frame obtained by the gain corrector 124 and the suppression gain G(n−j, f) (where j=1 to m) of the last m frames.
First, on the basis of the suppression gain G(n−j, f) (where j=1 to m) of the last m frames and the corrected gain G′(n, f) for the nth frame, the correction controller 126 computes an index indicating the slope of the magnitude of the suppression gain G(n, f) in a period up to the nth frame. The correction controller 126 may compute an average gain Gav(n, f) as expressed in Eq. 16 as the index indicating the slope of the magnitude of the suppression gain G(n, f) up to the nth frame, for example.
Gav(n,f)=βGav(n−1,f)+(1−β)G′(n,f) (16)
According to Eq. 16, the average gain Gav(n, f) up to the nth frame is the result of weighted addition of the average gain Gav(n−1, f) up to the (n−1)th frame and the corrected gain G′(n, f) of the nth frame, with the weights expressed by a given weighting coefficient β. By suitably adjusting the value of this weighting coefficient β, from Eq. 16 it is possible to compute an average gain Gav(n, f) that reflects the magnitude of the suppression gain G(n−j, f) (where j=1 to m) applied to the last m frames preceding the current frame.
The correction controller 126 may then determine the desirability of applying reverberation suppression to the input signal x(n, t) of the nth frame on the basis of a comparison between the average gain Gav(n, f) computed in this way and a given third threshold Th3. The value of the third threshold Th3 may, for example, be determined on the basis of a minimum suppression gain at which human hearing may perceive differences between sound played back from an output signal y(t) with suppression gain applied by the suppression applier 103, and sound played back from an output signal y(t) without suppression gain applied.
For example, the correction controller 126 may determine that there is low desirability to apply reverberation suppression in the case where the average gain Gav(n, f) is less than or equal to the third threshold Th3, or in other words, in the case where the suppression effect over the past several frames is miniscule to a degree that might not be humanly perceivable. In this case, the correction controller 126 causes the gain corrector 124 to compute a suppression gain G(n, f) with a value smaller than the corrected gain G′(n, f). In contrast, the correction controller 126 may determine that there is high desirability to apply reverberation suppression in the case where the average gain Gav(n, f) is greater than the third threshold Th3, or in other words, in the case where the suppression effect over the past several frames is large to a degree that may be humanly perceivable. In this case, the correction controller 126 causes the gain corrector 124 to output a corrected gain G′(n, f) computed using Eq. 15, for example, directly as the suppression gain G(n, f).
Consequently, the suppression gain G(n, f) computed by the gain corrector 124 illustrated by example in
By applying such control, the correction controller 126 is able to stop reverberation suppression exercised on the input signal x(n, t) of a frame where the efficacy of reverberation suppression is anticipated to be slight, and reduce distortion in sound played back from the output signal y(n, t).
The suppression controller 120 that includes the gain corrector 124 and the correction controller 126 illustrated by example in
Following the processing in step S323, the processor 21 determines the desirability of applying the reverberation suppression process to the input signal x(t), on the basis of a comparison between the frequency δ(n) obtained by the processing in the above step S337 and the second threshold Th2 (step S341). In the case where the frequency δ(n) is greater than the second threshold Th2 (step S341, Yes), the processor 21 determines that there is low desirability to suppress reverb in the environment where the microphone 101 is placed. In this case, the processor 21 computes a corrected gain G′(n, f) with a value that is smaller than the standard suppression gain Gs(n, f) (such as a value of 0 dB, for example), similarly to step S325 illustrated in
In this way, by executing the processing in steps S341 to S343 the processor 21 is able to fulfill the function of the gain corrector 124 which computes a corrected gain G′(n, f) on the basis of comparison results between the above frequency δ(n) and the second threshold Th2.
Next, the processor 21 uses the above Eq. 16 to compute an average gain Gav(n, f) as an index indicating the slope of magnitude of the suppression gain G(n, f) up to the nth frame (step S344). Subsequently, the processor 21 determines whether or not the average gain Gav(n, f) obtained by the processing in step S344 is less than or equal to the third threshold Th3 (step S345). In the case of a positive determination in step S345, the processor 21 determines that there is low desirability to apply reverberation suppression. In this case, the processor 21 computes a suppression gain G(n, f) with a value that is smaller than the above corrected gain G′(n, f) (such as a value of 0 dB, for example) (step S346). In contrast, in the case of a negative determination in step S345, the processor 21 determines that there is high desirability to apply reverberation suppression. In this case, the processor 21 takes the above corrected gain G′(n, f) directly as the suppression gain G(n, f) (step S347).
In this way, by executing the processing in the steps enclosed by the box labeled S348 in
However, the respective units included in the analyzer 110 and the suppression controller 120 illustrated in
For example, the correction controller 126 illustrated by example in
All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the invention and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2011-207508 | Sep 2011 | JP | national |
Number | Name | Date | Kind |
---|---|---|---|
20030026437 | Janse et al. | Feb 2003 | A1 |
20060115095 | Giesbrecht et al. | Jun 2006 | A1 |
20080059157 | Fukuda et al. | Mar 2008 | A1 |
Number | Date | Country |
---|---|---|
1 469 703 | Oct 2004 | EP |
1 667 416 | Jun 2006 | EP |
2006-129434 | May 2006 | JP |
2006-157920 | Jun 2006 | JP |
2008-58900 | Mar 2008 | JP |
2008-288718 | Nov 2008 | JP |
2011-065128 | Mar 2011 | JP |
WO 2006011104 | Feb 2006 | WO |
Entry |
---|
James Eaton Et al, “Noise-Robust Reverberation Time Estimation using Spectral Decay Distributions with Reduced Computational Cost”, Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Apr. 11, 2013, XP055072650, Vancouver Canada, 5 pages. |
Extended Search Report dated Aug. 2, 2013 in European Patent Application No. 12173939.5-1901/2573768. |
Japanese Office Action dated Jan. 27, 2015 in corresponding Japanese Patent Application No. 2011-207508 (3 pages) (2 pages English Translation). |
Number | Date | Country | |
---|---|---|---|
20130077798 A1 | Mar 2013 | US |