The present invention relates to a direct sound extraction device and a reverberant sound extraction device, and more particularly to a direct sound extraction device that can extract a direct sound from an input signal containing a reverberant sound, and a reverberant sound extraction device that can extract a reverberant sound from the input signal.
If music, speeches, and the like are played in an environment where reverberation can easily occur, such as halls, and are recorded, the recorded acoustic signals often contain not only a direct sound but also a reverberant sound, which is convoluted in during the recording. Therefore, if the acoustic signals into which the reverberant sound has been convoluted are played in another acoustic environment, there is a reduction in the clarity of the direct sound, possibly making it very difficult to listen when the acoustic signals are played.
If a speech sound into which a reverberant sound has been convoluted is used for voice recognition or the like, the problem is that the recognition rate of the speech sound (content) would decrease due to a reduction in the clarity caused by the reverberant sound.
As for the acoustic signals into which the reverberant sound has been convoluted as described above, a conventional technique has been known to reduce the reverberant sound (See Patent Literature 1, for example). The use of the technique makes it possible to clarify the direct sound by reducing the reverberant sound.
Patent Literature 1: JP-A-2010-74531
However, according to the method described in Patent Literature 1, in order to reduce the reverberant sound contained in an input signal, various types of signal processing need to be carried out, such as a pseudo-whitening process, a multi-step linear prediction process, and a rear reverberation prediction process and the like. Therefore, a lot of processing load is required. Accordingly, to actually reduce the reverberant sound, high-powered devices, such as microprocessors or digital signal processors, are required. The problem is that, in terms of cost and other factors, the method of Patent Literature 1 easily cannot be used without being changed.
The present invention has been made in view of the above problems. The object of the present invention is to provide a direct sound extraction device and reverberant sound extraction device that can easily extract a direct sound or a reverberant sound from an acoustic signal containing the reverberant sound.
According to the present invention, a direct sound extraction device includes: a Fourier transform unit which performs a Fourier transform process on an input signal that includes a reverberant sound in a direct sound; a spectrum transform unit which transforms, on the basis of frequency spectra of real and imaginary numbers of the input signal on which a Fourier transform process has been performed by the Fourier transform unit, the input signal to a first amplitude spectrum signal and a phase spectrum signal; a low-pass filter unit which carries out a low-pass filtering process on the first amplitude spectrum signal by using a preset normalized cutoff frequency for each frequency; a first limiter unit which limits a negative side of an amplitude of a second amplitude spectrum signal on which a low-pass filtering process has been performed by the low-pass filter unit, so as to bring the amplitude to zero; a first subtraction unit which calculates a third amplitude spectrum signal by subtracting the second amplitude spectrum signal whose negative-side amplitude has been limited by the first limiter unit from the first amplitude spectrum signal; a second limiter unit which limits a negative side of an amplitude of the third amplitude spectrum signal calculated by the first subtraction unit, so as to bring the amplitude to zero; an inverse spectrum transform unit which calculates, on the basis of the phase spectrum signal and the third amplitude spectrum signal whose negative-side amplitude has been limited by the second limiter unit, a signal that is made from frequency spectra of real and imaginary numbers; and an inverse Fourier transform unit which performs an inverse Fourier transform process on the signal calculated by the inverse spectrum transform unit to generate a direct sound signal that is obtained by extracting the direct sound from the input signal.
The direct sound extraction device of the present invention performs Fourier transform of an input signal that includes a reverberant sound in a direct sound, and uses a preset normalized cutoff frequency to carry out a low-pass filtering process on a first amplitude spectrum signal calculated by the spectrum transform unit. In this manner, the direct sound extraction device calculates a signal that is integrated for each spectrum (Integral signal: second amplitude spectrum signal). The signal thus integrated is the equivalent of a spectrum signal that constitutes a stationary component in the time change of the input signal, i.e. a reverberant sound signal.
Accordingly, a third amplitude spectrum signal that the first subtraction unit calculates by subtracting the second amplitude spectrum signal from the first amplitude spectrum signal is a signal that is obtained by subtracting a reverberant sound from an input signal. The process makes it possible to calculate a signal that is the equivalent of a direct sound signal.
Therefore, a signal that is generated by the inverse spectrum transform unit and the inverse Fourier transform unit is a signal that is obtained by extracting a direct sound from the input signal. As a result, from the input signal that includes a reverberant sound in a direct sound, the direct sound can be easily extracted.
Furthermore, by adjusting the normalized cutoff frequency, it is possible to adjust an extraction time of the direct sound contained in the input signal. As the value of the normalized cutoff frequency becomes smaller, the extraction time of the direct sound contained in the input signal becomes longer, enabling extraction of the direct sound in such a way as to contain not only a non-stationary sound but also a stationary sound. Since the direct sound is extracted in such a way as to contain a stationary sound, it is possible to add such properties as tone colors and ease of listening to the direct sound, compared with a direct sound not containing a stationary sound at all. When a listener listens to the direct sound, the listener can recognize the direct sound as a sound without a feeling of strangeness.
The direct sound extraction device of the present invention can easily extract the direct sound from the input signal that includes the reverberant sound in the direct sound. The reverberant sound extraction device of the present invention can easily extract the reverberant sound from the input signal that includes the reverberant sound in the direct sound.
a) shows one example of filter coefficients for each amplitude spectrum in a LPF unit according to an embodiment of the present invention; and
a) is a diagram showing one example of frequency changes of an amount of weighting of amplification and attenuation of a first gain unit according to an embodiment of the present invention; and
a) is a diagram schematically showing a state in which the waveform of a direct sound signal shown in
The following shows an acoustic processing device, which is an example of a direct sound extraction device and reverberant sound extraction device of the present invention. The acoustic processing device will be described in detail with reference to the accompanying drawings.
Incidentally, when a reverberant sound is convoluted into a direct sound such as voice or instrumental sound, a stationary signal corresponding to a reverberation time is added to the non-stationary signal such as voice and instrumental sound in a frequency spectrum. The acoustic processing device of the present embodiment extracts or separates a non-stationary signal from an input signal to extract a direct sound; and extracts or separates a stationary signal from an input signal to extract a reverberant sound.
Into the FFT unit 3, two-channel input signals L and R (L-channel and R-channel) are input from a sound source unit not shown in the diagram: In the two-channel input signals L and R, a reverberant sound (e.g. a reflected sound in a speech) is convoluted into (or contained in) a direct sound (e.g. voice such as speech). The FFT unit 3 is designed to use a window function to weight each of the two-channel input signals L and R into which the reverberant sound has been convoluted.
After having used the window function to weight, the FFT unit 3 performs a short-time Fourier transform process on each of the input signals L and R, thereby transforming the input signals L and R from a time domain to a frequency domain and calculating frequency spectra of real and imaginary numbers.
Furthermore, the FFT unit 3 transforms two-channel frequency spectra, which are calculated by frequency-region conversion, to amplitude spectrum signals Lfa and Rfa (first amplitude spectrum signals) and phase spectrum signals Lfp and Rfp. Then, the FFT unit 3 outputs the transformed two-channel amplitude spectrum signals Lfa and Rfa to the frequency spectrum region filtering unit 4. Moreover, the FFT unit 3 outputs the two-channel phase spectrum signals Lfp and Rfp to the IFFT unit 5a and the IFFT unit 5b. In this case, the FFT unit 3 transforms the input signals to the amplitude spectrum signals Lfa and Rfa and the phase spectrum signals Lfp and Rfp. Therefore, the FFT unit 3 works as a spectrum transform unit of the present invention.
As shown in
The LPF unit 10 is designed to perform, on the basis of a predetermined normalized cutoff frequency, a low-pass filtering process for each spectrum (each frequency) on the amplitude spectrum signal Lfa that is input from the FFT unit 3. The first limiter unit 12 is designed to limit the negative-side amplitude of the amplitude spectrum signal (second amplitude spectrum signal) on which the low-pass filtering process has been performed by the LPF unit 10, thereby bringing the amplitude to zero. The first gain unit 16 is designed to amplify or attenuate the amplitude of the amplitude spectrum signal whose negative-side amplitude has been limited. In this manner, in the LPF unit 10, the low-pass filtering process is carried out on the amplitude spectrum signal Lfa. As a result, a signal (integral signal: second amplitude spectrum signal) Lfa1 that has been integrated for each spectrum is generated.
The first subtraction unit 18 subtracts, from the amplitude spectrum signal Lfa that is input from the FFT unit 3, the integral signal Lfa1 that is input from the first gain unit 16, thereby calculating a non-stationary spectrum signal (third amplitude spectrum signal) that changes with time. Then, the second limiter unit 13 limits the negative-side amplitude of the spectrum signal (third amplitude spectrum signal) calculated by the first subtraction unit 18, thereby bringing the amplitude to zero. The signal whose amplitude has been limited by the second limiter unit 13 is output as a direct sound signal Lfd to the IFFT unit 5a.
The HPF unit 11 is designed to perform, on the basis of a predetermined normalized cutoff frequency, a high-pass filtering process for each spectrum (each frequency) on the amplitude spectrum signal Lfa that is input from the FFT unit 3. The third limiter unit 14 is designed to limit the negative-side amplitude of the amplitude spectrum signal on which the high-pass filtering process has been performed by the HPF unit 11, thereby bringing the amplitude to zero. The second gain unit 17 is designed to amplify or attenuate the amplitude of the amplitude spectrum signal whose negative-side amplitude has been limited. In this manner, in the HPF unit 11, the high-pass filtering process is carried out on the amplitude spectrum signal Lfa. As a result, a signal (differential signal) Lfa2 that has been differentiated for each spectrum is generated.
The second subtraction unit 19 subtracts, from the amplitude spectrum signal Lfa that is input from the FFT unit 3, the differential signal Lfa2 that is input from the second gain unit 17, thereby calculating a stationary spectrum signal that slightly changes with time. Then, the fourth limiter unit 15 limits the negative-side amplitude of the spectrum signal calculated by the second subtraction unit 19, thereby bringing the amplitude to zero. The signal whose amplitude has been limited by the fourth limiter unit 15 is output as a reverberant sound signal Lfr to the IFFT unit 5b.
Incidentally, the normalized cutoff frequency of a low-pass filter of each amplitude spectrum in the LPF unit 10, and the normalized cutoff frequency of a high-pass filter of each amplitude spectrum in the HPF unit 11 are those used to adjust the division time of the direct sound and reverberant sound (or those used to adjust the extraction time of the direct sound, and to adjust the extraction time of the reverberant sound). Moreover, in the first gain unit 16 and the second gain unit 17, by changing an amount of weighting of amplification and attenuation, it becomes possible to adjust a blend ratio of the direct sound and reverberant sound (or to adjust the percentage of the reverberant sound contained in the direct sound, as well as to adjust the percentage of the direct sound contained in the reverberant sound).
a) shows one example of filter coefficients for each amplitude spectrum in the LPF unit 10 according to the present embodiment.
a) is a diagram showing one example of frequency changes of an amount of weighting of amplification and attenuation of the first gain unit 16 according to the present embodiment.
Incidentally, in the example of operation shown in
What is shown in
First, for a direct sound-side signal shown in
Incidentally, the subtraction process by the first subtraction unit 18 makes the amplitude of the direct sound signal Lfd negative. However, since the amplitude has been limited by the second limiter unit 13 and brought to zero, as shown in
Then, for a reverberant sound-side signal shown in
Incidentally, the subtraction process by the second subtraction unit 19, too, makes the amplitude of the reverberant sound signal Lfr negative. However, since the amplitude has been limited by the fourth limiter unit 15 and brought to zero, as shown in
As shown in
The IFFT unit 5a converts, on the basis of the amplitude spectrum signals (direct sound signals Lfd and Rfd) that are made from the direct sound filtered by the frequency spectrum region filtering unit 4 and the phase spectrum signals Lfp and Rfp acquired from the FFT unit 3, to frequency spectra of real and imaginary numbers; and carries out a process of weighting by using a window function. Then, the IFFT unit 5a performs a short-time inverse Fourier transform process and an overlap addition process on a signal on which the weighting process has been performed, thereby converting the signal from the frequency domain to the time domain and generating direct sound signals Ld and Rd that are made from the direct sound.
Similarly, the IFFT unit 5b converts, on the basis of the amplitude spectrum signals (reverberant sound signals Lfr and Rfr) that are made from the reverberant sound filtered by the frequency spectrum region filtering unit 4 and the phase spectrum signals Lfp and Rfp acquired from the FFT unit 3, to frequency spectra of real and imaginary numbers; and carries out a process of weighting by using a window function. Then, the IFFT unit 5b performs a short-time inverse Fourier transform process and an overlap addition process on a signal on which the weighting process has been performed, thereby converting the signal from the frequency domain to the time domain and generating reverberant sound signals Lr and Rr that are made from the reverberant sound.
Incidentally, the IFFT units 5a and 5b carry out, on the basis of the amplitude spectrum signals and the phase spectrum signals, a process of converting to frequency spectra of real and imaginary numbers. Therefore, the IFFT units 5a and 5b correspond to an inverse spectrum transform unit of the present invention. Furthermore, the IFFT units 5a and 5b carry out a short-time inverse Fourier transform process on a signal on which the weighting process has been performed. Therefore, the IFFT units 5a and 5b correspond to an inverse Fourier transform unit of the present invention.
In the cases of
In the cases of
In
In the case of
In the case of
In
a) is a diagram schematically showing a state in which the waveform of the direct sound shown in
In that manner, by adjusting the value of the normalized cutoff frequency, it is possible to change the extraction time of the direct sound in the input signal. Accordingly, as the value of the normalized cutoff frequency is decreased, the extraction time of the direct sound in the input signal becomes longer, enabling extraction of the direct sound in such a way as to contain not only a non-stationary sound but also a stationary sound. For example, to an extent shown in
b) is a diagram schematically showing a state in which the waveform of the reverberant sound shown in
Accordingly, by adjusting the value of the normalized cutoff frequency, it is possible to change the extraction time of the direct sound in the input signal. By decreasing the value of the normalized cutoff frequency, it is possible to reduce the effects of the direct sound contained in the reverberant sound signal. By increasing the value of the normalized cutoff frequency, it is possible to extract the reverberant sound signal that contains a small amount of direct sound.
Although the present invention has been described in detail with the reference to the accompanying drawings, the direct sound extraction device and reverberant sound extraction device of the present invention are not limited to the above embodiment. It will be apparent to those having ordinary skill in the art that a number of modifications or alternations to the invention as described herein may be made. All such modifications or alternations should therefore be seen as within the scope of the present invention.
By utilizing the direct sound extraction device and reverberant sound extraction device of the present invention, it is also possible to build various acoustic environments. For example, from an input signal that includes a reverberant sound in a direct sound, a direct sound signal is extracted by the direct sound extraction device; the direct sound signal is output from a speaker, which is placed near a listener. As a result, compared with the case where the input signal is output from a speaker without being changed, it is possible to make a vocal sound clearer, thereby making it possible for the listener to easily listen. Moreover, a reverberant sound signal is extracted by the reverberant sound extraction device from the input signal; and the reverberant sound signal is output from a speaker, which is placed distant from the listener. As a result, it is possible to output the reverberant sound in an effective manner.
1: acoustic processing device (direct sound extraction device and reverberant sound extraction device)
3: FFT unit (Fourier transform unit and spectrum transform unit)
4: frequency spectrum region filtering unit
5
a, 5b: IFFT unit (inverse Fourier transform unit and inverse spectrum transform unit)
10: LPF unit (low-pass filter unit)
11: HPF unit (high-pass filter unit)
12: first limiter unit
13: second limiter unit
14: third limiter unit
15: fourth limiter unit
16: first gain unit
17: second gain unit
18: first subtraction unit
19: second subtraction unit
L, R: input signal
Lfa, Rfa: amplitude spectrum signal
Lfp, Rfp: phase spectrum signal
Lfa1: integral signal
Lfa2: differential signal
Lfd, Ld, Rfd, Rd: direct sound signal
Lfr, Lr, Rfr, Rr: reverberant sound signal
Number | Date | Country | Kind |
---|---|---|---|
2011-147021 | Jul 2011 | JP | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP2012/065222 | 6/14/2012 | WO | 00 | 10/21/2013 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2013/005550 | 1/10/2013 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
20080300869 | Derkx et al. | Dec 2008 | A1 |
20140177857 | Kuster | Jun 2014 | A1 |
Number | Date | Country |
---|---|---|
2010-74531 | Apr 2010 | JP |
Entry |
---|
Fukane et al., “Different Approaches of Spectral Subtraction method for Enhancing the Speech Signal in Noisy Enviroments”, International Journal of Scientific & Engineering Research, Mar. 1, 2011, vol. 2, Issue 5, XP055172923, see NPL Cite No. 5. |
Hirsch, “Robust Speech Recognition in Noisy and Reverberant Enviroments”, Speech Recognition and Understanding, Recent Advances, Jan. 1, 1992, pp. 101-106, vol. 75, Part 1, XP008175013, see NPL Cite No. 5. |
Lebart et al., “A New Method Based on Spectral Subtraction for Speech Dereverberation”, ACUSTICA, S. Hirzel Verlag, May 1, 2001, pp. 359-366, vol. 87, No. 3, XP009053193, see NPL Cite No. 5. |
Smith, “Chapter 14. Introduction to Digital Filters”, The Scientist and Engineer's Guide to Digital Signal Processing, Jan. 1, 2002, pp. 261-276, XP055172473, see NPL Cite No. 5. |
Supplementary European Search Report for corresponding EP Application No. 12807065.3, Mar. 11, 2015. |
International Search Report for corresponding International Application No. PCT/JP2012/065222, Jul. 10, 2012. |
Chinese Office Action for corresponding CN Application No. 201280015523.2, Nov. 4, 2014. |
Number | Date | Country | |
---|---|---|---|
20140044273 A1 | Feb 2014 | US |