1. Technical Field
The present invention relates to a technique for suppressing a noise component for a signal representing a sound (hereinafter, referred to as “sound signal”) in which a desired signal component (target sound component) and a noise component are mixed.
2. Background Art
Conventionally, various techniques for suppressing a noise component of a sound signal (or emphasizing a signal component) have been proposed. For example, in Non-Patent Document 1 or Patent Document 1, a spectrum subtraction method for subtracting an estimated spectrum of a noise component (hereinafter, referred to as “estimation noise spectrum) from a spectrum of a sound signal is disclosed.
[Non-Patent Document 1] Ephraim Y., Malah D., “Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator”, DECEMBER 1984, IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. 32, NO. 6, PP. 1109-1121
[Patent Document 1] JP-A-2003-131689
However, in the technique of Non-Patent Document 1 or Patent Document 1, a noise component may not be completely removed. A noise component remaining in an interval in which the strength of a signal component is low is remarkably perceived by a listener. In particular, there is a problem in that a noise component irregularly remaining on a time axis and a frequency axis is perceived as strident musical noise (birdie noise). A level of suppressing an estimation noise spectrum from a spectrum of a sound signal needs to be increased in a situation where a signal to noise ratio is low, but the musical noise is remarkably perceived as the suppression level of the estimation noise spectrum is increased.
In view of the above situation, an object of the present invention is to make it difficult to perceive a noise component (particularly, musical noise).
A noise suppressing apparatus related to one aspect of the present invention is provided for addressing the above problem. The inventive noise suppressing apparatus suppresses a noise component of a sound signal which contains the noise component and a signal component. The noise suppressing apparatus comprises: a frequency analyzing section that divides the sound signal into a plurality of frames such that adjacent frames overlap with each other along a time axis, and that computes a first spectrum of each frame; a noise suppressing section that suppresses a noise component of the first spectrum so as to provide a second spectrum of each frame in which the noise component is suppressed; a frequency specifying section that specifies a frequency of a noise component of each frame; a phase controlling section that varies a phase of the noise component corresponding to the specified frequency in the second spectrum by a different variation amount each frame; and a signal synthesizing section that combines the frames after the second spectrum of each frame is processed by the phase controlling section, such that adjacent frames overlap with each other along the time axis so as to output the sound signal.
According to the above configuration, the clearness of the noise component is reduced by varying a phase of the noise component by a different variation amount in each frame. Accordingly, this can make it difficult to perceive a noise component (for example, musical noise) as compared with a configuration in which a sound signal after suppression by a noise suppressing section is directly output.
In case that a signal component is specified and then the remaining component is specified as a noise component, the frequency specifying section includes a section that specifies a frequency of a signal component. Moreover, the frequency specifying section uses any information to specify the frequency of the signal component. For example, the frequency of the noise component can be specified on the basis of the first spectrum computed in the frequency analyzing section or the second spectrum after processing by the noise suppressing section. The frequency of the noise component can be specified on the basis of a spectrum obtained by means separate from the frequency analyzing section or the noise suppressing section.
The noise suppressing apparatus related to a preferred aspect of the present invention includes a variation amount setting section that sets a different variation amount according to a random number generated for each frame. The phase controlling section varies the phase of the noise component corresponding to the specified frequency by the different variation amount set by the variation amount setting section for each frame. According to the above aspect, the clearness of musical noise can be effectively reduced since phase variation amounts of the frames are set according to random numbers.
According to a preferred aspect, the phase controlling section varies the phase of the noise component corresponding to the specified frequency provided that the specified frequency falls in a predetermined frequency range of the second spectrum. The predetermined frequency range is set, for example, to include a frequency capable of being easily perceived by a listener. According to the above aspect, there is advantageous in that an amount of processing by the phase controlling section is reduced in comparison with a configuration in which a phase is controlled for noise component frequencies over all frequency range. There can be adopted a configuration in which the phase controlling section selectively controls only a phase of a frequency belonging to a predetermined frequency range among noise component frequencies specified in the frequency specifying section, or a configuration in which the frequency specifying section specifies only a frequency belonging to a predetermined frequency range.
The noise suppressing apparatus related to the present invention is realized with hardware (an electronic circuit) of a DSP (Digital Signal Processor) or the like dedicated to suppress a noise component, and is also realized with a cooperation of a general-purpose arithmetic processing unit of a CPU (Central Processing Unit) or the like and a program. A computer program related to one aspect of the present invention is executable by a computer for suppressing a noise component of a sound signal which contains the noise component and a signal component. The computer program comprises: a frequency analyzing process of dividing the sound signal into a plurality of frames such that adjacent frames overlap with each other along a time axis, and computing first spectrum of each frame; a noise suppressing process of suppressing a noise component of the first spectrum so as to provide second spectrum of each frame in which the noise component is suppressed; a frequency specifying process of specifying a frequency of a noise component of each frame; a phase controlling process of varying a phase of the noise component corresponding to the specified frequency in the second spectrum by a different variation amount each frame; and a signal synthesizing process of combining the frames after the second spectrum of each frame is processed by the phase controlling section, such that adjacent frames overlap with each other along the time axis so as to output the sound signal.
Moreover, the present invention is provided as a method for suppressing a noise component. The noise suppressing method related to one aspect of the present invention suppresses a noise component of a sound signal which contains the noise component and a signal component. The method comprises: a frequency analyzing process of dividing the sound signal into a plurality of frames such that adjacent frames overlap with each other along a time axis, and computing first spectrum of each frame; a noise suppressing process of suppressing a noise component of the first spectrum so as to provide second spectrum of each frame in which the noise component is suppressed; a frequency specifying process of specifying a frequency of a noise component of each frame; a phase controlling process of varying a phase of the noise component corresponding to the specified frequency in the second spectrum by a different variation amount each frame; and a signal synthesizing process of combining the frames after the second spectrum of each frame is processed by the phase controlling section, such that adjacent frames overlap with each other along the time axis so as to output the sound signal.
As shown in
The frequency analyzing section 20 is means for computing a spectrum (amplitude spectrum or power spectrum) QA for each of a plurality of frames into which a sound signal SIN is divided on along time axis. As shown in FIG. 1, the frequency analyzing section 20 includes a dividing section 22, a windowing section 24, and a converting section 26. The dividing section 22 divides the sound signal SIN into a plurality of frames and sequentially outputs the divided frames. The frames adjacent to each other are partially overlapped along the time axis. That is, a time difference between the frames adjacent to each other is shorter than each frame time length. The windowing section 24 multiplies the sound signal SIN of each frame by a window function (for example, Hamming window or Hanning window).
The converting section 26 computes a first spectrum QA of a frequency domain by performing frequency analysis of an FFT (Fast Fourier Transform) process or the like for the sound signal SIN of each frame multiplied by the window function. As the converting section 26, any means (for example, a filter bank) for converting the sound signal SIN of a time domain into a frequency domain signal is adopted. The spectrum QA is expressed as a plurality of components (hereinafter, referred to as “frequency bins”) corresponding to separate frequencies (or frequency bands).
The noise suppressing section 30 is means for suppressing the noise component from the spectrum QA computed in the frequency analyzing section 20. As shown in
The subtracting section 36 generates a second spectrum QB by subtracting the estimation noise spectrum QN from the first spectrum QA of each frame sequentially supplied from the frequency analyzing section 20. There can be adopted a configuration in which a suppression level of the noise component is suitably adjusted by subtraction from the spectrum QA after multiplying the estimation noise spectrum QN by a predetermined coefficient (suppression coefficient).
A noise component averagely generated over a plurality of frames among spectra QA is effectively suppressed by the subtraction process by the subtracting section 36. However, a local noise component incidentally occurring in each frame is not completely removed by the processing in the subtracting section 36. As described above, the local noise component remaining in the spectrum QB is perceived as musical noise by the listener. The frequency specifying section 40 and the phase controlling section 50 function as means for making it difficult that the listener perceives the musical noise.
The frequency specifying section 40 is means for specifying a noise component frequency of the spectrum QB of each frame. In this embodiment, the frequency specifying section 40 classifies frequencies of a plurality of frequency bins (or frequency bands) configuring the spectrum QB into a frequency of a dominant signal component (hereinafter, referred to as “signal dominant frequency”) BS and a frequency of a dominant noise component (hereinafter, referred to as “noise dominant frequency”) BN. For the classification of the signal dominant frequency BS and the noise dominant frequency BN, for example, the following method is adopted.
A vocal sound has a property called harmonic structure in which a spectrum peak appears at a frequency of an integer multiple of a predetermined frequency (fundamental tone). The frequency specifying section 40 selects a frequency approximating each frequency (that is, the frequency of the integer multiple of the frequency of the fundamental tone) configuring the harmonic structure among a plurality of frequencies corresponding to a frequency bin as the signal dominant frequency BS, and selects each frequency other than the signal dominant frequency BS as the noise dominant frequency BN.
The phase controlling section 50 of
The phase controlling section 50 varies a phase of a component of the noise dominant frequency BN in the spectrum QB by a variation amount set for a corresponding frame in the variation amount setting section 52. That is, the phase variation amount of the component corresponding to the noise dominant frequency BN is different between the frames. Based on the second spectrum QB, a third spectrum QC containing each frequency bin of the signal dominant frequency BS and a frequency bin of the noise dominant frequency BN whose phase is controlled by the phase controlling section 50 are output from the phase controlling section 50 to the signal synthesizing section 60 on a frame by frame basis.
The signal synthesizing section 60 is means for synthesizing a sound signal SOUT of the time domain from the third spectrum QC of a plurality of frames. The signal synthesizing section 60 includes a converting section 62, a windowing section 64, and a summing section 66. The converting section 62 generates a time domain signal C for each frame by performing an inverse FFT process for the spectra QC. The windowing section 64 multiplies the sound signal C of each frame by a window function (for example, Hamming window or Hanning window). The summing section 66 generates a sound signal SOUT by sequentially combining sound signals C of the frames multiplied by the window function to be overlapped along the time axis. A type of window function or a window length may be common or different between the frequency analyzing section 20 and the signal synthesizing section 60.
The arithmetic content in which the phase controlling section 50 varies a phase of the noise dominant frequency BN by a variation amount θ is expressed by the following Expression (1).
S′(k)=S(k)e−jθ (1)
In Expression (1), S(k) corresponds to a k-th frequency bin (frequency bin of the noise dominant frequency BN), and S′(k) corresponds to a k-th frequency bin after the phase is varied.
s′(m) computed by performing an inverse FFT process for S′(k) of Expression (1) in the converting section 62 is expressed as follows. W of Expression (2) is a rotator.
As seen from Expression (2), s′(m) is a signal obtained by delaying a time domain signal S(m) corresponding to S(k) before processing by the phase controlling section 50 by a variation amount θ on the time axis. That is, noise components remaining after processing by the noise suppressing section 30 are delayed by individual delay amounts on a frame by frame basis, and are then overlapped and added in the summing section 66. That is, a process for adding components of the noise dominant frequency BN after phase variations by individual variation amounts θ on the frame basis corresponds to a process for applying the reverb effect to the musical noise.
As described above, this embodiment can make it difficult that the listener perceives musical noise (impression of a strident sound) since the reverb effect is applied to the musical noise in comparison with the conventional configuration in which the musical noise is clearly perceived when a voice is reproduced after processing by the noise suppressing section 30. Since noise component suppression by the noise suppressing section 30 and phase control by the phase controlling section 50 are individually performed, the perception of the musical noise is effectively reduced while the noise component is sufficiently suppressed in the noise suppressing section 30, even when a sound signal SIN whose signal to noise ratio is low is processed. Since the phase control by the phase controlling section 50 is selectively performed for only the noise dominant frequency BN in the spectrum QB, the signal component of the signal dominant frequency BS is maintained in the same clearness as that of the sound signal SIN.
The above embodiment can be variously modified.
Aspects of concrete modifications are illustrated as follows. The following aspects can be suitably combined.
In the above embodiment, a configuration for controlling a phase for a component of a noise dominant frequency BN over all frequency bands of the spectrum QB has been illustrated in the above embodiment, but a configuration for controlling a phase for only a noise dominant frequency BN within a specific frequency band (for example, a frequency range capable of being easily perceived by the listener) can also be adopted. For example, the phase controlling section 50 varies a phase of a noise dominant frequency BN belonging to a predetermined frequency band among noise dominant frequencies BN specified in the frequency specifying section 40, and does not vary a noise dominant frequency BN out of the corresponding frequency band. Moreover, the frequency specifying section 40 can specify only the noise dominant frequency BN belonging to the predetermined frequency band. As compared with a configuration for controlling a phase for all noise dominant frequencies BN, the above configuration is advantageous in that an amount of processing by the phase controlling section 50 is reduced.
As shown in
In the above, a configuration for specifying a noise dominant frequency BN on the basis of a harmonic structure of a spectrum (a second spectrum QB of
As shown in
In the above embodiment, a configuration for subtracting an estimation noise spectrum QN from a spectrum QA has been illustrated, but the noise suppressing section 30 suppresses a noise component by various methods. For example, a configuration for performing an individual weighting process for each frequency band of the spectrum QA is adopted. A weight value of a frequency band of a signal component and a weight value of a frequency band of a noise component are individually set such that the noise component is suppressed. Moreover, a spectrum QB can be generated by extracting only a component of the frequency band of the signal from the spectrum QA (namely, destroying a component of the frequency band of the noise).
In a configuration in which a frequency band of a signal component and a frequency band of a noise component are separated from each other to suppress the noise component, a configuration is preferable in which a result of specification by the frequency specifying section 40 is shared between the noise suppressing section 30 and the phase controlling section 50. That is, as shown in
The variation amount setting section 52 sets a phase variation amount by various methods. A configuration in which the variation amount setting section 52 performs a predetermined arithmetical operation and computes a variation amount of each frame can also be adopted. For example, there is adopted a configuration in which a phase variation amount of a corresponding frame is computed in the four arithmetical operations (for example, addition of a strength and a predetermined value) according to the strength of a spectrum QB in a noise dominant frequency BN of each frame. Moreover, one of a predetermined number of numerical values can be selected as a variation amount in an order filter process. That is, a configuration in which phase variation amounts are different between frames in tandem is suitably adopted in the present invention. In this regard, phase variation amounts do not need to be different between all frames in tandem. A configuration in which a phase variation amount is controlled in a unit of two or more frames can be adopted.
Number | Date | Country | Kind |
---|---|---|---|
2007-100757 | Apr 2007 | JP | national |