The present invention relates to a noise suppression device for suppressing noise superposed upon a desired sound signal, and its method and program.
As a device for suppressing background noise of an input signal that is configured of desired sound and background noise, a noise suppression device (hereinafter, referred to as a noise suppressor) is known. The noise suppressor is a device for suppressing noise superposed upon a desired sound signal. The noise suppressor operates, as a rule, so as to suppress the noise coexisting in the desired sound signal by employing an input signal converted in a frequency region, thereby to estimate a power spectrum of a noise component, and subtracting this estimated power spectrum from the input signal. In addition, successively estimating the power spectrum of the noise component enables the noise suppressor to be applied also for the suppression of non-constant noise. There exists, for example, the technique described in Patent document 1 as a noise suppressor.
A configuration of the noise suppressor disclosed in the Patent document 1 will be explained by making a reference to
Patent document 1: JP-P2002-204175A
However, in the conventional configuration explained by employing
Thereupon, the present invention has been accomplished in consideration of the above-mentioned problems, and an object thereof is to provide a noise suppression device that is capable of realizing the high-quality noise suppression with a small arithmetic quantity, and its method and program.
The present invention for solving the above-mentioned is a noise suppression device, comprising: a conversion means for converting an input signal into a frequency region signal for each decided first frame; a frame generation means for generating a second frame so that it differs from said first frame; a representative frequency region signal generation means for generating a representative frequency region signal from said frequency region signal of the first frame being included in said second frame; and a noise suppression degree calculation means for obtaining a degree of noise suppression of said second frame based upon said representative frequency region signal.
The present invention for solving the above-mentioned is a noise suppression method, comprising: a conversion step of converting an input signal into a frequency region signal for each decided first frame; a frame generation step of generating a second frame so that it differs from said first frame; a representative frequency region signal generation step of generating a representative frequency region signal from said frequency region signal of the first frame being included in said second frame; and a noise suppression degree calculation step of obtaining a degree of noise suppression of said second frame based upon said representative frequency region signal.
The present invention for solving the above-mentioned is a noise suppression program for causing a computer to execute: a conversion process of converting an input signal into a frequency region signal for each decided first frame; a frame generation process of generating a second frame so that it differs from said first frame; a representative frequency region signal generation process of generating a representative frequency region signal from said frequency region signal of the first frame being included in said second frame; and a noise suppression degree calculation process of obtaining a degree of noise suppression of said second frame based upon said representative frequency region signal.
In the configuration of the present invention, the noise suppression information is calculated for each processing frame having two converted frames or more integrated therein. For this, the noise suppression having a high sound quality can be realized with a small arithmetic quantity owing to the configuration of the present invention.
Embodiments of the noise suppression device of the present invention will be explained in details by making a reference to the accompanied drawings.
A configuration of the best mode of the present invention will be explained by making a reference to
The input signal, being a degraded sound signal, is supplied as a sample value sequence to the input terminal 1. The input signal sample is supplied to the converted frame division unit 2, and divided into decided converted frame lengths. The converted frame division unit 2 outputs the input signal sample of an n-th converted frame to the conversion unit 5. The conversion unit 5 converts the input signal sample of the n-th converted frame into a degraded sound spectrum Yn(k), being a signal of the frequency region. Herein, n indicates an index in a time direction of the converted frame. It is assumed that k indicates an index in a frequency direction, and the input signal sample of the n-th converted frame is divided into K frequency bands (0≦k<K). The conversion unit 5 separates the degraded sound spectrum Yn(k) into a phase and an amplitude, outputs arg Yn(k), being a phase, to the inverse conversion unit 6, and outputs a degraded sound power spectrum |Yn(k)|2 to the processing frame information generation unit 7, the representative frequency region signal generation unit 8, and the noise suppression processing unit 10.
The conversion unit 5 applies a frequency conversion for the input signal sample divided into the converted frames as a method of converting the input signal sample of the n-th converted frame into the degraded sound spectrum Yn(k). As an example of the frequency conversion, a Fourier transform, a cosine transform, a KL (Karhunen Loeve) transform, etc. are known. The technology related to a specific arithmetic operation of these transforms, and its properties are disclosed in Non-patent document 1 (DIGITAL CODING OF WAVEFORMS, PRINCIPLES AND APPLICATIONS TO SPEECH AND VIDEO, PRENTICE-HALL, 1990). Further, it is widely known that other conversions such as a Hadamard transform, a Haar transform, and a wavelet transform can be employed.
The conversion unit 5 also can apply the foregoing transforms for a result obtained by weighting the input signal sample of the above converted frame with a window function W. As such a window function, the window functions such as a Hamming window, a Hanning (Hann) window, a Kaiser window, and a Blackman window are known. Further, more complicated window functions can be employed. The technology related to these window functions is disclosed in Non-patent document 2 (DIGITAL SIGNAL PROCESSING, PRENTICE-HALL, 1975) and Non-patent document 3 (MULTIRATE SYSTEMS AND FILTER BANKS, PRENTICE-HALL, 1993). In addition, it is also widely conducted to partially superpose (overlap) the continuous two converted frames or more upon each other for windowing. In this case, the foregoing frequency conversion is applied for the signal windowed with superposition. The technology relating to the blocking involving the overlap and the conversion is disclosed in the Non-patent document 2.
In addition, the conversion unit 5 may be configured of a band-division filter bank to calculate the degraded sound spectrum Yn(k). The band-division filter bank is configured of a plurality of band-pass filters. An interval of each frequency band of the band-division filter bank could be equal in some cases, and unequal in some cases. Performing the unequal-interval band division makes it possible to lower/raise a time resolution, that is, the time resolution can be lowered by performing the division into narrow bands with regard to a low-frequency area, and the time resolution can be raised by performing the division into wide bands with regard to a high-frequency area. As a typified example of the unequal-interval division, there exists an octave division in which the band gradually halves toward the low-frequency area, a critical band division that corresponds to an auditory feature of a human being, or the like. After the conversion unit 5 performs the division into the equal-interval frequency bands, it may employ a hybrid filter bank for further band-dividing only the low-frequency area in order to enhance a frequency resolution of the frequency band in the low-frequency area. The technology relating to the band-division filter bank and its design method is disclosed in the Non-patent document 3.
The processing frame information generation unit 7 calculates processing frame information for generating a representative degraded sound power spectrum, which is later described, from the degraded sound power spectrum. Information for integrating a plurality of the degraded sound power spectra in the time direction and in the frequency direction is included in the processing frame information. The processing frame information generation unit 7 being included in
The converted frame energy calculation unit 50 obtains a converted frame energy E(n) of the above converted frame from the degraded sound power spectrum |Yn(k)|2, and outputs it to the time group generation unit 51. The converted frame energy E(n) becomes the following equation.
Herein, a sum of the energies of the degraded sound power spectra of all frequency bands is defined as the converted frame energy. However, the converted frame energy may be calculated from the degraded sound power spectrum of only one part of the frequency bands. For example, the converted frame energy may be calculated from the degraded sound power spectrum of only the band in which a power of the sound signal concentrates. With this, the generation of the processing frame, which is later described, can be performed at a high standard of quality. Further, calculating the converted frame energy without using the signal of the low-frequency band enables an influence of the noise component, which is inclined to concentrate in the low-frequency area, to be removed.
In addition, the degraded sound power spectrum may be weighted in the frequency direction to employ a sum of the weighted values as the converted frame energy. Besides it, the calculated converted frame energy may be smoothed in the time direction.
Herein, the calculated converted frame energy can be also modified according to an auditory feature. For example, it is known that perception of an intensity of the sound is proportional to a logarithm thereof as an auditory feature of a human being. The value obtained by logarithmizing the energy can be defined as the converted frame energy by employing this feature. The converted frame energy also can be modified by employing not only the simple logarithm but also a more complicated function and polynomial expression. The polynomial expression approximating the logarithm, which is one example of these, contributes a reduction in the arithmetic quantity.
The time group generation unit 51 decides a delimiter position of the processing frame for generating a representative degraded sound power spectrum, which is later described, based upon the converted frame energy. The time group generation unit 51 outputs the processing frame generated based upon the decided processing frame delimiter position to the frequency group generation unit 52. There exists the method of deciding the delimiter position of the processing frame based upon a change in the converted frame energy as a method of deciding the delimiter position of the processing frame.
An example of a change in the converted frame energy will be explained by making a reference to
As a method of detecting a location in which the converted frame energy is greatly changed, for example, there exists the method of determining that the converted frame energy has been greatly changed when the following equation is satisfied by employing a pre-determined threshold THA.
E(nL)−E(nL−1)>THA [Numerical equation 2]
In the case of this method, the delimiter position of the processing frame is decided so that the processing frame is divided at n=nL. At this time, the threshold THA can be also changed. For example, the threshold THA is adaptably changed based upon an average value or a dispersion value of the converted frame energies so that a ratio at which the Numerical equation 2 is satisfied is equalized in a certain constant block. Doing so makes it possible to reduce a dispersion of the numbers of times at which the arithmetic operation is performed for the noise suppression information that is later described.
As another method of generating the delimiter position of the processing frame, there exists the method of not calculating a change quantity only from the energies of the neighboring two converted frames, but calculating a change quantity by employing a plurality of the converted frame energies, and generating the delimiter position of the processing frame. For example, the delimiter position of the processing frame can be decided so that the processing frame is divided at n=nL by employing the three converted frame energies when the following conditional equation is satisfied.
(E(nL)−E(nL−1))·(E(nL)−E(nL2))>THB [Numerical equation 3]
Where, THB is a threshold. At this time, the threshold THB can be also changed. For example, the threshold THB is adaptably changed based upon an average value or a dispersion value of the converted frame energies so that a ratio at which [Numerical equation 3] is satisfied is equalized in a certain constant block. Doing so makes it possible to reduce a dispersion of the numbers of times at which the arithmetic operations is performed for the noise suppression information that is later described.
As yet another method of deciding the delimiter position of the processing frame, there exists the method of deciding the delimiter position of the processing frame so that a minimum value and a maximum value of the converted frame energy being included in the above processing frame become equal to or less than a pre-decided threshold. In this case, the signal being included in the above processing frame resultantly has an equal energy or so, and the noise suppression information, which is later described, can be calculated at a high standard of quality. Further, the delimiter position of the processing frame may be generated so that a fixed processing frame length is yielded from the location in which the converted frame energy has been greatly changed. In this case, the arithmetic quantity can be reduced because the number of times at which a change in the energy is determined can be reduced.
In the foregoing, the method was explained of calculating the converted frame energy for each converted frame, and generating the delimiter position of the processing frame. So far as the above-mentioned method is concerned, it is also possible to calculate the converted frame energy in a unit obtained by integrating a plurality of the converted frames, and to generate the delimiter position of the processing frame based upon the calculated converted frame energy. In this case, the arithmetic quantity of the time group generation unit 51 can be reduced because the converted frame energy does not need to be calculated converted frame by converted frame. Further, it is also possible to analyze a change in the signal frequency band by frequency band, and to decide the delimiter position of the processing frame. As a result, an importance degree decided frequency band by frequency band can be reflected. For example, making an importance degree of the band in which the sound signal is included large enables a change in the signal of the above band to be easily reflected.
A feature of the degraded sound spectrum other than the converted frame energy may be employed as an index for deciding the delimiter position of the processing frame. For example, the delimiter position can be decided based upon the index such as a psychological auditory entropy. That is, this method is a method of actively employ a psychological auditory masking that the small sound in the adjacent of the large sound is hard to hear, being an auditory feature of a human being, or the like. It is a method of employing the psychological auditory masking, thereby to decide the delimiter position of the processing frame so that the processing frame is divided at the location in which a component of the sound that a human being can hear is changed. With this method, the processing frame based upon the auditory feature of a human being can be generated, and the noise suppression information, which is later described, can be calculated at a high standard of quality.
It is apparent that not only one of the above-mentioned methods is employed, but a combination thereof can be employed at the moment of deciding the delimiter position of the processing frame.
Herein, one example of a processing operation of the time group generation unit 51 will be explained by making a reference to a flowchart of
The time group generation unit 51 calculates a dispersion of the converted frame energies with respect to N converted frames within a decided certain constant block (S001). Thereafter, the time group generation unit 51 determines whether N converted frames within the above constant block satisfy the foregoing Numerical equation 2 or Numerical equation 3 (S002). When the number of the converted frames satisfying the numerical equation is at least one, the process proceeds to S007. Contrarily, the number of the converted frames satisfying the foregoing Numerical equation 2 or Numerical equation 3 is zero, the process proceeds to S003.
In the S003, the time group generation unit 51 determines whether the calculated dispersion value is larger than a threshold Thr1, and advances the operation to the S007 when the dispersion value is larger than the threshold Thr1. On the other hand, when the dispersion value is smaller than the threshold Thr1, the process proceeds to S004. In the S004, the time group generation unit 51 determines whether the calculated dispersion value is larger than a threshold Thr2, and advances the operation to S005 when the dispersion value is smaller than the threshold Thr2.
In the S005, the above N converted frames are defined as one processing frame. Where each of n0 and n1 indicates the delimiter position of the processing frame, and Kosu indicates how many processing frames have been generated from the above N converted frames. On the other hand, when the dispersion value is larger than the threshold Thr2 in the S004, the process proceeds to S006. In the S006, the above N converted frames are defined as two processing frames. At this time, the delimiter position is set so that the processing frame lengths of the two processing frames become identical to each other. That is, n1=N/2 is yielded
Continuously, an operation of the S007 and after it will be explained. In the S007, after the time group generation unit 51 initializes necessary variables, it investigates the above N converted frames in an order of n=0 to n=N−1, and determines whether the locations of these converted frames become a delimiter position of the processing frame, respectively. Next, in S008, the time group generation unit 51 determines whether an absolute value of a difference between the minimum value and the maximum value of the energy of the converted frame being included in the above processing frame is larger than a pre-decided threshold. When it is larger than the pre-decided threshold, the process proceeds to S010, and when it is smaller than the pre-decided threshold, the process proceeds to S009. Continuously, in the S009, the time group generation unit 51 determines whether the converted frame n satisfies the foregoing Numerical equation 2 or Numerical equation 3. In the S009, when the converted frame n satisfies the foregoing Numerical equation 2 or Numerical equation 3, the process proceeds to the S010. On the other hand, when the converted frame n does not satisfy the foregoing Numerical equation 2 or Numerical equation 3, the process proceeds to S011. In the S010, the time group generation unit 51 decides the delimiter position of the processing frame so that the processing frame is divided at the converted frame n, increase the number of the processing frames by one, and advances the process to the S011. In the S011, the time group generation unit 51 determines whether the investigation has been performed as far as the converted frame N−1, defines n as n=n+1 when the converted frame that should be investigated still remains (S012), and the process returns to the S008. When all of the above N converted frames have been investigated, the generation of the processing frame is finished.
Above, the explanation of one example of the processing operation of the time group generation unit 51 by making a reference to
The frequency group generation unit 52 integrates the frequency bands for each processing frame supplied by the time group generation unit 51, and decides the delimiter position of the integrated frequency band for calculating the representative degraded sound power spectrum, which is later described. Thereafter, the frequency group generation unit 52 outputs the delimiter position of the processing frame and the delimiter position of the integrated frequency band as processing frame information to the representative frequency region signal generation unit 8.
A situation in which the frequency bands are integrated will be explained by making a reference to
At this time, more numerous bands may be integrated into one in the high-frequency region as compared with the low-frequency region. That is, it means that more numerous frequency components are integrated into one in the higher-frequency region component, and an unequal-interval division is performed. As an example of such an unequal-interval division, an octave division in which the band is widened according to a power of 2 toward the high-frequency region side, a division according to critical bands band-divided based upon the auditory feature of a human being, and so on are known. In particular, the band division according to the critical band is widely employed because consistency with the auditory feature of a human being is high. Deterioration in the noise suppression feature is also prevented from occurring by integrating the frequency bands into a group smaller than the critical band at the time of integrating them.
Next, a second configuration example of the processing frame information generation unit 7 will be explained in details by making a reference to
From the degraded sound power spectrum and the processing frame, the frequency energy calculation unit 53 obtains a frequency energy EfL(k), being a sum of the energies of the degraded sound power spectra of the identical frequency band in the above processing frame. The frequency energy calculation unit 53 outputs the frequency energy EfL(k) to the frequency group generation unit 54. That is, the frequency energy EfL(k) of the processing frame L becomes the following equation.
The frequency group generation unit 54 integrates the frequency bands, which resemble each other in the feature of the degraded sound power spectrum, in a processing frame unit based upon the processing frame supplied from the time group generation unit 51 and the frequency energy EfL(k) supplied from the frequency energy calculation unit 53. With this, the frequency group generation unit 54 decides the delimiter position of the integrated frequency band.
A situation in which the frequency bands are integrated in each processing frame will be explained by making a reference to
With regard to the integration of the frequency band, the delimiter position of the integrated frequency band is decided so that the integrated frequency band is divided at the location in which a change in the frequency energy is large. For example, the frequency bands may be integrated by applying the method based upon the energy change explained in the time group generation unit 51 for the frequency direction. Making such a configuration enables the best suitable integration of the frequency bands to be realized in each processing frame. For this, the integration into unnecessarily many frequency bands can be suppressed when a change in the signal is small, and the arithmetic quantity can be reduced.
Above, the explanation of the second configuration example of the processing frame information generation unit 7 is finished.
Constituting the processing frame information generation unit 7 as mentioned above makes it possible to generate the processing frame having a plurality of the converted frames integrated therein. At this time, the converted frames being included in the processing frame resembles each other in the feature of the degraded sound power spectrum, whereby respective items of the noise suppression information calculated for each of the above converted frames have an analogous value. The noise suppression information will be described later. For this reason, almost no difference occurs of the effect between the noise suppression by the noise suppression information calculated converted frame by converted frame, and the noise suppression by the noise suppression information calculated processing frame by processing frame. Owing to this, there is no possibility that the effect of the noise suppression declines even though the noise suppression information calculated processing frame by processing frame is employed. Thus, no possibility of exerting an influence upon the final noise suppression exists even though the arithmetic quantity is reduced by calculating the noise suppression information processing frame by processing frame.
Above, the explanation of the processing frame information generation unit 7 is finished.
The representative frequency region signal generation unit 8 generates a representative degraded sound power spectrum by employing the processing frame information and the degraded sound power spectrum. And the representative frequency region signal generation unit 8 outputs the representative degraded sound power spectrum to the noise suppression information calculation unit 9. As a method of generating the representative degraded sound power spectrum, there exists the method of employing an average value of the degraded sound power spectra that are included in the above processing frame and in the above integrated frequency band. In this case, a representative degraded sound power spectrum |ZL(m)|2 (m=0, . . . , ML−1) of the L-th processing frame becomes the following equation.
That is, in
Further, there exists the method of obtaining an average value of the degraded sound power spectra except the large degraded sound power spectrum and the small degraded sound power spectrum besides the method of employing an average value of all of the degraded sound power spectra. Doing so makes it possible to remove the unexpected degraded sound power spectrum, whereby the representative degraded sound power spectrum is stabilized, and a degree of the noise suppression, which is later described, can be calculated at a high standard of quality.
Besides, the method as well exists of not employing an average value, but employing a specific degraded sound power spectrum as the representative degraded sound power spectrum. For example, when the maximum value of the degraded sound power spectrum, which is included in the above processing frame and in the above integrated frequency region, is defined as the representative degraded sound power spectrum, the noise component is resultantly estimated to be in a high level at the moment of calculating the noise suppression information that is described later. In this case, residual noise being included in the noise-suppressed emphasized sound can be made small. On the other hand, when the minimum value of the degraded sound power spectrum, which is included in the above processing frame and in the above integrated frequency region, is defined as the representative degraded sound power spectrum, the noise component is resultantly estimated to be in a low level at the moment of calculating the noise suppression information that is described later. In this case, strain of the noise-suppressed emphasized sound can be made small.
The noise suppression information calculation unit 9 obtains the noise suppression information indicative of a degree of one noise suppression for each representative degraded sound power spectrum. And, the noise suppression information calculation unit 9 outputs the noise suppression information to the noise suppression processing unit 10. That is, the noise suppression information calculation unit 9 calculates the noise suppression information common to a plurality of the degraded sound power spectra. This is equivalent to the calculation of one item of noise suppression information CL(m)(m=0 . . . , ML−1) per one grid encircled by gray in
A first configuration example of the noise suppression information calculation unit 9 will be explained in details by making a reference to
The noise estimation unit 300 estimates the energy of the noise component being included in the degraded sound based upon the representative degraded sound power spectrum. The noise estimation unit 300 outputs the energy of the estimated noise component as an estimated noise power spectrum to the noise suppression coefficient generation unit 601. The noise suppression coefficient generation unit 601 obtains a suppression coefficient based upon the representative degraded sound power spectrum, the estimated noise power spectrum, and an amended suppression coefficient, which is described later, and estimates an inherent SNR indicative of a ratio between the sound and the noise being included in the input signal. The estimated inherent SNR will be described later. The noise suppression coefficient generation unit 601 outputs the suppression coefficient and the estimated inherent SNR to the suppression coefficient amendment unit 1501. The suppression coefficient amendment unit 1501 amends the inputted suppression coefficient based upon the estimated inherent SNR, and obtains the amended suppression coefficient. The suppression coefficient amendment unit 1501 outputs the amended suppression coefficient as noise suppression information, and simultaneously therewith, outputs it to the noise suppression coefficient generation unit 601.
A configuration example of the noise estimation unit 300 being included in
A configuration of the estimated noise calculation unit 310 being included in
On the other hand, the count value, the representative degraded sound power spectrum, and the estimated noise power spectrum are inputted into the update determination unit 400. The update determination unit 400 outputs a signal of 1 or 0 to the counter 480, the switch 430, and the shift register 440. The update determination unit 400 outputs 1 at any time until the count value being inputted reaches a pre-set value. Further, the update determination unit 400 outputs 1 when it has been determined that the inputted degraded sound signal is noise after the count value reaches the pre-set value, and outputs 0 in the cases other than it. The switch 430 closes the circuit when the signal inputted from the update determination unit 400 is 1, and opens the circuit when it is 0. The counter 480 increase the count value when the signal inputted from the update determination unit 400 is 1, and does not change the count value when it is 0. The shift register 440 incorporates the signal sample being inputted from the switch 430 by one (1) sample when the signal inputted from the update determination unit 400 is 1. In addition, the shift register 440 shifts the storage value of the internal register to the neighboring register simultaneously therewith the incorporation of one (1) sample. The output of the counter 480 and the output of the register length storage unit 410 are inputted into the minimum value selection unit 460.
The minimum value selection unit 460 selects one of the inputted count value and register length, which is smaller, and outputs it to the division unit 470. The division unit 470 divides the addition value of the representative degraded sound power spectrum inputted from the adder 450 by one of the count value and the register length, which is smaller. The division unit 470 outputs a quotient obtained by the division as an estimated noise power spectrum λL(m). Upon defining B1(m) (1=0, 1, . . . , P−1) as a sample value of the weighted degraded sound power spectrum saved in the shift register 440, λL(m) is given by the following equation.
Where, P is one of the count value and the register length, which is smaller. The addition value is divided firstly by the count value because the count value is increased monotonously, to begin with zero. After the count value becomes larger than the register length, the addition value is divided by the register length. Dividing the addition value by the register length means that the average value of the values stored in the shift register is obtained. At first, a sufficiently many values have not been stored in the shift register 440, whereby the division is executed by using the number of the registers into which the value has been actually stored. The number of the registers in which the value has been actually stored is equal to the count value when the count value is smaller than the register length, and becomes equal to the register length when the former becomes larger than the latter.
A configuration of the update determination unit 400 being included in
A configuration of the weighted degraded sound calculation unit 320 being included in the noise estimation unit 300 will be explained in details by making a reference to
Where, λL−1(m) is the estimated noise power spectrum stored one processing frame before.
The non-linear processing unit 3204 calculates a weight coefficient vector by employing the SNR being inputted from the SNR calculation unit 3202. And, the non-linear processing unit 3204 outputs the weight coefficient vector to the multiplier 3203. The multiplier 3203 calculates a product of the representative degraded sound power spectrum being inputted from the representative frequency region signal generation unit 8 of
The non-linear processing unit 3204 has a non-linear function capable of outputting an actual value that corresponds to each of multiplexed input values. An example of the non-linear function is shown in
Where, a and b are an optional actual number, respectively.
The non-linear processing unit 3204 processes the SNR being inputted from the SNR calculation unit 3202 with the non-linear function, thereby to obtain the weight coefficient, and outputs it to the multiplier 3203. That is, the non-linear processing unit 3204 outputs the weight coefficient of 1 up to 0 that corresponds to the SNR. It outputs 1 when the SNR is small, and 0 when the SNR is large.
The multiplier 3203 of
Above, the explanation of the noise estimation unit 300 is finished.
Continuously, a configuration of the noise suppression coefficient generation unit 601 of
The noise suppression coefficient generation unit 601 is configured of an acquired SNR calculation unit 610, an estimated inherent-SNR calculation unit 620, a noise suppression coefficient calculation unit 630, and a sound non-existence probability storage unit 640. The acquired SNR calculation unit 610 calculates the SNR for each integrated frequency band by employing the inputted representative degraded sound power spectrum and estimated noise power spectrum. And, the acquired SNR calculation unit 610 outputs a calculation result as an acquired SNR to the estimated inherent-SNR calculation unit 620 and the noise suppression coefficient calculation unit 630. The estimated inherent-SNR calculation unit 620 estimates the inherent SNR by employing the inputted acquired SNR, and the amended suppression coefficient inputted from the suppression coefficient amendment unit 1501. The estimated inherent-SNR calculation unit 620 outputs the estimated inherent SNR to the suppression coefficient amendment unit 1501. In addition, the estimated inherent-SNR calculation unit 620 outputs the estimated inherent SNR to the noise suppression coefficient calculation unit 630.
The noise suppression coefficient calculation unit 630 generates the suppression coefficient by employing the inputted acquired SNR and estimated inherent SNR, and a sound non-existence probability being inputted from the sound non-existence probability storage unit 640. The sound non-existence probability signifies a pre-decided probability that no sound is included in the input signal. And, the noise suppression coefficient calculation unit 630 outputs the suppression coefficient.
A configuration of the estimated inherent-SNR calculation unit 620 being included in
An acquired SNR γL(m) (m=0, 1, . . . , ML−1) being inputted from the acquired SNR calculation unit 610 of
−1 is supplied to another terminal of the adder 6208, and an addition result γL(m)−1 is output to the value range restriction processing unit 6201. The value range restriction processing unit 6201 subjects the addition result γL(m)−1 inputted from the adder 6208 to an operation by a value range restriction operator P[•]. And, the value range restriction processing unit 6201 conveys P[γL(m)−1], being a result of the arithmetic operation, as a momentarily-estimated SNR to the weighted addition unit 6207. Where, P[x] is decided by the following equation.
A weight is inputted into the weighted addition unit 6207 from the weight storage unit 6206. The weighted addition unit 6207 obtains the estimated inherent SNR by employing these inputted momentarily-estimated SNR, past estimated SNR, and weight. Upon defining the weight as α, and ξn(m)-hat as an estimated inherent SNR, the ξL(m)-hat is calculated by the following equation.
{circumflex over (ξ)}L(m)=αγL−1(m)CL−12(m)+(1−α)P[γL(m)−1] [Numerical equation 10]
Where, it is assumed that γ−1(m)C2−1(m)=1.
A configuration of the noise suppression coefficient calculation unit 630 being included in
It is assumed that the processing frame number is L, the frequency number is m, γL(m) is a by-frequency acquired SNR being inputted from the acquired SNR calculation unit 610 of
The MMSE STSA gain function value calculation unit 6301 calculates an MMSE STSA gain function value frequency band by frequency band based upon the acquired SNR γL(m) being inputted from the acquired SNR calculation unit 610 of
Where, I0(z) is a zero-order modified Bessel function, and I1(z) is a first-order modified Bessel function.
The generalized likelihood ratio calculation unit 6302 calculates a generalized likelihood ratio frequency band by frequency band based upon the acquired SNR γL(m) being inputted from the acquired SNR calculation unit 610 of
The suppression coefficient calculation unit 6303 calculates the suppression coefficient frequency band by frequency band from the MMSE STSA gain function value GL(m)-bar being inputted from the MMSE STSA gain function value calculation unit 6301, and the generalized likelihood ratio ΛL(m) being inputted from the generalized likelihood ratio calculation unit 6302. And, the suppression coefficient calculation unit 6303 outputs the suppression coefficient to the suppression coefficient amendment unit 1501 of
It is also possible to obtain the SNR common to a wide band that is configured of a plurality of the frequency bands and to employ the obtained common SNR instead of calculating the SNR frequency band by frequency band.
A configuration of the suppression coefficient amendment unit 1501 will be explained in details by making a reference to
On the other hand, the suppression coefficient lower-limit value storage unit 1592 outputs the lower limit value of the suppression coefficient stored by the suppression coefficient lower-limit value storage unit 1592 itself to the maximum value selection unit 1591. The maximum value selection unit 1591 compares the suppression coefficient by the integrated frequency band being inputted from the noise suppression coefficient calculation unit 630 of
So far, the calculation of the noise suppression information in which the shift register 440 for outputting the value indicative of the status of the past processing frame, the estimated noise storage unit 3201, the acquired SNR storage unit 6202, and so on were involved was explained with the case exemplified of outputting the value of the past processing frame that was indicated by an index number identical to that of the integrated frequency band of the current processing frame. However, when the integrated frequency band differ processing frame by processing frame, the actual frequency band differs in some cases even though the index number of the integrated frequency band in the current processing frame is identical to that of the integrated frequency band in the past processing frame. In this case, making a configuration so that the value indicated by the index number of a band nearest to the above band, out of the stored values of the past processing frames, is outputted enables the high-quality noise suppression to be realized in the current processing frame. Further, the value equivalent to the above band of the current processing frame can be also calculated to employ this without using the stored value of the past processing frame as it stands.
Above, the explanation of the first configuration of the noise suppression information calculation unit 9 is finished.
Continuously, a second configuration example of the noise suppression information calculation unit 9 of
A configuration of the suppression coefficient amendment unit 1502 being included in
The multiplier 660 obtains a product of the representative degraded sound power spectrum and the suppression coefficient, and outputs it as a temporary emphasized sound power spectrum to the sound existence probability calculation unit 670 and the temporary output SNR calculation unit 680. The sound existence probability calculation unit 670 obtains a sound existence probability VL of the L-th processing frame from the temporary emphasized sound power spectrum and the estimated noise power spectrum, and outputs it to the temporary output SNR calculation unit 680 and the suppression coefficient lower-limit value calculation unit 6512. As one example of the sound existence probability, a ratio of the temporary emphasized sound power spectrum and the estimated noise power spectrum can be employed. The sound existence probability is high when this ratio is large, and the sound existence probability is low when this ratio is small. The temporary output SNR calculation unit 680 obtains a temporary output SNR DL(m) from the temporary output and the estimated noise power spectrum by employing the sound existence probability VL, and outputs it to the suppression coefficient lower-limit value calculation unit 6512. As one example of the temporary output SNR, a long-time output SNR, which is derived from a long-time average of the temporary output, and the estimated noise power spectrum, can be employed. The temporary output SNR calculation unit 680 updates the long-time average of the temporary output responding to magnitude of the sound existence probability VL inputted from the sound existence probability calculation unit 670.
The suppression coefficient lower-limit value calculation unit 6512 calculates the lower-limit value of the suppression coefficient from the temporary output SNR DL(m) and the sound existence probability VL, and outputs it to the maximum value selection unit 6511. A lower-limit value A(VL,DL(m)) of the suppression coefficient can be expressed based upon the following equation by employing a function A(DL(m)) and a suppression coefficient minimum-value fs corresponding to a sound section.
A(VL,DL(m))=fs·VL+(1−VL)·A(DL(m)) [Numerical equation 14]
The function A(DL(m)), basically, has a shape such that for a large SNR, a small value is yielded. The fact that A(DL(m)) is a function assuming such a shape responding to the temporary output SNR DL(m) means that the higher the temporary output SNR is, the smaller the lower-limit value of the suppression coefficient corresponding to a non-sound section becomes. This, which corresponds to a decrease in residual noise, has an effect of reducing a discontinuity of the sound quality between the sound section and the non-sound section. Additionally, The function A(DL(m)) may differ for each of all frequency components, and the common function A(DL(m)) may be employed for a plurality of the frequency components. Further, it is also possible that the shape changes with a lapse of the time.
The maximum value selection unit 6511 compares the suppression coefficient CL(m)-bar inputted from the noise suppression coefficient calculation unit 630 with the lower-limit value of the suppression coefficient inputted from the suppression coefficient lower-limit value calculation unit 6512, and outputs the larger value as the amended suppression coefficient CL(m). This process can be expressed with the following equation.
That is, fs becomes a suppression coefficient minimum value when the section is completely considered as a sound section, and the value, which is decided responding to the temporary output DL(m) with a monotone decrease function, becomes a suppression coefficient minimum value when the section is completely considered as a non-sound section. In a situation where the section is considered to be an in-between section of both, these values are adequately mixed. Owing to the monotone decrease of A(DL(m)), the large suppression coefficient minimum value at the time of the low SNR is guaranteed. With this, the continuity from the just-before sound section in which a lot of the not-deleted noise still survives is maintained. The control is taken in the high SNR so that the suppression coefficient minimum value is made small, and the residual noise is made small. The reason is that the continuity is maintained also when the residual noise of the non-sound section is small because the residual noise of the sound section is negligibly small. Further, setting fs so that it is larger than A(DL(m)) allows a level of the noise suppression to be alleviated in the case of the sound section, or in the case that a possibility that the section is a sound section is high, thereby enabling a distortion occurring in the sound to be reduced. This is particularly effective in the case that the precision at which the noise is estimated cannot raised sufficiently in the sound in which a distortion caused by coding/decoding has been mixed.
Above, the explanation of the second configuration of the noise suppression information calculation unit 9 is finished.
Returning to
|
As another method of calculating the emphasized sound power spectrum, there also exists the method of calculating the emphasized sound power spectrum by employing the noise suppression information of a plurality of the processing frames. For example, upon performing an interpolation by employing noise suppression information CL−1(m) of the one-before processing frame, the following equation is yielded.
Employing the noise suppression information interpolated in such a manner makes it possible to reduce a feeling of discontinuousness in the adjacent of a boundary of the processing frame, and to realize the high-quality noise suppression. Further, the above-mentioned method may be employed after performing the smoothing for the noise suppression information of a plurality of the processing frames in advance. In this case, a drastic change in the noise suppression information can be avoided, and the high-quality noise suppression can be realized. Besides, the emphasized sound power spectrum may be calculated after interpolating the noise suppression information in the frequency direction in advance. Further, the noise suppression information for which the smoothing has been performed in both of the time direction and the frequency direction may be applied for the degraded sound power spectrum.
The inverse conversion unit 6 multiplies an emphasized sound amplitude spectrum |Xn(k)|-bar obtained by employing the emphasized sound power spectrum |Xn(k)|2-bar being inputted from the noise suppression processing unit 10 by the phase arg Yn(k) inputted from the conversion unit 5, and obtains an emphasized sound spectrum Xn(k)-bar. That is, the following is executed.
n(k)=|
The inverse conversion unit 6 subjects the obtained emphasized sound spectrum Xn(k)-bar to an inverse frequency conversion, and generates a time region signal. At this time, as an inverse frequency conversion that the inverse conversion unit 6 applies, the inverse conversion corresponding to the frequency conversion that the conversion unit 5 applies is preferably selected. When the conversion unit 5 performs the weighting with a window function W, it multiplies the signal subjected to the inverse frequency conversion by the window function W. When the conversion unit 5 is configured of the band-division filter bank, the inverse conversion unit 6 is configured of a band-composition filter bank. The technology relating to the band-composition filter bank and its design method is disclosed in the Non-patent document 3. The time region signal subjected to the inverse frequency conversion is outputted to the converted frame composition unit 3.
The converted frame composition unit 3 composes the inputted time region signals subjected to the inverse frequency converted, which has been divided into the converted frame lengths, and outputs the emphasized sound signal sample to the output terminal 4.
It is possible to realize the high-quality noise suppression, to reduce the number of times at which the noise suppression information is calculated, and to reduce the arithmetic quality because the noise suppression information is calculated with the processing frame having the converted frames integrated therein while the short converted frame length capable of following a change in the input signal is employed. In addition, adaptably deciding the processing frame responding to the input signal enables the high-quality noise suppression to be realized with a low arithmetic quantity.
Above, the explanation of the best mode of the present invention is finished.
Continuously, a second embodiment of the present invention will be explained in details by making a reference to
The second embodiment of the present invention, upon comparing
A first configuration example of the noise suppression information calculation unit 11 being included in
A configuration of the noise estimation unit 301 being included in
Cnt(L+1)=Cnt(L)+(nL+1−nL) [Numerical equation 19]
Thus, as a rule, when the update determination unit 400 of the estimated noise calculation unit 310 compares the count value of the counter 331 with the threshold, the value of the threshold storage unit 4003 of
With the foregoing configuration, the decided time can be accurately determined, and the noise estimation having a high standard of quality can be realized even though the processing frame length differs processing frame by processing frame.
A second configuration example of the noise suppression information calculation unit 11 will be explained in details by making a reference to
While the operation of the counter 331 was explained as an example of taking a control by employing the processing frame length in this embodiment, the operation is applicable the other parts as well. For example, it is also possible to employ only the weighted degraded sound power spectrum of the processing frame being included in the past time decided by the above processing frame, out of the weighted degraded power spectra saved in the shift register 440 of the estimated noise calculation unit 310, at the time of calculating the estimated noise power spectrum, and to define an average of these as an estimated noise power spectrum. With such a configuration, the estimated noise can be calculated by employing the signal within a constant time irrespectively of size of the processing frame length, whereby the noise estimation having a high standard of quality can be realized.
Above, the explanation of the second embodiment of the present invention is finished.
Continuously, a third embodiment of the present invention will be explained in details by making a reference to
The third embodiment of the present invention, upon comparing
A first configuration example of the processing frame information generation unit 14 of
A second configuration example of the processing frame information generation unit 14 of
This processing frame information generation unit 14, upon making a comparison with the processing frame information generation unit 7 of
Constituting the processing frame information generation unit 14 in such a manner makes it possible to decide the maximum value of the number of the processing frames within a constant time. Thus, the number of times at which the noise suppression information is calculated can be controlled and the arithmetic quantity can be reduced.
Above, the explanation of the third embodiment of the present invention is finished.
Continuously, a fourth embodiment of the present invention will be explained in details by making a reference to
The fourth embodiment of the present invention, upon comparing
A configuration example of the processing frame information generation unit 12 of
Upon defining the maximum value inputted into the processing frame information generation unit 12 as LM, a number TN of the processing frames that the time group generation unit 55 generates is expressed as TN=f(LM) by employing a function f. Herein, as an example of the function f, the maximum value may be defined as a positive maximum integer that does not exceed a square root of LM. Besides, the maximum value may be defined as a maximum integer that does not exceed the value obtained by dividing the maximum value LM by a constant. The time group generation unit 55 integrates the converted frames, and decides the delimiter position of the processing frame so that the number of the processing frames is TN. As a method of deciding the delimiter position of the processing frame, there exists the method of deciding the delimiter position of the processing frame based upon a change quantity of the converted frame energy E(n) as already explained by making a reference to
The frequency group generation unit 56 integrates a plurality of the frequency bands in each processing frame, decides the delimiter position of the integrated frequency band, and outputs the processing frame information. A maximum number FN of the integrated frequency bands in each processing frame is decided as FN=int(LM/TN). Where, int(X) is a maximum integer that does not exceed X. That is, the frequency group generation unit 56 sets the integrated frequency band so that a number ML, of the integrated frequency bands of the L-th processing frame already explained by making a reference to
Constituting the processing frame information generation unit in such a manner makes it possible to decide the maximum value of the number of the times at which the noise suppression information is calculated within a constant time, whereby the arithmetic quantity can be reduced.
Above, the explanation of the fourth embodiment of the present invention is finished.
Continuously, a fifth embodiment of the present invention will be explained in details by making a reference to
A configuration example of the processing frame information generation unit 13 will be explained in details by making a reference to
This embodiment is characterized in that the processing frame information is calculated not by analyzing the frequency-converted signal, but by analyzing the time signal. For this reason, the frequency conversion and the calculation of the processing frame information can be performed in parallel. With this, the arithmetic quantity can be reduced. In addition, employing a parallel processor etc. enables the reduction of the arithmetic quantity to be realized all the more.
Above, the explanation of the fifth embodiment of the present invention is finished.
Continuously, a sixth embodiment of the present invention will be explained in details by making a reference to
The sixth embodiment of the present invention, upon comparing
A configuration example of the processing frame information generation unit 15 will be explained in details by making a reference to
Constituting the processing frame information generation unit 15 in such a manner makes it possible to drastically reduce the arithmetic quantity for calculating the processing frame information, whereby the noise suppression is performed with a low arithmetic quantity.
Above, the explanation of the sixth embodiment of the present invention is finished.
Continuously, a seventh embodiment of the present invention will be explained in details by making a reference to
The seventh embodiment of the present invention, upon comparing
The noise suppression processing unit 16 calculates the emphasized sound power spectrum from the noise suppression information CL(m), the processing frame information, and the representative degraded sound power spectrum, and outputs it to the inverse conversion unit 6. The emphasized sound power spectrum |Xn(k)|2-bar becomes the following equation.
|
As another method of calculating the emphasized sound power spectrum, there also exists the method of calculating the emphasized sound power spectrum by employing the noise suppression information of a plurality of the processing frames. For example, upon performing an interpolation by employing noise suppression information CL−1(m) of the one-before processing frame, the following equation is yielded.
Needless to say, the interpolation may be performed from the noise suppression information of a plurality of the processing frames. Employing the noise suppression information interpolated in such a manner makes it possible to reduce a feeling of discontinuousness in the adjacent of a boundary of the processing frame, and to realize the high-quality noise suppression. Further, the above-mentioned method may be employed after performing the smoothing for the noise suppression information of a plurality of the processing frames in advance. In this case, a drastic change in the noise suppression information can be avoided, and the high-quality noise suppression can be realized. Besides, the emphasized sound power spectrum may be calculated after interpolating the noise suppression information in the frequency direction in advance. Further, the noise suppression information for which the smoothing has been performed in both of the time direction and the frequency direction may be applied for the degraded sound power spectrum.
Above, the explanation of the seventh embodiment of the present invention is finished.
Continuously, an eighth embodiment of the present invention will be explained in details by making a reference to
The eighth embodiment of the present invention is configured of a record unit 30 and a reproduction unit 31. The record unit 30, into which the input signal is inputted from the input terminal 1, calculates information for suppressing the noise of the input signal, multiplexes the input signal and the calculated information, and outputs a multiplexed signal. On the other hand, the reproduction unit 31 receives the multiplexed signal outputted by the record unit 30, suppresses the noise of the input signal being included in the multiplexed signal based upon the information for suppressing the noise being included in the multiplexed signal, and outputs it to the output terminal 4.
The record unit 30 is configured of the converted frame division unit 2, the conversion unit 5, the processing frame information generation unit 7, the representative frequency region signal generation unit 8, the noise suppression information calculation unit 9, and a multiplexing unit 32. The converted frame division unit 2, the conversion unit 5, the processing frame information generation unit 7, the representative frequency region signal generation unit 8, and the noise suppression information calculation unit 9 were already explained by making a reference to
The multiplexing unit 32 multiplexes the input signal and the processing frame information, and outputs the multiplexed signal.
The reproduction unit 31 is configured of a separation unit 33, the converted frame division unit 2, the conversion unit 5, the noise suppression processing unit 10, the inverse conversion unit 6, and the converted frame composition unit 3. The converted frame division unit 2, the conversion unit 5, the noise suppression processing unit 10, the inverse conversion unit 6, and the converted frame composition unit 3 were already explained by making a reference to
The separation unit 33 separates the inputted multiplexed signal into the input signal, the processing frame information, and the noise suppression information, outputs the input signal to the converted frame division unit 2, and outputs the processing frame information and the noise suppression information to the noise suppression processing unit 10.
Herein, the multiplexed signal may be saved in an accumulation medium temporarily so as to take out the multiplexed signal from the accumulation medium at the time of reproduction. Further, it is not that the input signal is multiplexed as it stands, but that the input signal may be encoded to multiplex the information-compressed data. In this case, the reproduction unit 31 is provided with a decoding unit, being a function of decoding the input signal that is opposite to that of the record unit 30. Likewise, it is apparent that the processing frame information and the noise suppression information can be encoded.
While, herein, the explanation was made on the assumption that the record unit 30 and the reproduction unit 31 existed in an identical terminal, each of the record unit 30 and the reproduction unit 31 may exist in a different terminal. In this case, the multiplexed signal, being an output of the record unit 30, may be outputted to the reproduction unit 31 existing in another terminal through a transmission path etc. Further, the multiplexed signal may be preserved in the accumulation medium to input it into the reproduction unit 31 existing in another terminal.
Making a configuration in such a manner makes it possible to reduce the arithmetic quantity because the noise suppression information does not need to be calculated at the moment of reproducing the recorded signal.
Above, the explanation of the eighth embodiment of the present invention is finished.
Continuously, a ninth embodiment of the present invention will be explained in details by making a reference to
The ninth embodiment of the present invention is provided with a computer 1000 that operates under a program control. The computer 1000, which performs the process relating to any of the foregoing best mode and second embodiment to eighth embodiment of the present invention for the input signal received from the input terminal 1, operates based upon a program for outputting the emphasized sound to the output terminal 4.
Above, the explanation of the ninth embodiment of the present invention is finished.
While all of the embodiments were explained so far on the assumption that the minimum mean-square error short-time spectral amplitude technique was employed as a technique of suppressing the noise, the other methods as well are applicable. As an example of such a method, there exist the Wiener filtering method disclosed in Non-patent document 5 (PROCEEDINGS OF THE IEEE, Vol. 67. No. 12, pp. 1586 to 1604, December, 1979) and the spectrum subtraction method disclosed in Non-patent document 6 (IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, Vol. 27. No. 2, pp. 113 to 120, April, 1979), and explanation of these detailed configuration examples is omitted.
While the embodiments were explained above, examples of the present invention will be described below.
The 1st embodiment of the present invention is characterized in that a noise suppression device, comprising: a conversion means for converting an input signal into a frequency region signal for each decided first frame; a frame generation means for generating a second frame so that it differs from said first frame; a representative frequency region signal generation means for generating a representative frequency region signal from said frequency region signal of the first frame being included in said second frame; and a noise suppression degree calculation means for obtaining a degree of noise suppression of said second frame based upon said representative frequency region signal.
Furthermore, the 2nd embodiment of the present invention is characterized in that, in the above-mentioned embodiment, said frame generation means generates the second frame of which a frame length is longer than that of said first frame.
Furthermore, the 3rd embodiment of the present invention is characterized in that, in the above-mentioned embodiments, said frame generation means generates said second frame so that said second frame partners are made independent of each other.
Furthermore, the 4th embodiment of the present invention is characterized in that, in the above-mentioned embodiments, said noise suppression degree calculation means applies said degree of the noise suppression for said frequency region signal being included in said second frame, thereby to suppress noise.
Furthermore, the 5th embodiment of the present invention is characterized in that, in the above-mentioned embodiments, said noise suppression degree calculation means applies a degree of the noise suppression calculated by interpolating said degree of the noise suppression of the other second frames for said frequency region signal being included in said second frame, thereby to suppress noise.
Furthermore, the 6th embodiment of the present invention is characterized in that, in the above-mentioned embodiments, said frame generation means generates the second frame based upon a feature of said frequency region signal.
Furthermore, the 7th embodiment of the present invention is characterized in that, in the above-mentioned embodiments, said feature of the frequency region signal is a change in an energy of said input signal.
Furthermore, the 8th embodiment of the present invention is characterized in that, in the above-mentioned embodiments, the noise suppression device comprising a frequency delimiter position generation means for generating a delimiter position in a frequency direction for each said second frame, and said representative frequency region signal generation means generates said representative frequency region signal from said frequency region signal based upon said second frame and said delimiter position in the frequency direction.
Furthermore, the 9th embodiment of the present invention is characterized in that, in the above-mentioned embodiments, said frame generation means generates said second frame so that the number of the second frames in a constant block is within a range of a pre-decided number.
Furthermore, the 10th embodiment of the present invention is characterized in that, in the above-mentioned embodiments, said frame generation means obtains said second frame and said delimiter position in the frequency direction so that the number of times at which said degree of the noise suppression is calculated in a constant block is within a range of a pre-decided number of times.
Furthermore, the 11th embodiment of the present invention is characterized in that, in of the above-mentioned embodiments, said degree of the noise suppression is expressed as a noise suppression coefficient.
Furthermore, the 12th embodiment of the present invention is characterized in that, in the above-mentioned embodiments, said degree of the noise suppression is expressed as an estimated value of the noise.
The 13th embodiment of the present invention is characterized in that a noise suppression method comprising: a conversion step of converting an input signal into a frequency region signal for each decided first frame; a frame generation step of generating a second frame so that it differs from said first frame; a representative frequency region signal generation step of generating a representative frequency region signal from said frequency region signal of the first frame being included in said second frame; and a noise suppression degree calculation step of obtaining a degree of noise suppression of said second frame based upon said representative frequency region signal.
Furthermore, the 14th embodiment of the present invention is characterized in that, in the above-mentioned embodiments, said frame generation step generates said second frame of which a frame length is longer than that of said first frame.
Furthermore, the 15th embodiment of the present invention is characterized in that, in the above-mentioned embodiments, said frame generation step generates said second frame so that said second frame partners are made independent of each other.
Furthermore, the 16th embodiment of the present invention is characterized in that, in the above-mentioned embodiments, said noise suppression degree calculation steps applies said degree of the noise suppression for said frequency region signal being included in said second frame, thereby to suppress noise.
Furthermore, the 17th embodiment of the present invention is characterized in that, in the above-mentioned embodiments, said noise suppression degree calculation steps applies a degree of the noise suppression calculated by interpolating said degree of the noise suppression of the other second frames for said frequency region signal being included in said second frame, thereby to suppress noise.
Furthermore, the 18th embodiment of the present invention is characterized in that, in the above-mentioned embodiments, said frame generation step generates said second frame based upon a feature of said frequency region signal.
Furthermore, the 19th embodiment of the present invention is characterized in that, in the above-mentioned embodiments, said feature of the frequency region signal is a change in an energy of said input signal.
Furthermore, the 20th embodiment of the present invention is characterized in that, in the above-mentioned embodiments, the a noise suppression method comprising a frequency delimiter position generation step of generating a delimiter position in a frequency direction for each said second frame, and said representative frequency region signal generation step generates the representative frequency region signal from said frequency region signal based upon said second frame and said delimiter position in the frequency direction.
Furthermore, the 21st embodiment of the present invention is characterized in that, in the above-mentioned embodiments, said frame generation step generates said second frame so that the number of said second frames in a constant block is within a range of a pre-decided number.
Furthermore, the 22nd embodiment of the present invention is characterized in that, in the above-mentioned embodiments, said frame generation step generates said second frame and said delimiter position in the frequency direction so that the number of times at which said degree of the noise suppression is calculated in a constant block is within a range of a pre-decided number of times.
Furthermore, the 23rd embodiment of the present invention is characterized in that, in the above-mentioned embodiments, in said noise suppression degree calculation step, said degree of the noise suppression is expressed as a noise suppression coefficient.
Furthermore, the 24th embodiment of the present invention is characterized in that, in the above-mentioned embodiments, in said noise suppression degree calculation step, said degree of the noise suppression is expressed as an estimated value of the noise.
Furthermore, the 25th embodiment of the present invention is characterized in that a noise suppression program for causing a computer to execute: a conversion process of converting an input signal into a frequency region signal for each decided first frame; a frame generation process of generating a second frame so that it differs from said first frame; a representative frequency region signal generation process of generating a representative frequency region signal from said frequency region signal of the first frame being included in said second frame; and a noise suppression degree calculation process of obtaining a degree of noise suppression of said second frame based upon said representative frequency region signal.
Furthermore, the 26th embodiment of the present invention is characterized in that, in the above-mentioned embodiments, said frame generation process generates said second frame of which a frame length is longer than that of said first frame.
Furthermore, the 27th embodiment of the present invention is characterized in that, in the above-mentioned embodiments, said frame generation process generates said second frame so that said second frame partners are made independent of each other.
Furthermore, the 28th embodiment of the present invention is characterized in that, in the above-mentioned embodiments, said noise suppression degree calculation process applies said degree of the noise suppression for said frequency region signal being included in said second frame, thereby to suppress noise.
Furthermore, the 29th embodiment of the present invention is characterized in that, in the above-mentioned embodiments, said noise suppression degree calculation process applies a degree of the noise suppression calculated by interpolating said degree of the noise suppression of the other second frames for said frequency region signal being included in said second frame, thereby to suppress noise.
Furthermore, the 30th embodiment of the present invention is characterized in that, in the above-mentioned embodiments, said frame generation process generates said second frame based upon a feature of said frequency region signal.
Furthermore, the 31st embodiment of the present invention is characterized in that, in the above-mentioned embodiments, said feature of the frequency region signal is a change in an energy of said input signal.
Furthermore, the 32nd embodiment of the present invention is characterized in that, in the above-mentioned embodiments, the a noise suppression program comprising a frequency delimiter position generation process of generating a delimiter position in a frequency direction for each said second frame, and said representative frequency region signal generation process generates the representative frequency region signal from said frequency region signal based upon said second frame and said delimiter position in the frequency direction.
Furthermore, the 33rd embodiment of the present invention is characterized in that, in the above-mentioned embodiments, said frame generation process generates said second frame so that the number of said second frames in a constant block is within a range of a pre-decided number.
Furthermore, the 34th embodiment of the present invention is characterized in that, in the above-mentioned embodiments, said frame generation process generates said second frame and said delimiter position in the frequency direction so that the number of times at which said degree of the noise suppression is calculated in a constant block is within a range of a pre-decided number of times.
Furthermore, the 35th embodiment of the present invention is characterized in that, in the above-mentioned embodiments, in said noise suppression degree calculation process, said degree of the noise suppression is expressed as a noise suppression coefficient.
Furthermore, the 36th embodiment of the present invention is characterized in that, in the above-mentioned embodiments, in said noise suppression degree calculation process, said degree of the noise suppression is expressed as an estimated value of the noise.
Above, while the present invention has been described with respect to the preferred embodiments and examples, the present invention is not always limited to the above-mentioned embodiment and examples, and alterations to, variations of, and equivalent to these embodiments and the examples can be implemented without departing from the spirit and scope of the present invention.
This application is based upon and claims the benefit of priority from Japanese patent application No. 2007-243001, filed on Sep. 19, 2007, the disclosure of which is incorporated herein in its entirety by reference.
Number | Date | Country | Kind |
---|---|---|---|
JP 2007-243001 | Sep 2007 | JP | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP2008/066871 | 9/18/2008 | WO | 00 | 3/18/2010 |