This invention relates to a noise suppression system and, more particularly, to a noise suppression system, a noise suppression method and a noise suppression program, which are suited for suppressing noise component in speech recognition.
The conventional noise suppression technique for speech recognition may roughly be classified into the following two types.
(a) The noise component is subtracted from an input signal using a signal processing technique.
(b) An acoustic model and a noise model are synthesized on a decoder to create a noise adapted acoustic model.
Meanwhile, in the present specification, the noise designates a signal other than the speech signal, and includes, in addition to a background noise, thought to be relatively stationary, the unexpectedly occurring noise, reverberation, echo and the speech of speaker other than a target speaker, for example.
According to Patent Document 1, the techniques (a) and (b) are classified as the technique by the front end and processing by a decoder, respectively.
A method widely used as the signal processing technique (a) is a “spectrum subtraction method (abbreviated as SS method)”.
The system of this configuration has the following advantages.
An amount of computation is small.
The system may readily be used in combination with other techniques, such as a technique of updating the noise mean spectrum.
However, if the noise mean spectrum is simply subtracted from the input signal, the residual noise in the subtraction (musical noise) is generated due to variance components of the noise or to the phase difference between the speech and the noise. Such residual noise may give rise to recognition error.
Thus, in the SS method, it is necessary to carry out flooring by way of processing for burying the information in the valley of the speech. In case the flooring level is increased, the residual noise, generated in the subtraction process, may be suppressed, however, the performance may be degraded because the information in the valley of the speech has been buried.
In Patent Document 1, Non-Patent publication 2 and in Non-Patent publication 6, there is disclosed a technique of calculating a noise reducing filter using a smoothed a priori SNR (estimate speech divided by the noise mean spectrum).
Referring to
If smoothing is carried out thoroughly, the residual noise in the subtraction may be suppressed, however, there persist problems such as
That is, the signal processing technique suffers from the following problem:
It is therefore difficult to make universal use of the signal processing technique.
Turning to the technique of (b) for adapting the acoustic model to the noise, there is widely known the “Parallel Model Combination (PMC) Method” disclosed in Non-Patent Document 3.
This technique uses a unit for formulating a noise model, an acoustic model HMM, learned in advance in a noise-free environment, a unit for transforming the noise model to a linear spectrum, and a unit for transforming the acoustic model HMM to linear spectrum. The technique also uses a unit for adding the noise model, transformed into the linear spectrum, and the acoustic model HMM, also transformed into the linear spectrum, to formulate a noise adapted acoustic model HMM, and a unit for transforming the so formulated noise adapted model to cepstrum.
The system of this configuration has the following advantages.
That is, since the acoustic model HMM has been adapted to the noise, recognition may be achieved without dependency on the sort of the noise or on the SNR.
However, there persist the following problems.
The computation for formulating the noise adapted acoustic model NMM is extremely costly.
It is not that easy to use the technique in combination with other techniques, such as the technique for updating the noise mean spectrum.
As a method for adapting not the acoustic model but reference pattern GMM (Gaussian Mixture Model) of the speech to the noise, the “method for speech signal estimation by GMM” has been proposed in Non-Patent Document 4.
Referring to
The system, configured as described above, has the following merit.
That is, the system is able to perform speech recognition with high stability by replacing the operation of subtracting the noise component, which has been of a problem in the above-described signal processing technique, by the operation of finding the expected value of the variance G between the reference pattern and the noise adaptive patterns.
Similarly to the PMC method, the system, having the above configuration, suffers from the following problem.
The computation for formulating the noise adaptive acoustic model NMM is extremely costly.
It is not that easy to use the system in combination with other techniques, such as the technique of updating the noise mean spectrum.
As described above, the conventional systems suffer from the following problems.
The first problem is that, with the signal processing technique, flooring or smoothing has to be carried out, such that dropout of the information of the original speech may be produced from time to time. The reason is that, under a highly noisy environment, variance of the noise or the effect of the phase difference between the speech and the noise may hardly be disregarded, such that residual noise may be generated in subtracting the noise mean spectrum from the input speech.
The second problem is that, with the signal processing technique, parameter tuning becomes necessary depending on the sort of the noise or on the SNR. The reason is that a parameter for reducing information dropout to a minimum while suppressing the residual noise may be found out only empirically.
The third problem is that, with the technique of adapting the acoustic model or the reference pattern to the noise, it is difficult to combine a method for updating the noise mean spectrum to the time varying noise to adapt the acoustic model or the reference pattern to the noise from frame to frame. The reason is that it is necessary to carry out calculation at a high cost for adapting the acoustic model or the reference pattern to the noise.
Accordingly, it is an object of the present invention to provide a system, a method and a computer program product with which it is possible to remove noise components to high accuracy without causing dropout of the speech information.
It is another object of the present invention to provide a system, a method and a computer program product for noise suppression in which the number of tuning parameters may be reduced and which are not sensitive to the values of the tuning parameters.
It is yet another object of the present invention to provide a system, a method and a computer program product for noise suppression in which computation cost may be reduced and in which time variations of the noise may be followed easily.
The above and other objects are attained by the invention summarized substantially as follows:
A first system according to the present invention includes means for calculating a noise mean spectrum from an input signal, means for deriving the provisional estimate speech in a spectral domain from the input signal and the noise mean spectrum, and means for correcting the provisional estimate speech using reference pattern of the speech stored in a storage unit.
A first noise suppressing method according to the present invention includes the steps of:
calculating a noise mean spectrum from an input signal;
deriving the provisional estimate speech in a spectral domain from the input signal and the noise mean spectrum; and
correcting the provisional estimate speech using reference pattern of the speech.
A first computer program according to the present invention includes the program for causing a computer, receiving an input signal for suppressing the noise for estimating the speech, to execute the processing of calculating the noise mean spectrum from the input signal, the processing of deriving the provisional estimate speech in a spectral domain from the input signal and from the noise mean spectrum, and the processing of correcting the provisional estimate speech using the reference pattern of the speech.
With this configuration, the residual noise, produced by subtraction, may be corrected, on the basis of the reference pattern, so that the first object of the present invention may be achieved.
Moreover, certain inaccuracies of the provisional estimate noise may be tolerated, so that expectations may be made for processing which need not be sensitive to the tuning parameter values, and hence the second object of the present invention may be achieved.
In addition, since it is unnecessary to adapt the reference pattern to the noise, the cost for computations may be reduced, while the noise may be followed easily, so that the third object of the present invention may be achieved.
A second noise suppressing method according to the present invention is such a method which, in the first noise suppression method, further comprises the steps of:
transforming the provisional estimate speech derived in the spectral domain, into a feature vector; and
correcting the provisional estimate speech, transformed into the feature vector, using the reference pattern in a feature vector area.
A third noise suppression method according to the present invention is such a method in which, in the first or second noise suppression method, a probability distribution is presupposed as the reference pattern, an expected value of the speech is found from the probability that the probability distribution forming the reference pattern outputs the provisional estimate speech, and from a mean value of the probability distribution forming the reference pattern, and the expected value of the speech is used as a value for correction of the provisional estimate speech.
A fourth noise suppression method according to the present invention is such a method in which, in the step of correcting the provisional estimate speech, in the first or second noise suppression method, the provisional estimate speech is corrected, using the reference pattern formed by a plurality of speech patterns, and the reference pattern, which is closest to the input speech, is selected for use as a value for correction of the provisional estimate speech, or a plurality of speech patterns, closer to the input speech, are averaged with weights variable with distances for use as a value for correction of the provisional estimate speech.
A fifth noise suppression method according to the present invention is such a method in which, in any of the first to fourth noise suppression methods, the step of correcting the provisional estimate speech includes a step of finding the standard deviation of the noise. The standard deviation of the noise, thus found, is taken into account in controlling the provisional estimate speech.
A sixth noise suppressing method according to the present invention is such a method which, in any of the first to fifth noise suppression methods, further includes a step of calculating a noise reducing filter from the value for correction of the provisional estimate speech and from the noise mean spectrum, and a step of applying filtering by the noise reducing filter to the input signal to derive an estimate speech.
A seventh noise suppression method according to the present invention is such a method in which, in the sixth noise suppression method, the noise reducing filter is calculated using the input signal in addition to using the provisional estimate speech as corrected and the noise mean spectrum.
An eighth noise suppression method according to the present invention is such a method in which, in calculating the noise reducing filter in the sixth or seventh noise suppression method, the provisional estimate speech as corrected or the a priori SNR (signal to noise ratio) obtained on dividing the corrected provisional estimate speech with the noise mean spectrum, is smoothed in at least one of the time domain, frequency domain and the domain of the number of dimensions of the feature vector.
A ninth noise suppression method according to the present invention is such a method in which, in any of the first to eighth noise suppression methods, the operation of setting the provisional estimate speech, as corrected using the reference pattern, as provisional estimate speech, and of correcting the provisional estimate speech again using the reference pattern, is carried out a plural number of times.
A tenth method according to the present invention is such a method in which, in any of the first to ninth methods, the step of calculating the noise mean spectrum from the input signal calculates the noise spectrum from at least one of the plural input signals, and the step of deriving the provisional estimate speech finds the provisional estimate speech from at least one of the plural input signals, and from the noise spectrum.
A speech recognition method according to the present invention includes a step of recognizing the noise-suppressed speech using any of the first to tenth noise suppression methods.
A second computer program according to the present invention is such a program in which, in the first program, the processing of correcting the provisional estimate speech includes the processing of transforming the provisional estimate speech derived in the spectral domain, into a feature vector, and
the processing of correcting the provisional estimate speech, transformed into the feature vector, using the reference pattern in a feature vector area.
A third computer program according to the present invention is such a program in which, in the first or second program, the processing of correcting the provisional estimate speech presupposes a probability distribution as the reference pattern, and an expected value of the speech is found from the probability that the probability distribution forming the reference pattern outputs the provisional estimate speech and from a mean value of the probability distribution forming the reference pattern. The expected value of the speech is used as a value for correction of the provisional estimate speech.
A fourth computer program according to the present invention is such a program in which, in the first or second program, the processing of correcting the provisional estimate speech, using the reference pattern made up of a plurality of speech patterns, and the reference pattern which is closest to the input speech is selected for use as a value for correction of the provisional estimate speech, or a plurality of speech patterns, closer to the input speech, are averaged with weights variable with distances, for use as a value for correction of the provisional estimate speech.
A fifth computer program according to the present invention is such a program in which, in any one of the first to fourth programs, the processing of correcting the provisional estimate speech includes the processing of finding the standard deviation of the noise and controls the correction as the standard deviation of the noise is taken in to account.
A sixth computer program according to the present invention is such a program which, in any one of the first to fifth programs, allows the computer to further execute the processing of calculating a noise reducing filter from the provisional estimate speech as corrected and from the noise mean spectrum, and the processing of applying filtering by the noise reducing filter to the input signal to derive the estimate speech.
A seventh computer program according to the present invention is such a program in which, in the sixth program, the processing of calculating the noise reducing filter calculates the noise reducing filter using the input signal in addition to using the estimate noise as corrected and the noise mean spectrum.
An eighth computer program according to the present invention is such a program in which, in the sixth or seventh program, the estimate speech as corrected or the a priori SNR, obtained on dividing the corrected estimate speech by the noise mean spectrum, is smoothed in at least one of the time domain, frequency domain and the domain of the number of dimensions of the feature vector.
A ninth computer program according to the present invention is such a program in which, in any one of the first to eighth programs, the processing of setting the estimate speech, which has been obtained by correcting the provisional estimate speech the using the reference pattern, as a provisional estimate value, and correcting the provisional estimate value again using the reference pattern, is repeated a plural number of times.
A tenth computer program according to the present invention is such a program in which, in any one of the first to ninth programs, the processing of calculating a noise mean spectrum calculates the spectrum of the noise from at least one of a plurality of input signals, and the processing of deriving the provisional estimate speech from the input signal and from the noise mean spectrum finds the provisional estimate speech from at least one of the input signals and from the noise spectrum.
An eleventh computer program according to the present invention allows a computer, making up a speech recognition apparatus, to receive a noise-suppressed speech signal to execute speech recognition, by any one of the first to tenth programs.
The meritorious effects of the present invention are summarized as follows.
According to the present invention, the residual noise of the provisional estimate noise may properly be corrected using the knowledge of the reference pattern.
According to the present invention, the provisional estimate noise may be inaccurate, to a more or less extent, and hence there may be expected processing which is not particularly sensitive to the values of the tuning parameters.
According to the present invention, there is no necessity for adapting the reference pattern to the noise, and hence the costs for calculations may be reduced, while the noise may be followed readily.
Still other features and advantages of the present invention will become readily apparent to those skilled in this art from the following detailed description in conjunction with the accompanying drawings wherein only the preferred embodiments of the invention are shown and described, simply by way of illustration of the best mode contemplated of carrying out this invention. As will be realized, the invention is capable of other and different embodiments, and its several details are capable of modifications in various obvious respects, all without departing from the invention. Accordingly, the drawing and description are to be regarded as illustrative in nature, and not as restrictive.
Referring to the drawings, the present invention will now be described in further detail.
Let the input signal spectrum X be expressed as X(f, t).
It is noted that f stands for the frequency filter bank number (f=1, . . . , Lf, where Lf is the number of the frequency filter banks) and t stands for the frame numbers (t=1, 2, . . . ). The input signal spectrum X(f, t) is obtained by executing short-time frame based spectrum analysis of the speech information acquired in the input signal acquisition unit 1, for example, by a microphone.
The noise mean spectrum calculation unit 2 calculates the noise mean spectrum N (f, t) from the input signal spectrum X(f, t) (step S1).
In calculating the noise mean spectrum N (f, t), any of the following techniques, for example, may be used.
The provisional estimate speech calculation unit 3 then calculates a provisional estimate noise S′ (f, t), by known techniques, such as
If the SS method is used, the provisional estimate noise S′ (f, t) may be calculated as follows:
S′ (f,t)=max(X(f,t)−N(f,t),αN(f,t)) (1).
where α is a flooring parameter.
In the present embodiment, it is assumed that the reference pattern 4 includes the reference pattern of speech, obtained on learning in advance in a noise-free environment, although this is not to be restrictive. Or, the reference pattern 4 may include the reference pattern of the speech, obtained on learning under a known noise. As for details of the learning method for learning the reference pattern, reference is made to, for example, the disclosure of the Non-Patent Document 7. In this Non-Patent Document 7, there are stated EM (Expectation-Maximum) algorithms for the GMM (Gaussian Mixed Model) and the algorithm of the HMM.
In the present embodiment, it is assumed that the reference pattern 4 hold the pattern of the speech in the form of a cepstrum GMM, for example. However, the reference pattern held may, of course, be any other suitable features, such as log spectrum GMM, linear spectrum GMM or LPC (Linear Prediction Coding) cepstrum GMM. It is also possible to use the probability distribution other than the mixed Gaussian distribution.
The provisional estimate speech correction unit 5 corrects the provisional estimate speech S′ (f, t), as calculated by the provisional estimate speech calculation unit 3, using the reference pattern 4 (step S3).
A more specific example of the above-described correcting method will now be described.
First, the a posteriori probability of the provisional estimate speech for the k-th Gaussian distribution is determined as follows:
P(k|S′(f,t))=W(k)p(S′(f,t)|μs(k),σs(k))/ΣkW(k)p(S′(f,t)|μs(k),σs(k)) (2).
where k is a suffix of the Gaussian distribution as the GMM element (k=1, . . . K, K being a number of the mixture),
W(k) is the weight of the k-th Gaussian distribution, and
p(S′|μs(k), σs(k)) is the probability with which the Gaussian distribution having the mean value μs(k) and the variance σs(k) outputs the estimate speech S′.
In the present embodiment, the provisional estimate speech S′ which is transformed into the form of a cepstrum which conforms to the form of the speech pattern held in the reference pattern 4.
Of course, if the form of the speech pattern, held in the reference pattern 4, is changed, the form of the provisional estimate speech S′ is changed.
Then, using the above a posteriori probability, an expected value of the speech
<S(f,t)>=Σkμs(k)P(k|S′(f,t)) (3)
is found and output as being a value for correction of the provisional estimate speech S′.
<S(f, t)> is an estimate value of the speech which is an input signal from which the noise has been removed.
The meritorious effect of the present invention will now be described.
In the present embodiment, the provisional estimate speech is corrected, using the reference pattern for the speech. Hence, the distortion of the estimate speech, produced by
It is seen from above that, with the present embodiment, the problem of the conventional signal processing technique may be solved.
In the present embodiment, the estimate speech is corrected by the reference speech pattern. Hence, the margin of the tuning parameter, such as a flooring parameter, determined by the equation (1), is enlarged so that the tuning parameter may be incorrect to a more or less extent.
Moreover, in the present embodiment, in which it is unnecessary to adapt the reference pattern to the noise, computation cost is reduced, and hence an algorithm for estimating the time-varying noise may be used for the noise mean spectrum calculation unit 2. Thus, the noise tracking may be made easy.
In the first embodiment, at least one of units 1, 2, 3 and 5 may be implemented by a computer program, which may be recorded in a medium and loaded on a computer constituting a noise suppression system to cause the computer to execute the function/processing of the associated unit.
A second embodiment of the present invention will now be described with reference to the drawings.
A more specific example of the above correction will be described below. Initially, the distances between the provisional estimate speech S′ (f, t) and the reference pattern composed by plural speech patterns (for example, the mean values of the speech patterns) are compared. Here, the above distances between the speech and the reference pattern are compared in the form of the log spectrum. The distances between the speech and the reference pattern may also be compared in other forms, such as in the form of the cepstrum.
d(k)=Σf(S′(f,t)−μs(k)(f))2 (4)
where f is the frequency filter bank number (f=1, . . . , Lf, Lf being the number of the frequency filter banks), k=1, . . . K, K being the number of the reference patterns and μs(k) is a mean value of the patterns k of the speech forming the reference pattern.
If the provisional estimate noise S′ (f, t) is in some other form, f becomes some other suffix.
Then, such k which will minimize the distance between the provisional estimate noise S′ (f, t) and the reference speech pattern is selected and the corresponding value of S′(f, t) is replaced by a corresponding reference pattern which is to be used as a correction value. Or, a plural number of k's, which will give smaller values of the distance, are selected, and the corresponding values of S′(f, t) are averaged with weights depending on the distances. The resulting averaged value is then used as a correction value. Meanwhile, the distances need not be limited to squares of the distances, such that other optional forms of the distances, such as absolute values, may also be used.
In the second embodiment, the computation cost may be reduced.
In the second embodiment, at least one of units 1, 2, 3 and 5a may be implemented by a computer program, which may be recorded in a medium and loaded on a computer constituting a noise suppression system to cause the computer to execute the function/processing of the associated unit.
A third embodiment of the present invention will now be described.
Moreover, the provisional estimate speech calculation unit 3 of
The points of difference of the operation of the present embodiment from that of the first embodiment will now be described.
The noise mean spectrum/standard deviation calculation unit 2a calculates the noise mean spectrum N(f, t), from the input signal spectrum X(f, t), using a technique similar to that used by the noise mean spectrum calculation unit 2. In addition, the noise mean spectrum/standard deviation calculation unit calculates the standard deviation of the noise V(f, t).
The standard deviation of the noise V(f, t) may be calculated by known methods, such as by
evaluating the deviation between beginning tens of frames of the input signal spectrum X(f, t) and the noise mean spectrum N(f, t), or
finding the speech section and the non-speech section and finding the standard deviation of the input signal spectrum X(f, t) in the non-speech section, to use the standard deviation of the input signal spectrum X(f, t) thus found out as the standard deviation V(f, t) of the noise.
The provisional estimate speech/reliability calculation unit 3a finds the provisional estimate speech S′ (f, t), using a technique similar to that used by the provisional estimate speech calculation unit 3 of
Specifically, as the reliability of S′ (f, t),
The provisional estimate speech correction unit 5b, which uses the reference pattern, corrects the provisional estimate speech S′ (f, t), calculated by the provisional estimate speech/reliability calculation unit 3a, using the reference pattern 4.
At this time, the range of correction is limited, using the reliability of the provisional estimate speech S′ (f, t), as calculated by the provisional estimate speech/reliability calculation unit 3a.
Specifically, when the value of the provisional estimate speech <S>, as corrected using the reference pattern, is within a range between the provisional estimate speech S′ (f, t) plus the standard deviation of the noise V(f, t) and the provisional estimate speech S′ (f, t) minus the standard deviation of the noise V(f, t), that is, in case
S′(f,t)−V(f,t)≦S(f,t)≦S′(f,t)+V(f,t) (6)
the provisional estimate speech S′ (f, t) is replaced by a correction value <S> and, if otherwise, no such replacement is made.
The meritorious effect of the present embodiment will now be described.
In the present embodiment, in which the reliability which is based on the standard deviation of the noise is taken into account in the correction of the provisional estimate speech, it is possible to suppress any marked deviation of the correction by the reference pattern.
In the third embodiment, at least one of units 1, 2a, 3a and 5b may be implemented by a computer program, which may be recorded in a medium and loaded on a computer constituting a noise suppression system to cause the computer to execute the function/processing of the associated unit.
A fourth embodiment of the present invention will now be described with reference to the drawings.
The operation of the present embodiment will now be described in detail.
The noise reducing filter calculation unit 6 calculates a noise reducing filter from the provisional estimate speech <S(f, t)>, as corrected by the provisional estimate speech correction unit 5, employing the reference pattern, and from the noise mean spectrum N(f, t), as calculated by the noise mean spectrum calculation unit 2.
More specifically, the corrected provisional estimate speech <S(f, t)> is transformed into a linear spectrum to derive the a priori SNR η (f, t) which is given as follows:
η(f,t)=<S(f,t)>/N(f,t) (7).
The above a priori SNR η(f, t) may also be found by smoothing, as explained below, using the priori SNR η(f, t−1) of the directly previous frame:
η(f,t)=β×η(f,t−1)+(1−β)×(S(f,t)>/N(f,t) (8)
where β (0≦β≦1) is a parameter for controlling the smoothing.
A noise reducing filter W(f, t) is calculated by
W(f,t)=η(f,t)/(1+η(f,t)) (9).
Finally, the estimate speech calculation unit 7, calculating the estimate speech, calculates the estimate speech S(f, t), by
S(f,t)=W(f,t)×X(f,t) (10)
from the noise-reducing filter W(f, t), as calculated by the noise reducing filter calculation unit 6, and from the input signal X (f, t), as acquired from the input signal acquisition unit 1.
The meritorious effect of the present embodiment will now be described.
In the present embodiment, the a priori SNR is calculated, using the provisional estimate speech, as corrected, and the finally estimate speech is found using the constructed noise reducing filter. It is possible to avoid quantization with the finite number of speech patterns making up the reference pattern, thereby obtaining the estimate speech of high accuracy.
In the fourth embodiment, at least one of units 1, 2, 3, 5, 6 and 7 may be implemented by a computer program, which may be recorded in a medium and loaded on a computer constituting a noise suppression system to cause the computer to execute the function/processing of the associated unit.
The operation of the present embodiment, differing from that of the fourth embodiment will now be described.
In the present embodiment, the noise reducing filter calculation unit 6a derives the a posteriori SNR γ(f, t), from the input signal spectrum X(f, t) and from the noise mean spectrum N(f, t), as follows:
γ(f,t)=X(f,t)/N(f,t) 11)
in addition to finding the a priori SNR η(f, t), using the technique similar to that used in the noise reducing filter calculation unit 6.
As a noise reducing filter W(f, t), the combination of the a priori SNR η(f, t) and the a posteriori SNR γ(f, t), such as the MMSE (minimum mean square error) filter, disclosed in Non-Patent Document 2, is used.
In the fifth embodiment, at least one of units 1, 2, 3, 5, 6a and 7 may be implemented by a computer program, which may be recorded in a medium and loaded on a computer constituting a noise suppression system to cause the computer to execute the function/processing of the associated unit.
This condition may, for example, be decision means, such as
The meritorious effect of the present embodiment will now be explained.
In the present embodiment, a true value can be asymptotically approached by repeatedly carrying out processing, whereby an estimate speech of high accuracy may be produced.
In the sixth embodiment, at least one of units 1, 2, 3, 5 and 8 may be implemented by a computer program, which may be recorded in a medium and loaded on a computer constituting a noise suppression system to cause the computer to execute the function/processing of the associated unit.
The meritorious effect of the present embodiment may be depicted as follows:
In the seventh embodiment, in which plural input signals are provided, the provisional estimate speech and the noise spectrum may be improved in accuracy to produce the estimate speech in high accuracy.
In the seventh embodiment, at least one of units 1, 2b, 3b and 5 may be implemented by a computer program, which may be recorded in a medium and loaded on a computer constituting a noise suppression system to cause the computer to execute the function/processing of the associated unit.
The above-described first to seventh embodiments may be combined together.
In the seventh embodiment, at least one of units 1, 12 and 13 may be implemented by a computer program, which may be recorded in a medium and loaded on a computer constituting a speech recognition system to cause the computer to execute the function/processing of the associated unit.
The meritorious effect of the present embodiment may be depicted as follows:
With the present embodiment, it is possible to construct a recognition system of a high recognition rate even under highly noisy environments.
The configuration of the present invention may be adapted for an application where noise components in a noisy environment are removed to take out only the targeted speech components. The present invention may also be put to a use for speech recognition under noisy environment.
It should be noted that other objects, features and aspects of the present invention will become apparent in the entire disclosure and that modifications may be done without departing the gist and scope of the present invention as disclosed herein and claimed as appended herewith.
Also it should be noted that any combination of the disclosed and/or claimed elements, matters and/or items may fall under the modifications aforementioned.
Number | Date | Country | Kind |
---|---|---|---|
2005-217694 | Jul 2005 | JP | national |
Number | Name | Date | Kind |
---|---|---|---|
5359695 | Ohora et al. | Oct 1994 | A |
5390280 | Kato et al. | Feb 1995 | A |
5577161 | Pelaez Ferrigno | Nov 1996 | A |
5655057 | Takagi | Aug 1997 | A |
5749068 | Suzuki | May 1998 | A |
5943429 | Handel | Aug 1999 | A |
6415253 | Johnson | Jul 2002 | B1 |
6591234 | Chandran et al. | Jul 2003 | B1 |
6643619 | Linhard et al. | Nov 2003 | B1 |
6910011 | Zakarauskas | Jun 2005 | B1 |
7231347 | Zakarauskas | Jun 2007 | B2 |
7266494 | Droppo et al. | Sep 2007 | B2 |
7359857 | Mahe et al. | Apr 2008 | B2 |
7453963 | Joublin et al. | Nov 2008 | B2 |
7483831 | Rankovic | Jan 2009 | B2 |
7584097 | Yao | Sep 2009 | B2 |
7590529 | Zhang et al. | Sep 2009 | B2 |
20020116177 | Bu et al. | Aug 2002 | A1 |
20030177007 | Kanazawa et al. | Sep 2003 | A1 |
20030225577 | Deng et al. | Dec 2003 | A1 |
20040002858 | Attias et al. | Jan 2004 | A1 |
20040064307 | Scalart et al. | Apr 2004 | A1 |
20040172241 | Mahe et al. | Sep 2004 | A1 |
20040230428 | Choi | Nov 2004 | A1 |
20050119882 | Bou-Ghazale | Jun 2005 | A1 |
20050143989 | Jelinek | Jun 2005 | A1 |
20060136203 | Ichikawa | Jun 2006 | A1 |
20060271362 | Katou et al. | Nov 2006 | A1 |
20070027685 | Arakawa et al. | Feb 2007 | A1 |
20070055505 | Doclo et al. | Mar 2007 | A1 |
20070106504 | Deng et al. | May 2007 | A1 |
Number | Date | Country |
---|---|---|
7-191689 | Jul 1995 | JP |
11-327593 | Nov 1999 | JP |
2003-507764 | Feb 2003 | JP |
2003-216180 | Jul 2003 | JP |
2004-520616 | Jul 2004 | JP |
2005-84653 | Mar 2005 | JP |
0113364 | Feb 2001 | WO |
Entry |
---|
S. Kamath, and P. Loizou “A Multi-Band Spectral Subtraction Method for enhancing Speech corrupted by colored Noise” in Proceedings of ICASSP, 2002. |
R. Martin, “Speech Enhancement Using MMSE Short Time Spectral Estimation with Gamma Distributed Speech Priors,” in Proc. IEEE Intl. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. I, pp. 253-256, 2002. |
Japanese Office Action dated Nov. 4, 2009 with partial English-language translation. |
Takayuki Arakawa: “Model-Based Wiener Filter for noise robust speech recognition” IEIC Technical Report, vol. 2005, No. 127, p. 151-152, Dec. 22, 2005, The Institute of Electronics, Information and Communication Engineers, Japan. |
Hiroshi Matsumoto, “Speech Recognition Techniques for Noisy Environments”, Information Science Technological Forum FIT2003, Sep. 10, 2003. |
Y. Ephraim. D. Malah, “Speech Enchancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator”, IEEE Trans. on ASSP-32, No. 6, pp. 1109-1121, Dec. 1984. |
M.J.F. Gales and S.J. Young, “Robust Continuous Speech Recognition Using Parallel Model Combination”, IEEE Trans. SAP-4, No. 5, pp. 352-359, Sep. 1996. |
J.C. Segura A. de la Torre, M.C. Benitez and A.M. Peinado “Model-Based Compensation of the Additive Noise for Continuous Speech Recogition Experiments Using Aurora II Database and Tasks”, EuroSoeech '01, vol. 1, pp. 221-224, 2001. |
Rainer Martin, “Noise Power Spectral Density Estimation Based on Optimal Smoothing and Minimum Statistics”, IEEE Trans. on Speech and Audio Processing, vol. 9, vol. 5, Jul. 2001. |
ETSI ES 202 050 VI. 1. 1. “Speech Processing, Transmission and Quality aspects (SQ); Distributed speech recognition; Advanced front-end feature extraction algorithm; Compression algorithms”, 2002. |
Guorong Xuan. Wei Zhang. Peiqi Chai. “EM Algorithms of Gaussian Mixture Model and Hiden Markov Model”, IEEE International Conference on Image Processing ICIP 2001, vol. 1, pp. 145-148, Oct. 2001. |
Number | Date | Country | |
---|---|---|---|
20070027685 A1 | Feb 2007 | US |