This invention relates generally to the field of adaptive signal processing for human speech, particularly to the use of adaptive filters for the enhancement of speech signals against background noise.
The ability of a person to understand speech is greatly limited if background noise is present. A person with normal hearing can generally comprehend noisy speech as long as the power of the noise is less than the power of the speech signal. If the power of the noise is greater than that of the speech signal, the speech will not be understood. A person with hearing impairment is much more impacted by noise than a person with normal hearing. For most people with hearing loss, the slightest noise is enough to prevent speech understanding. The purpose of the present invention is to enhance speech signals in the presence of background noise, that is to reduce the noise amplitude while retaining the speech volume and intelligibility. Applications of the present invention will be to improvements in the design of hearing aids and hearing devices for people with hearing impairment, and to speech processing and communication equipment designed to deliver clear and understandable speech from noisy speech signals.
It is an object of this invention to provide systems that reduce the noise of noisy speech signals while preserving the intelligibility of the speech. These systems take advantage of the differences that exist between human speech and additive noise. Speech is predictable over short periods of time, and noise, being wideband, is much less predictable. An adaptive predictor is used to separate speech and noise. The predictor is made to adapt rapidly in real time to the nuances of the speech.
Human speech is highly nonstationary from a statistical viewpoint. A speech predictor needs to be adaptive in order to adjust to the varying character of the speech signal. Rapid adaptation is necessary since substantial changes in the predictor need to take place during the time span of an individual spoken word.
The input signal to the adaptive predictor is noisy speech. The output signal is the speech, with the noise greatly attenuated. The speech is enhanced relative to the noise because it is much more predictable than the noise.
The foregoing and other objects of the invention will be more clearly understood from the following detailed description when read in conjunction with the accompanying drawings, wherein:
Referring now to
These signals are multiplied by or weighted by the weights w1k,w2k, . . . . The weight vector is represented by:
The number of weights is n. The ADC 26 samples the input regularly in time, and the time index or sample time number is k. The weighted signals are summed by the summer 15 to provide a weighted sum signal yk, 29. The weighted sum yk can be written as the inner product of the input signal vector and the weight vector. That is,
yk=XkTWk
The filter output signal 2 is obtained from yk by digital-to-analog conversion, by DAC 27. The DAC includes an analog low pass filter, so that output 2 is a continuous signal. A desired response signal 3 is generally supplied as a training signal. Subtracting the filter output signal 2 from the desired response 3 gives an error signal 21 that is used by the adaptive algorithm to train or adapt the weights. The error signal 21 is digitized by ADC 28 to form the discrete error signal ek, 20 for the adaptive algorithm. The mean square of the error is known to be a quadratic function of the weights. This function has a global minimum and no local minima. The method of steepest descent is generally used to iteratively find the global optimum.
The most widely used adaptive algorithm in the world is the LMS algorithm of Widrow and Hoff (see B. Widrow and S. D. Stearns, “Adaptive Signal Processing”, New Jersey: Prentice-Hall, Inc., 1985, incorporated herein by reference). This algorithm was invented in 1959 and patented by B. Widrow and M. E. Hoff, Jr. under U.S. Pat. No. 3,222,654. LMS is an iterative algorithm based on the method of steepest descent, and it is given by
Wk+1=Wk+2μekXk
where
ek=dk−yk.
The parameter μ is chosen to control rate of convergence and stability. When μ has a small value, convergence is slow and this algorithm causes the weight vector to converge in the mean to a Wiener solution, the best linear least squares solution W*, given by
W*=R−1P
where
R=E[xkxkT]
and
P=E[dkxkT]
The parameter μ is chosen to control rate of convergence and stability. When μ has a small value, convergence is slow and this algorithm causes the weight vector to converge in the mean to a Wiener solution, the best linear least squares solution W*, given by
W*=R−1P
where
R=E[xkxkT]
and
P=E[dkxkT]
The algorithm is stable as long as 1>μ trace R>0. This is the condition for convergence of the variance of the weight vector. Various proofs of convergence and formulas for speed of convergence are given in the literature. Typical convergence time of an adaptive filter with μ chosen so that μ trace R=0.1 would be a number of sample periods equal to ten times the number of weights n, or about ten times the length of the filter impulse response. This rate of convergence would be suitable for the adaptive filter used with this invention.
Many algorithms other than LMS exist for adapting the weights and can be used with the present invention. The literature is extensive. An excellent summary is given by S. Hay-kin, “Adaptive Filter Theory”, Third Edition, Prentice-Hall, Englewood Cliffs, N.J., 1996, incorporated herein by reference. This book describes the recursive least squares algorithm (RLS) which is often used to adapt an adaptive filter having either a tapped delay line or a lattice architecture.
The adaptive filter of
An analog-input analog-output type of adaptive filter is desirable for inclusion in most of the circuits of the present invention. If, however, the input to the adaptive filter is already in digital form, and a digital output is desired, then ADC's 26 and 28 and DAC 27 can be eliminated. The sampling rate of the data signals flowing through the adaptive filter would need to be synchronized with the clock rate of the adaptive filter itself, however.
The adaptive filter of
In
The adaptive predictor is described in the Widrow and Stearns book, Chapter 12. FIG. 12.36 of this book shows the adaptive predictor as it would be used to separate wideband noise from a noisy periodic signal. This invention uses the adaptive predictor to separate wideband noise from a noisy speech signal. Human speech is of course very different from a periodic signal. These two applications of the adaptive predictor differ in how the adaptive filter is used and how the predictor is configured.
A periodic signal is perfectly predictable. Its statistical properties are stable or stationary over time. Human speech, on the other hand, is not perfectly predictable and its statistical properties are highly nonstationary. Human speech is able to be predicted over a short time, not perfectly, but to a good approximation. The further into the future one tries to predict it, the poorer will be the approximation. In the case of a periodic signal, one can predict perfectly as far into the future as desired. Wideband noise, in contrast to a periodic signal and to human speech, is essentially unpredictable. It can be approximately predicted by an amount of time into the future equal to the reciprocal of its bandwidth. Noise with a large bandwidth can only be predicted over a very short time into the future. Prediction is therefore a mechanism for the separation of periodic signals and separation of speech signals from wideband additive noise. When using a predictor for separation of signals from background noise, one must choose how far into the future the predictor should predict. For the adaptive predictor of
The adaptive predictor functions in the following way. To make the error 21 small, which is accomplished by the adaptive algorithm in the adaptive filter, it is necessary for the adaptive filter 25 cascaded with the delay 35 to produce an output signal 2 which is close to the predictor input signal 3. This corresponds to the adaptive filter and the delay 35 having a combined transfer characteristic like a gain of unity. For this to be, the adaptive filter would need to reverse the effects of the delay, ie to create an output 2 which is a predicted version of the adaptive filter input 1. The prediction would be Δ units of time into the future, an amount of time equal to the delay time.
The above is an intuitive explanation of the functioning of the adaptive predictor. A mathematical analysis of the predictor with noisy periodic inputs is given in the Widrow and Steams book. No mathematical analysis yet exists for the behavior of the adaptive predictor with noisy speech inputs.
For speech enhancement, the delay 35 should be chosen to be long enough to make the noise contained in the filter input signal 1 be decorrelated from the noise contained in the desired response signal 3. A good choice of delay would be several times the reciprocal of the noise bandwidth. With a sampling rate of 22 kHz in the adaptive filter, for example, a typical choice of delay would be from 1 to 20 sampling periods. A good choice of number of weights for the adaptive filter would be from 64 to 512. A good choice for parameter μ would be such that μ trace R would range from 0.05 to 0.25. Parameter choices within the given ranges are not critical. Good performance is obtained within these ranges for a wide variety of input signal to noise ratios.
With μ trace R set to 0.1, substantial variation takes place in the weights (in the impulse response) of the adaptive filter during the time period of an individual spoken word. This variation is the key to speech enhancement. Experiments were tried using optimal weight settings for best least squares prediction for phrases of noisy speech. The Wiener solution was obtained, which gave a set of weights that did the best prediction averaged over a given phrase. When the weights were fixed at the Wiener solution and the noisy speech phrase was played through the predictor, the output was as noisy as the input. But when the noisy speech was played through the adaptive predictor that was free to adapt to the speech in real time, substantial noise reduction was experienced. What is needed for speech enhancement is adaptive filtering that provides short-term nonstationary Wiener solutions that vary as the words are spoken. These solutions are obtained in real time by the adaptive predictor of
The adaptive predictor has been used in the past to enhance periodic signals against wideband additive noise. For this purpose, the adaptive filter is used to obtain long-term Wiener solutions. This is done by making μ trace R much smaller, generally less than 0.01. Speech enhancement requires much faster adaptation. This is critically important for speech enhancement.
This invention represents a new idea for speech enhancement in the presence of background noise, and it is based on fast adaptive prediction. In the adaptive predictor, the adaptive filter acts as a least-squares statistical predictor of its input signal, predicting Δ units of time into the future. The output signal contains the predictable components of the input signal. An input signal composed of speech and additive uncorrelated noise would have a relatively unpredictable component, the noise, and a much more predictable component, the speech. The noise would be blocked by the adaptive filter, and the speech would propagate through it, with a small amount of distortion. Experiments have been done which show that when the input is speech without noise, the output is speech with essentially no distortion. When the input SNR is 0 dB (speech and noise having equal powers), the speech is intelligible at the input only if one listens carefully, but the speech is easily understood at the predictor output. The output speech signal is at the same amplitude as the input speech signal but the noise is almost gone. When the input SNR is −10 dB, the noise is so great that one is barely aware that someone is speaking when listening to the input, but one can detect speech and even understand what is being said when listening to the predictor output. When the input SNR is −20 dB, one cannot detect speech when listening to the input, but it is easy to detect speech and even understand some of the words at the predictor output.
Further enhancement of speech against background noise can be made with the system diagrammed in
Sometimes the noise of noisy speech contains periodic as well as broadband components. The adaptive predictor of
In order to prevent the canceller frrm canceling speech signals along with the periodic noise, it is necessary to make the delay 50 long enough to insure that speech components at the adaptive filter input 56 are not correlated with the speech components of the input signal 55. A delay 50 of several seconds or more will do this. Such a delay will not decorrelate the periodic noise components of 56 from those of 55, and the periodic noise will be canceled. The periodic noise canceller works like a notch filter, automatically making notches at the fundamental and harmonic frequencies of the periodic noise. When operating at 22 kHz, with a noise canceller having 1024 weights, its adaptive filter has an impulse response duration of 0.0467 sec. When forming a notch, the notch width is the reciprocal of the impulse response duration, or 21.4 Hz. As the notches developed by the noise canceller to cancel the periodic noise are 21.4 Hz wide, the notches do not significantly harm the spectrum of the speech signal that has a bandwidth of about 200 times that of a single notch. The adaptive canceller works well and does not significantly distort the speech signal.
Signal 3 is comprised of wideband noise plus speech. The adaptive predictor reduces or removes the wideband noise and the result is that the output 2 is enhanced speech.
In the cascade of the periodic noise canceller and adaptive predictor shown in
All of the methods described above for enhancement of speech against additive noise can be used to improve the performance of hearing aids. The adaptive system shown in
The speech enhancement methods described above could also be used to improve the performance of cellular phones when used in a noisy environment such as in an automobile, a restaurant, or outdoors when windy. The speech enhancing system could be incorporated within the cell phone housing and could be connected anywhere between the microphone output and the input to the modulator. This will make it easier for the person of the opposite end of the call to be able to understand what is being said under noisy circumstances. The same methodology could be used to improve speech quality with computer microphones, conference room microphones, news reporting microphones, etc.
The above description is based on preferred embodiments of the present invention; however, it will be apparent that modifications and variations thereof could be effected by one with skill in the art without departing from the spirit or scope of the invention, which is to be determined by the following claims.
This application claims priority to Provisional Application Ser. No. 60/509,315 filed Oct. 6, 2003.
Number | Date | Country | |
---|---|---|---|
60509315 | Oct 2003 | US |