1. Field of the Invention
The present invention relates to reducing the level of noise in a speech signal.
2. Background Art
Electrical renditions of human speech are increasingly used for inter-person communication, storing speech and for man-machine interfaces. One limit on the comprehensibility of speech signals is the amount of noise intermixed with the speech. A wide variety of techniques have been proposed to reduce the amount of noise contained in speech signals. Many of these techniques are not practical because they assume information not readily available such as the noise characteristics, location of noise sources, precise speech characteristics, and the like.
One technique for reducing noise is to filter the noisy speech signal. This may be accomplished by converting the speech signal into its frequency domain equivalent, multiplying the frequency domain signal by the desired filter then converting back to a time domain signal. Converting between time domain and frequency domain representations is commonly accomplished using a fast Fourier transform and an inverse fast Fourier transform. Alternatively, the speech signal may be broken into subbands and a gain applied to each subband. The amplified or attenuated subbands are then combined to produce the filtered speech signal. In either case, filter or gain parameters must be calculated. This calculation depends upon determining characteristics of noise contaminating the speech signal.
Typically, speech contains quiet periods when only the noise component appears in the speech signal. Quiet periods occur naturally when the speaker pauses or takes a breath. A voice activity detector (VAD) may be used to detect the presence of speech in a speech signal. In use, a VAD is connected to the noisy speech signal. The output of the VAD signals parameter calculation logic when speech is occurring in the input signal. One problem with using a VAD is that the VAD is typically complex if the speech signal contains widely varying levels of noise.
What is needed is to produce improved speech signals in the presence of varying levels of noise without requiring complex logic for calculating noise reducing coefficients.
The present invention detects the presence of speech in a filtered speech signal for the purpose of suspending noise floor level calculations during periods of speech.
A method for reducing noise in a speech signal is provided. A noise floor in a received speech signal is estimated. The received speech signal is split into a plurality of subband signals. A subband variable gain is determined for each subband based on the noise floor estimation an on the subband signals. Each subband signal is multiplied by the subband variable gain for that subband. The scaled subband signals are combined to produce an output voice signal. The presence of speech is determined in a filtered voice signal. Noise floor estimation is suspended during periods when speech is determined to be present in the filtered voice signal.
The filtered voice signal may be the output voice signal. Alternatively, the filtered voice signal may be determined by multiplying each subband signal by a speech determination subband gain different from the corresponding subband variable gain. The product of the subband signal with a speech determination subband gain is combined to produce the filtered voice signal. This results in one path for enhanced speech and another, lower quality path for voice detection.
In an embodiment of the present invention, the method further includes decimation of each subband signal prior to multiplication by the subband variable gain and interpolation of the subband signal following multiplication by the subband variable gain.
In another embodiment of the present invention, each subband variable gain is determined as a ratio of a noisy speech level to the noise floor level. At least one of the noisy speech level and the noise floor level may be determined as a decaying average of levels expressed by a time constant. The time constant value may be based on a comparison of a previous level with a current level.
In yet another embodiment of the present invention, the method further includes determining a state based on the estimated noise floor. The subband variable gain is determined for each subband based on the determined state.
In still another embodiment of the present invention, each subband variable gain is determined as a ratio of a noisy speech level to a noise floor level. The noise floor level is determined as a decaying average of noise floor levels. Determination of the noise floor level is suspended during periods when speech is determined to be present in the filtered voice signal.
A system for reducing noise in an input speech signal is also provided. The system includes an analysis filter bank accepting the speech signal. The analysis filter bank includes a plurality of filters, each filter extracting a subband signal from the speech signal. The system also includes a plurality of variable gain multipliers. Each variable gain multiplier multiplies one subband signal by a subband variable gain to produce a subband product signal. A synthesizer accepts the subband product signals and generates a reduced noise speech signal. A voice activity detector detects the presence of speech in the reduced noise speech signal. Gain calculation logic determines a noise floor level based on the input speech signal if the presence of speech is not detected and holds the noise floor level constant if the presence of speech is detected. The subband variable gains are determined based on the noise floor level.
Another system for reducing noise in an input speech signal is provided. The system includes an analysis filter bank extracting subband signals from input speech signal. A variable gain multiplier for each subband multiplies the subband signal by a subband variable gain to produce a subband product signal. A speech signal synthesizer accepts the plurality of subband product signals and generates a reduced noise speech signal. The system also includes a plurality of speech detection multipliers. Each speech detection multiplier multiplies one subband signal by a speech detection subband gain to produce a detection subband signal. A voice detection synthesizer accepts the plurality of detection subband signals and generates a speech detection signal. A voice activity detector detects the presence of speech in the speech detection signal. Gain calculation logic generates the subband variable gains based on the detected presence of speech.
The above objects and other objects, features, and advantages of the present invention are readily apparent from the following detailed description of the best mode for carrying out the invention when taken in connection with the accompanying drawings.
Referring to
Subband filters 26 may be constructed in a variety of means as is known in the art. Subband filters 26 may be implemented as a uniform filter bank. Subband filters 26 may also be implemented as a wavelet filter bank, DFT filter bank, filter bank based on BARK scale, octave filter bank, and the like. The first subband filter 26, indicated by H1(n), may be a low pass filter or a band pass filter. The last subband filter, indicated by HL(n), may be a high pass filter or a band pass filter. Other subband filters 26 are typically band pass filters.
Subband signals 28 are received by gain section 30 modifying the gain of each subband 28 by a gain factor 32. Within each subband, multiplier 34 accepts subband signal 28 and gain 32 and generates product signal 36. As will be recognized by one of ordinary skill in the art, multiplier 34 may be implemented by a variety of means such as, for example, by a hardware multiplication circuit, by multiplication in software, by shift-and-add operations, with a transconductance amplifier, and the like.
Synthesis section 38 accepts product signal 36 and generates output voice signal y′(n) 40. In the embodiment shown, synthesis section 38 is implemented with summer 42. Synthesis section 38 may also be implemented with a synthesis filter bank to improve performance.
By properly selecting the number of subbands 28, frequency range of subband filters 26 and gains 32, the effect of noise in input speech signal 22 can be greatly reduced in output voice signal 40.
Referring now to
A synthesis/analysis system without decimation, as shown in
Referring now to
Preferably, variable gain 32 is calculated for the kth subband using the envelope of the subband noisy speech signal, Yk(n), and subband noise floor envelope, Vk(n). Equation 1 provides a formula for obtaining the envelope of subband signal 28 where |yk(n)| represents the absolute value of subband signal 28.
Yk(n)=αYk(n−1)+(1−α)|yk(n) (1)
The constant, α, is defined as shown in Equation 2:
where fs represents the sampling frequency of input speech signal 22, M is the down sampling factor, and speech_decay is a time constant that determines the decay time of the speech envelope. The initial value Yk(0) is set to zero. Similarly, the noise floor envelope may be expressed as in Equation 3:
Vk(n)=βVk(n−1)+(1−β)|yk(n)|. (3)
The constant, β, is defined as shown in Equation 4:
where noise_decay is a time constant that determines the decay time of the noise envelope.
The constants α and β can be implemented to allow different attack and decay time constants, as indicated in Equations 5 and 6:
where the subscript “a” indicates the attack time constant and the subscript “d” indicates the decay time constant. Example parameters are:
Once the values of Yk(n) and Vk(n) have been obtained, variable gain 32 for each subband may be computed as in Equation 7:
where the constant, γ, provides an estimate of the noise reduction. For example, if the speech and noise envelopes have approximately the same value as may occur, for example, during periods of silence, the gain factor becomes:
Thus, if γ=10, the noise reduction will be approximately 20 dB. In an embodiment of the present invention, values for gamma may be based on noise characteristics such as, for example, the level of noise in input speech signal 22. Also, a different gain factor, γk, may be used for each subband k. Typically, variable gain 32 is limited to magnitudes of one or less.
Voice activity detector 74 may be implemented in a variety of manners as is known in the art. One difficulty with voice activity detectors commonly in use is that such detectors require complex logic in the presence of high or medium levels of noise. VAD 74 monitors output speech signal 40 for the presence of speech. Since much of the noise intermixed with input speech signal 22 has already been removed, the design of VAD 74 may be much simpler than if VAD 74 monitored input speech signal 22. One implementation of VAD 74 detects the presence of speech by examining the power in output speech signal 40. If the power level is above a preset threshold, speech is detected.
In another embodiment, VAD 74 may detect the presence of speech in output speech signal 40 by obtaining a signal-to-noise ratio. For example, the ratio of an output speech level envelope to an output noise floor estimation may be used, as shown in Equation 9:
where T is a threshold value and VAD is voice activity signal 76. Speech level envelope, Y′(n), and noise floor level envelope, V′(n), may be calculated as described above with regards to Equations 1–6. The threshold T may be chosen based on the noise floor estimation of the input signal. Hysteresis may also be used with the threshold.
Problems can occur in a noise reduction system if voice is present in any subband signal 28 for an extended period of time. This problem can occur in continuous speech, which may be more common in certain languages and in signals from certain speakers. Continuous speech causes the noise floor ceiling envelope to grow. As a result, the gain factor for each subband, Gk(n), will be smaller than it should be, resulting in an undesirable attenuation in processed speech signal 40. This problem can be reduced if the update of the noise envelope floor estimation is halted during speech periods. In other words, when voice activity signal 76 is asserted, the value of Vk(n) is not updated. This operation is described in Equation 10 as follows:
Referring now to
Separate analysis sections for generating speech detection signal 102 and for generating reduced noise speech signal 40 permits different characteristics to be used for each. For example, speech detection subband gains 94 may be different than subband variable gains 32 to better suit the task of detecting speech. Also, speech detection subband gains 94 and detection multipliers 92 may have different, typically lower, resolution requirements than subband variable gains 32 and variable gain multipliers 34.
Referring now to
ŷ(n)=y(n)−a1·ŷ(n−1) (11)
where ŷ(n) is the output of preemphasis filter 112 and the constant a1 is typically between 0.96 and 0.99. Deemphasis filter 114 removes the effects of preemphasis filter 112. A corresponding deemphasis filter 114 may be described by Equation 12:
y′(n)={tilde over (y)}(n)−a1·y′(n−1) (12)
where {tilde over (y)}(n) is the input to deemphasis filter 114. If necessary, more complex structures may be used to implement preemphasis filter 112 and deemphasis filter 114.
In real world applications, the characteristic of noise can change at any time. Further, the level of noise may vary widely from low noise conditions to high noise conditions. Differing noise conditions may be used to trigger different sets of parameters for calculating variable gains 32. Inappropriate selection of parameters may actually degrade performance of speech processing system 110. For example, in low noise conditions, an aggressive set of gain parameters may result in undesirable speech distortion in output speech signal 40.
Gain logic 78 may include state machine 116 and noise floor estimator 118 for determining gain calculation parameters. Fullband noise estimation 120 is obtained by subtracting delayed input signal 22 from filtered speech signal 102. This results in an amount of noise, extracted from noisy input 22, used by noise floor estimator 118 to generate an estimation of the noise floor present in input signal 22. The amount of delay, d, applied to input 22 compensates for the delay created by the subband structure. The noise floor estimation will only be updated during periods of no speech in order to improve the estimation process. Noise floor estimator may be described by Equation 13 as follows:
where V(n) is the envelope of extracted noise signal 120.
State machine 116 changes to one of P states based on noise floor signal 120 and thresholds T1, T2, . . . , Tp, as follows:
For each state p, different parameters such as γ, β, α, and the like, can be used in calculating gains 32. This allows more aggressive noise cancellation in higher levels of noise and less aggressive, less distorting noise cancellation during periods of low noise. In addition, hysteresis may be used in state transitions to prevent rapid fluctuations between states.
Referring now to
With reference to the above
Referring now to
While embodiments of the invention have been illustrated and described, it is not intended that these embodiments illustrate and describe all possible forms of the invention. Words used in this specification are words of description rather than limitation, and it is understood that various changes may be made without departing from the spirit and scope of the invention.
Number | Name | Date | Kind |
---|---|---|---|
5012519 | Adlersberg et al. | Apr 1991 | A |
5276765 | Freeman et al. | Jan 1994 | A |
5699382 | Shoham et al. | Dec 1997 | A |
5749067 | Barrett | May 1998 | A |
5768473 | Eatwell et al. | Jun 1998 | A |
5963901 | Vahatalo et al. | Oct 1999 | A |
5991718 | Malah | Nov 1999 | A |
6035048 | Diethorn | Mar 2000 | A |
6070137 | Bloebaum et al. | May 2000 | A |
6098040 | Petroni et al. | Aug 2000 | A |
6108610 | Winn | Aug 2000 | A |
6175634 | Graumann | Jan 2001 | B1 |
6230122 | Wu et al. | May 2001 | B1 |
6230123 | Mekuria et al. | May 2001 | B1 |
6591234 | Chandran et al. | Jul 2003 | B1 |
6604071 | Cox et al. | Aug 2003 | B1 |
20020029141 | Cox et al. | Mar 2002 | A1 |
Number | Date | Country | |
---|---|---|---|
20040078200 A1 | Apr 2004 | US |