The invention relates generally to approaches for noise removal in electronic circuits.
Vehicles are often equipped with various types of devices that produce and receive sound energy. For example, various hands-free systems are used by vehicle occupants to control various vehicular functions through a user speaking commands into a microphone, and the commands being recognized and executed by one or more control modules at the vehicles. The users in the vehicles may also use cellular phones or other types of sound producing or receiving devices.
Noise removal or suppression is important for clear mobile voice communications or accurate automatic speech recognition. However, effectively removing ambient noise without introducing distortion to speech has long been a difficult challenge. Over the past few decades, numerous noise suppression (NS) algorithms have been developed, particularly in the category of single channel noise suppressors. Some of these algorithms are widely used in mobile phones, Bluetooth headsets, hearing aids and hands-free car kits for the purpose of enhancing speech in noisy environment.
These algorithms are sometimes capable of suppressing stationary noise contaminated to speech (e.g., with 15 dB SNR improvement under a static car engine noise condition). However, the performance degrades significantly if the ambient noise changes dynamically over time (e.g., 4 dB SNR improvement in babble noise conditions). One reason for this degradation is that most voice activity detection (VAD) approaches used in these previous algorithms have difficulties in separating speech from non-stationary noise (e.g. multi-talker babble noise). Another reason for the degradation is that the estimated noise and the noise presence are not time aligned. More specifically, noise suppression algorithms typically estimate noise when speech is absent, but freezes noise estimation when speech is present. As a consequence, the noise subtraction/attenuation during speech periods typically depend on an “out-of-date” noise estimates.
Although this asynchronous noise estimation/utilization process is sometimes acceptable when the ambient noise is stationary, it becomes over-simplistic and not suitable in canceling non-stationary noises, such as transient traffic noise, or babble noise. In these later cases, outdated information is used and noise removal is not effective or acceptable. The absence of effective noise removal produces audio qualities that are unacceptable for many users.
The present invention is illustrated, by way of example and not limitation, in the accompanying figures, in which like reference numerals indicate similar elements, and in which:
Skilled artisans will appreciate that elements in the figures are illustrated for simplicity and clarity and have not necessarily been drawn to scale. For example, the dimensions and/or relative positioning of some of the elements in the figures may be exaggerated relative to other elements to help to improve understanding of various embodiments of the present invention. Also, common but well-understood elements that are useful or necessary in a commercially feasible embodiment are often not depicted in order to facilitate a less obstructed view of these various embodiments of the present invention. It will further be appreciated that certain actions and/or steps may be described or depicted in a particular order of occurrence while those skilled in the art will understand that such specificity with respect to sequence is not actually required. It will also be understood that the terms and expressions used herein have the ordinary meaning as is accorded to such terms and expressions with respect to their corresponding respective areas of inquiry and study except where specific meanings have otherwise been set forth herein.
In the approaches described herein noise is estimated continuously or substantially continuously (e.g. during speech). The noise estimate is removed from the signal of interest (that includes both speech and noise) and the noise removal can be made more effectively than previous approaches, for instance, since the noise cancellation and noise estimate are synchronous with each other (i.e., there is no substantial delay between these events).
In many of the approaches described herein, a multi-source signal separation algorithm is used to achieve more effective noise suppression. The present approaches remove utilizing voice activity detection (VAD) and conventional noise estimates typically utilized in previous approaches. In this respect, a smoothing factor is calculated and applied to the noise estimate. In some aspects, the smoothing factor is based on the discrepancy between a long term and a short term noise estimates. In some examples, the continuous noise estimate is incorporated into a gain function calculation for noise suppression.
More specifically and in many of these embodiments, a continuous stream of noise is created from a plurality of input signals. A smoothing spectrum estimate is continuously calculated from the continuous stream of noise. Noise is responsively removed from a selected one of the plurality of input signals using the smoothing spectrum estimate. The removal of the noise from the selected input signal is performed substantially synchronously and in time alignment with the creating of the continuous stream of noise and the calculating of the smoothing spectrum estimate.
In other aspects, the noise removal utilizes one or more of a gain function, a noise subtraction approach, or a Weiner filter. Other examples are possible. Calculating the smoothing spectrum estimate may include calculating a difference in spectral deviation between a long term noise estimate and a short term noise estimate.
In other aspects, the plurality of input signals comprises a plurality of microphone signals. In yet other aspects, the plurality of microphone signals are formed from a plurality of microphones disposed at a device, and the device may be a mobile phone, a hands-free vehicular application, or a hearing aid. The microphones may be deployed at other types of devices as well. In still other aspects, the plurality of input signals includes a first signal from a primary microphone and a second signal from a secondary microphone. In some examples, creating a continuous stream of noise includes cancelling a speech component from the secondary microphone signal using the first signal as a reference to leave a continuous noise signal.
In others of these embodiments a first signal and a second signal are received. A continuous stream of noise is created based upon the first signal and the second signal. A smoothing spectrum estimate is continuously calculated using the continuous stream of noise. Noise is responsively removed from the first signal using the smoothing spectrum estimate. The removal of the noise is performed substantially synchronously and in time alignment with creating the continuous stream of noise and calculating the smoothing spectrum estimate.
In still others of these embodiments, a system for suppressing noise from a signal includes a noise creation module, a smoothing spectrum creation module, and a noise removal module. The noise creation module is configured to create a continuous stream of noise from a plurality of input signals. The smoothing spectrum creation module is coupled to the noise creation module and is configured to continuously calculate a smoothing spectrum estimate from the continuous stream of noise. The noise removal module is coupled to the smoothing spectrum module and is configured to remove noise from a selected one of the plurality of input signals using the smoothing spectrum estimate. The noise removal module removes noise from the selected one of the plurality of input signals substantially synchronously and in time alignment with the noise creation module creating the continuous stream of noise.
In some aspects, the smoothing factor creation module is configured to calculate the smoothing spectrum estimate by determining a difference in spectral deviation between a long term noise estimate and a short term noise estimate. In other aspects, the application of the smoothing spectrum estimate is effective to suppress noise in the microphone signal. In some other aspects, the plurality of input signals comprises a first signal from a primary microphone and a second signal from a secondary microphone. In yet other aspects, the noise creation module is configured to create the continuous stream of noise by cancelling a speech component from the secondary microphone signal using the first signal as a reference to leave a continuous noise signal.
Referring now to
The noise reduction module 106 as described elsewhere herein is configured to remove noise from the signals. In one aspect, the approach used combines a multi-sensor module followed by single channel noise suppression.
More specifically, the noise suppression module 106 includes a noise creation module 120, a smoothing spectrum creation module 122, and a noise removal module 124. The noise creation module 122 is configured to create a continuous stream of noise from a plurality of input signals (the microphone signals). The smoothing spectrum creation module 122 is coupled to the noise creation module 120 and is configured to continuously calculate a smoothing spectrum estimate from the continuous stream of noise. The noise removal module 124 is coupled to the smoothing spectrum module and is configured to remove noise from a selected one of the plurality of input signals using the smoothing spectrum estimate. The noise removal module 124 removes noise from the selected one of the plurality of input signals substantially synchronously and in time alignment with the noise creation module creating the continuous stream of noise.
In some aspects, the smoothing factor creation module 122 is configured to calculate the smoothing spectrum estimate by determining a difference in spectral deviation between a long term noise estimate and a short term noise estimate. In other aspects, the application of the smoothing spectrum estimate is effective to suppress noise in the microphone signal. In some other aspects, the plurality of input signals comprises a first signal from a primary microphone and a second signal from a secondary microphone. In yet other aspects, the noise creation module 120 is configured to create the continuous stream of noise by cancelling a speech component from the secondary microphone signal using the first signal as a reference to leave a continuous noise signal.
Referring now to
As illustrated in
More specifically and as shown in
The functions of the first analysis window element 210 and the second analysis window element 212 are to provide window analysis function. The first fast Fourier transform element 214 and the second fast Fourier transform element 216 obtain a Fourier transform of the signal. The first squaring element 218 and the second squaring element 220 provide a squaring function. The function of the gain function module 222 is to provide a gain function. The function of the smoothed noise estimation module 224 is to provide a smoothed noise estimate. The summer 226 sums the output of the gain function module 222 and the output of the first squaring element 218. The inverse fast Fourier transform module 228 obtains an inverse Fourier transform of its input. The function of the synthesis window element 230 is to provide synthesis functions for application to the signal. The overlap and add module 232 provides overlap and addition functions for application to the signal.
In one example, the desired speech is captured along with background speech (e.g., babble noise) via two microphones and these two microphones are displaced a predetermined distance apart (e.g., 4 cm apart). Using the approaches described herein and to give one example, the SNR gain is approximately 8.5 dB, which is approximately 4.2 dB higher than some previous single channel noise suppression algorithms. The use of a separate and reliable noise source in a single channel based noise suppression of the present approaches cancels non-stationary (as well as stationary) noise effectively during speech presence, and is immune to the errors made by VAD inside main stream single NS algorithms
Referring now to
At step 304, a smoothing spectrum estimate is continuously calculated from the continuous stream of noise. In one aspect, the smoothing spectrum estimate is determined by calculating a spectral deviation between a long term noise estimate and a short term noise estimate. Other examples are possible.
At step 306, noise is responsively removed from a selected one of the plurality of input signals using the smoothing spectrum estimate. The removal of the noise from the selected input signal is performed substantially synchronously and in time alignment with the creating of the continuous stream of noise and the calculating of the smoothing spectrum estimate. By substantially synchronously and in time alignment it is meant that there is no significant or substantial delay between these two events.
Referring now to
At step 404, a continuous stream of noise is created based upon the first signal and the second signal. In one aspect, creating the continuous stream of noise may include cancelling a speech component from the first microphone signal.
At step 406, a smoothing spectrum estimate is continuously calculated using the continuous stream of noise. In one aspect, the smoothing spectrum estimate is determined by calculating a spectral deviation between a long term noise estimate and a short term noise estimate.
At step 408, noise is responsively removed from the first signal using the smoothing spectrum estimate. The removal of the noise is performed substantially synchronously and in time alignment with creating the continuous stream of noise and calculating the smoothing spectrum estimate. The noise may be removed, for example, by using an approach such as a gain function, a noise subtraction approach, or a Weiner filter. Other examples are possible.
Referring now to
It will be understood that the functions described herein may be implemented by computer instructions stored on a computer media (e.g., in a memory) and executed by a processing device (e.g., a microprocessor, controller, or the like).
It is understood that the implementation of other variations and modifications of the present invention and its various aspects will be apparent to those of ordinary skill in the art and that the present invention is not limited by the specific embodiments described. It is therefore contemplated to cover by the present invention any modifications, variations or equivalents that fall within the spirit and scope of the basic underlying principles disclosed and claimed herein.
Number | Name | Date | Kind |
---|---|---|---|
6549586 | Gustafsson et al. | Apr 2003 | B2 |
6717991 | Gustafsson et al. | Apr 2004 | B1 |
20050060142 | Visser | Mar 2005 | A1 |
20090106021 | Zurek et al. | Apr 2009 | A1 |
20090164212 | Chan | Jun 2009 | A1 |
20110099007 | Zhang | Apr 2011 | A1 |
Number | Date | Country |
---|---|---|
9718647 | May 1997 | WO |
Entry |
---|
Osamu Hoshuyama and Akihiko Sugiyama “Robust Adaptive Beam Forming” Microphone Arrays: Speech Processing Techniques and Applications, 2001, ISBN: 30540-41953-5, NEC Media Research Labs, Kawasaki, Japan. |
International Search Report and Written Opinion for PCT Application No. PCT/US2012/070010, mailed on Jun. 28, 2013. 10 pages. |
Tashev, Ivan J. Sound Capture and Processing: Practical Approaches, Microsoft Research, USA, John Wiley & Sons Ltd., 2009. |
Number | Date | Country | |
---|---|---|---|
20130158989 A1 | Jun 2013 | US |