Noise suppression assisted automatic speech recognition

Information

  • Patent Grant
  • 9558755
  • Patent Number
    9,558,755
  • Date Filed
    Tuesday, December 7, 2010
    14 years ago
  • Date Issued
    Tuesday, January 31, 2017
    7 years ago
Abstract
Noise suppression information is used to optimize or improve automatic speech recognition performed for a signal. Noise suppression can be performed on a noisy speech signal using a gain value. The gain to apply to the noisy speech signal is selected to optimize speech recognition analysis of the resulting signal. The gain may be selected based on one or more features for a current sub band and time frame, as well as one or more features for other sub bands and/or time frames. Noise suppression information can be provided to a speech recognition module to improve the robustness of the speech recognition analysis. Noise suppression information can also be used to encode and identify speech.
Description
BACKGROUND OF THE INVENTION

Speech recognition systems have been used to convert spoken words into text. In medium and high noise environments, however, the accuracy of automatic speech recognition systems tends to degrade significantly. As a result, most speech recognition systems are used with audio captured in a noise-free environment.


Unlike speech recognition systems, a standard noise reduction strategy consists of strongly attenuating portions of the acoustic spectrum which are dominated by noise. Spectrum portions dominated by speech are preserved.


Strong attenuation of undesired spectrum portions is a valid strategy from the point of view of noise reduction and perceived output signal quality, it is not necessarily a good strategy for an automatic speech recognition system. In particular, the spectral regions strongly attenuated by noise suppression may have been necessary to extract features for speech recognition. As a result, the attenuation resulting from noise suppression corrupts the features of the speech signal more than the original noise signal. This corruption by the noise suppression of the speech signal, which is greater than the corruption caused by the added noise signal, causes the noise reduction algorithm to make automatic speech recognition results unusable.


SUMMARY OF THE INVENTION

The present technology may utilize noise suppression information to optimize or improve automatic speech recognition performed for a signal. Noise suppression may be performed on a noisy speech signal using a gain value. The gain to apply to the noisy signal as part of the noise suppression is selected to optimize speech recognition analysis of the resulting signal. The gain may be selected based on one or more features for a current sub band and time frame, as well as others. Noise suppression information may be provided to a speech recognition module to improve the robustness of the speech recognition analysis. Noise suppression information may also be used to encode and identify speech. Resources spent on automatic speech recognition such as a bit rate of a speech codec) may be selected based on the SNR.


An embodiment may enable processing of an audio signal. Sub-band signals may be generated from a received primary acoustic signal and a secondary acoustic signal. One or more features may be determined for a sub-band signal. Noise suppression information may be determined based the one or more features to a speech recognition module.





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 illustrates an environment in which the present technology may be utilized.



FIG. 2 is a block diagram of an exemplary audio device.



FIG. 3 is a block diagram of an exemplary audio processing system.



FIG. 4 is a flow chart of an exemplary method for performing speech recognition based on noise suppression information.



FIG. 5 is a flow chart of an exemplary method for performing noise suppression on a sub band signal.



FIG. 6 is a flow chart of an exemplary method for providing noise suppression information to a speech recognition module.





DETAILED DESCRIPTION OF THE INVENTION

The present technology may utilize noise suppression information to optimize or improve automatic speech recognition performed for a signal. Noise suppression may be performed on a noisy speech signal using a gain value. The gain to apply to the noisy signal as part of the noise suppression is selected to optimize speech recognition analysis of the resulting signal. The gain may be selected based on one or more features for a current sub band and time frame, as well as others.


Noise suppression information may be provided to a speech recognition module to improve the robustness of the speech recognition analysis. Noise suppression information may include voice activity detection (VAD) information, such as for example noise, an indication of whether a signal includes speech, an indication of a speech to noise ration (SNR) for a signal, and other information. Noise suppression information may also be used to encode and identify speech. Resources spent on automatic speech recognition such as a bit rate of a speech codec) may be selected based on the SNR.



FIG. 1 is an illustration of an environment in which embodiments of the present technology may be used. A user may act as an audio (speech) source 102 to an audio device 104. The exemplary audio device 104 includes two microphones: a primary microphone 106 relative to the audio source 102 and a secondary microphone 108 located a distance away from the primary microphone 106. Alternatively, the audio device 104 may include a single microphone. In yet other embodiments, the audio device 104 may include more than two microphones, such as for example three, four, five, six, seven, eight, nine, ten or even more microphones.


The primary microphone 106 and secondary microphone 108 may be omni-directional microphones. Alternatively embodiments may utilize other forms of microphones or acoustic sensors, such as directional microphones.


While the microphones 106 and 108 receive sound (i.e. acoustic signals) from the audio source 102, the microphones 106 and 108 also pick up noise 112. Although the noise 110 is shown coming from a single location in FIG. 1, the noise 110 may include any sounds from one or more locations that differ from the location of audio source 102, and may include reverberations and echoes. The noise 110 may be stationary, non-stationary, and/or a combination of both stationary and non-stationary noise.


Some embodiments may utilize level differences (e.g. energy differences) between the acoustic signals received by the two microphones 106 and 108. Because the primary microphone 106 is much closer to the audio source 102 than the secondary microphone 108 in a close-talk use case, the intensity level is higher for the primary microphone 106, resulting in a larger energy level received by the primary microphone 106 during a speech/voice segment, for example.


The level difference may then be used to discriminate speech and noise in the time-frequency domain. Further embodiments may use a combination of energy level differences and time delays to discriminate speech. Based on binaural cue encoding, speech signal extraction or speech enhancement may be performed.



FIG. 2 is a block diagram of an exemplary audio device 104. In the illustrated embodiment, the audio device 104 includes a receiver 200, a processor 202, the primary microphone 106, an optional secondary microphone 108, an audio processing system 210, and an output device 206. The audio device 104 may include further or other components necessary for audio device 104 operations. Similarly, the audio device 104 may include fewer components that perform similar or equivalent functions to those depicted in FIG. 2.


Processor 202 may execute instructions and modules stored in a memory (not illustrated in FIG. 2) in the audio device 104 to perform functionality described herein, including noise reduction for an acoustic signal, speech recognition, and other functionality. Processor 202 may include hardware and software implemented as a processing unit, which may process floating point operations and other operations for the processor 202.


The exemplary receiver 200 may include an acoustic sensor configured to receive and transmit a signal to and from a communications network. In some embodiments, the receiver 200 may include an antenna device. The signal received may be forwarded to the audio processing system 210 to reduce noise using the techniques described herein, and provide an audio signal to the output device 206. Similarly, a signal received by one or more of primary microphone 106 and secondary microphone 108 may be processed for noise suppression and ultimately transmitted to a communications network via receiver 200. Hence, the present technology may be used in one or both of the transmit and receive paths of the audio device 104.


The audio processing system 210 is configured to receive the acoustic signals from an acoustic source via the primary microphone 106 and secondary microphone 108 (or a far-end signal via receiver 200) and process the acoustic signals. Processing may include performing noise reduction within an acoustic signal and speech recognition for an acoustic signal. The audio processing system 210 is discussed in more detail below.


The primary and secondary microphones 106, 108 may be spaced a distance apart in order to allow for detecting an energy level difference, time difference or phase difference between them. The acoustic signals received by primary microphone 106 and secondary microphone 108 may be converted into electrical signals (i.e. a primary electrical signal and a secondary electrical signal). The electrical signals may themselves be converted by an analog-to-digital converter (not shown) into digital signals for processing in accordance with some embodiments. In order to differentiate the acoustic signals for clarity purposes, the acoustic signal received by the primary microphone 106 is herein referred to as the primary acoustic signal, while the acoustic signal received from by the secondary microphone 108 is herein referred to as the secondary acoustic signal. The primary acoustic signal and the secondary acoustic signal may be processed by the audio processing system 210 to produce a signal with an improved signal-to-noise ratio. It should be noted that embodiments of the technology described herein may be practiced utilizing only the primary microphone 106.


The output device 206 is any device which provides an audio output to the user. For example, the output device 206 may include a speaker, an earpiece of a headset or handset, or a speaker on a conference device.


In various embodiments, where the primary and secondary microphones are omni-directional microphones that are closely-spaced (e.g., 1-2 cm apart), a beamforming technique may be used to simulate forwards-facing and backwards-facing directional microphones. The level difference may be used to discriminate speech and noise in the time-frequency domain which can be used in noise reduction.



FIG. 3 is a block diagram of an exemplary audio processing system 210 for performing noise reduction and automatic speech recognition. In exemplary embodiments, the audio processing system 210 is embodied within a memory device within audio device 104. The audio processing system 210 may include a frequency analysis module 302, a feature extraction module 304, a source inference engine module 306, gain data store 307, mask selector module 308, noise canceller module 310, modifier module 312, reconstructor module 314, and automatic speech recognition 316. Audio processing system 210 may include more or fewer components than illustrated in FIG. 3, and the functionality of modules may be combined or expanded into fewer or additional modules. Exemplary lines of communication are illustrated between various modules of FIG. 3, and in other figures herein. The lines of communication are not intended to limit which modules are communicatively coupled with others, nor are they intended to limit the number of and type of signals communicated between modules.


In operation, acoustic signals received from the primary microphone 106 and second microphone 108 are converted to electrical signals, and the electrical signals are processed through frequency analysis module 302. The acoustic signals may be pre-processed in the time domain before being processed by frequency analysis module 302. Time domain pre-processing may include applying input limiter gains, speech time stretching, and filtering using an FIR or IIR filter.


The frequency analysis module 302 receives the acoustic signals and mimics the frequency analysis of the cochlea (e.g., cochlear domain) to generate sub-band signals, simulated by a filter bank. The frequency analysis module 302 separates each of the primary and secondary acoustic signals into two or more frequency sub-band signals. A sub-band signal is the result of a filtering operation on an input signal, where the bandwidth of the filter is narrower than the bandwidth of the signal received by the frequency analysis module 302. The filter bank may be implemented by a series of cascaded, complex-valued, first-order IIR filters. Alternatively, other filters such as short-time Fourier transform (STFT), sub-band filter banks, modulated complex lapped transforms, cochlear models, wavelets, etc., can be used for the frequency analysis and synthesis. The samples of the frequency sub-band signals may be grouped sequentially into time frames (e.g. over a predetermined period of time). For example, the length of a frame may be 4 ms, 8 ms, or some other length of time. In some embodiments there may be no frame at all. The results may include sub-band signals in a fast cochlea transform (FCT) domain.


The sub-band frame signals are provided from frequency analysis module 302 to an analysis path sub-system 320 and a signal path sub-system 330. The analysis path sub-system 320 may process the signal to identify signal features, distinguish between speech components and noise components of the sub-band signals, and determine a signal modifier. The signal path sub-system 330 is responsible for modifying sub-band signals of the primary acoustic signal by reducing noise in the sub-band signals. Noise reduction can include applying a modifier, such as a multiplicative gain mask determined in the analysis path sub-system 320, or by subtracting components from the sub-band signals. The noise reduction may reduce noise and preserve the desired speech components in the sub-band signals.


Signal path sub-system 330 includes noise canceller module 310 and modifier module 312. Noise canceller module 310 receives sub-band frame signals from frequency analysis module 302. Noise canceller module 310 may subtract (e.g., cancel) a noise component from one or more sub-band signals of the primary acoustic signal. As such, noise canceller module 310 may output sub-band estimates of noise components in the primary signal and sub-band estimates of speech components in the form of noise-subtracted sub-band signals.


Noise canceller module 310 may provide noise cancellation, for example in systems with two-microphone configurations, based on source location by means of a subtractive algorithm. Noise canceller module 310 may also provide echo cancellation and is intrinsically robust to loudspeaker and Rx path non-linearity. By performing noise and echo cancellation (e.g., subtracting components from a primary signal sub-band) with little or no voice quality degradation, noise canceller module 310 may increase the speech-to-noise ratio (SNR) in sub-band signals received from frequency analysis module 302 and provided to modifier module 312 and post filtering modules. The amount of noise cancellation performed may depend on the diffuseness of the noise source and the distance between microphones, both of which contribute towards the coherence of the noise between the microphones, with greater coherence resulting in better cancellation.


Noise canceller module 310 may be implemented in a variety of ways. In some embodiments, noise canceller module 310 may be implemented with a single NPNS module. Alternatively, Noise canceller module 310 may include two or more NPNS modules, which may be arranged for example in a cascaded fashion.


An example of noise cancellation performed in some embodiments by the noise canceller module 310 is disclosed in U.S. patent application Ser. No. 12/215,980, entitled “System and Method for Providing Noise Suppression Utilizing Null Processing Noise Subtraction,” filed Jun. 30, 2008, U.S. application Ser. No. 12/422,917, entitled “Adaptive Noise Cancellation,” filed Apr. 13, 2009, and U.S. application Ser. No. 12/693,998, entitled “Adaptive Noise Reduction Using Level Cues,” filed Jan. 26, 2010, the disclosures of which are each incorporated herein by reference.


The feature extraction module 304 of the analysis path sub-system 320 receives the sub-band frame signals derived from the primary and secondary acoustic signals provided by frequency analysis module 302 as well as the output of NPNS module 310. Feature extraction module 304 may compute frame energy estimations of the sub-band signals, inter-microphone level differences (ILD), inter-microphone time differences (ITD) and inter-microphones phase differences (IPD) between the primary acoustic signal and the secondary acoustic signal, self-noise estimates for the primary and second microphones, as well as other monaural or binaural features which may be utilized by other modules, such as pitch estimates and cross-correlations between microphone signals. The feature extraction module 304 may both provide inputs to and process outputs from NPNS module 310.


NPNS module may provide noise cancelled sub-band signals to the ILD block in the feature extraction module 304. Since the ILD may be determined as the ratio of the NPNS output signal energy to the secondary microphone energy, ILD is often interchangeable with Null Processing Inter-microphone Level Difference (NP-ILD). “Raw-ILD” may be used to disambiguate a case where the ILD is computed from the “raw” primary and secondary microphone signals.


Determining energy level estimates and inter-microphone level differences is discussed in more detail in U.S. patent application Ser. No. 11/343,524, entitled “System and Method for Utilizing Inter-Microphone Level Differences for Speech Enhancement”, which is incorporated by reference herein.


Source inference engine module 306 may process the frame energy estimations provided by feature extraction module 304 to compute noise estimates and derive models of the noise and speech in the sub-band signals. Source inference engine module 306 adaptively estimates attributes of the acoustic sources, such as their energy spectra of the output signal of the NPNS module 310. The energy spectra attribute may be utilized to generate a multiplicative mask in mask generator module 308.


The source inference engine module 306 may receive the NP-ILD from feature extraction module 304 and track the NP-ILD probability distributions or “clusters” of the target audio source 102, background noise and optionally echo.


This information is then used, along with other auditory cues, to define classification boundaries between source and noise classes. The NP-ILD distributions of speech, noise and echo may vary over time due to changing environmental conditions, movement of the audio device 104, position of the hand and/or face of the user, other objects relative to the audio device 104, and other factors. The cluster tracker adapts to the time-varying NP-ILDs of the speech or noise source(s).


An example of tracking clusters by a cluster tracker module is disclosed in U.S. patent application Ser. No. 12/004,897, entitled “System and method for Adaptive Classification of Audio Sources,” filed on Dec. 21, 2007, the disclosure of which is incorporated herein by reference.


Source inference engine module 306 may include a noise estimate module which may receive a noise/speech classification control signal from the cluster tracker module and the output of noise canceller module 310 to estimate the noise N(t,w), wherein t is a point in time and W represents a frequency or sub-band. A speech to noise ratio (SNR) can be generated by source inference engine module 306 from the noise estimate and a speech estimate, and the SNR can be provided to other modules within the audio device, such as automatic speech recognition module 316 and mask selector 308.


Gain data store 307 may include one or more stored gain values and may communicate with mask selector 308. Each stored gain may be associated with a set of one or more features. An exemplary set of features may include a speech to noise ratio and a frequency (i.e., a center frequency for a sub band). Other feature data may also be stored in gain store 307. Each gain stored in gain data store 307 may, when applied to a sub-band signal, provide as close to a clean speech signal as possible. Though the gains provide a speech signal with a reduced amount of noise, they may not provide the perceptually most desirable sounding speech.


In some embodiments, each gain stored in gain store 307 may be optimized for a set of features, such as for example a particular frequency and speech to noise ratio. For example, to determine an optimal gain value for a particular combination of features, a known speech spectrum may be combined with noise at various speech to noise ratios. Because the energy spectrum and noise are known, a gain can be determined which suppress the combined speech-noise signal into a clean speech signal which is ideal for speech recognition. In some embodiments, the gain is configured to suppress the speech-noise signal such that noise is reduced but no portion of the speech signal is attenuated or degraded. These gains derived from the combined signals for a known SNR are stored in the gain data store for different combination of frequency and speech to noise ratio.


Mask selector 308 may receive a set of one or more features and/or other data from source inference engine 306, query gain data store 307 for a gain associated with a particular set of features and/or other data, and provide an accessed gain to modifier 312. For example, for a particular sub band, mask selector 308 may receive a particular speech to noise ratio from source inference engine 306 for the particular sub band in the current frame. Mask selector 308 may then query data store 307 for a gain that is associated with the combination of the speech to noise ratio and the current sub band center frequency. Mask selector 308 receives the corresponding gain from gain data store 307 and provides the gain to modifier 312.


The accessed gain may be applied to the estimated noise subtracted sub-band signals provided, for example as a multiplicative mask, by noise canceller 310 to modifier 312. The modifier module 312 multiplies the gain masks to the noise-subtracted sub-band signals of the primary acoustic signal output by the noise canceller module 310. Applying the mask reduces energy levels of noise components in the sub-band signals of the primary acoustic signal and results in noise reduction.


Modifier module 312 receives the signal path cochlear samples from noise canceller module 310 and applies a gain mask received from mask selector 308 to the received samples. The signal path cochlear samples may include the noise subtracted sub-band signals for the primary acoustic signal. The gain mask provided by mask selector 308 may vary quickly, such as from frame to frame, and noise and speech estimates may vary between frames. To help address the variance, the upwards and downwards temporal slew rates of the mask may be constrained to within reasonable limits by modifier 312. The mask may be interpolated from the frame rate to the sample rate using simple linear interpolation, and applied to the sub-band signals by multiplicative noise suppression. Modifier module 312 may output masked frequency sub-band signals.


Reconstructor module 314 may convert the masked frequency sub-band signals from the cochlea domain back into the time domain. The conversion may include adding the masked frequency sub-band signals and phase shifted signals. Alternatively, the conversion may include multiplying the masked frequency sub-band signals with an inverse frequency of the cochlea channels. Once conversion to the time domain is completed, the synthesized acoustic signal may be output to the user via output device 206 and/or provided to a codec for encoding.


In some embodiments, additional post-processing of the synthesized time domain acoustic signal may be performed. For example, comfort noise generated by a comfort noise generator may be added to the synthesized acoustic signal prior to providing the signal to the user. Comfort noise may be a uniform constant noise that is not usually discernable to a listener (e.g., pink noise). This comfort noise may be added to the synthesized acoustic signal to enforce a threshold of audibility and to mask low-level non-stationary output noise components. In some embodiments, the comfort noise level may be chosen to be just above a threshold of audibility and may be settable by a user. In some embodiments, the mask generator module 308 may have access to the level of comfort noise in order to generate gain masks that will suppress the noise to a level at or below the comfort noise.


Automatic speech recognition module 316 may perform a speech recognition analysis on the reconstructed signal output by reconstructor 314. Automatic speech recognition module 316 may receive a voice activity detection (VAD) signal as well as a speech to noise (SNR) ratio indication or other noise suppression information from source inference engine 306. The information received from source information engine 306, such as the VAD and SNR, may be used to optimize the speech recognition process performed by automatic speech recognition module 316. Speech recognition module 316 is discussed in more detail below.


The system of FIG. 3 may process several types of signals received by an audio device. The system may be applied to acoustic signals received via one or more microphones. The system may also process signals, such as a digital Rx signal, received through an antenna or other connection.


An exemplary system which may be used to implement at least a portion of audio processing system 210 is described in U.S. patent application Ser. No. 12/832,920, titled “Multi-Microphone Robust Noise Suppression,” filed Jul. 8, 2010, the disclosure of which is incorporated herein by reference



FIG. 4 is a flow chart of an exemplary method for performing speech recognition based on noise suppression information. First, a primary acoustic signal and a secondary acoustic signal are received at step 410. The signals may be received through microphones 106 and 108 of audio device 104. Sub band signals may then be generated from the primary acoustic signal and secondary acoustic signal at step 420. The received signals may be converted to sub band signals by frequency analysis module 302.


A feature is determined for a sub band signal at step 430. Feature extractor 304 may extract features for each sub band in the current frame or the frame as a whole. Features may include a speech energy level for a particular sub band noise level, pitch, and other features. Noise suppression information is then generated from the features at step 440. The noise suppression information may be generated and output by source inference engine 306 from features received from feature extraction module 304. The noise suppression information may include an SNR ratio for each sub band in the current frame, a VAD signal for the current frame, ILD, and other noise suppression information.


Noise suppression may be performed on a sub band signal based on noise suppression information at step 450. The noise suppression may include accessing a gain value based on one or more features and applying the gain to a sub band acoustic signal. Performing noise suppression on a sub band signal is discussed in more detail below with respect to the method of FIG. 5. Additionally, noise suppression performed on a sub band signal may include performing noise cancellation by noise canceller 310 in the audio processing system of FIG. 3.


Noise suppression information may be provided to speech recognition module 316 at step 460. Speech recognition module 316 may receive noise suppression information to assist with speech recognition. Providing noise suppression information to speech recognition module 316 is discussed in more detail below with respect to FIG. 6.


Speech recognition is automatically performed based on the noise suppression information at step 470. The speech recognition process may be optimized based on the noise suppression information. Performing speech recognition based on noise suppression information may include modulating a bit rate of a speech encoder or decoder based on a speech to noise ratio for a particular frame. In some embodiments, the bit rate is decreased when the speech to noise ratio is large. In some embodiments, speech recognition based on noise suppression may include setting a node search depth level by a speech recognition module based on a speech to noise ratio for a current frame. The node search depth level, for example, may be decreased when the speech to noise ratio is large.



FIG. 5 is a flow chart of an exemplary method for performing noise suppression on a sub band signal. The method of FIG. 5 provides more detail for step 450 in the method of FIG. 4. A speech to noise ratio (SNR) for a sub band is accessed at step 510. The SNR may be received by mask selector 308 from source inference engine 306. Mask selector 308 also has access to sub band information for the sub band being considered.


A gain which corresponds to the sub band signal frequency and the signal of the noise ratio is accessed at step 520. The gain is accessed by mask selector 308 from gain data store 307 and may correspond to a particular sub band signal frequency and SNR. The accessed gain is then applied to one or more sub band frequencies at step 530. The accessed gain may be provided to modifier 312 which then applies the gain to a sub band which may or may not be have undergone noise cancellation.



FIG. 6 is a flow chart of an exemplary method for providing noise suppression information to a speech recognition module. The method of FIG. 6 may provide more detail for step 460 than the method of FIG. 4. A determination as to whether speech is detected in a primary acoustic signal based on one or more features is performed at step 610. The detection may include detecting whether speech is or is not present within the signal within the current frame. In some embodiments, an SNR for the current sub band or for all sub bands may be compared to a threshold level. If the SNR is above the threshold value, then speech may be detected to be present in the primary acoustic signal. If the SNR is not above the threshold value, then speech may be determined to not be present in the current frame.


Each of steps 620-640 describe how speech recognition may be optimized based on noise suppression or noise suppression information and may be performed in combination or separately. Hence, in some embodiments, only one of step 620-640 may be performed. In some embodiments, more than just one of steps 620-640 may be performed when providing noise suppression information to a speech recognition module.


A speech recognition module is provided with a noise signal if a speech is not detected in a current frame of a signal at step 620. For example, if the determination is made that speech is not present in the current frame of a reconstructed signal, a noise signal is provided to acoustic speech recognition module 316 in order to ensure that no false positive for speech detection occurs. The noise signal may be any type of signal that has a high likelihood of not being mistaken for speech by the speech recognition module.


A speech recognition module may be provided with an indication that speech is present in the acoustic signal at step 630. In this case, automatic speech recognition module 316 may be provided with a VAD signal provided by source inference engine 306. The VAD signal may indicate whether or not speech is present in the signal provided to automatic speech recognition module 316. Automatic speech recognition module 316 may use the VAD signal to determine whether or not to perform speech recognition on the signal.


A speech to noise ratio (SNR) signal may be provided for the current frame and/or sub band to the speech recognition module at step 640. In this case, the SNR may provide a value within a range of values indicating whether or not speech is present. This may help the automatic speech recognition module learn when to expend resources to recognize speech and when not to.


The above described modules, including those discussed with respect to FIG. 3, may include instructions stored in a storage media such as a machine readable medium (e.g., computer readable medium). These instructions may be retrieved and executed by the processor 202 to perform the functionality discussed herein. Some examples of instructions include software, program code, and firmware. Some examples of storage media include memory devices and integrated circuits.


While the present invention is disclosed by reference to the preferred embodiments and examples detailed above, it is to be understood that these examples are intended in an illustrative rather than a limiting sense. It is contemplated that modifications and combinations will readily occur to those skilled in the art, which modifications and combinations will be within the spirit of the invention and the scope of the following claims.

Claims
  • 1. A method for processing an audio signal, comprising: generating sub-band signals from a received primary acoustic signal and a received secondary acoustic signal;determining two or more features for the sub-band signals, the two or more features including a speech energy level for the sub-band noise level and at least one of the following: inter-microphone level differences, inter-microphone time differences, and inter-microphone phase differences between the primary acoustic signal and the secondary acoustic signal;suppressing a noise component in the primary acoustic signal based on the two or more features, the suppressing configured to clean the primary acoustic signal to create a cleaned speech signal optimized for accurate speech recognition processing by an automatic speech recognition processing module, the suppressing comprising: applying a gain to a sub-band of the primary acoustic signal to provide a noise suppressed signal, the applying comprising: determining a speech to noise ratio (SNR) for the sub-band of the primary acoustic signal;accessing the gain, based on the frequency of the sub-band and the determined SNR for the sub-band, from a datastore, the datastore including a plurality of pre-stored gains configured to create cleaned speech signals optimized for accurate speech recognition processing by the automatic speech recognition processing module, each pre-stored gain in the plurality of pre-stored gains associated with a corresponding frequency and an SNR value; andapplying the accessed gain to the sub-band frequency; andproviding the cleaned speech signal and corresponding noise suppression information to the automatic speech recognition processing module, the noise suppression information based on the two or more features and including a voice activity detection signal.
  • 2. The method of claim 1, further comprising determining whether the primary acoustic signal includes speech, the determination performed based on the two or more features.
  • 3. The method of claim 2, further comprising providing a noise signal to the automatic speech recognition processing module in response to detecting that the primary acoustic signal does not include speech.
  • 4. The method of claim 2, wherein the voice activity detection signal is generated based on the determination of whether the primary acoustic signal includes speech, and the voice activity detection signal indicating whether automatic speech recognition is to occur.
  • 5. The method of claim 4, wherein the voice activity detection signal is a value within a range of values corresponding to the level of speech detected in the primary acoustic signal.
  • 6. The method of claim 2, wherein the noise suppression information includes a speech to noise ratio for the current time frame and the sub-band to the automatic speech recognition processing module.
  • 7. The method of claim 1, wherein the noise suppression information includes a speech to noise ratio, the method further comprising modulating a bit rate of a speech encoder or decoder based on the speech to noise ratio for a particular frame.
  • 8. The method of claim 1, wherein the noise suppression information includes a speech to noise ratio, the method further comprising setting a node search depth level by the automatic speech recognition processing module based on the speech to noise ratio for a current frame.
  • 9. A non-transitory computer readable storage medium having embodied thereon a program, the program being executable by a processor to perform a method for reducing noise in an audio signal, the method comprising: generating sub-band signals from a received primary acoustic signal and a received secondary acoustic signal;determining two or more features for a sub-band signal, the two or more features including a speech energy level for the sub-band noise level and at least one of the following: inter-microphone level differences, inter-microphone time differences, and inter-microphone phase differences between the primary acoustic signal and the secondary acoustic signal;suppressing a noise component in the primary acoustic signal based on the two or more features, the suppressing configured to clean the primary acoustic signal to create a cleaned speech signal optimized for accurate speech recognition processing by an automatic speech recognition processing module, the suppressing comprising: applying a gain to a sub-band of the primary acoustic signal to provide a noise suppressed signal, the applying comprising: determining a speech to noise ratio (SNR) for the sub-band of the primary acoustic signal;accessing the gain, based on the frequency of the sub-band and the determined SNR for the sub-band, from a datastore, the datastore including a plurality of pre-stored gains configured to create cleaned speech signals optimized for accurate speech recognition processing by the automatic speech recognition processing module, each pre-stored gain in the plurality of pre-stored gains associated with a corresponding frequency and an SNR value; andapplying the accessed gain to the sub-band frequency; andproviding the cleaned speech signal and corresponding noise suppression information to the automatic speech recognition processing module, the noise suppression information based on the two or more features and including a speech to noise ratio for each of the sub-band signals and a voice activity detection signal.
  • 10. The non-transitory computer readable storage medium of claim 9, further comprising providing a noise signal to the automatic speech recognition processing module in response to detecting that the primary acoustic signal does not include speech.
  • 11. The non-transitory computer readable storage medium of claim 9, wherein the voice activity detection signal is generated based on the determination of whether the primary acoustic signal includes speech, and the voice activity detection signal indicating whether automatic speech recognition is to occur.
  • 12. The non-transitory computer readable storage medium of claim 9, wherein the noise suppression information includes a speech to noise ratio for the current time frame and the sub-band to the automatic speech recognition processing module.
  • 13. The non-transitory computer readable storage medium of claim 9, wherein the noise suppression information includes a speech to noise ratio, the method further comprising modulating a bit rate of a speech encoder or decoder based on the speech to noise ratio for a particular frame.
  • 14. The non-transitory computer readable storage medium of claim 9, wherein the noise suppression information includes a speech to noise ratio, the method further comprising setting a node search depth level by the automatic speech recognition processing module based on the speech to noise ratio for a current frame.
CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the priority benefit of U.S. Provisional Application Ser. No. 61/346,851, titled “Noise Suppression Assisted Automatic Speech Recognition,” filed May 20, 2010, the disclosure of the aforementioned application is incorporated herein by reference.

US Referenced Citations (393)
Number Name Date Kind
4025724 Davidson, Jr. et al. May 1977 A
4630304 Borth et al. Dec 1986 A
4802227 Elko et al. Jan 1989 A
4969203 Herman Nov 1990 A
5115404 Lo et al. May 1992 A
5289273 Lang Feb 1994 A
5400409 Linhard Mar 1995 A
5406635 Jarvinen Apr 1995 A
5546458 Iwami Aug 1996 A
5550924 Helf et al. Aug 1996 A
5555306 Gerzon Sep 1996 A
5625697 Bowen et al. Apr 1997 A
5706395 Arslan et al. Jan 1998 A
5715319 Chu Feb 1998 A
5734713 Mauney et al. Mar 1998 A
5754665 Hosoi May 1998 A
5774837 Yeldener et al. Jun 1998 A
5806025 Vis et al. Sep 1998 A
5819215 Dobson et al. Oct 1998 A
5839101 Vahatalo et al. Nov 1998 A
5905969 Mokbel May 1999 A
5917921 Sasaki et al. Jun 1999 A
5943429 Handel Aug 1999 A
5978824 Ikeda Nov 1999 A
5991385 Dunn et al. Nov 1999 A
6011853 Koski et al. Jan 2000 A
6035177 Moses et al. Mar 2000 A
6065883 Herring et al. May 2000 A
6084916 Ott Jul 2000 A
6098038 Hermansky et al. Aug 2000 A
6122384 Mauro Sep 2000 A
6122610 Isabelle Sep 2000 A
6144937 Ali Nov 2000 A
6188769 Jot et al. Feb 2001 B1
6205421 Morii Mar 2001 B1
6219408 Kurth Apr 2001 B1
6263307 Arslan et al. Jul 2001 B1
6266633 Higgins et al. Jul 2001 B1
6281749 Klayman et al. Aug 2001 B1
6327370 Killion et al. Dec 2001 B1
6339706 Tillgren et al. Jan 2002 B1
6339758 Kanazawa et al. Jan 2002 B1
6343267 Kuhn et al. Jan 2002 B1
6381284 Strizhevskiy Apr 2002 B1
6381469 Wojick Apr 2002 B1
6389142 Hagen et al. May 2002 B1
6411930 Burges Jun 2002 B1
6424938 Johansson et al. Jul 2002 B1
6449586 Hoshuyama Sep 2002 B1
6453284 Paschall Sep 2002 B1
6480610 Fang et al. Nov 2002 B1
6487257 Gustafsson et al. Nov 2002 B1
6504926 Edelson et al. Jan 2003 B1
6526140 Marchok et al. Feb 2003 B1
6615170 Liu et al. Sep 2003 B1
6717991 Gustafsson et al. Apr 2004 B1
6738482 Jaber May 2004 B1
6745155 Andringa et al. Jun 2004 B1
6748095 Goss Jun 2004 B1
6751588 Menendez-Pidal Jun 2004 B1
6768979 Menendez-Pidal Jul 2004 B1
6778954 Kim et al. Aug 2004 B1
6782363 Lee et al. Aug 2004 B2
6804651 Juric et al. Oct 2004 B2
6810273 Mattila et al. Oct 2004 B1
6873837 Yoshioka et al. Mar 2005 B1
6882736 Dickel et al. Apr 2005 B2
6931123 Hughes Aug 2005 B1
6980528 LeBlanc et al. Dec 2005 B1
7006881 Hoffberg Feb 2006 B1
7010134 Jensen Mar 2006 B2
7020605 Gao Mar 2006 B2
RE39080 Johnston Apr 2006 E
7035666 Silberfenig et al. Apr 2006 B2
7054808 Yoshida May 2006 B2
7058572 Nemer Jun 2006 B1
7065486 Thyssen Jun 2006 B1
7072834 Zhou Jul 2006 B2
7092529 Yu et al. Aug 2006 B2
7092882 Arrowood et al. Aug 2006 B2
7103176 Rodriguez et al. Sep 2006 B2
7110554 Brennan et al. Sep 2006 B2
7127072 Rademacher et al. Oct 2006 B2
7145710 Holmes Dec 2006 B2
7146013 Saito et al. Dec 2006 B1
7165026 Acero et al. Jan 2007 B2
7171246 Mattila et al. Jan 2007 B2
7190775 Rambo Mar 2007 B2
7209567 Kozel et al. Apr 2007 B1
7221622 Matsuo et al. May 2007 B2
7225001 Eriksson et al. May 2007 B1
7245710 Hughes Jul 2007 B1
7245767 Moreno et al. Jul 2007 B2
7254535 Kushner et al. Aug 2007 B2
7289955 Deng et al. Oct 2007 B2
7327985 Morfitt, III et al. Feb 2008 B2
7359520 Brennan et al. Apr 2008 B2
7376558 Gemello et al. May 2008 B2
7383179 Alves et al. Jun 2008 B2
7447631 Truman et al. Nov 2008 B2
7469208 Kincaid Dec 2008 B1
7516067 Seltzer et al. Apr 2009 B2
7548791 Johnston Jun 2009 B1
7562140 Clemm et al. Jul 2009 B2
7574352 Quatieri, Jr. Aug 2009 B2
7617282 Han Nov 2009 B2
7657038 Doclo et al. Feb 2010 B2
7664495 Bonner et al. Feb 2010 B1
7664640 Webber Feb 2010 B2
7685132 Hyman Mar 2010 B2
7725314 Wu et al. May 2010 B2
7773741 LeBlanc et al. Aug 2010 B1
7791508 Wegener Sep 2010 B2
7796978 Jones et al. Sep 2010 B2
7895036 Hetherington et al. Feb 2011 B2
7899565 Johnston Mar 2011 B1
7925502 Droppo et al. Apr 2011 B2
7970123 Beaucoup Jun 2011 B2
8032364 Watts Oct 2011 B1
8036767 Soulodre Oct 2011 B2
8046219 Zurek et al. Oct 2011 B2
8081878 Zhang et al. Dec 2011 B1
8107656 Dreβler et al. Jan 2012 B2
8126159 Goose et al. Feb 2012 B2
8140331 Lou Mar 2012 B2
8143620 Malinowski et al. Mar 2012 B1
8155953 Park et al. Apr 2012 B2
8175291 Chan et al. May 2012 B2
8189429 Chen et al. May 2012 B2
8194880 Avendano Jun 2012 B2
8194882 Every et al. Jun 2012 B2
8204252 Avendano Jun 2012 B1
8204253 Solbach Jun 2012 B1
8223988 Wang et al. Jul 2012 B2
8229137 Romesburg Jul 2012 B2
8280731 Yu Oct 2012 B2
8345890 Avendano et al. Jan 2013 B2
8359195 Li Jan 2013 B2
8363823 Santos Jan 2013 B1
8363850 Amada Jan 2013 B2
8369973 Risbo Feb 2013 B2
8447596 Avendano et al. May 2013 B2
8467891 Huang et al. Jun 2013 B2
8473285 Every et al. Jun 2013 B2
8494193 Zhang et al. Jul 2013 B2
8531286 Friar et al. Sep 2013 B2
8538035 Every et al. Sep 2013 B2
8606249 Goodwin Dec 2013 B1
8615392 Goodwin Dec 2013 B1
8615394 Avendano et al. Dec 2013 B1
8639516 Lindahl et al. Jan 2014 B2
8682006 Laroche et al. Mar 2014 B1
8694310 Taylor Apr 2014 B2
8705759 Wolff et al. Apr 2014 B2
8718290 Murgia et al. May 2014 B2
8744844 Klein Jun 2014 B2
8750526 Santos et al. Jun 2014 B1
8762144 Cho et al. Jun 2014 B2
8774423 Solbach Jul 2014 B1
8781137 Goodwin Jul 2014 B1
8798290 Choi et al. Aug 2014 B1
8880396 Laroche Nov 2014 B1
8886525 Klein Nov 2014 B2
8903721 Cowan Dec 2014 B1
8949120 Every et al. Feb 2015 B1
8949266 Phillips et al. Feb 2015 B2
9007416 Murgia et al. Apr 2015 B1
9008329 Mandel et al. Apr 2015 B1
9143857 Every et al. Sep 2015 B2
9185487 Solbach et al. Nov 2015 B2
9197974 Clark et al. Nov 2015 B1
9343056 Goodwin May 2016 B1
9431023 Avendano et al. Aug 2016 B2
20010044719 Casey Nov 2001 A1
20020002455 Accardi et al. Jan 2002 A1
20020041678 Basburg-Ertem et al. Apr 2002 A1
20020071342 Marple et al. Jun 2002 A1
20020138263 Deligne Sep 2002 A1
20020156624 Gigi Oct 2002 A1
20020160751 Sun et al. Oct 2002 A1
20020176589 Buck et al. Nov 2002 A1
20020177995 Walker Nov 2002 A1
20020194159 Kamath et al. Dec 2002 A1
20030014248 Vetter Jan 2003 A1
20030040908 Yang et al. Feb 2003 A1
20030056220 Thornton et al. Mar 2003 A1
20030063759 Brennan et al. Apr 2003 A1
20030093279 Malah et al. May 2003 A1
20030099370 Moore May 2003 A1
20030101048 Liu May 2003 A1
20030103632 Goubran et al. Jun 2003 A1
20030118200 Beaucoup et al. Jun 2003 A1
20030128851 Furuta Jul 2003 A1
20030147538 Elko Aug 2003 A1
20030169891 Ryan et al. Sep 2003 A1
20030177006 Ichikawa Sep 2003 A1
20030179888 Burnett et al. Sep 2003 A1
20030191641 Acero et al. Oct 2003 A1
20040013276 Ellis et al. Jan 2004 A1
20040066940 Amir Apr 2004 A1
20040076190 Goel et al. Apr 2004 A1
20040078199 Kremer et al. Apr 2004 A1
20040102967 Furuta et al. May 2004 A1
20040131178 Shahaf et al. Jul 2004 A1
20040145871 Lee Jul 2004 A1
20040148166 Zheng Jul 2004 A1
20040172240 Crockett Sep 2004 A1
20040184882 Cosgrove Sep 2004 A1
20040185804 Kanamori et al. Sep 2004 A1
20040263636 Cutler et al. Dec 2004 A1
20050008169 Muren et al. Jan 2005 A1
20050027520 Mattila et al. Feb 2005 A1
20050049857 Seltzer et al. Mar 2005 A1
20050066279 LeBarton et al. Mar 2005 A1
20050069162 Haykin et al. Mar 2005 A1
20050075866 Widrow Apr 2005 A1
20050080616 Leung et al. Apr 2005 A1
20050114123 Lukac et al. May 2005 A1
20050114128 Hetherington et al. May 2005 A1
20050152563 Amada et al. Jul 2005 A1
20050213739 Rodman et al. Sep 2005 A1
20050238238 Xu et al. Oct 2005 A1
20050240399 Makinen Oct 2005 A1
20050261894 Balan et al. Nov 2005 A1
20050261896 Schuijers et al. Nov 2005 A1
20050267369 Lazenby et al. Dec 2005 A1
20050276363 Joublin et al. Dec 2005 A1
20050281410 Grosvenor et al. Dec 2005 A1
20050288923 Kok Dec 2005 A1
20060053007 Niemisto Mar 2006 A1
20060058998 Yamamoto et al. Mar 2006 A1
20060063560 Herle Mar 2006 A1
20060072768 Schwartz et al. Apr 2006 A1
20060092918 Talalai May 2006 A1
20060100868 Hetherington et al. May 2006 A1
20060122832 Takiguchi Jun 2006 A1
20060136201 Landron et al. Jun 2006 A1
20060136203 Ichikawa Jun 2006 A1
20060153391 Hooley et al. Jul 2006 A1
20060165202 Thomas et al. Jul 2006 A1
20060184363 McCree et al. Aug 2006 A1
20060206320 Li Sep 2006 A1
20060224382 Taneda Oct 2006 A1
20060282263 Vos et al. Dec 2006 A1
20070003097 Langberg et al. Jan 2007 A1
20070005351 Sathyendra et al. Jan 2007 A1
20070025562 Zalewski et al. Feb 2007 A1
20070033020 (Kelleher) Francois et al. Feb 2007 A1
20070033032 Schubert et al. Feb 2007 A1
20070041589 Patel et al. Feb 2007 A1
20070055508 Zhao Mar 2007 A1
20070058822 Ozawa Mar 2007 A1
20070064817 Dunne et al. Mar 2007 A1
20070071206 Gainsboro et al. Mar 2007 A1
20070081075 Canova, Jr. et al. Apr 2007 A1
20070110263 Brox May 2007 A1
20070127668 Ahya et al. Jun 2007 A1
20070150268 Acero Jun 2007 A1
20070154031 Avendano et al. Jul 2007 A1
20070185587 Kondo Aug 2007 A1
20070195968 Jaber Aug 2007 A1
20070230712 Belt et al. Oct 2007 A1
20070230913 Ichimura Oct 2007 A1
20070237339 Konchitsky Oct 2007 A1
20070253574 Soulodre Nov 2007 A1
20070282604 Gartner et al. Dec 2007 A1
20070287490 Green et al. Dec 2007 A1
20070294263 Punj et al. Dec 2007 A1
20080019548 Avendano Jan 2008 A1
20080059163 Ding et al. Mar 2008 A1
20080069366 Soulodre Mar 2008 A1
20080111734 Fam et al. May 2008 A1
20080159507 Virolainen et al. Jul 2008 A1
20080160977 Ahmaniemi et al. Jul 2008 A1
20080170703 Zivney Jul 2008 A1
20080187143 Mak-Fan Aug 2008 A1
20080192955 Merks Aug 2008 A1
20080228474 Huang et al. Sep 2008 A1
20080233934 Diethorn Sep 2008 A1
20080247567 Kjolerbakken et al. Oct 2008 A1
20080259731 Happonen Oct 2008 A1
20080273476 Cohen et al. Nov 2008 A1
20080298571 Kurtz et al. Dec 2008 A1
20080304677 Abolfathi et al. Dec 2008 A1
20080317259 Zhang Dec 2008 A1
20080317261 Yoshida et al. Dec 2008 A1
20090012783 Klein Jan 2009 A1
20090034755 Short et al. Feb 2009 A1
20090060222 Jeong et al. Mar 2009 A1
20090063143 Schmidt et al. Mar 2009 A1
20090089054 Wang et al. Apr 2009 A1
20090116652 Kirkeby et al. May 2009 A1
20090116656 Lee et al. May 2009 A1
20090134829 Baumann et al. May 2009 A1
20090141908 Jeong et al. Jun 2009 A1
20090147942 Culter Jun 2009 A1
20090150149 Culter et al. Jun 2009 A1
20090164905 Ko Jun 2009 A1
20090177464 Gao et al. Jul 2009 A1
20090192791 El-Maleh et al. Jul 2009 A1
20090192803 Nagaraja Jul 2009 A1
20090220107 Every et al. Sep 2009 A1
20090226010 Schnell et al. Sep 2009 A1
20090228272 Herbig et al. Sep 2009 A1
20090240497 Usher et al. Sep 2009 A1
20090253418 Makinen Oct 2009 A1
20090264114 Virolainen et al. Oct 2009 A1
20090271187 Yen et al. Oct 2009 A1
20090292536 Hetherington et al. Nov 2009 A1
20090303350 Terada Dec 2009 A1
20090323655 Cardona et al. Dec 2009 A1
20090323925 Sweeney et al. Dec 2009 A1
20090323981 Cutler Dec 2009 A1
20090323982 Solbach et al. Dec 2009 A1
20100017205 Visser et al. Jan 2010 A1
20100036659 Haulick et al. Feb 2010 A1
20100082339 Konchitsky et al. Apr 2010 A1
20100092007 Sun Apr 2010 A1
20100094622 Cardillo et al. Apr 2010 A1
20100103776 Chan Apr 2010 A1
20100105447 Sibbald et al. Apr 2010 A1
20100128123 DiPoala May 2010 A1
20100130198 Kannappan et al. May 2010 A1
20100138220 Matsumoto et al. Jun 2010 A1
20100177916 Gerkmann et al. Jul 2010 A1
20100215184 Buck et al. Aug 2010 A1
20100217837 Ansari et al. Aug 2010 A1
20100245624 Beaucoup Sep 2010 A1
20100278352 Petit et al. Nov 2010 A1
20100282045 Chen et al. Nov 2010 A1
20100303298 Marks et al. Dec 2010 A1
20100315482 Rosenfeld et al. Dec 2010 A1
20110026734 Hetherington et al. Feb 2011 A1
20110038486 Beaucoup Feb 2011 A1
20110060587 Phillips et al. Mar 2011 A1
20110081024 Soulodre Apr 2011 A1
20110081026 Ramakrishnan et al. Apr 2011 A1
20110091047 Konchitsky et al. Apr 2011 A1
20110101654 Cech May 2011 A1
20110129095 Avendano et al. Jun 2011 A1
20110173006 Nagel et al. Jul 2011 A1
20110173542 Imes et al. Jul 2011 A1
20110178800 Watts Jul 2011 A1
20110182436 Murgia et al. Jul 2011 A1
20110224994 Norvell et al. Sep 2011 A1
20110261150 Goyal et al. Oct 2011 A1
20110280154 Silverstrim et al. Nov 2011 A1
20110286605 Furuta et al. Nov 2011 A1
20110300806 Lindahl et al. Dec 2011 A1
20110305345 Bouchard et al. Dec 2011 A1
20120010881 Avendano et al. Jan 2012 A1
20120027217 Jun et al. Feb 2012 A1
20120027218 Every et al. Feb 2012 A1
20120050582 Seshadri et al. Mar 2012 A1
20120062729 Hart et al. Mar 2012 A1
20120063609 Triki et al. Mar 2012 A1
20120087514 Williams et al. Apr 2012 A1
20120093341 Kim et al. Apr 2012 A1
20120116758 Murgia et al. May 2012 A1
20120116769 Malah et al. May 2012 A1
20120133728 Lee May 2012 A1
20120143363 Liu et al. Jun 2012 A1
20120179461 Every et al. Jul 2012 A1
20120179462 Klein Jul 2012 A1
20120182429 Forutanpour et al. Jul 2012 A1
20120197898 Pandey et al. Aug 2012 A1
20120202485 Mirbaha et al. Aug 2012 A1
20120220347 Davidson Aug 2012 A1
20120231778 Chen et al. Sep 2012 A1
20120249785 Sudo et al. Oct 2012 A1
20120250882 Mohammad et al. Oct 2012 A1
20130011111 Abraham et al. Jan 2013 A1
20130024190 Fairey Jan 2013 A1
20130034243 Yermeche et al. Feb 2013 A1
20130051543 McDysan et al. Feb 2013 A1
20130182857 Namba et al. Jul 2013 A1
20130196715 Hansson et al. Aug 2013 A1
20130231925 Avendano et al. Sep 2013 A1
20130251170 Every et al. Sep 2013 A1
20130268280 Del Galdo et al. Oct 2013 A1
20130332171 Avendano et al. Dec 2013 A1
20140039888 Taubman et al. Feb 2014 A1
20140098964 Rosca et al. Apr 2014 A1
20140108020 Sharma et al. Apr 2014 A1
20140112496 Murgia et al. Apr 2014 A1
20140142958 Sharma et al. May 2014 A1
20140241702 Solbach et al. Aug 2014 A1
20140337016 Herbig et al. Nov 2014 A1
20150030163 Sokolov Jan 2015 A1
20150100311 Kar et al. Apr 2015 A1
20160027451 Solbach et al. Jan 2016 A1
20160063997 Nemala et al. Mar 2016 A1
20160066089 Klein Mar 2016 A1
Foreign Referenced Citations (63)
Number Date Country
0756437 Jan 1997 EP
1232496 Aug 2002 EP
1536660 Jun 2005 EP
20100431 Dec 2010 FI
20125812 Oct 2012 FI
20135038 Apr 2013 FI
124716 Dec 2014 FI
H0553587 Mar 1993 JP
H07248793 Sep 1995 JP
H05300419 Dec 1995 JP
2001159899 Jun 2001 JP
2002366200 Dec 2002 JP
2002542689 Dec 2002 JP
2003514473 Apr 2003 JP
2003271191 Sep 2003 JP
2004187283 Jul 2004 JP
2006094522 Apr 2006 JP
2006515490 May 2006 JP
2006337415 Dec 2006 JP
2007006525 Jan 2007 JP
2008015443 Jan 2008 JP
2008135933 Jun 2008 JP
2008542798 Nov 2008 JP
2009037042 Feb 2009 JP
2010532879 Oct 2010 JP
2011527025 Oct 2011 JP
H07336793 Dec 2011 JP
2013517531 May 2013 JP
2013534651 Sep 2013 JP
5762956 Jun 2015 JP
1020100041741 Apr 2010 KR
1020110038024 Apr 2011 KR
1020120116442 Oct 2012 KR
1020130117750 Oct 2013 KR
101461141 Nov 2014 KR
101610656 Apr 2016 KR
526468 Apr 2003 TW
I279776 Apr 2007 TW
200910793 Mar 2009 TW
201009817 Mar 2010 TW
201143475 Dec 2011 TW
201214418 Apr 2012 TW
I463817 Dec 2014 TW
I488179 Jun 2015 TW
WO8400634 Feb 1984 WO
WO0137265 May 2001 WO
WO0156328 Aug 2001 WO
WO2006027707 Mar 2006 WO
WO2007001068 Jan 2007 WO
WO2007049644 May 2007 WO
WO2008034221 Mar 2008 WO
WO2008101198 Aug 2008 WO
WO2009008998 Jan 2009 WO
WO2010005493 Jan 2010 WO
WO2011068901 Jun 2011 WO
WO2011091068 Jul 2011 WO
WO2011129725 Oct 2011 WO
WO2012009047 Jan 2012 WO
WO2012097016 Jul 2012 WO
WO2013188562 Dec 2013 WO
WO2014063099 Apr 2014 WO
WO2014131054 Aug 2014 WO
WO2016033364 Mar 2016 WO
Non-Patent Literature Citations (125)
Entry
Non-Final Office Action, Aug. 18, 2010, U.S. Appl. No. 11/825,563, filed Jul. 6, 2007.
Final Office Action, Apr. 28, 2011, U.S. Appl. No. 11/825,563, filed Jul. 6, 2007.
Non-Final Office Action, Apr. 24, 2013, U.S. Appl. No. 11/825,563, filed Jul. 6, 2007.
Final Office Action, Dec. 30, 2013, U.S. Appl. No. 11/825,563, filed Jul. 6, 2007.
Notice of Allowance, Mar. 25, 2014, U.S. Appl. No. 11/825,563, filed Jul. 6, 2007.
Non-Final Office Action, Sep. 14, 2011, U.S. Appl. No. 12/004,897, filed Dec. 21, 2007.
Notice of Allowance, Jan. 27, 2012, U.S. Appl. No. 12/004,897, filed Dec. 21, 2007.
Non-Final Office Action, Jul. 28, 2011, U.S. Appl. No. 12/072,931, filed Feb. 29, 2008.
Notice of Allowance, Mar. 1, 2012, U.S. Appl. No. 12/072,931, filed Feb. 29, 2008.
Notice of Allowance, Mar. 1, 2012, U.S. Appl. No. 12/080,115, filed Mar. 31, 2008.
Non-Final Office Action, Nov. 14, 2011, U.S. Appl. No. 12/215,980, filed Jun. 30, 2008.
Final Office Action, Apr. 24, 2012, U.S. Appl. No. 12/215,980, filed Jun. 30, 2008.
Advisory Action, Jul. 3, 2012, U.S. Appl. No. 12/215,980, filed Jun. 30, 2008.
Non-Final Office Action, Mar. 11, 2014, U.S. Appl. No. 12/215,980, filed Jun. 30, 2008.
Final Office Action, Jul. 11, 2014, U.S. Appl. No. 12/215,980, filed Jun. 30, 2008.
Non-Final Office Action, Dec. 8, 2014, U.S. Appl. No. 12/215,980, filed Jun. 30, 2008.
Notice of Allowance, Jul. 7, 2015, U.S. Appl. No. 12/215,980, filed Jun. 30, 2008.
Non-Final Office Action, Sep. 1, 2011, U.S. Appl. No. 12/286,909, filed Oct. 2, 2008.
Notice of Allowance, Feb. 28, 2012, U.S. Appl. No. 12/286,909, filed Oct. 2, 2008.
Non-Final Office Action, Nov. 15, 2011, U.S. Appl. No. 12/286,995, filed Oct. 2, 2008.
Final Office Action, Apr. 10, 2012, U.S. Appl. No. 12/286,995, filed Oct. 2, 2008.
Notice of Allowance, Mar. 13, 2014, U.S. Appl. No. 12/286,995, filed Oct. 2, 2008.
Non-Final Office Action, Aug. 1, 2012, U.S. Appl. No. 12/860,043, filed Aug. 20, 2010.
Notice of Allowance, Jan. 18, 2013, U.S. Appl. No. 12/860,043, filed Aug. 22, 2010.
Non-Final Office Action, Aug. 17, 2012, U.S. Appl. No. 12/868,622, filed Aug. 25, 2010.
Final Office Action, Feb. 22, 2013, U.S. Appl. No. 12/868,622, filed Aug. 25, 2010.
Advisory Action, May 14, 2013, U.S. Appl. No. 12/868,622, filed Aug. 25, 2010.
Notice of Allowance, May 1, 2014, U.S. Appl. No. 12/868,622, filed Aug. 25, 2010.
Non-Final Office Action, Feb. 19, 2013, U.S. Appl. No. 12/944,659, filed Nov. 11, 2010.
Final Office Action, Jan. 12, 2016, U.S. Appl. No. 12/959,994, filed Dec. 3, 2010.
Notice of Allowance, May 25, 2011, U.S. Appl. No. 13/016,916, filed Jan. 28, 2011.
Notice of Allowance, Aug. 4, 2011, U.S. Appl. No. 13/016,916, filed Jan. 28, 2011.
Notice of Allowance, Oct. 3, 2013, U.S. Appl. No. 13/157,238, filed Jun. 9, 2011.
Non-Final Office Action, Nov. 2013, U.S. Appl. No. 13/363,362, filed Jan. 31, 2012.
Final Office Action, Sep. 12, 2014, U.S. Appl. No. 13/363,362, filed Jan. 31, 2012.
Non-Final Office Action, Oct. 28, 2015, U.S. Appl. No. 13/363,362, filed Jan. 31, 2012.
Non-Final Office Action, Dec. 4, 2013, U.S. Appl. No. 13/396,568, filed Feb. 14, 2012.
Final Office Action, Sep. 23, 2014, U.S. Appl. No. 13/396,568, filed Feb. 14, 2012.
Non-Final Office Action, Nov. 5, 2015, U.S. Appl. No. 13/396,568, filed Feb. 14, 2012.
Non-Final Office Action, May 11, 2012, U.S. Appl. No. 13/424,189, filed Mar. 19, 2012.
Final Office Action, Sep. 4, 2012, U.S. Appl. No. 13/424,189, filed Mar. 19, 2012.
Final Office Action, Nov. 28, 2012, U.S. Appl. No. 13/424,189, filed Mar. 19, 2012.
Notice of Allowance, Mar. 7, 2013, U.S. Appl. No. 13/424,189, filed Mar. 19, 2012.
Non-Final Office Action, Jun. 7, 2012, U.S. Appl. No. 13/426,436, filed Mar. 21, 2012.
Final Office Action, Dec. 31, 2012, U.S. Appl. No. 13/426,436, filed Mar. 21, 2012.
Non-Final Office Action, Sep. 12, 2013, U.S. Appl. No. 13/426,436, filed Mar. 21, 2012.
Notice of Allowance, Jul. 16, 2014, U.S. Appl. No. 13/426,436, filed Mar. 21, 2012.
Non-Final Office Action, Nov. 7, 2012, U.S. Appl. No. 13/492,780, filed Jun. 8, 2012.
Non-Final Office Action, May 8, 2013, U.S. Appl. No. 13/492,780, filed Jun. 8, 2012.
Final Office Action, Oct. 23, 2013, U.S. Appl. No. 13/492,780, filed Jun. 8, 2012.
Notice of Allowance, Nov. 24, 2014, U.S. Appl. No. 13/492,780, filed Jun. 8, 2012.
Non-Final Office Action, May 23, 2014, U.S. Appl. No. 13/859,186, filed Apr. 9, 2013.
Final Office Action, Dec. 3, 2014, U.S. Appl. No. 13/859,186, filed Apr. 9, 2013.
Non-Final Office Action, Jul. 7, 2015, U.S. Appl. No. 13/859,186, filed Apr. 9, 2013.
Final Office Action, Feb. 2, 2016, U.S. Appl. No. 13/859,186, filed Apr. 9, 2013.
Notice of Allowance, Apr. 28, 2016, U.S. Appl. No. 13/859,186, filed Apr. 9, 2013.
Non-Final Office Action, Apr. 17, 2015, U.S. Appl. No. 13/888,796, filed May 7, 2013.
Non-Final Office Action, Jul. 14, 2015, U.S. Appl. No. 14/046,551, filed Oct. 4, 2013.
Notice of Allowance, May 20, 2015, U.S. Appl. No. 13/888,796, filed May 7, 2013.
Non-Final Office Action, Apr. 19, 2016, U.S. Appl. No. 14/046,551, filed Oct. 4, 2013.
Non-Final Office Action, May 21, 2015, U.S. Appl. No. 14/189,817, filed Feb. 25, 2014.
Final Office Action, Dec. 15, 2015, U.S. Appl. No. 14/189,817, filed Feb. 25, 2014.
Non-Final Office Action, Jul. 15, 2015, U.S. Appl. No. 14/058,059, filed Oct. 18, 2013.
Non-Final Office Action, Jun. 26, 2015, U.S. Appl. No. 14/262,489, filed Apr. 25, 2014.
Notice of Allowance, Jan. 28, 2016, U.S. Appl. No. 14/313,883, filed Jun. 24, 2014.
Non-Final Office Action, Jun. 26, 2015, U.S. Appl. No. 14/626,489, filed Apr. 25, 2014.
Non-Final Office Action, Jun. 10, 2015, U.S. Appl. No. 14/628,109, filed Feb. 20, 2015.
Final Office Action, Mar. 16, 2016, U.S. Appl. No. 14/628,109, filed Feb. 20, 2015.
Non-Final Office Action, Apr. 8, 2016, U.S. Appl. No. 14/838,133, filed Aug. 27, 2015.
Dahl, Mattias et al., “Simultaneous Echo Cancellation and Car Noise Suppression Employing a Microphone Array”, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, Apr. 21-24, pp. 239-242.
Graupe, Daniel et al., “Blind Adaptive Filtering of Speech from Noise of Unknown Spectrum Using a Virtual Feedback Configuration”, IEEE Transactions on Speech and Audio Processing, Mar. 2000, vol. 8, No. 2, pp. 146-158.
Kato et al., “Noise Suppression with High Speech Quality Based on Weighted Noise Estimation and MMSE STSA” Proc. IWAENC [Online] 2001, pp. 183-186.
Soon et al., “Low Distortion Speech Enhancement”, Proc. Inst. Elect. Eng. [Online] 2000, vol. 147, pp. 247-253.
Stahl, V. et al., “Quantile Based Noise Estimation for Spectral Subtraction and Wiener Filtering,” 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing, Jun. 5-9, vol. 3, pp. 1875-1878.
Tchorz, Jurgen et al., “SNR Estimation Based on Amplitude Modulation Analysis with Applications to Noise Suppression”, IEEE Transactions on Speech and Audio Processing, vol. 11, No. 3, May 2003, pp. 184-192.
Yoo, Heejong et al., “Continuous-Time Audio Noise Suppression and Real-Time Implementation”, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, May 13-17, pp. IV3980-IV3983.
International Search Report and Written Opinion dated Oct. 1, 2008 in Patent Cooperation Treaty Application No. PCT/US2008/008249.
International Search Report and Written Opinion dated Aug. 27, 2009 in Patent Cooperation Treaty Application No. PCT/US2009/003813.
Dahl, Mattias et al., “Acoustic Echo and Noise Cancelling Using Microphone Arrays”, International Symposium on Signal Processing and its Applications, ISSPA, Gold coast, Australia, Aug. 25-30, 1996, pp. 379-382.
International Search Report and Written Opinion dated Sep. 1, 2011 in Patent Cooperation Treaty Application No. PCT/US11/37250.
Fazel et al., “An overview of statistical pattern recognition techniques for speaker verification,” IEEE, May 2011.
Sundaram et al., “Discriminating Two Types of Noise Sources Using Cortical Representation and Dimension Reduction Technique,” IEEE, 2007.
Bach et al., “Learning Spectral Clustering with application to spech separation”, Journal of machine learning research, 2006.
Tognieri et al., “A Comparison of the LBG, LVQ, MLP, SOM and GMM Algorithms for Vector Quantisation and Clustering Analysis,” University of Western Australia, 1992.
Klautau et al., “Discriminative Gaussian Mixture Models a Comparison with Kernel Classifiers,” ICML, 2003.
Mokbel et al., “Automatic Word Recognition in Cars,” IEEE Transactions of Speech and Audio Processing, vol. 3, No. 5, Sep. 1995, pp. 346-356.
Office Action mailed Oct. 14, 2013 in Taiwan Patent Application 097125481, filed Jul. 4, 2008.
Office Action mailed Oct. 29, 2013 in Japan Patent Application 2011-516313, filed Jun. 26, 2009.
Office Action mailed Dec. 9, 2013 in Finland Patent Application 20100431, filed Jun. 26, 2009.
Office Action mailed Jan. 20, 2014 in Finland Patent Application 20100001, filed Jul. 3, 2008.
International Search Report & Written Opinion dated Mar. 18, 2014 in Patent Cooperation Treaty Application No. PCT/US2013/065752, filed Oct. 18, 2013.
Office Action mailed Oct. 17, 2013 in Taiwan Patent Application 097125481, filed Jul. 4, 2008.
Allowance mailed May 21, 2014 in Finland Patent Application 20100001, filed Jan. 4, 2010.
Office Action mailed May 2, 2014 in Taiwan Patent Application 098121933, filed Jun. 29, 2009.
Office Action mailed Apr. 15, 2014 in Japan Patent Application 2010-514871, filed Jul. 3, 2008.
Office Action mailed Jun. 27, 2014 in Korean Patent Application No. 10-2010-7000194, filed Jan. 6, 2010.
International Search Report & Written Opinion dated Jul. 15, 2014 in Patent Cooperation Treaty Application No. PCT/US2014/018443, filed Feb. 25, 2014.
Notice of Allowance dated Sep. 16, 2014 in Korean Application No. 10-2010-7000194, filed Jul. 3, 2008.
Notice of Allowance dated Sep. 29, 2014 in Taiwan Application No. 097125481, filed Jul. 4, 2008.
Notice of Allowance dated Oct. 10, 2014 in Finland Application No. 20100001, filed Jul. 3, 2008.
Notice of Allowance mailed Feb. 10, 2015 in Taiwan Patent Application No. 098121933, filed Jun. 29, 2009.
Office Action mailed Mar. 24, 2015 in Japan Patent Application No. 2011-516313, filed Jun. 26, 2009.
Office Action mailed Apr. 16, 2015 in Korean Patent Application No. 10-2011-7000440, filed Jun. 26, 2009.
Notice of Allowance mailed Jun. 2, 2015 in Japan Patent Application 2011-516313, filed Jun. 26, 2009.
Kim et al., “Improving Speech Intelligibility in Noise Using Environment-Optimized Algorithms,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, No. 8, Nov. 2010, pp. 2080-2090.
Sharma et al., “Rotational Linear Discriminant Analysis Technique for Dimensionality Reduction,” IEEE Transactions on Knowledge and Data Engineering, vol. 20, No. 10, Oct. 2008, pp. 1336-1347.
Temko et al., “Classifiation of Acoustic Events Using SVM-Based Clustering Schemes,” Pattern Recognition 39, No. 4, 2006, pp. 682-694.
Office Action mailed Jun. 9, 2015 in Japan Patent Application 2014-165477 filed Jul. 3, 2008.
Office Action mailed Jun. 17, 2015 in Japan Patent Application 2013-519682 filed May 19, 2011.
International Search Report & Written Opinion dated Nov. 27, 2015 in Patent Cooperation Treaty Application No. PCT/US2015/047263, filed Aug. 27, 2015.
Notice of Allowance dated Feb. 24, 2016 in Korean Application No. 10-2011-7000440, filed Jun. 26, 2009.
Hu et al., “Robust Speaker's Location Detection in a Vehicle Environment Using GMM Models,” IEEE Transactions on Systems, Man, and Cybernetics—Part B: Cybernetics, vol. 36, No. 2, Apr. 2006, pp. 403-412.
International Search Report and Written Opinion dated Feb. 7, 2011 in Application No. PCT/US10/58600.
International Search Report dated Dec. 20, 2013 in Patent Cooperation Treaty Application No. PCT/US2013/045462, filed Jun. 12, 2013.
Office Action dated Aug. 26, 2014 in Japanese Application No. 2012-542167, filed Dec. 1, 2010.
Office Action mailed Oct. 31, 2014 in Finnish Patent Application No. 20125600, filed Jun. 1, 2012.
Office Action mailed Jul. 21, 2015 in Japanese Patent Application 2012-542167 filed Dec. 1, 2010.
Office Action mailed Sep. 29, 2015 in Finnish Patent Application 20125600, filed Dec. 1, 2010.
Goodwin, Michael M. et al., “Key Click Suppression”, U.S. Appl. No. 14/745,176, filed Jun. 19, 2015, 25 pages.
Final Office Action, May 5, 2016, U.S. Appl. No. 13/363,362, filed Jan. 31, 2012.
Non-Final Office Action, May 6, 2016, U.S. Appl. No. 14/495,550, filed Sep. 24, 2014.
Non-Final Office Action, May 31, 2016, U.S. Appl. No. 14/874,329, filed Oct. 2, 2015.
Final Office Action, Jun. 17, 2016, U.S. Appl. No. 13/396,568, filed Feb. 14, 2012.
Advisory Action, Jul. 29, 2016, U.S. Appl. No. 13/363,362, filed Jan. 31, 2012.
Final Office Action, Aug. 30, 2016, U.S. Appl. No. 14/838,133, filed Aug. 27, 2015.
Provisional Applications (1)
Number Date Country
61346851 May 2010 US