Noise suppression assisted automatic speech recognition

Description

BACKGROUND OF THE INVENTION

Speech recognition systems have been used to convert spoken words into text. In medium and high noise environments, however, the accuracy of automatic speech recognition systems tends to degrade significantly. As a result, most speech recognition systems are used with audio captured in a noise-free environment.

Unlike speech recognition systems, a standard noise reduction strategy consists of strongly attenuating portions of the acoustic spectrum which are dominated by noise. Spectrum portions dominated by speech are preserved.

Strong attenuation of undesired spectrum portions is a valid strategy from the point of view of noise reduction and perceived output signal quality, it is not necessarily a good strategy for an automatic speech recognition system. In particular, the spectral regions strongly attenuated by noise suppression may have been necessary to extract features for speech recognition. As a result, the attenuation resulting from noise suppression corrupts the features of the speech signal more than the original noise signal. This corruption by the noise suppression of the speech signal, which is greater than the corruption caused by the added noise signal, causes the noise reduction algorithm to make automatic speech recognition results unusable.

SUMMARY OF THE INVENTION

The present technology may utilize noise suppression information to optimize or improve automatic speech recognition performed for a signal. Noise suppression may be performed on a noisy speech signal using a gain value. The gain to apply to the noisy signal as part of the noise suppression is selected to optimize speech recognition analysis of the resulting signal. The gain may be selected based on one or more features for a current sub band and time frame, as well as others. Noise suppression information may be provided to a speech recognition module to improve the robustness of the speech recognition analysis. Noise suppression information may also be used to encode and identify speech. Resources spent on automatic speech recognition such as a bit rate of a speech codec) may be selected based on the SNR.

An embodiment may enable processing of an audio signal. Sub-band signals may be generated from a received primary acoustic signal and a secondary acoustic signal. One or more features may be determined for a sub-band signal. Noise suppression information may be determined based the one or more features to a speech recognition module.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates an environment in which the present technology may be utilized.

FIG. 2 is a block diagram of an exemplary audio device.

FIG. 3 is a block diagram of an exemplary audio processing system.

FIG. 4 is a flow chart of an exemplary method for performing speech recognition based on noise suppression information.

FIG. 5 is a flow chart of an exemplary method for performing noise suppression on a sub band signal.

FIG. 6 is a flow chart of an exemplary method for providing noise suppression information to a speech recognition module.

DETAILED DESCRIPTION OF THE INVENTION

Noise suppression information may be provided to a speech recognition module to improve the robustness of the speech recognition analysis. Noise suppression information may include voice activity detection (VAD) information, such as for example noise, an indication of whether a signal includes speech, an indication of a speech to noise ration (SNR) for a signal, and other information. Noise suppression information may also be used to encode and identify speech. Resources spent on automatic speech recognition such as a bit rate of a speech codec) may be selected based on the SNR.

FIG. 1 is an illustration of an environment in which embodiments of the present technology may be used. A user may act as an audio (speech) source 102 to an audio device 104. The exemplary audio device 104 includes two microphones: a primary microphone 106 relative to the audio source 102 and a secondary microphone 108 located a distance away from the primary microphone 106. Alternatively, the audio device 104 may include a single microphone. In yet other embodiments, the audio device 104 may include more than two microphones, such as for example three, four, five, six, seven, eight, nine, ten or even more microphones.

The primary microphone 106 and secondary microphone 108 may be omni-directional microphones. Alternatively embodiments may utilize other forms of microphones or acoustic sensors, such as directional microphones.

While the microphones 106 and 108 receive sound (i.e. acoustic signals) from the audio source 102, the microphones 106 and 108 also pick up noise 112. Although the noise 110 is shown coming from a single location in FIG. 1, the noise 110 may include any sounds from one or more locations that differ from the location of audio source 102, and may include reverberations and echoes. The noise 110 may be stationary, non-stationary, and/or a combination of both stationary and non-stationary noise.

Some embodiments may utilize level differences (e.g. energy differences) between the acoustic signals received by the two microphones 106 and 108. Because the primary microphone 106 is much closer to the audio source 102 than the secondary microphone 108 in a close-talk use case, the intensity level is higher for the primary microphone 106, resulting in a larger energy level received by the primary microphone 106 during a speech/voice segment, for example.

The level difference may then be used to discriminate speech and noise in the time-frequency domain. Further embodiments may use a combination of energy level differences and time delays to discriminate speech. Based on binaural cue encoding, speech signal extraction or speech enhancement may be performed.

FIG. 2 is a block diagram of an exemplary audio device 104. In the illustrated embodiment, the audio device 104 includes a receiver 200, a processor 202, the primary microphone 106, an optional secondary microphone 108, an audio processing system 210, and an output device 206. The audio device 104 may include further or other components necessary for audio device 104 operations. Similarly, the audio device 104 may include fewer components that perform similar or equivalent functions to those depicted in FIG. 2.

Processor 202 may execute instructions and modules stored in a memory (not illustrated in FIG. 2) in the audio device 104 to perform functionality described herein, including noise reduction for an acoustic signal, speech recognition, and other functionality. Processor 202 may include hardware and software implemented as a processing unit, which may process floating point operations and other operations for the processor 202.

The exemplary receiver 200 may include an acoustic sensor configured to receive and transmit a signal to and from a communications network. In some embodiments, the receiver 200 may include an antenna device. The signal received may be forwarded to the audio processing system 210 to reduce noise using the techniques described herein, and provide an audio signal to the output device 206. Similarly, a signal received by one or more of primary microphone 106 and secondary microphone 108 may be processed for noise suppression and ultimately transmitted to a communications network via receiver 200. Hence, the present technology may be used in one or both of the transmit and receive paths of the audio device 104.

The audio processing system 210 is configured to receive the acoustic signals from an acoustic source via the primary microphone 106 and secondary microphone 108 (or a far-end signal via receiver 200) and process the acoustic signals. Processing may include performing noise reduction within an acoustic signal and speech recognition for an acoustic signal. The audio processing system 210 is discussed in more detail below.

The primary and secondary microphones 106, 108 may be spaced a distance apart in order to allow for detecting an energy level difference, time difference or phase difference between them. The acoustic signals received by primary microphone 106 and secondary microphone 108 may be converted into electrical signals (i.e. a primary electrical signal and a secondary electrical signal). The electrical signals may themselves be converted by an analog-to-digital converter (not shown) into digital signals for processing in accordance with some embodiments. In order to differentiate the acoustic signals for clarity purposes, the acoustic signal received by the primary microphone 106 is herein referred to as the primary acoustic signal, while the acoustic signal received from by the secondary microphone 108 is herein referred to as the secondary acoustic signal. The primary acoustic signal and the secondary acoustic signal may be processed by the audio processing system 210 to produce a signal with an improved signal-to-noise ratio. It should be noted that embodiments of the technology described herein may be practiced utilizing only the primary microphone 106.

The output device 206 is any device which provides an audio output to the user. For example, the output device 206 may include a speaker, an earpiece of a headset or handset, or a speaker on a conference device.

In various embodiments, where the primary and secondary microphones are omni-directional microphones that are closely-spaced (e.g., 1-2 cm apart), a beamforming technique may be used to simulate forwards-facing and backwards-facing directional microphones. The level difference may be used to discriminate speech and noise in the time-frequency domain which can be used in noise reduction.

FIG. 3 is a block diagram of an exemplary audio processing system 210 for performing noise reduction and automatic speech recognition. In exemplary embodiments, the audio processing system 210 is embodied within a memory device within audio device 104. The audio processing system 210 may include a frequency analysis module 302, a feature extraction module 304, a source inference engine module 306, gain data store 307, mask selector module 308, noise canceller module 310, modifier module 312, reconstructor module 314, and automatic speech recognition 316. Audio processing system 210 may include more or fewer components than illustrated in FIG. 3, and the functionality of modules may be combined or expanded into fewer or additional modules. Exemplary lines of communication are illustrated between various modules of FIG. 3, and in other figures herein. The lines of communication are not intended to limit which modules are communicatively coupled with others, nor are they intended to limit the number of and type of signals communicated between modules.

In operation, acoustic signals received from the primary microphone 106 and second microphone 108 are converted to electrical signals, and the electrical signals are processed through frequency analysis module 302. The acoustic signals may be pre-processed in the time domain before being processed by frequency analysis module 302. Time domain pre-processing may include applying input limiter gains, speech time stretching, and filtering using an FIR or IIR filter.

The frequency analysis module 302 receives the acoustic signals and mimics the frequency analysis of the cochlea (e.g., cochlear domain) to generate sub-band signals, simulated by a filter bank. The frequency analysis module 302 separates each of the primary and secondary acoustic signals into two or more frequency sub-band signals. A sub-band signal is the result of a filtering operation on an input signal, where the bandwidth of the filter is narrower than the bandwidth of the signal received by the frequency analysis module 302. The filter bank may be implemented by a series of cascaded, complex-valued, first-order IIR filters. Alternatively, other filters such as short-time Fourier transform (STFT), sub-band filter banks, modulated complex lapped transforms, cochlear models, wavelets, etc., can be used for the frequency analysis and synthesis. The samples of the frequency sub-band signals may be grouped sequentially into time frames (e.g. over a predetermined period of time). For example, the length of a frame may be 4 ms, 8 ms, or some other length of time. In some embodiments there may be no frame at all. The results may include sub-band signals in a fast cochlea transform (FCT) domain.

The sub-band frame signals are provided from frequency analysis module 302 to an analysis path sub-system 320 and a signal path sub-system 330. The analysis path sub-system 320 may process the signal to identify signal features, distinguish between speech components and noise components of the sub-band signals, and determine a signal modifier. The signal path sub-system 330 is responsible for modifying sub-band signals of the primary acoustic signal by reducing noise in the sub-band signals. Noise reduction can include applying a modifier, such as a multiplicative gain mask determined in the analysis path sub-system 320, or by subtracting components from the sub-band signals. The noise reduction may reduce noise and preserve the desired speech components in the sub-band signals.

Signal path sub-system 330 includes noise canceller module 310 and modifier module 312. Noise canceller module 310 receives sub-band frame signals from frequency analysis module 302. Noise canceller module 310 may subtract (e.g., cancel) a noise component from one or more sub-band signals of the primary acoustic signal. As such, noise canceller module 310 may output sub-band estimates of noise components in the primary signal and sub-band estimates of speech components in the form of noise-subtracted sub-band signals.

Noise canceller module 310 may provide noise cancellation, for example in systems with two-microphone configurations, based on source location by means of a subtractive algorithm. Noise canceller module 310 may also provide echo cancellation and is intrinsically robust to loudspeaker and Rx path non-linearity. By performing noise and echo cancellation (e.g., subtracting components from a primary signal sub-band) with little or no voice quality degradation, noise canceller module 310 may increase the speech-to-noise ratio (SNR) in sub-band signals received from frequency analysis module 302 and provided to modifier module 312 and post filtering modules. The amount of noise cancellation performed may depend on the diffuseness of the noise source and the distance between microphones, both of which contribute towards the coherence of the noise between the microphones, with greater coherence resulting in better cancellation.

Noise canceller module 310 may be implemented in a variety of ways. In some embodiments, noise canceller module 310 may be implemented with a single NPNS module. Alternatively, Noise canceller module 310 may include two or more NPNS modules, which may be arranged for example in a cascaded fashion.

An example of noise cancellation performed in some embodiments by the noise canceller module 310 is disclosed in U.S. patent application Ser. No. 12/215,980, entitled “System and Method for Providing Noise Suppression Utilizing Null Processing Noise Subtraction,” filed Jun. 30, 2008, U.S. application Ser. No. 12/422,917, entitled “Adaptive Noise Cancellation,” filed Apr. 13, 2009, and U.S. application Ser. No. 12/693,998, entitled “Adaptive Noise Reduction Using Level Cues,” filed Jan. 26, 2010, the disclosures of which are each incorporated herein by reference.

The feature extraction module 304 of the analysis path sub-system 320 receives the sub-band frame signals derived from the primary and secondary acoustic signals provided by frequency analysis module 302 as well as the output of NPNS module 310. Feature extraction module 304 may compute frame energy estimations of the sub-band signals, inter-microphone level differences (ILD), inter-microphone time differences (ITD) and inter-microphones phase differences (IPD) between the primary acoustic signal and the secondary acoustic signal, self-noise estimates for the primary and second microphones, as well as other monaural or binaural features which may be utilized by other modules, such as pitch estimates and cross-correlations between microphone signals. The feature extraction module 304 may both provide inputs to and process outputs from NPNS module 310.

NPNS module may provide noise cancelled sub-band signals to the ILD block in the feature extraction module 304. Since the ILD may be determined as the ratio of the NPNS output signal energy to the secondary microphone energy, ILD is often interchangeable with Null Processing Inter-microphone Level Difference (NP-ILD). “Raw-ILD” may be used to disambiguate a case where the ILD is computed from the “raw” primary and secondary microphone signals.

Determining energy level estimates and inter-microphone level differences is discussed in more detail in U.S. patent application Ser. No. 11/343,524, entitled “System and Method for Utilizing Inter-Microphone Level Differences for Speech Enhancement”, which is incorporated by reference herein.

Source inference engine module 306 may process the frame energy estimations provided by feature extraction module 304 to compute noise estimates and derive models of the noise and speech in the sub-band signals. Source inference engine module 306 adaptively estimates attributes of the acoustic sources, such as their energy spectra of the output signal of the NPNS module 310. The energy spectra attribute may be utilized to generate a multiplicative mask in mask generator module 308.

The source inference engine module 306 may receive the NP-ILD from feature extraction module 304 and track the NP-ILD probability distributions or “clusters” of the target audio source 102, background noise and optionally echo.

This information is then used, along with other auditory cues, to define classification boundaries between source and noise classes. The NP-ILD distributions of speech, noise and echo may vary over time due to changing environmental conditions, movement of the audio device 104, position of the hand and/or face of the user, other objects relative to the audio device 104, and other factors. The cluster tracker adapts to the time-varying NP-ILDs of the speech or noise source(s).

An example of tracking clusters by a cluster tracker module is disclosed in U.S. patent application Ser. No. 12/004,897, entitled “System and method for Adaptive Classification of Audio Sources,” filed on Dec. 21, 2007, the disclosure of which is incorporated herein by reference.

Source inference engine module 306 may include a noise estimate module which may receive a noise/speech classification control signal from the cluster tracker module and the output of noise canceller module 310 to estimate the noise N(t,w), wherein t is a point in time and W represents a frequency or sub-band. A speech to noise ratio (SNR) can be generated by source inference engine module 306 from the noise estimate and a speech estimate, and the SNR can be provided to other modules within the audio device, such as automatic speech recognition module 316 and mask selector 308.

Gain data store 307 may include one or more stored gain values and may communicate with mask selector 308. Each stored gain may be associated with a set of one or more features. An exemplary set of features may include a speech to noise ratio and a frequency (i.e., a center frequency for a sub band). Other feature data may also be stored in gain store 307. Each gain stored in gain data store 307 may, when applied to a sub-band signal, provide as close to a clean speech signal as possible. Though the gains provide a speech signal with a reduced amount of noise, they may not provide the perceptually most desirable sounding speech.

In some embodiments, each gain stored in gain store 307 may be optimized for a set of features, such as for example a particular frequency and speech to noise ratio. For example, to determine an optimal gain value for a particular combination of features, a known speech spectrum may be combined with noise at various speech to noise ratios. Because the energy spectrum and noise are known, a gain can be determined which suppress the combined speech-noise signal into a clean speech signal which is ideal for speech recognition. In some embodiments, the gain is configured to suppress the speech-noise signal such that noise is reduced but no portion of the speech signal is attenuated or degraded. These gains derived from the combined signals for a known SNR are stored in the gain data store for different combination of frequency and speech to noise ratio.

Mask selector 308 may receive a set of one or more features and/or other data from source inference engine 306, query gain data store 307 for a gain associated with a particular set of features and/or other data, and provide an accessed gain to modifier 312. For example, for a particular sub band, mask selector 308 may receive a particular speech to noise ratio from source inference engine 306 for the particular sub band in the current frame. Mask selector 308 may then query data store 307 for a gain that is associated with the combination of the speech to noise ratio and the current sub band center frequency. Mask selector 308 receives the corresponding gain from gain data store 307 and provides the gain to modifier 312.

The accessed gain may be applied to the estimated noise subtracted sub-band signals provided, for example as a multiplicative mask, by noise canceller 310 to modifier 312. The modifier module 312 multiplies the gain masks to the noise-subtracted sub-band signals of the primary acoustic signal output by the noise canceller module 310. Applying the mask reduces energy levels of noise components in the sub-band signals of the primary acoustic signal and results in noise reduction.

Modifier module 312 receives the signal path cochlear samples from noise canceller module 310 and applies a gain mask received from mask selector 308 to the received samples. The signal path cochlear samples may include the noise subtracted sub-band signals for the primary acoustic signal. The gain mask provided by mask selector 308 may vary quickly, such as from frame to frame, and noise and speech estimates may vary between frames. To help address the variance, the upwards and downwards temporal slew rates of the mask may be constrained to within reasonable limits by modifier 312. The mask may be interpolated from the frame rate to the sample rate using simple linear interpolation, and applied to the sub-band signals by multiplicative noise suppression. Modifier module 312 may output masked frequency sub-band signals.

Reconstructor module 314 may convert the masked frequency sub-band signals from the cochlea domain back into the time domain. The conversion may include adding the masked frequency sub-band signals and phase shifted signals. Alternatively, the conversion may include multiplying the masked frequency sub-band signals with an inverse frequency of the cochlea channels. Once conversion to the time domain is completed, the synthesized acoustic signal may be output to the user via output device 206 and/or provided to a codec for encoding.

In some embodiments, additional post-processing of the synthesized time domain acoustic signal may be performed. For example, comfort noise generated by a comfort noise generator may be added to the synthesized acoustic signal prior to providing the signal to the user. Comfort noise may be a uniform constant noise that is not usually discernable to a listener (e.g., pink noise). This comfort noise may be added to the synthesized acoustic signal to enforce a threshold of audibility and to mask low-level non-stationary output noise components. In some embodiments, the comfort noise level may be chosen to be just above a threshold of audibility and may be settable by a user. In some embodiments, the mask generator module 308 may have access to the level of comfort noise in order to generate gain masks that will suppress the noise to a level at or below the comfort noise.

Automatic speech recognition module 316 may perform a speech recognition analysis on the reconstructed signal output by reconstructor 314. Automatic speech recognition module 316 may receive a voice activity detection (VAD) signal as well as a speech to noise (SNR) ratio indication or other noise suppression information from source inference engine 306. The information received from source information engine 306, such as the VAD and SNR, may be used to optimize the speech recognition process performed by automatic speech recognition module 316. Speech recognition module 316 is discussed in more detail below.

The system of FIG. 3 may process several types of signals received by an audio device. The system may be applied to acoustic signals received via one or more microphones. The system may also process signals, such as a digital Rx signal, received through an antenna or other connection.

An exemplary system which may be used to implement at least a portion of audio processing system 210 is described in U.S. patent application Ser. No. 12/832,920, titled “Multi-Microphone Robust Noise Suppression,” filed Jul. 8, 2010, the disclosure of which is incorporated herein by reference

FIG. 4 is a flow chart of an exemplary method for performing speech recognition based on noise suppression information. First, a primary acoustic signal and a secondary acoustic signal are received at step 410. The signals may be received through microphones 106 and 108 of audio device 104. Sub band signals may then be generated from the primary acoustic signal and secondary acoustic signal at step 420. The received signals may be converted to sub band signals by frequency analysis module 302.

A feature is determined for a sub band signal at step 430. Feature extractor 304 may extract features for each sub band in the current frame or the frame as a whole. Features may include a speech energy level for a particular sub band noise level, pitch, and other features. Noise suppression information is then generated from the features at step 440. The noise suppression information may be generated and output by source inference engine 306 from features received from feature extraction module 304. The noise suppression information may include an SNR ratio for each sub band in the current frame, a VAD signal for the current frame, ILD, and other noise suppression information.

Noise suppression may be performed on a sub band signal based on noise suppression information at step 450. The noise suppression may include accessing a gain value based on one or more features and applying the gain to a sub band acoustic signal. Performing noise suppression on a sub band signal is discussed in more detail below with respect to the method of FIG. 5. Additionally, noise suppression performed on a sub band signal may include performing noise cancellation by noise canceller 310 in the audio processing system of FIG. 3.

Noise suppression information may be provided to speech recognition module 316 at step 460. Speech recognition module 316 may receive noise suppression information to assist with speech recognition. Providing noise suppression information to speech recognition module 316 is discussed in more detail below with respect to FIG. 6.

Speech recognition is automatically performed based on the noise suppression information at step 470. The speech recognition process may be optimized based on the noise suppression information. Performing speech recognition based on noise suppression information may include modulating a bit rate of a speech encoder or decoder based on a speech to noise ratio for a particular frame. In some embodiments, the bit rate is decreased when the speech to noise ratio is large. In some embodiments, speech recognition based on noise suppression may include setting a node search depth level by a speech recognition module based on a speech to noise ratio for a current frame. The node search depth level, for example, may be decreased when the speech to noise ratio is large.

FIG. 5 is a flow chart of an exemplary method for performing noise suppression on a sub band signal. The method of FIG. 5 provides more detail for step 450 in the method of FIG. 4. A speech to noise ratio (SNR) for a sub band is accessed at step 510. The SNR may be received by mask selector 308 from source inference engine 306. Mask selector 308 also has access to sub band information for the sub band being considered.

A gain which corresponds to the sub band signal frequency and the signal of the noise ratio is accessed at step 520. The gain is accessed by mask selector 308 from gain data store 307 and may correspond to a particular sub band signal frequency and SNR. The accessed gain is then applied to one or more sub band frequencies at step 530. The accessed gain may be provided to modifier 312 which then applies the gain to a sub band which may or may not be have undergone noise cancellation.

FIG. 6 is a flow chart of an exemplary method for providing noise suppression information to a speech recognition module. The method of FIG. 6 may provide more detail for step 460 than the method of FIG. 4. A determination as to whether speech is detected in a primary acoustic signal based on one or more features is performed at step 610. The detection may include detecting whether speech is or is not present within the signal within the current frame. In some embodiments, an SNR for the current sub band or for all sub bands may be compared to a threshold level. If the SNR is above the threshold value, then speech may be detected to be present in the primary acoustic signal. If the SNR is not above the threshold value, then speech may be determined to not be present in the current frame.

Each of steps 620-640 describe how speech recognition may be optimized based on noise suppression or noise suppression information and may be performed in combination or separately. Hence, in some embodiments, only one of step 620-640 may be performed. In some embodiments, more than just one of steps 620-640 may be performed when providing noise suppression information to a speech recognition module.

A speech recognition module is provided with a noise signal if a speech is not detected in a current frame of a signal at step 620. For example, if the determination is made that speech is not present in the current frame of a reconstructed signal, a noise signal is provided to acoustic speech recognition module 316 in order to ensure that no false positive for speech detection occurs. The noise signal may be any type of signal that has a high likelihood of not being mistaken for speech by the speech recognition module.

A speech recognition module may be provided with an indication that speech is present in the acoustic signal at step 630. In this case, automatic speech recognition module 316 may be provided with a VAD signal provided by source inference engine 306. The VAD signal may indicate whether or not speech is present in the signal provided to automatic speech recognition module 316. Automatic speech recognition module 316 may use the VAD signal to determine whether or not to perform speech recognition on the signal.

A speech to noise ratio (SNR) signal may be provided for the current frame and/or sub band to the speech recognition module at step 640. In this case, the SNR may provide a value within a range of values indicating whether or not speech is present. This may help the automatic speech recognition module learn when to expend resources to recognize speech and when not to.

The above described modules, including those discussed with respect to FIG. 3, may include instructions stored in a storage media such as a machine readable medium (e.g., computer readable medium). These instructions may be retrieved and executed by the processor 202 to perform the functionality discussed herein. Some examples of instructions include software, program code, and firmware. Some examples of storage media include memory devices and integrated circuits.

While the present invention is disclosed by reference to the preferred embodiments and examples detailed above, it is to be understood that these examples are intended in an illustrative rather than a limiting sense. It is contemplated that modifications and combinations will readily occur to those skilled in the art, which modifications and combinations will be within the spirit of the invention and the scope of the following claims.

Claims

1. A method for processing an audio signal, comprising: generating sub-band signals from a received primary acoustic signal and a received secondary acoustic signal;determining two or more features for the sub-band signals, the two or more features including a speech energy level for the sub-band noise level and at least one of the following: inter-microphone level differences, inter-microphone time differences, and inter-microphone phase differences between the primary acoustic signal and the secondary acoustic signal;suppressing a noise component in the primary acoustic signal based on the two or more features, the suppressing configured to clean the primary acoustic signal to create a cleaned speech signal optimized for accurate speech recognition processing by an automatic speech recognition processing module, the suppressing comprising: applying a gain to a sub-band of the primary acoustic signal to provide a noise suppressed signal, the applying comprising: determining a speech to noise ratio (SNR) for the sub-band of the primary acoustic signal;accessing the gain, based on the frequency of the sub-band and the determined SNR for the sub-band, from a datastore, the datastore including a plurality of pre-stored gains configured to create cleaned speech signals optimized for accurate speech recognition processing by the automatic speech recognition processing module, each pre-stored gain in the plurality of pre-stored gains associated with a corresponding frequency and an SNR value; andapplying the accessed gain to the sub-band frequency; andproviding the cleaned speech signal and corresponding noise suppression information to the automatic speech recognition processing module, the noise suppression information based on the two or more features and including a voice activity detection signal.
2. The method of claim 1, further comprising determining whether the primary acoustic signal includes speech, the determination performed based on the two or more features.
3. The method of claim 2, further comprising providing a noise signal to the automatic speech recognition processing module in response to detecting that the primary acoustic signal does not include speech.
4. The method of claim 2, wherein the voice activity detection signal is generated based on the determination of whether the primary acoustic signal includes speech, and the voice activity detection signal indicating whether automatic speech recognition is to occur.
5. The method of claim 4, wherein the voice activity detection signal is a value within a range of values corresponding to the level of speech detected in the primary acoustic signal.
6. The method of claim 2, wherein the noise suppression information includes a speech to noise ratio for the current time frame and the sub-band to the automatic speech recognition processing module.
7. The method of claim 1, wherein the noise suppression information includes a speech to noise ratio, the method further comprising modulating a bit rate of a speech encoder or decoder based on the speech to noise ratio for a particular frame.
8. The method of claim 1, wherein the noise suppression information includes a speech to noise ratio, the method further comprising setting a node search depth level by the automatic speech recognition processing module based on the speech to noise ratio for a current frame.
9. A non-transitory computer readable storage medium having embodied thereon a program, the program being executable by a processor to perform a method for reducing noise in an audio signal, the method comprising: generating sub-band signals from a received primary acoustic signal and a received secondary acoustic signal;determining two or more features for a sub-band signal, the two or more features including a speech energy level for the sub-band noise level and at least one of the following: inter-microphone level differences, inter-microphone time differences, and inter-microphone phase differences between the primary acoustic signal and the secondary acoustic signal;suppressing a noise component in the primary acoustic signal based on the two or more features, the suppressing configured to clean the primary acoustic signal to create a cleaned speech signal optimized for accurate speech recognition processing by an automatic speech recognition processing module, the suppressing comprising: applying a gain to a sub-band of the primary acoustic signal to provide a noise suppressed signal, the applying comprising: determining a speech to noise ratio (SNR) for the sub-band of the primary acoustic signal;accessing the gain, based on the frequency of the sub-band and the determined SNR for the sub-band, from a datastore, the datastore including a plurality of pre-stored gains configured to create cleaned speech signals optimized for accurate speech recognition processing by the automatic speech recognition processing module, each pre-stored gain in the plurality of pre-stored gains associated with a corresponding frequency and an SNR value; andapplying the accessed gain to the sub-band frequency; andproviding the cleaned speech signal and corresponding noise suppression information to the automatic speech recognition processing module, the noise suppression information based on the two or more features and including a speech to noise ratio for each of the sub-band signals and a voice activity detection signal.
10. The non-transitory computer readable storage medium of claim 9, further comprising providing a noise signal to the automatic speech recognition processing module in response to detecting that the primary acoustic signal does not include speech.
11. The non-transitory computer readable storage medium of claim 9, wherein the voice activity detection signal is generated based on the determination of whether the primary acoustic signal includes speech, and the voice activity detection signal indicating whether automatic speech recognition is to occur.
12. The non-transitory computer readable storage medium of claim 9, wherein the noise suppression information includes a speech to noise ratio for the current time frame and the sub-band to the automatic speech recognition processing module.
13. The non-transitory computer readable storage medium of claim 9, wherein the noise suppression information includes a speech to noise ratio, the method further comprising modulating a bit rate of a speech encoder or decoder based on the speech to noise ratio for a particular frame.
14. The non-transitory computer readable storage medium of claim 9, wherein the noise suppression information includes a speech to noise ratio, the method further comprising setting a node search depth level by the automatic speech recognition processing module based on the speech to noise ratio for a current frame.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the priority benefit of U.S. Provisional Application Ser. No. 61/346,851, titled “Noise Suppression Assisted Automatic Speech Recognition,” filed May 20, 2010, the disclosure of the aforementioned application is incorporated herein by reference.

US Referenced Citations (393)

Number	Name	Date	Kind
4025724	Davidson, Jr. et al.	May 1977	A
4630304	Borth et al.	Dec 1986	A
4802227	Elko et al.	Jan 1989	A
4969203	Herman	Nov 1990	A
5115404	Lo et al.	May 1992	A
5289273	Lang	Feb 1994	A
5400409	Linhard	Mar 1995	A
5406635	Jarvinen	Apr 1995	A
5546458	Iwami	Aug 1996	A
5550924	Helf et al.	Aug 1996	A
5555306	Gerzon	Sep 1996	A
5625697	Bowen et al.	Apr 1997	A
5706395	Arslan et al.	Jan 1998	A
5715319	Chu	Feb 1998	A
5734713	Mauney et al.	Mar 1998	A
5754665	Hosoi	May 1998	A
5774837	Yeldener et al.	Jun 1998	A
5806025	Vis et al.	Sep 1998	A
5819215	Dobson et al.	Oct 1998	A
5839101	Vahatalo et al.	Nov 1998	A
5905969	Mokbel	May 1999	A
5917921	Sasaki et al.	Jun 1999	A
5943429	Handel	Aug 1999	A
5978824	Ikeda	Nov 1999	A
5991385	Dunn et al.	Nov 1999	A
6011853	Koski et al.	Jan 2000	A
6035177	Moses et al.	Mar 2000	A
6065883	Herring et al.	May 2000	A
6084916	Ott	Jul 2000	A
6098038	Hermansky et al.	Aug 2000	A
6122384	Mauro	Sep 2000	A
6122610	Isabelle	Sep 2000	A
6144937	Ali	Nov 2000	A
6188769	Jot et al.	Feb 2001	B1
6205421	Morii	Mar 2001	B1
6219408	Kurth	Apr 2001	B1
6263307	Arslan et al.	Jul 2001	B1
6266633	Higgins et al.	Jul 2001	B1
6281749	Klayman et al.	Aug 2001	B1
6327370	Killion et al.	Dec 2001	B1
6339706	Tillgren et al.	Jan 2002	B1
6339758	Kanazawa et al.	Jan 2002	B1
6343267	Kuhn et al.	Jan 2002	B1
6381284	Strizhevskiy	Apr 2002	B1
6381469	Wojick	Apr 2002	B1
6389142	Hagen et al.	May 2002	B1
6411930	Burges	Jun 2002	B1
6424938	Johansson et al.	Jul 2002	B1
6449586	Hoshuyama	Sep 2002	B1
6453284	Paschall	Sep 2002	B1
6480610	Fang et al.	Nov 2002	B1
6487257	Gustafsson et al.	Nov 2002	B1
6504926	Edelson et al.	Jan 2003	B1
6526140	Marchok et al.	Feb 2003	B1
6615170	Liu et al.	Sep 2003	B1
6717991	Gustafsson et al.	Apr 2004	B1
6738482	Jaber	May 2004	B1
6745155	Andringa et al.	Jun 2004	B1
6748095	Goss	Jun 2004	B1
6751588	Menendez-Pidal	Jun 2004	B1
6768979	Menendez-Pidal	Jul 2004	B1
6778954	Kim et al.	Aug 2004	B1
6782363	Lee et al.	Aug 2004	B2
6804651	Juric et al.	Oct 2004	B2
6810273	Mattila et al.	Oct 2004	B1
6873837	Yoshioka et al.	Mar 2005	B1
6882736	Dickel et al.	Apr 2005	B2
6931123	Hughes	Aug 2005	B1
6980528	LeBlanc et al.	Dec 2005	B1
7006881	Hoffberg	Feb 2006	B1
7010134	Jensen	Mar 2006	B2
7020605	Gao	Mar 2006	B2
RE39080	Johnston	Apr 2006	E
7035666	Silberfenig et al.	Apr 2006	B2
7054808	Yoshida	May 2006	B2
7058572	Nemer	Jun 2006	B1
7065486	Thyssen	Jun 2006	B1
7072834	Zhou	Jul 2006	B2
7092529	Yu et al.	Aug 2006	B2
7092882	Arrowood et al.	Aug 2006	B2
7103176	Rodriguez et al.	Sep 2006	B2
7110554	Brennan et al.	Sep 2006	B2
7127072	Rademacher et al.	Oct 2006	B2
7145710	Holmes	Dec 2006	B2
7146013	Saito et al.	Dec 2006	B1
7165026	Acero et al.	Jan 2007	B2
7171246	Mattila et al.	Jan 2007	B2
7190775	Rambo	Mar 2007	B2
7209567	Kozel et al.	Apr 2007	B1
7221622	Matsuo et al.	May 2007	B2
7225001	Eriksson et al.	May 2007	B1
7245710	Hughes	Jul 2007	B1
7245767	Moreno et al.	Jul 2007	B2
7254535	Kushner et al.	Aug 2007	B2
7289955	Deng et al.	Oct 2007	B2
7327985	Morfitt, III et al.	Feb 2008	B2
7359520	Brennan et al.	Apr 2008	B2
7376558	Gemello et al.	May 2008	B2
7383179	Alves et al.	Jun 2008	B2
7447631	Truman et al.	Nov 2008	B2
7469208	Kincaid	Dec 2008	B1
7516067	Seltzer et al.	Apr 2009	B2
7548791	Johnston	Jun 2009	B1
7562140	Clemm et al.	Jul 2009	B2
7574352	Quatieri, Jr.	Aug 2009	B2
7617282	Han	Nov 2009	B2
7657038	Doclo et al.	Feb 2010	B2
7664495	Bonner et al.	Feb 2010	B1
7664640	Webber	Feb 2010	B2
7685132	Hyman	Mar 2010	B2
7725314	Wu et al.	May 2010	B2
7773741	LeBlanc et al.	Aug 2010	B1
7791508	Wegener	Sep 2010	B2
7796978	Jones et al.	Sep 2010	B2
7895036	Hetherington et al.	Feb 2011	B2
7899565	Johnston	Mar 2011	B1
7925502	Droppo et al.	Apr 2011	B2
7970123	Beaucoup	Jun 2011	B2
8032364	Watts	Oct 2011	B1
8036767	Soulodre	Oct 2011	B2
8046219	Zurek et al.	Oct 2011	B2
8081878	Zhang et al.	Dec 2011	B1
8107656	Dreβler et al.	Jan 2012	B2
8126159	Goose et al.	Feb 2012	B2
8140331	Lou	Mar 2012	B2
8143620	Malinowski et al.	Mar 2012	B1
8155953	Park et al.	Apr 2012	B2
8175291	Chan et al.	May 2012	B2
8189429	Chen et al.	May 2012	B2
8194880	Avendano	Jun 2012	B2
8194882	Every et al.	Jun 2012	B2
8204252	Avendano	Jun 2012	B1
8204253	Solbach	Jun 2012	B1
8223988	Wang et al.	Jul 2012	B2
8229137	Romesburg	Jul 2012	B2
8280731	Yu	Oct 2012	B2
8345890	Avendano et al.	Jan 2013	B2
8359195	Li	Jan 2013	B2
8363823	Santos	Jan 2013	B1
8363850	Amada	Jan 2013	B2
8369973	Risbo	Feb 2013	B2
8447596	Avendano et al.	May 2013	B2
8467891	Huang et al.	Jun 2013	B2
8473285	Every et al.	Jun 2013	B2
8494193	Zhang et al.	Jul 2013	B2
8531286	Friar et al.	Sep 2013	B2
8538035	Every et al.	Sep 2013	B2
8606249	Goodwin	Dec 2013	B1
8615392	Goodwin	Dec 2013	B1
8615394	Avendano et al.	Dec 2013	B1
8639516	Lindahl et al.	Jan 2014	B2
8682006	Laroche et al.	Mar 2014	B1
8694310	Taylor	Apr 2014	B2
8705759	Wolff et al.	Apr 2014	B2
8718290	Murgia et al.	May 2014	B2
8744844	Klein	Jun 2014	B2
8750526	Santos et al.	Jun 2014	B1
8762144	Cho et al.	Jun 2014	B2
8774423	Solbach	Jul 2014	B1
8781137	Goodwin	Jul 2014	B1
8798290	Choi et al.	Aug 2014	B1
8880396	Laroche	Nov 2014	B1
8886525	Klein	Nov 2014	B2
8903721	Cowan	Dec 2014	B1
8949120	Every et al.	Feb 2015	B1
8949266	Phillips et al.	Feb 2015	B2
9007416	Murgia et al.	Apr 2015	B1
9008329	Mandel et al.	Apr 2015	B1
9143857	Every et al.	Sep 2015	B2
9185487	Solbach et al.	Nov 2015	B2
9197974	Clark et al.	Nov 2015	B1
9343056	Goodwin	May 2016	B1
9431023	Avendano et al.	Aug 2016	B2
20010044719	Casey	Nov 2001	A1
20020002455	Accardi et al.	Jan 2002	A1
20020041678	Basburg-Ertem et al.	Apr 2002	A1
20020071342	Marple et al.	Jun 2002	A1
20020138263	Deligne	Sep 2002	A1
20020156624	Gigi	Oct 2002	A1
20020160751	Sun et al.	Oct 2002	A1
20020176589	Buck et al.	Nov 2002	A1
20020177995	Walker	Nov 2002	A1
20020194159	Kamath et al.	Dec 2002	A1
20030014248	Vetter	Jan 2003	A1
20030040908	Yang et al.	Feb 2003	A1
20030056220	Thornton et al.	Mar 2003	A1
20030063759	Brennan et al.	Apr 2003	A1
20030093279	Malah et al.	May 2003	A1
20030099370	Moore	May 2003	A1
20030101048	Liu	May 2003	A1
20030103632	Goubran et al.	Jun 2003	A1
20030118200	Beaucoup et al.	Jun 2003	A1
20030128851	Furuta	Jul 2003	A1
20030147538	Elko	Aug 2003	A1
20030169891	Ryan et al.	Sep 2003	A1
20030177006	Ichikawa	Sep 2003	A1
20030179888	Burnett et al.	Sep 2003	A1
20030191641	Acero et al.	Oct 2003	A1
20040013276	Ellis et al.	Jan 2004	A1
20040066940	Amir	Apr 2004	A1
20040076190	Goel et al.	Apr 2004	A1
20040078199	Kremer et al.	Apr 2004	A1
20040102967	Furuta et al.	May 2004	A1
20040131178	Shahaf et al.	Jul 2004	A1
20040145871	Lee	Jul 2004	A1
20040148166	Zheng	Jul 2004	A1
20040172240	Crockett	Sep 2004	A1
20040184882	Cosgrove	Sep 2004	A1
20040185804	Kanamori et al.	Sep 2004	A1
20040263636	Cutler et al.	Dec 2004	A1
20050008169	Muren et al.	Jan 2005	A1
20050027520	Mattila et al.	Feb 2005	A1
20050049857	Seltzer et al.	Mar 2005	A1
20050066279	LeBarton et al.	Mar 2005	A1
20050069162	Haykin et al.	Mar 2005	A1
20050075866	Widrow	Apr 2005	A1
20050080616	Leung et al.	Apr 2005	A1
20050114123	Lukac et al.	May 2005	A1
20050114128	Hetherington et al.	May 2005	A1
20050152563	Amada et al.	Jul 2005	A1
20050213739	Rodman et al.	Sep 2005	A1
20050238238	Xu et al.	Oct 2005	A1
20050240399	Makinen	Oct 2005	A1
20050261894	Balan et al.	Nov 2005	A1
20050261896	Schuijers et al.	Nov 2005	A1
20050267369	Lazenby et al.	Dec 2005	A1
20050276363	Joublin et al.	Dec 2005	A1
20050281410	Grosvenor et al.	Dec 2005	A1
20050288923	Kok	Dec 2005	A1
20060053007	Niemisto	Mar 2006	A1
20060058998	Yamamoto et al.	Mar 2006	A1
20060063560	Herle	Mar 2006	A1
20060072768	Schwartz et al.	Apr 2006	A1
20060092918	Talalai	May 2006	A1
20060100868	Hetherington et al.	May 2006	A1
20060122832	Takiguchi	Jun 2006	A1
20060136201	Landron et al.	Jun 2006	A1
20060136203	Ichikawa	Jun 2006	A1
20060153391	Hooley et al.	Jul 2006	A1
20060165202	Thomas et al.	Jul 2006	A1
20060184363	McCree et al.	Aug 2006	A1
20060206320	Li	Sep 2006	A1
20060224382	Taneda	Oct 2006	A1
20060282263	Vos et al.	Dec 2006	A1
20070003097	Langberg et al.	Jan 2007	A1
20070005351	Sathyendra et al.	Jan 2007	A1
20070025562	Zalewski et al.	Feb 2007	A1
20070033020	(Kelleher) Francois et al.	Feb 2007	A1
20070033032	Schubert et al.	Feb 2007	A1
20070041589	Patel et al.	Feb 2007	A1
20070055508	Zhao	Mar 2007	A1
20070058822	Ozawa	Mar 2007	A1
20070064817	Dunne et al.	Mar 2007	A1
20070071206	Gainsboro et al.	Mar 2007	A1
20070081075	Canova, Jr. et al.	Apr 2007	A1
20070110263	Brox	May 2007	A1
20070127668	Ahya et al.	Jun 2007	A1
20070150268	Acero	Jun 2007	A1
20070154031	Avendano et al.	Jul 2007	A1
20070185587	Kondo	Aug 2007	A1
20070195968	Jaber	Aug 2007	A1
20070230712	Belt et al.	Oct 2007	A1
20070230913	Ichimura	Oct 2007	A1
20070237339	Konchitsky	Oct 2007	A1
20070253574	Soulodre	Nov 2007	A1
20070282604	Gartner et al.	Dec 2007	A1
20070287490	Green et al.	Dec 2007	A1
20070294263	Punj et al.	Dec 2007	A1
20080019548	Avendano	Jan 2008	A1
20080059163	Ding et al.	Mar 2008	A1
20080069366	Soulodre	Mar 2008	A1
20080111734	Fam et al.	May 2008	A1
20080159507	Virolainen et al.	Jul 2008	A1
20080160977	Ahmaniemi et al.	Jul 2008	A1
20080170703	Zivney	Jul 2008	A1
20080187143	Mak-Fan	Aug 2008	A1
20080192955	Merks	Aug 2008	A1
20080228474	Huang et al.	Sep 2008	A1
20080233934	Diethorn	Sep 2008	A1
20080247567	Kjolerbakken et al.	Oct 2008	A1
20080259731	Happonen	Oct 2008	A1
20080273476	Cohen et al.	Nov 2008	A1
20080298571	Kurtz et al.	Dec 2008	A1
20080304677	Abolfathi et al.	Dec 2008	A1
20080317259	Zhang	Dec 2008	A1
20080317261	Yoshida et al.	Dec 2008	A1
20090012783	Klein	Jan 2009	A1
20090034755	Short et al.	Feb 2009	A1
20090060222	Jeong et al.	Mar 2009	A1
20090063143	Schmidt et al.	Mar 2009	A1
20090089054	Wang et al.	Apr 2009	A1
20090116652	Kirkeby et al.	May 2009	A1
20090116656	Lee et al.	May 2009	A1
20090134829	Baumann et al.	May 2009	A1
20090141908	Jeong et al.	Jun 2009	A1
20090147942	Culter	Jun 2009	A1
20090150149	Culter et al.	Jun 2009	A1
20090164905	Ko	Jun 2009	A1
20090177464	Gao et al.	Jul 2009	A1
20090192791	El-Maleh et al.	Jul 2009	A1
20090192803	Nagaraja	Jul 2009	A1
20090220107	Every et al.	Sep 2009	A1
20090226010	Schnell et al.	Sep 2009	A1
20090228272	Herbig et al.	Sep 2009	A1
20090240497	Usher et al.	Sep 2009	A1
20090253418	Makinen	Oct 2009	A1
20090264114	Virolainen et al.	Oct 2009	A1
20090271187	Yen et al.	Oct 2009	A1
20090292536	Hetherington et al.	Nov 2009	A1
20090303350	Terada	Dec 2009	A1
20090323655	Cardona et al.	Dec 2009	A1
20090323925	Sweeney et al.	Dec 2009	A1
20090323981	Cutler	Dec 2009	A1
20090323982	Solbach et al.	Dec 2009	A1
20100017205	Visser et al.	Jan 2010	A1
20100036659	Haulick et al.	Feb 2010	A1
20100082339	Konchitsky et al.	Apr 2010	A1
20100092007	Sun	Apr 2010	A1
20100094622	Cardillo et al.	Apr 2010	A1
20100103776	Chan	Apr 2010	A1
20100105447	Sibbald et al.	Apr 2010	A1
20100128123	DiPoala	May 2010	A1
20100130198	Kannappan et al.	May 2010	A1
20100138220	Matsumoto et al.	Jun 2010	A1
20100177916	Gerkmann et al.	Jul 2010	A1
20100215184	Buck et al.	Aug 2010	A1
20100217837	Ansari et al.	Aug 2010	A1
20100245624	Beaucoup	Sep 2010	A1
20100278352	Petit et al.	Nov 2010	A1
20100282045	Chen et al.	Nov 2010	A1
20100303298	Marks et al.	Dec 2010	A1
20100315482	Rosenfeld et al.	Dec 2010	A1
20110026734	Hetherington et al.	Feb 2011	A1
20110038486	Beaucoup	Feb 2011	A1
20110060587	Phillips et al.	Mar 2011	A1
20110081024	Soulodre	Apr 2011	A1
20110081026	Ramakrishnan et al.	Apr 2011	A1
20110091047	Konchitsky et al.	Apr 2011	A1
20110101654	Cech	May 2011	A1
20110129095	Avendano et al.	Jun 2011	A1
20110173006	Nagel et al.	Jul 2011	A1
20110173542	Imes et al.	Jul 2011	A1
20110178800	Watts	Jul 2011	A1
20110182436	Murgia et al.	Jul 2011	A1
20110224994	Norvell et al.	Sep 2011	A1
20110261150	Goyal et al.	Oct 2011	A1
20110280154	Silverstrim et al.	Nov 2011	A1
20110286605	Furuta et al.	Nov 2011	A1
20110300806	Lindahl et al.	Dec 2011	A1
20110305345	Bouchard et al.	Dec 2011	A1
20120010881	Avendano et al.	Jan 2012	A1
20120027217	Jun et al.	Feb 2012	A1
20120027218	Every et al.	Feb 2012	A1
20120050582	Seshadri et al.	Mar 2012	A1
20120062729	Hart et al.	Mar 2012	A1
20120063609	Triki et al.	Mar 2012	A1
20120087514	Williams et al.	Apr 2012	A1
20120093341	Kim et al.	Apr 2012	A1
20120116758	Murgia et al.	May 2012	A1
20120116769	Malah et al.	May 2012	A1
20120133728	Lee	May 2012	A1
20120143363	Liu et al.	Jun 2012	A1
20120179461	Every et al.	Jul 2012	A1
20120179462	Klein	Jul 2012	A1
20120182429	Forutanpour et al.	Jul 2012	A1
20120197898	Pandey et al.	Aug 2012	A1
20120202485	Mirbaha et al.	Aug 2012	A1
20120220347	Davidson	Aug 2012	A1
20120231778	Chen et al.	Sep 2012	A1
20120249785	Sudo et al.	Oct 2012	A1
20120250882	Mohammad et al.	Oct 2012	A1
20130011111	Abraham et al.	Jan 2013	A1
20130024190	Fairey	Jan 2013	A1
20130034243	Yermeche et al.	Feb 2013	A1
20130051543	McDysan et al.	Feb 2013	A1
20130182857	Namba et al.	Jul 2013	A1
20130196715	Hansson et al.	Aug 2013	A1
20130231925	Avendano et al.	Sep 2013	A1
20130251170	Every et al.	Sep 2013	A1
20130268280	Del Galdo et al.	Oct 2013	A1
20130332171	Avendano et al.	Dec 2013	A1
20140039888	Taubman et al.	Feb 2014	A1
20140098964	Rosca et al.	Apr 2014	A1
20140108020	Sharma et al.	Apr 2014	A1
20140112496	Murgia et al.	Apr 2014	A1
20140142958	Sharma et al.	May 2014	A1
20140241702	Solbach et al.	Aug 2014	A1
20140337016	Herbig et al.	Nov 2014	A1
20150030163	Sokolov	Jan 2015	A1
20150100311	Kar et al.	Apr 2015	A1
20160027451	Solbach et al.	Jan 2016	A1
20160063997	Nemala et al.	Mar 2016	A1
20160066089	Klein	Mar 2016	A1

Foreign Referenced Citations (63)

Number	Date	Country
0756437	Jan 1997	EP
1232496	Aug 2002	EP
1536660	Jun 2005	EP
20100431	Dec 2010	FI
20125812	Oct 2012	FI
20135038	Apr 2013	FI
124716	Dec 2014	FI
H0553587	Mar 1993	JP
H07248793	Sep 1995	JP
H05300419	Dec 1995	JP
2001159899	Jun 2001	JP
2002366200	Dec 2002	JP
2002542689	Dec 2002	JP
2003514473	Apr 2003	JP
2003271191	Sep 2003	JP
2004187283	Jul 2004	JP
2006094522	Apr 2006	JP
2006515490	May 2006	JP
2006337415	Dec 2006	JP
2007006525	Jan 2007	JP
2008015443	Jan 2008	JP
2008135933	Jun 2008	JP
2008542798	Nov 2008	JP
2009037042	Feb 2009	JP
2010532879	Oct 2010	JP
2011527025	Oct 2011	JP
H07336793	Dec 2011	JP
2013517531	May 2013	JP
2013534651	Sep 2013	JP
5762956	Jun 2015	JP
1020100041741	Apr 2010	KR
1020110038024	Apr 2011	KR
1020120116442	Oct 2012	KR
1020130117750	Oct 2013	KR
101461141	Nov 2014	KR
101610656	Apr 2016	KR
526468	Apr 2003	TW
I279776	Apr 2007	TW
200910793	Mar 2009	TW
201009817	Mar 2010	TW
201143475	Dec 2011	TW
201214418	Apr 2012	TW
I463817	Dec 2014	TW
I488179	Jun 2015	TW
WO8400634	Feb 1984	WO
WO0137265	May 2001	WO
WO0156328	Aug 2001	WO
WO2006027707	Mar 2006	WO
WO2007001068	Jan 2007	WO
WO2007049644	May 2007	WO
WO2008034221	Mar 2008	WO
WO2008101198	Aug 2008	WO
WO2009008998	Jan 2009	WO
WO2010005493	Jan 2010	WO
WO2011068901	Jun 2011	WO
WO2011091068	Jul 2011	WO
WO2011129725	Oct 2011	WO
WO2012009047	Jan 2012	WO
WO2012097016	Jul 2012	WO
WO2013188562	Dec 2013	WO
WO2014063099	Apr 2014	WO
WO2014131054	Aug 2014	WO
WO2016033364	Mar 2016	WO

Non-Patent Literature Citations (125)

Entry
Non-Final Office Action, Aug. 18, 2010, U.S. Appl. No. 11/825,563, filed Jul. 6, 2007.
Final Office Action, Apr. 28, 2011, U.S. Appl. No. 11/825,563, filed Jul. 6, 2007.
Non-Final Office Action, Apr. 24, 2013, U.S. Appl. No. 11/825,563, filed Jul. 6, 2007.
Final Office Action, Dec. 30, 2013, U.S. Appl. No. 11/825,563, filed Jul. 6, 2007.
Notice of Allowance, Mar. 25, 2014, U.S. Appl. No. 11/825,563, filed Jul. 6, 2007.
Non-Final Office Action, Sep. 14, 2011, U.S. Appl. No. 12/004,897, filed Dec. 21, 2007.
Notice of Allowance, Jan. 27, 2012, U.S. Appl. No. 12/004,897, filed Dec. 21, 2007.
Non-Final Office Action, Jul. 28, 2011, U.S. Appl. No. 12/072,931, filed Feb. 29, 2008.
Notice of Allowance, Mar. 1, 2012, U.S. Appl. No. 12/072,931, filed Feb. 29, 2008.
Notice of Allowance, Mar. 1, 2012, U.S. Appl. No. 12/080,115, filed Mar. 31, 2008.
Non-Final Office Action, Nov. 14, 2011, U.S. Appl. No. 12/215,980, filed Jun. 30, 2008.
Final Office Action, Apr. 24, 2012, U.S. Appl. No. 12/215,980, filed Jun. 30, 2008.
Advisory Action, Jul. 3, 2012, U.S. Appl. No. 12/215,980, filed Jun. 30, 2008.
Non-Final Office Action, Mar. 11, 2014, U.S. Appl. No. 12/215,980, filed Jun. 30, 2008.
Final Office Action, Jul. 11, 2014, U.S. Appl. No. 12/215,980, filed Jun. 30, 2008.
Non-Final Office Action, Dec. 8, 2014, U.S. Appl. No. 12/215,980, filed Jun. 30, 2008.
Notice of Allowance, Jul. 7, 2015, U.S. Appl. No. 12/215,980, filed Jun. 30, 2008.
Non-Final Office Action, Sep. 1, 2011, U.S. Appl. No. 12/286,909, filed Oct. 2, 2008.
Notice of Allowance, Feb. 28, 2012, U.S. Appl. No. 12/286,909, filed Oct. 2, 2008.
Non-Final Office Action, Nov. 15, 2011, U.S. Appl. No. 12/286,995, filed Oct. 2, 2008.
Final Office Action, Apr. 10, 2012, U.S. Appl. No. 12/286,995, filed Oct. 2, 2008.
Notice of Allowance, Mar. 13, 2014, U.S. Appl. No. 12/286,995, filed Oct. 2, 2008.
Non-Final Office Action, Aug. 1, 2012, U.S. Appl. No. 12/860,043, filed Aug. 20, 2010.
Notice of Allowance, Jan. 18, 2013, U.S. Appl. No. 12/860,043, filed Aug. 22, 2010.
Non-Final Office Action, Aug. 17, 2012, U.S. Appl. No. 12/868,622, filed Aug. 25, 2010.
Final Office Action, Feb. 22, 2013, U.S. Appl. No. 12/868,622, filed Aug. 25, 2010.
Advisory Action, May 14, 2013, U.S. Appl. No. 12/868,622, filed Aug. 25, 2010.
Notice of Allowance, May 1, 2014, U.S. Appl. No. 12/868,622, filed Aug. 25, 2010.
Non-Final Office Action, Feb. 19, 2013, U.S. Appl. No. 12/944,659, filed Nov. 11, 2010.
Final Office Action, Jan. 12, 2016, U.S. Appl. No. 12/959,994, filed Dec. 3, 2010.
Notice of Allowance, May 25, 2011, U.S. Appl. No. 13/016,916, filed Jan. 28, 2011.
Notice of Allowance, Aug. 4, 2011, U.S. Appl. No. 13/016,916, filed Jan. 28, 2011.
Notice of Allowance, Oct. 3, 2013, U.S. Appl. No. 13/157,238, filed Jun. 9, 2011.
Non-Final Office Action, Nov. 2013, U.S. Appl. No. 13/363,362, filed Jan. 31, 2012.
Final Office Action, Sep. 12, 2014, U.S. Appl. No. 13/363,362, filed Jan. 31, 2012.
Non-Final Office Action, Oct. 28, 2015, U.S. Appl. No. 13/363,362, filed Jan. 31, 2012.
Non-Final Office Action, Dec. 4, 2013, U.S. Appl. No. 13/396,568, filed Feb. 14, 2012.
Final Office Action, Sep. 23, 2014, U.S. Appl. No. 13/396,568, filed Feb. 14, 2012.
Non-Final Office Action, Nov. 5, 2015, U.S. Appl. No. 13/396,568, filed Feb. 14, 2012.
Non-Final Office Action, May 11, 2012, U.S. Appl. No. 13/424,189, filed Mar. 19, 2012.
Final Office Action, Sep. 4, 2012, U.S. Appl. No. 13/424,189, filed Mar. 19, 2012.
Final Office Action, Nov. 28, 2012, U.S. Appl. No. 13/424,189, filed Mar. 19, 2012.
Notice of Allowance, Mar. 7, 2013, U.S. Appl. No. 13/424,189, filed Mar. 19, 2012.
Non-Final Office Action, Jun. 7, 2012, U.S. Appl. No. 13/426,436, filed Mar. 21, 2012.
Final Office Action, Dec. 31, 2012, U.S. Appl. No. 13/426,436, filed Mar. 21, 2012.
Non-Final Office Action, Sep. 12, 2013, U.S. Appl. No. 13/426,436, filed Mar. 21, 2012.
Notice of Allowance, Jul. 16, 2014, U.S. Appl. No. 13/426,436, filed Mar. 21, 2012.
Non-Final Office Action, Nov. 7, 2012, U.S. Appl. No. 13/492,780, filed Jun. 8, 2012.
Non-Final Office Action, May 8, 2013, U.S. Appl. No. 13/492,780, filed Jun. 8, 2012.
Final Office Action, Oct. 23, 2013, U.S. Appl. No. 13/492,780, filed Jun. 8, 2012.
Notice of Allowance, Nov. 24, 2014, U.S. Appl. No. 13/492,780, filed Jun. 8, 2012.
Non-Final Office Action, May 23, 2014, U.S. Appl. No. 13/859,186, filed Apr. 9, 2013.
Final Office Action, Dec. 3, 2014, U.S. Appl. No. 13/859,186, filed Apr. 9, 2013.
Non-Final Office Action, Jul. 7, 2015, U.S. Appl. No. 13/859,186, filed Apr. 9, 2013.
Final Office Action, Feb. 2, 2016, U.S. Appl. No. 13/859,186, filed Apr. 9, 2013.
Notice of Allowance, Apr. 28, 2016, U.S. Appl. No. 13/859,186, filed Apr. 9, 2013.
Non-Final Office Action, Apr. 17, 2015, U.S. Appl. No. 13/888,796, filed May 7, 2013.
Non-Final Office Action, Jul. 14, 2015, U.S. Appl. No. 14/046,551, filed Oct. 4, 2013.
Notice of Allowance, May 20, 2015, U.S. Appl. No. 13/888,796, filed May 7, 2013.
Non-Final Office Action, Apr. 19, 2016, U.S. Appl. No. 14/046,551, filed Oct. 4, 2013.
Non-Final Office Action, May 21, 2015, U.S. Appl. No. 14/189,817, filed Feb. 25, 2014.
Final Office Action, Dec. 15, 2015, U.S. Appl. No. 14/189,817, filed Feb. 25, 2014.
Non-Final Office Action, Jul. 15, 2015, U.S. Appl. No. 14/058,059, filed Oct. 18, 2013.
Non-Final Office Action, Jun. 26, 2015, U.S. Appl. No. 14/262,489, filed Apr. 25, 2014.
Notice of Allowance, Jan. 28, 2016, U.S. Appl. No. 14/313,883, filed Jun. 24, 2014.
Non-Final Office Action, Jun. 26, 2015, U.S. Appl. No. 14/626,489, filed Apr. 25, 2014.
Non-Final Office Action, Jun. 10, 2015, U.S. Appl. No. 14/628,109, filed Feb. 20, 2015.
Final Office Action, Mar. 16, 2016, U.S. Appl. No. 14/628,109, filed Feb. 20, 2015.
Non-Final Office Action, Apr. 8, 2016, U.S. Appl. No. 14/838,133, filed Aug. 27, 2015.
Dahl, Mattias et al., “Simultaneous Echo Cancellation and Car Noise Suppression Employing a Microphone Array”, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, Apr. 21-24, pp. 239-242.
Graupe, Daniel et al., “Blind Adaptive Filtering of Speech from Noise of Unknown Spectrum Using a Virtual Feedback Configuration”, IEEE Transactions on Speech and Audio Processing, Mar. 2000, vol. 8, No. 2, pp. 146-158.
Kato et al., “Noise Suppression with High Speech Quality Based on Weighted Noise Estimation and MMSE STSA” Proc. IWAENC [Online] 2001, pp. 183-186.
Soon et al., “Low Distortion Speech Enhancement”, Proc. Inst. Elect. Eng. [Online] 2000, vol. 147, pp. 247-253.
Stahl, V. et al., “Quantile Based Noise Estimation for Spectral Subtraction and Wiener Filtering,” 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing, Jun. 5-9, vol. 3, pp. 1875-1878.
Tchorz, Jurgen et al., “SNR Estimation Based on Amplitude Modulation Analysis with Applications to Noise Suppression”, IEEE Transactions on Speech and Audio Processing, vol. 11, No. 3, May 2003, pp. 184-192.
Yoo, Heejong et al., “Continuous-Time Audio Noise Suppression and Real-Time Implementation”, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, May 13-17, pp. IV3980-IV3983.
International Search Report and Written Opinion dated Oct. 1, 2008 in Patent Cooperation Treaty Application No. PCT/US2008/008249.
International Search Report and Written Opinion dated Aug. 27, 2009 in Patent Cooperation Treaty Application No. PCT/US2009/003813.
Dahl, Mattias et al., “Acoustic Echo and Noise Cancelling Using Microphone Arrays”, International Symposium on Signal Processing and its Applications, ISSPA, Gold coast, Australia, Aug. 25-30, 1996, pp. 379-382.
International Search Report and Written Opinion dated Sep. 1, 2011 in Patent Cooperation Treaty Application No. PCT/US11/37250.
Fazel et al., “An overview of statistical pattern recognition techniques for speaker verification,” IEEE, May 2011.
Sundaram et al., “Discriminating Two Types of Noise Sources Using Cortical Representation and Dimension Reduction Technique,” IEEE, 2007.
Bach et al., “Learning Spectral Clustering with application to spech separation”, Journal of machine learning research, 2006.
Tognieri et al., “A Comparison of the LBG, LVQ, MLP, SOM and GMM Algorithms for Vector Quantisation and Clustering Analysis,” University of Western Australia, 1992.
Klautau et al., “Discriminative Gaussian Mixture Models a Comparison with Kernel Classifiers,” ICML, 2003.
Mokbel et al., “Automatic Word Recognition in Cars,” IEEE Transactions of Speech and Audio Processing, vol. 3, No. 5, Sep. 1995, pp. 346-356.
Office Action mailed Oct. 14, 2013 in Taiwan Patent Application 097125481, filed Jul. 4, 2008.
Office Action mailed Oct. 29, 2013 in Japan Patent Application 2011-516313, filed Jun. 26, 2009.
Office Action mailed Dec. 9, 2013 in Finland Patent Application 20100431, filed Jun. 26, 2009.
Office Action mailed Jan. 20, 2014 in Finland Patent Application 20100001, filed Jul. 3, 2008.
International Search Report & Written Opinion dated Mar. 18, 2014 in Patent Cooperation Treaty Application No. PCT/US2013/065752, filed Oct. 18, 2013.
Office Action mailed Oct. 17, 2013 in Taiwan Patent Application 097125481, filed Jul. 4, 2008.
Allowance mailed May 21, 2014 in Finland Patent Application 20100001, filed Jan. 4, 2010.
Office Action mailed May 2, 2014 in Taiwan Patent Application 098121933, filed Jun. 29, 2009.
Office Action mailed Apr. 15, 2014 in Japan Patent Application 2010-514871, filed Jul. 3, 2008.
Office Action mailed Jun. 27, 2014 in Korean Patent Application No. 10-2010-7000194, filed Jan. 6, 2010.
International Search Report & Written Opinion dated Jul. 15, 2014 in Patent Cooperation Treaty Application No. PCT/US2014/018443, filed Feb. 25, 2014.
Notice of Allowance dated Sep. 16, 2014 in Korean Application No. 10-2010-7000194, filed Jul. 3, 2008.
Notice of Allowance dated Sep. 29, 2014 in Taiwan Application No. 097125481, filed Jul. 4, 2008.
Notice of Allowance dated Oct. 10, 2014 in Finland Application No. 20100001, filed Jul. 3, 2008.
Notice of Allowance mailed Feb. 10, 2015 in Taiwan Patent Application No. 098121933, filed Jun. 29, 2009.
Office Action mailed Mar. 24, 2015 in Japan Patent Application No. 2011-516313, filed Jun. 26, 2009.
Office Action mailed Apr. 16, 2015 in Korean Patent Application No. 10-2011-7000440, filed Jun. 26, 2009.
Notice of Allowance mailed Jun. 2, 2015 in Japan Patent Application 2011-516313, filed Jun. 26, 2009.
Kim et al., “Improving Speech Intelligibility in Noise Using Environment-Optimized Algorithms,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, No. 8, Nov. 2010, pp. 2080-2090.
Sharma et al., “Rotational Linear Discriminant Analysis Technique for Dimensionality Reduction,” IEEE Transactions on Knowledge and Data Engineering, vol. 20, No. 10, Oct. 2008, pp. 1336-1347.
Temko et al., “Classifiation of Acoustic Events Using SVM-Based Clustering Schemes,” Pattern Recognition 39, No. 4, 2006, pp. 682-694.
Office Action mailed Jun. 9, 2015 in Japan Patent Application 2014-165477 filed Jul. 3, 2008.
Office Action mailed Jun. 17, 2015 in Japan Patent Application 2013-519682 filed May 19, 2011.
International Search Report & Written Opinion dated Nov. 27, 2015 in Patent Cooperation Treaty Application No. PCT/US2015/047263, filed Aug. 27, 2015.
Notice of Allowance dated Feb. 24, 2016 in Korean Application No. 10-2011-7000440, filed Jun. 26, 2009.
Hu et al., “Robust Speaker's Location Detection in a Vehicle Environment Using GMM Models,” IEEE Transactions on Systems, Man, and Cybernetics—Part B: Cybernetics, vol. 36, No. 2, Apr. 2006, pp. 403-412.
International Search Report and Written Opinion dated Feb. 7, 2011 in Application No. PCT/US10/58600.
International Search Report dated Dec. 20, 2013 in Patent Cooperation Treaty Application No. PCT/US2013/045462, filed Jun. 12, 2013.
Office Action dated Aug. 26, 2014 in Japanese Application No. 2012-542167, filed Dec. 1, 2010.
Office Action mailed Oct. 31, 2014 in Finnish Patent Application No. 20125600, filed Jun. 1, 2012.
Office Action mailed Jul. 21, 2015 in Japanese Patent Application 2012-542167 filed Dec. 1, 2010.
Office Action mailed Sep. 29, 2015 in Finnish Patent Application 20125600, filed Dec. 1, 2010.
Goodwin, Michael M. et al., “Key Click Suppression”, U.S. Appl. No. 14/745,176, filed Jun. 19, 2015, 25 pages.
Final Office Action, May 5, 2016, U.S. Appl. No. 13/363,362, filed Jan. 31, 2012.
Non-Final Office Action, May 6, 2016, U.S. Appl. No. 14/495,550, filed Sep. 24, 2014.
Non-Final Office Action, May 31, 2016, U.S. Appl. No. 14/874,329, filed Oct. 2, 2015.
Final Office Action, Jun. 17, 2016, U.S. Appl. No. 13/396,568, filed Feb. 14, 2012.
Advisory Action, Jul. 29, 2016, U.S. Appl. No. 13/363,362, filed Jan. 31, 2012.
Final Office Action, Aug. 30, 2016, U.S. Appl. No. 14/838,133, filed Aug. 27, 2015.

Provisional Applications (1)

	Number	Date	Country
	61346851	May 2010	US

Noise suppression assisted automatic speech recognition

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

Field of Search

US

CPC

International Classifications

Term Extension