The present invention relates to noise reduction in speech signal processing for automatic speech recognition and hands free speech communication.
In speech signal processing for automatic speech recognition (ASR) and hands free speech communication, a microphone signal is usually first segmented into overlapping blocks of appropriate size and a window function is applied. The speech signal processing can be performed in the time domain and/or in the frequency domain. Time domain processing operates directly on speech waveform signals while frequency domain processing operates on spectral representations of the speech signal.
Operations carried out in the frequency domain are achieved by a short-term Fourier transform (STFT). In this process, the sequence of sampled amplitude values x in dependence of the sample index i is multiplied with a window sequence w for every Rth sample and then discretely transformed to the frequency domain. This step is called analysis, and its realization is often referred to as an analysis filter bank:
for each sample i, frame k, frequency bin μ, frame shift R, window length L, and DFT size N. After processing in the frequency domain, the resulting spectrum is transformed to the time domain again by an inverse STFT. In analogy to the previous step, this one is called synthesis, and its implementation is referred to as the synthesis filter bank.
Frequency domain processing produces noisy short-term spectra signals. In order to reduce the undesirable noise components while keeping the speech signal as natural as possible, SNR-dependent (SNR: signal-to-noise ratio) weighting coefficients are computed and applied to the spectra signals. Common noise reduction algorithms make assumptions to the type of noise present in a noisy signal. The Wiener filter for example introduces the mean of squared errors (MSE) cost function as an objective distance measure to optimally minimize the distance between the desired and the filtered signal. The MSE however does not account for human perception of signal quality. Also, filtering algorithms are usually applied to each of the frequency bins independently. Thus, all types of signals are treated equally. This allows for good noise reduction performance under many different circumstances.
However, mobile communication situations in an automobile environment are special in that they contain speech as their desired signal. The noise present while driving is mainly characterized by increasing noise levels with lower frequency. Speech signal processing starts with an input audio signal from a speech-sensing microphone. The microphone signal represents a composite of multiple different sound sources. Except for the speech component, all of the other sound source components in the microphone signal act as undesirable noise that complicates the processing of the speech component.
Separating the desired speech component from the noise components has been especially difficult in moderate to high noise settings, especially within the cabin of an automobile traveling at highway speeds, when multiple persons are simultaneously speaking, or in the presence of audio content. Often in high noise conditions, only a low output quality is achieved. Usually, speech signal components are heavily distorted to such an extent that desired speech components are masked by the background noise. Standard noise suppression rules classify these parts as noise; as a consequence, a maximum attenuation is applied.
Embodiments of the present invention are directed to an arrangement for speech signal processing for automatic speech recognition and hands free speech communication. The processing may be accomplished on a speech signal prior to speech recognition, for example, with mobile telephony signals and more specifically in automotive environments that are noisy.
A signal pre-processor module transforms an input microphone signal into corresponding speech component signals. A noise suppression module applies noise reduction to the speech component signals to generate noise reduced speech component signals. A speech reconstruction module produces corresponding synthesized speech component signals for distorted speech component signals. A signal combination block adaptively combines the noise reduced speech component signals and the synthesized speech component signals based on signal to noise conditions to generate enhanced speech component signals for automatic speech recognition and hands free speech communication.
In further specific embodiments, the speech component signals may be time-domain speech component signals or frequency-domain speech component signals. The speech reconstruction module may use a non-linear function of speech components for producing the synthesized speech component signals. For example, distorted or missing harmonics of the speech component signals can be restored by the non-linear function.
The speech reconstruction module may use relevant features extracted from the distorted speech component signals for producing the synthesized speech component signals. The speech reconstruction module may use a source-filter model for producing the synthesized speech component signals. The speech reconstruction module may use a parametric model of envelope shape for producing the synthesized speech component signals. The speech reconstruction module may include a noise pre-processing module for noise suppression of the distorted speech component signals to provide speech synthesis reference signals. The speech reconstruction module may divide the distorted speech component signals into different frequency bands for producing the synthesized speech component signals.
Embodiments of the present invention are directed to enhancing highly distorted speech using a computationally efficient partial speech signal reconstruction algorithm. The partial signal reconstruction exploits the correlation of the speech components and replaces highly disturbed speech component signals with a corresponding synthesized speech component signal. The synthesized speech component signal is based on speech components with sufficient signal-to-noise ratio (SNR) using a nonlinear operator (nonlinear characteristic) as well as relevant features extracted from the noisy speech signal. The synthesized speech component signals are combined with the noise reduced speech component signals to generate an enhanced speech component signal output. Specific embodiments may operate on speech component signals in either the time domain or in the frequency domain. The following discussion is specific to operation in the frequency domain, but those in the field will appreciate that the principle of the invention is equally applicable to signal processing of time domain speech component signals.
Looking in greater detail at the speech reconstruction module 102, a noise pre-processing module 107 utilizes a noise suppression algorithm to attenuate undesired signal components. Typically, high noise attenuation (e.g., about 30 dB) is applied by the pre-processing module 107 to generate a greatly noise-reduced reference signal ŜPP for properly regenerating distorted speech components.
Non-linear operator module 108 regenerates distorted or missing harmonics of the noise-reduced reference signal ŜNR utilizing a nonlinear function that applies a nonlinear characteristic to a harmonic signal to produce a non-linear speech signal Ŝnl that has sub- and super-harmonics. Normally a nonlinear characteristic can only be applied efficiently (in terms of computational complexity) to speech signals in the time-domain, and using a nonlinear function for speech signal processing in the frequency domain comes at a much higher computational cost. So for efficiency reasons, in the frequency domain the non-linear operator module 108 may apply the non-linear operation using auto-convolution of a subset of the current sub-band signals. For example, the non-linear operator module 108 may divide the short-term spectrum into different frequency ranges (e.g., into 1 kHz bands from 100 Hz up to 4 kHz) and depending on the measured variability within each band, an upper frequency limit for the convolution can be adaptively determined. In embodiments operating on time domain speech component signals, the non-linear operator module 108 may apply a non-linear operation based on, for example, a quadratic characteristic or a half-way rectification of the speech component signals.
An excitation signal generator 109 receives the non-linear speech signal Snl and for voiced signals generates a synthetic excitation signal {tilde over (S)}exc having harmonics at the desired frequencies but with biased amplitudes. In order to reduce the bias, the excitation signal generator 109 may divide the incoming non-linear speech signal Snl by its envelope.
A separate envelope estimation module 110 estimates the frequency-domain envelope
Synthesis combination module 111 uses a source-filter model to combine the estimated envelope
The adaptive mixer 106 adaptively combines the synthesized spectra signals Ŝest with the noise reduced spectra signals ŜNR to produce enhanced frequency domain speech signals Ŝenhanced. At low SNRs, highly distorted noise reduced spectra signals ŜNR are replaced by the synthesized spectra signals Ŝest. At more moderate SNR conditions (i.e. medium and high SNRs), only the magnitudes of the spectra signals Ŝest and of the noise reduced spectra signals ŜNR are adaptively combined. The adaptive mixer 106 afterwards combines the resulting magnitude estimate with the phase of the noise reduced spectra signals ŜNR. Thus speech reconstruction is performed just for degraded voiced speech components. The synthesized phase is only applied for highly degraded voiced speech components.
A voice activity detector can be used to detect the presence of speech segments, and a voiced speech detector can be used to detect voice speech components. To detect corrupted voiced speech components, the adaptive mixer 106 can perform a comparison at each frequency bin between the synthesized spectra signals Ŝest and the noise reduced spectra signals ŜNR. The amplitudes of the synthesized spectra signals Ŝest at the pitch frequency and at harmonics are always greater than the noise reduced spectra signals ŜNR (for degraded speech components [highly degraded harmonics]. If the harmonics are degraded the noise reduction filter will apply a maximum attenuation.) Thus the speech reconstruction is only performed at those frequency bins where the synthesized spectra signals Ŝest are greater than the noise reduced spectra signals ŜNR by a specified margin.
Looking back at the envelope estimation more formally, the task of the envelope estimation module 110 is to estimate the smoothed spectrum of the clean speech signal as closely as possible. A decision is made as to how to separate the contributions of the voiced excitation and the vocal tract to the clean speech spectrum so as to supply comparable results. A time-smoothed version {circumflex over (Φ)}{circumflex over (D)}
{circumflex over (Φ)}{circumflex over (D)}
{circumflex over (Φ)}{circumflex over (D)}
where the discrete-time smoothing constant γ
where R is the frame shift, τ
The smoothed amplitude spectrum of the microphone or of the noise reduced signal is a good estimator of the envelope at high SNR conditions. But at low SNR levels, reliable estimation no longer can be achieved, especially at low frequencies if speech formants are masked by the background noise. Hence the envelope estimation module 110 can develop the envelope estimation using a parametric model for the shape of the envelope
For off-line parameter optimization based on a training data set of clean speech, the same definition can be used for the known signal's amplitude spectrum {circumflex over (Φ)}S:
The estimation task is the minimization of the sum of absolute differences in each relevant frame.
On modern computers, unsupervised vector quantization methods such as a codebook approach are mainly used to reduce the dimension of a feature space in order to save memory space and computation power. But codebook based estimation is still in common use in embedded systems, where these resources are limited. In the training step, codebooks store information on prototype feature vectors extracted from a database. In the testing step, incomplete feature vectors can be matched against all codebook entries by use of a cost or distance function. The codebook entry that best matches the incomplete vector is used to fill in missing features. In the task at hand, missing features are frequency bins where the corresponding filter coefficients lie below a certain threshold, thus indicating a low SNR. In these bins, the estimate amplitude value is dominated by the noise power, and the speech power is bound to be much lower. A codebook can be implemented using the well-known Linde-Buzo-Gray (LBG) algorithm for training on clean speech signal envelopes using 256 feature vectors each with M amplitude values.
Besides the codebook approach, other estimation methods can be considered, for example, based on bin-wise computation of available spectral information. In frequency bins where the noise estimation is accurate and close to 0, the noise reduction filter will output coefficients near 1 or 0 dB on a logarithmic scale. Here, the microphone spectrum provides a fairly good estimate for the speech spectrum at high SNR levels.
Microphone. The noisy microphone signal's amplitude spectrum {circumflex over (Φ)}X can be smoothed and used as a first approximation:
As is to be expected, this approximation only works well where no or very little noise is present. It can however be useful as a source of information applied by other estimators.
Average. While similar to the microphone estimator, an average estimation method indirectly takes into account the filter coefficients H(k, μ) available from a preceding noise reduction. These can range from the spectral floor up to a value of 1 and contain information on how much noise is present in each bin μ and frame k. Higher values indicate a better SNR. The actual implementation can rely on the smoothed amplitudes of the noise reduced spectra signals ŜNR:
As explained above, in a separate processing stage, the frequency-domain envelope of the original speech signal is estimated. The estimated envelope is then imprinted on the generated excitation signal in order to reconstruct the original speech spectrum as accurately as possible. The envelope estimation is implemented in a novel way that can be applied online at a very low computational power cost. First, the noise suppression filter's coefficients are analyzed to distinguish bins with good and bad speech signal power estimation. The well-estimated bins are then used as supporting points for the estimation of the badly estimated ones. The estimation itself consists in the use of a parametric model for the envelope shape. For extrapolating towards low frequencies, a logarithmic parabola is used. The shape is derived from observed envelope shapes and is modified by several parameters, such as the parabola's curvature. Several features can easily be extracted from the signal, such as the position of the lowest well estimated frequency bin. These features are used to determine a good estimate for the curves' parameters.
Disturbances in the low frequency spectrum are prevalent in hands-free automobile telecommunication systems. Typically a number of consecutive low-frequency bins do not offer enough speech signal information for estimating the envelope at that frequency bin. In a linear extrapolation approach, missing envelope information can be approximated by constructing a straight line in the logarithmic amplitude spectrum.
The lowest well-estimated bin's index in a frame k is referred to as μ°(k), and the logarithmic amplitude value in that place is
Parabolic Extrapolation. It is not known directly whether the low-frequency band contains an obscured formant or not. As an extension to the aforementioned linear extrapolation, a parabolic estimation model can be parameterized by the slope of the reference estimation to guess the optimal course of the envelope. This can be more flexible than the linear method, including it as a special case. Hence, it bears an equal-or-better potential performance but also introduces higher worst-case errors. From the reference estimation, a reference slope mR can be computed:
and used to determine the parabola's initial slope m°(k)=m(k, μ°)=mR at μ°. The curve can be fixed by setting the remaining degree of freedom, called here the curvature J(k). Additionally, a restriction on the curve's maximum slope mmax can be introduced and the minimum slope can be restricted to that of the estimated noise amplitude spectrum:
For lower frequencies,
p(k, μ):=J(k)·p(k, μ+1)·max{m{circumflex over (D)},min{mmax,m(k, μ)}}
where the slope m(k, μ) is:
This results in a parabolic shape that is determined by the parameters initial slope m°(k), curvature J(k), and minimum slope mmin(k), as well as on the course of the noise spectrum via mD.
In the foregoing the construction of a parabolic curve depending on parameters was described. But it has not yet been formally defined how these parameters are derived from the features extracted from the available information. Functional dependencies between the features and the optimal parameters can be established off-line from a training data set to obtain a simple constant or linear dependency from a feature to a parameter. In this way, the computational power cost under operating conditions stays minimal.
Detecting functional dependencies can be performed in two steps. In the first one, a minimum search for the cost function can be performed on the available parameters. This establishes the optimum approximation that a model is capable of. The parameters defining this optimal approximation can be saved together with the feature extracted from information that would be available also in testing conditions. In the second step, the tuples of features and corresponding optimal parameters can be analyzed to detect approximate functional dependencies between them. If there appears to be a dependency, a zero- to second-order polynomial can be fit on the data to represent it. This can be evaluated in a testing step.
Optimum Parameter Detection and Feature Extraction. First, the testing data set can be searched for frames in which reconstruction can likely be successfully performed. Criterion for exclusion may be insufficient voiced excitation power (with modified parameters). The band used goes from flow=800 Hz to fhigh=2400 Hz, and the threshold value is PVUD=2 dB. In the remaining frames, the noisy signals can be processed and the envelope estimation parameters for each gap in the smoothed spectrum can be optimized to approximate the known clean speech smoothed spectrum. The cost function C used in each frame k for finding the best fit of the parametric model
Apart from the optimized parameters for each frame, several features can be extracted from the noisy speech signal. These features may be likely to carry valuable information on the corresponding optimal parameters. Note that no features need be taken from the known clean speech signals, but from the noisy speech signals only. The optimal parameters together with the corresponding features can be treated as a tuple for each relevant frame, and all of the tuples can be sequentially written to a single file.
Applying the parametric estimation process on a testing data set yielded better estimation results than the codebook approach which is also much more expensive in terms of memory and computation power requirements. Even a limited amount of available training data sufficed to improve the results. While the “average” estimator gave a good reference even for some of the badly estimated frequency bins, it failed to provide a good estimation under low-SNR conditions. After un-biasing, it was still the best choice in situations with high SNR due to its very low complexity. The codebook approach spread the mean error evenly over the spectrum and thus performed better in very low frequencies where other estimators have difficulties, especially the non-parametric ones. It did not profit very much from un-biasing and had a singular advantage at 100 km/h in very low frequencies. It was not obvious why this was the case. Its demanding complexity in memory and computation power was not justified by its performance. Under high-noise conditions, the “parabola” and “line” estimators yielded the best results of the tested estimators, the latter slightly outperforming the former. However, the “parabola” estimation profited more from un-biasing than the “line” method. This makes it a viable choice, since the absolute computation power needed is only slightly higher than that of the “line” estimator.
Embodiments of the invention may be implemented in whole or in part in any conventional computer programming language such as VHDL, SystemC, Verilog, ASM, etc. Alternative embodiments of the invention may be implemented as pre-programmed hardware elements, other related components, or as a combination of hardware and software components.
Embodiments can be implemented in whole or in part as a computer program product for use with a computer system. Such implementation may include a series of computer instructions fixed either on a tangible medium, such as a computer readable medium (e.g., a diskette, CD-ROM, ROM, or fixed disk) or transmittable to a computer system, via a modem or other interface device, such as a communications adapter connected to a network over a medium. The medium may be either a tangible medium (e.g., optical or analog communications lines) or a medium implemented with wireless techniques (e.g., microwave, infrared or other transmission techniques). The series of computer instructions embodies all or part of the functionality previously described herein with respect to the system. Those skilled in the art should appreciate that such computer instructions can be written in a number of programming languages for use with many computer architectures or operating systems. Furthermore, such instructions may be stored in any memory device, such as semiconductor, magnetic, optical or other memory devices, and may be transmitted using any communications technology, such as optical, infrared, microwave, or other transmission technologies. It is expected that such a computer program product may be distributed as a removable medium with accompanying printed or electronic documentation (e.g., shrink wrapped software), preloaded with a computer system (e.g., on system ROM or fixed disk), or distributed from a server or electronic bulletin board over the network (e.g., the Internet or World Wide Web). Of course, some embodiments of the invention may be implemented as a combination of both software (e.g., a computer program product) and hardware. Still other embodiments of the invention are implemented as entirely hardware, or entirely software (e.g., a computer program product).
Although various exemplary embodiments of the invention have been disclosed, it should be apparent to those skilled in the art that various changes and modifications can be made which will achieve some of the advantages of the invention without departing from the true scope of the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2012/062549 | 10/30/2012 | WO | 00 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2014/070139 | 5/8/2014 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
4015088 | Dubnowski et al. | Mar 1977 | A |
4052568 | Jankowski | Oct 1977 | A |
4057690 | Vagliani et al. | Nov 1977 | A |
4359064 | Kimble | Nov 1982 | A |
4410763 | Strawcyznski et al. | Oct 1983 | A |
4672669 | DesBlache et al. | Jun 1987 | A |
4688256 | Yasunaga | Aug 1987 | A |
4764966 | Einkauf et al. | Aug 1988 | A |
4825384 | Sakurai | Apr 1989 | A |
4829578 | Roberts | May 1989 | A |
4864608 | Miyamoto et al. | Sep 1989 | A |
4914692 | Hartwell et al. | Apr 1990 | A |
5034984 | Bose | Jul 1991 | A |
5048080 | Bell et al. | Sep 1991 | A |
5125024 | Gokcen et al. | Jun 1992 | A |
5155760 | Johnson et al. | Oct 1992 | A |
5220595 | Uehara | Jun 1993 | A |
5239574 | Brandman et al. | Aug 1993 | A |
5349636 | Irribarren | Sep 1994 | A |
5394461 | Garland | Feb 1995 | A |
5416887 | Shimada | May 1995 | A |
5434916 | Hasegawa | Jul 1995 | A |
5475791 | Schalk et al. | Dec 1995 | A |
5574824 | Slyh et al. | Nov 1996 | A |
5577097 | Meek | Nov 1996 | A |
5581620 | Brandstein et al. | Dec 1996 | A |
5652828 | Silverman | Jul 1997 | A |
5708704 | Fisher | Jan 1998 | A |
5708754 | Wynn | Jan 1998 | A |
5721771 | Higuchi et al. | Feb 1998 | A |
5761638 | Knittle et al. | Jun 1998 | A |
5765130 | Nguyen | Jun 1998 | A |
5784484 | Umezawa | Jul 1998 | A |
5959675 | Mita et al. | Sep 1999 | A |
5978763 | Bridges | Nov 1999 | A |
6018771 | Hayden | Jan 2000 | A |
6061651 | Nguyen | May 2000 | A |
6098043 | Forest et al. | Aug 2000 | A |
6246986 | Ammicht et al. | Jun 2001 | B1 |
6266398 | Nguyen | Jul 2001 | B1 |
6279017 | Walker | Aug 2001 | B1 |
6373953 | Flaks | Apr 2002 | B1 |
6449593 | Valve | Sep 2002 | B1 |
6496581 | Finn et al. | Dec 2002 | B1 |
6526382 | Yuschik | Feb 2003 | B1 |
6574595 | Mitchell et al. | Jun 2003 | B1 |
6636156 | Damiani et al. | Oct 2003 | B2 |
6647363 | Claassen | Nov 2003 | B2 |
6717991 | Gustafsson et al. | Apr 2004 | B1 |
6778791 | Shimizu et al. | Aug 2004 | B2 |
6785365 | Nguyen | Aug 2004 | B2 |
7065486 | Thyssen | Jun 2006 | B1 |
7068796 | Moorer | Jun 2006 | B2 |
7069213 | Thompson | Jun 2006 | B2 |
7069221 | Crane et al. | Jun 2006 | B2 |
7117145 | Venkatesh et al. | Oct 2006 | B1 |
7162421 | Zeppenfeld et al. | Jan 2007 | B1 |
7171003 | Venkatesh et al. | Jan 2007 | B1 |
7206418 | Yang et al. | Apr 2007 | B2 |
7224809 | Hoetzel | May 2007 | B2 |
7274794 | Rasmussen | Sep 2007 | B1 |
7643641 | Haulick et al. | Jan 2010 | B2 |
8000971 | Ljolje | Aug 2011 | B2 |
8050914 | Schmidt et al. | Nov 2011 | B2 |
8359195 | Li | Jan 2013 | B2 |
8706483 | Gerl et al. | Apr 2014 | B2 |
20010038698 | Breed et al. | Nov 2001 | A1 |
20020184031 | Brittan et al. | Dec 2002 | A1 |
20030072461 | Moorer | Apr 2003 | A1 |
20030100345 | Gum | May 2003 | A1 |
20030185410 | June et al. | Oct 2003 | A1 |
20040047464 | Yu et al. | Mar 2004 | A1 |
20040076302 | Christoph | Apr 2004 | A1 |
20040230637 | Lecoueche et al. | Nov 2004 | A1 |
20050265560 | Haulick et al. | Dec 2005 | A1 |
20060206320 | Li | Sep 2006 | A1 |
20060222184 | Buck et al. | Oct 2006 | A1 |
20060271370 | Li | Nov 2006 | A1 |
20070010291 | Deng | Jan 2007 | A1 |
20070230712 | Belt et al. | Oct 2007 | A1 |
20080004881 | Attwater et al. | Jan 2008 | A1 |
20080077399 | Yoshida | Mar 2008 | A1 |
20080107280 | Haulick et al. | May 2008 | A1 |
20090112579 | Li et al. | Apr 2009 | A1 |
20090119096 | Gerl et al. | May 2009 | A1 |
20100189275 | Christoph | Jul 2010 | A1 |
20100250242 | Li | Sep 2010 | A1 |
20120259626 | Li et al. | Oct 2012 | A1 |
Number | Date | Country |
---|---|---|
101350108 | Jan 2009 | CN |
102035562 | Apr 2011 | CN |
101 56 954 | Jun 2003 | DE |
10 2005 002 865 | Jun 2006 | DE |
0 053 584 | Jun 1982 | EP |
0 856 834 | Aug 1998 | EP |
1 083 543 | Mar 2001 | EP |
1 116 961 | Jul 2001 | EP |
1 343 351 | Sep 2003 | EP |
EP 1 850 328 | Oct 2007 | EP |
2 058 803 | May 2009 | EP |
2 107 553 | Oct 2009 | EP |
2 148 325 | Jan 2010 | EP |
2 097 121 | Oct 1982 | GB |
WO 9418666 | Aug 1994 | WO |
WO 0232356 | Apr 2002 | WO |
WO 03107327 | Dec 2003 | WO |
WO 2004100602 | Nov 2004 | WO |
WO 2006117032 | Nov 2006 | WO |
WO 2011119168 | Sep 2011 | WO |
Entry |
---|
International Search Report, PCT/US2012/62549, date of mailing Jan. 4, 2013, 2 pages. |
Written Opinion of the International Searching Authority, PCT/US2012/62549, date of mailing Jan. 4, 2013, 4 pages. |
International Preliminary Report on Patentability, PCT/US2012/062549, date of issuance May 5, 2015, 6 pages. |
Office Action dated Jun. 14, 2013; for U.S. Appl. No. 12/254,488; 22 pages. |
Response filed on Dec. 3, 2013; to Office Action dated Jun. 14, 2013; for U.S. Appl. No. 12/254,488; 11 pages. |
Notice of Allowance dated Dec. 23, 2013; for U.S. Appl. No. 12/254,488; 11 pages. |
European Search Report dated Apr. 24, 2008; for European Pat. App. No. EP 07 02 1121; 1 page. |
European Application No. 12878823.9 Extended Search Report dated Jul. 20, 2016, 16 pages. |
Sang-Mun Chi et al: “Lombard effect compensation and noise suppression for noisy Lombard speech recognition”, Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on Philadelphia, PA, USA Oct. 3-6, 1996, New York, NY, USA, IEEE, US, vol. 4, Oct. 3, 1996, pp. 2013-2016, XP010238177, DOI: 10.1109/ICSLP.1996.607193. ISBN: 978-0-7803-3555-4, 4 pages. |
Jung et al: “On the Lombard Effect Induced by Vehicle Interior Driving Noises, Regarding Sound Pressure Level and Long-Term Average Speech Spectrum”, Acustica United With Acta Acustica, S. Hirzel Verlag, Stuttgart, DE, vol. 98, Mar. 1, 2012, pp. 334-341, XP008178809, ISSN: 1610-1928, DOI: 10.3813/AAA.918517. 8 pages. |
Schmidt G et al: “Signal processing for in-car communication systems”, Signal Processing, Elsevier Science Publishers B.V. Amsterdam, NL, vol. 86, No. 6, Jun. 1, 2006, pp. 1307-1326, XP024997680, ISSN: 0165-1684, DOI: 10.1016/J.SIGPRO.2005.07.040. 20 pages. |
Alfonso Ortega et al: “Cabin car communication system to improve communications inside a car”, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. (ICASSP). Orlando, FL, May 13-17, 2002; [IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)], New York, NY: IEEE, US, May 13, 2002, pp. IV-3836, XP032015678, DOI: 10.1109/ICASSP.2002, 5745493, ISBN: 978-0-7803-7402-7. 4 pages. |
U.S. Appl. No. 14/406,628 Response to Office Action filed Jun. 30, 2016, 13 pages. |
U.S. Appl. No. 14/406,628 Notice of Allowance dated Aug. 15, 2016, 12 pages. |
U.S. Appl. No. 14/423,543 Final Office Action dated Aug. 29, 2016, 16 pages. |
Chinese Office Action (with English translation) dated Aug. 10, 2016; for Chinese Pat. App. No. 201280074944.2; 22 pages. |
Arslan et al. “New Methods for Adaptive Noise Suppression,” IEEE, vol. 1, May 1995, 4 pages. |
Ljolje et al. “Discriminative Training of Multi-Stage Barge-in Models,” IEEE, Dec. 1, 2007, 6 pages. |
Setlur et al. “Recognition-based Word Counting for Reliable Barge-In and Early Endpoint Detection in Continuous Speech Recognition,” International Conference on spoken Language Processing, Oct. 1, 1998, 4 pages. |
Ittycheriah et al. “Detecting User Speech in Barge-in Over Prompts Using Speaker Identification Methods,” Eurospeech 99, Sep. 5, 1999, 4 pages. |
Rose et al. “A Hybrid Barge-In Procedure for More Reliable Turn-Taking in Human-Machine Dialog Systems,” 5th International Conference on Spoken Language Processing, Oct. 1, 1998, 6 pages. |
Decision to grant dated Feb. 28, 2014 for European Application No. 08013196.4; 52 pages. |
Supplemental Decision to grant dated Jun. 5, 2014 for European Application No. 08013196.4; 43 pages. |
Office Action dated Apr. 1, 2013 for U.S. Appl. No. 12/507,444, 17 pages. |
Response to Office Action dated Aug. 1, 2013 U.S. Appl. No. 12/507,444, 16 pages. |
Final Office Action dated Nov. 15, 2013 for U.S. Appl. No. 12/507,444, 19 pages. |
Hansler et al. “Acoustic Echo and Noise Control: A Practical Approach”, John Wiley & Sons, New York, New York, USA, Copy right 2004, Part 1, 250 pages. |
Hansler et al. “Acoustic Echo and Noise Control: A Practical Approach”, John Wiley & Sons, New York, New York, USA, Copy right 2004, Part 2, 221 pages. |
Office Action dated Jun. 14, 2013 for U.S Appl. No. 12/254,488; 22 pages. |
Response to Office Action dated Dec. 3, 2013 for U.S Appl. No. 12/254,488; 12 pages. |
Notice of Allowance dated Dec. 23, 2013 for U.S. Appl. No. 12/254,488; 11 pages. |
European Search Report Apr. 24, 2008 for European Application No. 07021121.4, 3 pages. |
European Search Report dated Jun. 14, 2011 for European Application No. 7021932.4, 2 pages. |
Decision to Grant dated Dec. 5, 2013 for European Application No. 7021932.4, 1 page. |
Richardson et al. “LPC-Synthesis Mixture: A Low Computational Cost Speech Enhancement Algorithm”, Proceedings of the IEEE, Apr. 11, 1996, 4 pages. |
International Preliminary Report on Patentability dated Nov. 11, 2005 for PCT Application No. PCT/EP2004/004980; 8 pages. |
Written Opinion dated Nov. 4, 2004 for PCT Application No. PCT/EP2004/004980; 7 pages. |
Search Report dated Aug. 11, 2004 for PCT Application No. PCT/EP2004/004980; 3 pages. |
Office Action dated Nov. 28, 2007 for U.S. Appl. No. 10/556,232; 17 pages. |
Response to Office Action files Mar. 28, 2008 for U.S. Appl. No. 10/556,232; 11 pages. |
Office Action dated May 29, 2008 for U.S. Appl. No. 10/556,232; 10 pages. |
Response to Office Action files Aug. 29, 2008 for U.S. Appl. No. 10/556,232; 9 pages. |
Office Action dated Dec. 9, 2008 for U.S. Appl. No. 10/556,232; 17 pages. |
Response to Office Action files Mar. 9, 2009 for U.S. Appl. No. 10/556,232; 13 pages. |
Office Action dated May 13, 2009 for U.S. Appl. No. 10/556,232; 17 pages. |
Response to Office Action files May 29, 2009 for U.S. Appl. No. 10/556,232; 6 pages. |
Notice of Allowance dated Aug. 26, 2009 for U.S. Appl. No. 10/556,232; 7 pages. |
Notice of Allowance dated Jan. 15, 2014 for U.S. Appl. No. 11/924,987; 7 pages. |
Office Action dated Jan. 7, 2014 for U.S. Appl. No. 13/518,406; 10 pages. |
Response to Office Action filed May 5, 2014 for U.S. Appl. No. 13/518,406; 8 pages. |
Final Office Action dated Jun. 10, 2014 for U.S. Appl. No. 13/518,406; 10 pages. |
Response to Final Office Action filed Nov. 13, 2014 for U.S. Appl. No. 13/518,406; 11 pages. |
Office Action dated Nov. 26, 2014 for U.S. Appl. No. 13/518,406; 6 pages. |
Response to Office Action filed Feb. 17, 2015 for U.S. Appl. No. 13/518,406; 9 pages. |
Notice of Allowance dated Mar. 10, 2015 for U.S. Appl. No. 13/518,406; 7 pages. |
European Office Action dated Oct. 16, 2014 for European Application No. 10716929.4; 5 pages. |
Decision to grant dated Jan. 18, 2016 for European Application No. 10716929.4; 24 pages. |
Response to Written Opinion filed Jan. 9, 2015 for European Application No. 10716929.4; 9 pages. |
International Preliminary Report on Patentability dated Oct. 2, 2012 for PCT Application No. PCT/US2010/028825; 8 pages. |
Search Report dated Dec. 28, 2010 for PCT Application No. PCT/US2010/028825; 4 pages. |
Written Opinion 2010 dated Dec. 28, 2010 for PCT Application No. PCT/US2010/028825; 7 pages. |
U.S. Appl. No. 11/928,251. |
U.S. Appl. No. 12/507,444. |
U.S. Appl. No. 12/254,488. |
U.S. Appl. No. 12/269,605. |
U.S. Appl. No. 13/273,890. |
U.S. Appl. No. 14/254,007. |
U.S. Appl. No. 10/556,232. |
U.S. Appl. No. 13/518,406. |
U.S. Appl. No. 14/406,628. |
U.S. Appl. No. 14/423,543. |
Amendment filed on Nov. 14, 2016 to the Final Office Action dated Aug. 29, 2016; for U.S. Appl. No. 14/423,543, 8 pages. |
RCE filed on Nov. 14, 2016 to the Final Office Action dated Aug. 29, 2016; for U.S. Appl. No. 14/423,543, 3 pages. |
Chinese Office Action with English translation dated Nov. 16, 2016; for Chinese Pat. App. No. 201280076334.6; 13 pages. |
Chinese Response with English claims filed Dec. 26, 2016 to Office Action dated Aug. 10, 2016; for Chinese Pat. App. No. 201280074944.2; 20 pages. |
European Response (with Amended Claims and Replacement Specification Page) to European Office Action dated Aug. 5, 2016; Response filed on Jan. 25, 2017 for European Application No. 12878823.9; 10 Pages. |
Number | Date | Country | |
---|---|---|---|
20150255083 A1 | Sep 2015 | US |