1. Technical Field
This disclosure relates to a speech processes, and more particularly to a process that improves intelligibility and speech quality.
2. Related Art
Processing speech in a vehicle is challenging. Systems may be susceptible to environmental noise and vehicle interference. Some sounds heard in vehicles may combine with noise and other interference to reduce speech intelligibility and quality.
Some systems suppress a fixed amount of noise across large frequency bands. In noisy environments, high levels of residual noise may remain in the lower frequencies as often in-car noises are more severe in lower frequencies than in higher frequencies. The residual noise may degrade the speech quality and intelligibility.
In some situations, systems may attenuate or eliminate large portions of speech while suppressing noise making voiced segments unintelligible. There is a need for a speech reconstruction system that is accurate, has minimal latency, and reconstructs speech across a perceptible frequency band.
A system improves speech intelligibility by reconstructing speech segments. The system includes a low-frequency reconstruction controller programmed to select a predetermined portion of a time domain signal. The low-frequency reconstruction controller substantially blocks signals above and below the selected predetermined portion. A harmonic generator generates low-frequency harmonics in the time domain that lie within a frequency range controlled by a background noise modeler. A gain controller adjusts the low-frequency harmonics to substantially match the signal strength to the time domain original input signal.
Other systems, methods, features, and advantages will be, or will become, apparent to one with skill in the art upon examination of the following figures and detailed description. It is intended that all such additional systems, methods, features, and advantages be included within this description, be within the scope of the invention, and be protected by the following claims.
The system may be better understood with reference to the following drawings and description. The components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention. Moreover, in the figures, like referenced numerals designate corresponding parts throughout the different views.
Hands-free systems, communication devices, and phones in vehicles or enclosures are susceptible to noise. The spatial, linear, and non-linear properties of noise may suppress or distort speech. A speech reconstruction system improves speech quality and intelligibility by dynamically generating sounds that may otherwise be masked by noise. A speech reconstruction system may produce voice segments by generating harmonics in select frequency ranges or bands. The system may improve speech intelligibility in vehicles or systems that transport persons or things.
A portion of the amplitude adjusted signal is selected at 318. The selection may occur through a dynamic process that allows substantially all frequencies below a threshold to pass to an output while substantially blocking or substantially attenuating signals that occur above the threshold. In one process, the selection process may be based on multiple (e.g., two, three, or more) linear models that model a background noise or any other noise.
One exemplary process digitizes an input speech signal (optional if received as a digital signal). The input may be converted to frequency domain by means of a Short-Time Fourier Transform (STFT) that separates the digitized signals into frequency bins.
The background noise power in the signal may be estimated at an nth frame at 310. The background noise power of each frame Bn, may be converted into the dB domain as described by equation 1.
φn=10 log10Bn (1)
The dB power spectrum may be divided into a low frequency portion and a high frequency portion at 312. The division may occur at a predetermined frequency fo such as a cutoff frequency, which may separate multiple linear regression models at 314 and 316. An exemplary process may apply two substantially linear models or the linear regression models described by equations 2 and 3.
YL=aLXL+bL (2)
YH=aHXH+bH (3)
In equations 2 and 3, X is the frequency, Y is the dB power of the background noise, aL, aH are the slopes of the low and high frequency portion of the dB noise power spectrum, bL, bH are the intercepts of the two lines when the frequency is set to zero.
Based on the difference between the intercepts of the low and high frequency portions of the dB, the scalar coefficients (e.g., m1(k), m2(k), mL(k)) of the transfer function of an exemplary dynamic selection process 318 may be determined by equations 4 and 5.
mi(k)=fi(b) (4)
In this process, b is the dynamic noise level expressed as equation 5 and
b=bL−bh (5)
bL, bH are the intercepts of the two linear models (equations 2 and 3) which model the background noise in low and high frequency ranges.
h(k)=m1(k)h1+m2(k)h2+ . . . +mL(k)hL (6)
In equation 6, h(k) is the updated filter coefficients vector, h1, h2, . . . , hL that may comprise the L basis filter coefficient vectors. In an exemplary application having three filter coefficient vectors, m1h1, m2h2, and m3h3, may have a maximally flat or monotonic passbands and a smooth roll offs, respectively, as shown in
An optional signal combination process 320 may combine the output of the signal selection process 318 with the input signal received. In some processes a perceptual weighting process combines the output of the signal selection process with the input signal. The perceptual weighting process may emphasize the harmonics structure of the speech signal and/or modeled harmonics allowing the noise or discontinuities that lie between the harmonics to become less audible.
The methods and descriptions of
A computer-readable medium, machine-readable medium, propagated-signal medium, and/or signal-bearing medium may comprise any medium that contains, stores, communicates, propagates, or transports software for use by or in connection with an instruction executable system, apparatus, or device. The machine-readable medium may selectively be, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. A non-exhaustive list of examples of a machine-readable medium would include: an electrical or tangible connection having one or more wires, a portable magnetic or optical disk, a volatile memory such as a Random Access Memory “RAM” (electronic), a Read-Only Memory “ROM,” an Erasable Programmable Read-Only Memory (EPROM or Flash memory), or an optical fiber. A machine-readable medium may also include a tangible medium upon which software is printed, as the software may be electronically stored as an image or in another format (e.g., through an optical scan), then compiled by a controller, and/or interpreted or otherwise processed. The processed medium may then be stored in a local or remote computer and/or machine memory.
When implemented through multiple filters, a highpass and a lowpass filter, for example, the high-pass filter may have a cutoff frequency at around 1200 Hz and the lowpass filter may have cutoff frequency at around 3000 Hz. The filters may comprise finite impulse response filters (FIR filter) and/or an infinite impulse response filters (IIR filter). To maintain a frequency response that is as flat as possible in the passbands (having a maximally flat or monotonic magnitude) and rolls off smoothly the filters may be implemented as a second order Butterworth filter having responses expressed as equations 7 and 8.
The filters' coefficients may comprise aH0=0.5050; aH1=−1.0100; aH2=0.5050; bH1=−0.7478; and bH2=0.2722. aL0=0.5690; aL1=1.1381; aL2=0.5690; bL1=0.9428; and bL2=0.3333
A nonlinear transformation controller 506 may reconstruct speech by generating harmonics in the time domain. The nonlinear transformation controller 506 may generate harmonics through one, two, or more functions, including, for example, through a full-wave rectification function, half-wave rectification function, square function, and/or other nonlinear functions. Some exemplary functions are expressed in equations 9, 10, and 11.
The amplitudes of the harmonics may be adjusted by a gain control 508 and multiplier 510. The gain may be determined by a ratio of energies measured or estimated in the original speech signal (S) and the reconstructed signal (R) as expressed by equation 12.
A perceptual filter processes the output of the multiplier 510. The filter selectively passes certain portions of the adjusted output while minimizing or dampening the remaining portions. In some systems, a dynamic filter selects signals by dynamically varying gain and/or cutoff limits or characteristics based on the strength of a detected background noise or an estimated noise in time. The gain and cutoff frequency or frequencies may vary according to the amount of dynamic noise detected or estimated in the speech signal.
In
h(k)=m1(k)h1+m2(k)h2+ . . . +mL(k)hL (6)
h(k) is the updated filter coefficients vector, h1, h2, . . . , hL. The filter coefficient may be updated on a temporal basis or by iteration of some or every speech segment using an exemplary dynamic noise function ƒi(.). The dynamic noise function may be described by equation 4.
mi(k)=fi(b) (4)
In equation 4, b comprises a dynamic noise level expressed by equation 5.
b=bL−bh (5)
In this example, bL, bH comprise the dynamic noise levels or intercepts of multiple linear models that describe the background noise in low and high aural frequency ranges. In this relationship, the more dynamic noise levels or intercepts differ, the larger the bandwidth and amplitude response of the filter. When the differences in the dynamic noise levels or intercepts are small, the bandwidth and amplitude response of the low-pass filter is small.
The linear models may be approximated in the decibel power domain. A spectral converter 514 may convert the time domain speech signal into the frequency domain. A background noise estimator 516 measures or estimates the continuous or ambient noise that may accompany the speech signal. The background noise estimator 516 may comprise a power detector that averages the acoustic when little or no speech is detected. To prevent biased noise estimations during transients, a transient detector (not shown) may disable the background noise estimator during abnormal or unpredictable increases in power in some alternate systems.
A spectral separator 518 may divide the estimated noise power spectrum into multiple sub-bands including a low frequency and middle frequency band and a high frequency band. The division may occur at a predetermined frequency or frequencies such as at designated cutoff frequency or frequencies.
To determine the required signal reconstruction, a modeler 520 may fit separate lines to selected portions of the noise power spectrum. For example, the modeler 520 may fit a line to a portion of the low and/or medium frequency spectrum and may fit a separate line to a portion of the high frequency portion of the spectrum. Using linear regression logic, a best-fit line may model the severity of a vehicle noise in two or more portions of the spectrum.
In an exemplary application have three filter-coefficient vectors, h1, h2, . . . , h3, the filter-coefficients vectors may have amplitude responses of
Here the thresholds t1, t2, and t3 may be estimated empirically and may lie within the range 0<t1<t2<t3<1.
A portion of the amplitude adjusted signal is selected by a speech reconstruction filter 708. The speech reconstruction filter 708 may allow substantially all frequencies below a threshold to pass through while substantially blocking or substantially attenuating signals above a variable threshold. A perceptual filter 710 combines the output of the reconstruction filter 708 with the input speech signal filter 702.
The speech reconstruction system improves speech intelligibility and/or speech quality. The reconstruction may occur in real-time (or after a delay depending on an application or desired result) based on signals received from an input device such as a vehicle microphone, speaker, piezoelectric element or voice activity detector, for example. The system may interface additional compensation devices and may communicate with system that suppresses specific noises, such as for example, wind noise from a voiced or unvoiced signal (e.g., speech) such as the system described in U.S. patent application Ser. No. 10/688,802, entitled “System for Suppressing Wind Noise” filed on Oct. 16, 2003, or background noise from a voiced or unvoiced signal (e.g., speech) such as the system described in U.S. application Ser. No. 11/923,358, entitled “Dynamic Noise Reduction” filed Oct. 24, 2007, which is incorporated by reference.
The system may dynamically reconstruct speech in a signal detected in an enclosure or an automobile. In an alternate system, aural signals may be selected by a dynamic filter and the harmonics may be generated by a harmonic processor (e.g., programmed to process a non-linear function). Signal power may be measured by a power processor and the level of background nose measured or estimated by a background noise processor. Based on the output of the background noise processor multiple linear relationships of the background noise may be modeled by a linear model processor. Harmonic gain may be rendered by a controller, an amplifier, or a programmable filter. In some systems the programmable filter, signal processor, or dynamic filter may select or filter the output to reconstruct speech.
Other alternate speech reconstruction systems include combinations of some or all of the structure and functions described above or shown in one or more or each of the Figures. These speech reconstruction systems are formed from any combination of structure and function described or illustrated within the figures. The logic may be implemented in software or hardware. The hardware may be implemented through a processor or a controller accessing a local or remote volatile and/or non-volatile memory that interfaces peripheral devices or the memory through a wireless or a tangible medium. In a high noise or a low noise condition, the spectrum of the original signal may be reconstructed so that intelligibility and signal quality is improved or reaches a predetermined threshold.
While various embodiments of the invention have been described, it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible within the scope of the invention. Accordingly, the invention is not to be restricted except in light of the attached claims and their equivalents.
This application is a continuation-in-part of U.S. application Ser. No. 11/923,358, entitled “Dynamic Noise Reduction” filed Oct. 24, 2007, which is incorporated by reference.
Number | Name | Date | Kind |
---|---|---|---|
4853963 | Bloy et al. | Aug 1989 | A |
5406635 | Jarvinen | Apr 1995 | A |
5408580 | Stautner et al. | Apr 1995 | A |
5414796 | Jacobs et al. | May 1995 | A |
5493616 | Iidaka et al. | Feb 1996 | A |
5499301 | Sudo et al. | Mar 1996 | A |
5524057 | Akiho et al. | Jun 1996 | A |
5692052 | Tanaka et al. | Nov 1997 | A |
5701393 | Smith et al. | Dec 1997 | A |
5978783 | Meyers et al. | Nov 1999 | A |
5978824 | Ikeda | Nov 1999 | A |
6044068 | El Malki | Mar 2000 | A |
6144937 | Ali | Nov 2000 | A |
6163608 | Romesburg et al. | Dec 2000 | A |
6263307 | Arslan et al. | Jul 2001 | B1 |
6336092 | Gibson et al. | Jan 2002 | B1 |
6493338 | Preston et al. | Dec 2002 | B1 |
6570444 | Wright | May 2003 | B2 |
6690681 | Preston et al. | Feb 2004 | B1 |
6741874 | Novorita et al. | May 2004 | B1 |
6771629 | Preston et al. | Aug 2004 | B1 |
6862558 | Huang | Mar 2005 | B2 |
6963649 | Vaudrey et al. | Nov 2005 | B2 |
7072831 | Etter | Jul 2006 | B1 |
7142533 | Ghobrial et al. | Nov 2006 | B2 |
7146324 | Den Brinker et al. | Dec 2006 | B2 |
7366161 | Mitchell et al. | Apr 2008 | B2 |
7580893 | Suzuki | Aug 2009 | B1 |
7716046 | Nongpiur et al. | May 2010 | B2 |
7773760 | Sakamoto et al. | Aug 2010 | B2 |
7792680 | Iser et al. | Sep 2010 | B2 |
8015002 | Li et al. | Sep 2011 | B2 |
20010006511 | Matt | Jul 2001 | A1 |
20010018650 | DeJaco | Aug 2001 | A1 |
20010054974 | Wright | Dec 2001 | A1 |
20030050767 | Bar-Or | Mar 2003 | A1 |
20030055646 | Yoshioka et al. | Mar 2003 | A1 |
20040066940 | Amir | Apr 2004 | A1 |
20040153313 | Aubauer et al. | Aug 2004 | A1 |
20040167777 | Hetherington et al. | Aug 2004 | A1 |
20050065792 | Gao | Mar 2005 | A1 |
20050119882 | Bou-Ghazale | Jun 2005 | A1 |
20060100868 | Hetherington et al. | May 2006 | A1 |
20060136203 | Ichikawa | Jun 2006 | A1 |
20060142999 | Takada et al. | Jun 2006 | A1 |
20060293016 | Giesbrecht et al. | Dec 2006 | A1 |
20070025281 | McFarland et al. | Feb 2007 | A1 |
20070058822 | Ozawa | Mar 2007 | A1 |
20070185711 | Jang et al. | Aug 2007 | A1 |
20070237271 | Pessoa et al. | Oct 2007 | A1 |
20080077399 | Yoshida | Mar 2008 | A1 |
20080120117 | Choo et al. | May 2008 | A1 |
20080262849 | Buck et al. | Oct 2008 | A1 |
20090112579 | Li et al. | Apr 2009 | A1 |
20090112584 | Li et al. | Apr 2009 | A1 |
20090216527 | Oshikiri | Aug 2009 | A1 |
Number | Date | Country |
---|---|---|
1 450 354 | Aug 2004 | EP |
2000-347688 | Dec 2000 | JP |
2002-171225 | Jun 2002 | JP |
2002-221988 | Aug 2002 | JP |
2004-254322 | Sep 2004 | JP |
WO 0173760 | Oct 2001 | WO |
Entry |
---|
Martinez et al. “Combination of adaptive filtering and spectral subtraction for noise removal”, Circuits and Systems, 2001. ISCAS 2001, pp. 793-796 vol. 2. |
Ephraim, Yariv et al., “Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-32, No. 6, Dec. 1984, pp. 1109-1121. |
Ephraim, Y. et al., “Speech Enhancement Using a Minimum Mean-Square Error Log-Spectral Amplitude Estimator,” IEEE Transactions on Acoustic, Speech, and Signal Processing, vol. ASSP-33, No. 2, Apr. 1985, pp. 443-445. |
Linhard, Klaus et al., “Spectral Noise Subtraction with Recursive Gain Curves,” Daimler Benz AG, Research and Technology, Jan. 9, 1998, 4 pages. |
Extended European Search Report dated Jan. 23, 2012 for corresponding European Application No. 08018600.0, 11 pages. |
Office Action dated Apr. 10, 2012 for corresponding Japanese Patent Application No. 2008-273648, 10 pages. |
Number | Date | Country | |
---|---|---|---|
20090112579 A1 | Apr 2009 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 11923358 | Oct 2007 | US |
Child | 12126682 | US |