1. Technical Field
This invention relates to signal processing systems, and more particularly to systems that estimate pitch.
2. Related Art
Some audio processing systems capture sound, reproduce sound, and convey sound to other devices. In some environments, unwanted components may reduce the clarity of a speech signal. Wind, engine noise and other background noises may obscure the signal. As the noise increases, the intelligibility of the speech may decrease.
Many speech signals may be classified into voiced and unvoiced. In the time domain, unvoiced segments display a noise like structure. Little or no periodicity may be apparent. In the speech spectrum, voiced speech segments have almost a periodic structure.
Some natural speech has a combination of a harmonic spectrum and a noise spectrum. A mixture of harmonics and noise may appear across a large bandwidth. Non-stationary and/or varying levels of noise may be highly objectionable especially when the noise masks voiced segments and non-speech intervals. While the spectral characteristics of non-stationary noise may not vary greatly, its amplitude may vary drastically.
To facilitate reconstruction of a speech signal having voiced and unvoiced segments, it may be necessary to estimate the pitch of the signal during the voiced speech. Accurate pitch estimations may improve the perceptual quality of a processed speech segment. Therefore, there is a need for a system that facilitates the extraction of pitch from a speech signal.
A system extracts pitch from a speech signal. The system estimates the pitch in voiced speech by enhancing the signal by deriving adaptive filter coefficients, and estimating pitch using the derived coefficients.
Other systems, methods, features and advantages of the invention will be, or will become, apparent to one with skill in the art upon examination of the following figures and detailed description. It is intended that all such additional systems, methods, features and advantages be included within this description, be within the scope of the invention, and be protected by the following claims.
The invention can be better understood with reference to the following drawings and description. The components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention. Moreover, in the figures, like referenced numerals designate corresponding parts throughout the different views.
Enhancement logic improves the perceptual quality of a processed speech signal. The logic may automatically identify and enhance speech segments. Selected voiced and/or unvoiced segments may be processed and amplified in one or more frequency bands. To improve perceptual quality, the pitch of the signal is estimated. The versatility of the system allows the enhancement logic to enhance speech before it is passed or processed by a second system. In some applications, speech or other audio signals may be passed to remote, local, or mobile system such as an automatic speech recognition engine that may capture and extract voice in the time and/or frequency domains.
The enhancement systems may interface or comprise a unitary part of a vehicle or a communication system (e.g., a wireless telephone, an automatic speech recognition system, etc). The systems may include preprocessing logic and/or post-processing logic and may be implemented in hardware and/or software. In some systems, software is processed by a digital signal processor (DSP), general purpose processor (GPP), or some combination of DSP and GPP. The DSP may execute instructions that delay an input signal, track frequency components of a signal, filter a signal, and/or reinforce selected spectral content. In other systems, the hardware or software may be programmed or implemented in discrete logic or circuitry, a combination of discrete and integrated logic or circuitry, and/or may be distributed across and executed by multiple controllers or processors.
The system for estimating the pitch in a speech signal approximates the position of peaks, k, of the signal. The pitch may be estimated by equation 1:
fp=fs/(D+k) Equation 1
where fp is the estimated pitch, fs is the sampling frequency, the signal has been delayed by D samples before passing through an adaptive filter, and k is a peak position of the adaptive filter coefficients.
In
The adaptive filter 108 passes output signal “y(n)”. The adaptive filter 108 may track one or more frequency components of the input signal based on the delayed input signal. The filter 108 tracks the fundamental frequencies of the input signal as the pitch changes during voiced speech. The filter 108 may comprise a Finite Impulse Response Filter (FIR) adapted by a Normalized Least Mean Squares (NLMS) technique or other adaptive filtering technique such as Recursive Least Squares (RLS) or Proportional NLMS.
In some enhancement systems the adaptive filter 108 changes or adapts its coefficients to match or approximate the response of the input signal “x(n)”. Using an adaptive filtering algorithm, the error signal “e(n)” is derived through adder logic or an adder circuit 110 (e.g., a vector adder) that subtracts the input signal “x(n)” from the adapted predicted output vector “y(n)”. As shown in equation 2:
vector e(n)=vector y(n)−x(n) Equation 2
Using this measure, the adaptive filter 108 changes its coefficients in attempt to reduce the difference between the adapted predicted output vector “y(n)” and the discrete input signal “x(n).”
The adaptive filter output 108, “y(n)”, is processed by weighting logic or a weighting circuit 112 to yield a scalar output. In
When a speech signal “x(n)” is enhanced, the filter coefficients of adaptive filter 108 approximate the autocorrelation values of the speech signal. Therefore, these filter coefficients obtained by adaptive filter 108 may be used to estimate the pitch in voiced speech.
In
The periodic components of input signal “x(n)” form substantially organized regions on the plot. The periodic components of the signal contribute to larger peak values in the filter coefficients of the adaptive filter, while the background noise and non-periodic components of speech do not contribute to large filter coefficients values. By using
When the input signal “x(n)” is periodic, the filter coefficient values, which are similar to the autocorrelation function values, may be used to calculate the period of this signal. As expressed in equation 1, the pitch of signal “x(n)” may be approximated by the inverse of the position of the peak filter coefficient. The position of the filter coefficients is analogous to the lag of the autocorrelation function.
For clean speech with relatively low noise, the position of the peak of the filter coefficient values may yield the lag. Taking the inverse of the lag, the pitch frequency may be obtained. For example, in
The adaptive filter coefficients shown in
In
In operation, signal “x(n)” is passed through low-pass filter 502 before passing to adaptive filter 108. Digital delay unit 104 couples the input signal “x(n)” to the low-pass filter 502 and a programmable filter 506 that may have a single input and multiple outputs. While the system encompasses many techniques for choosing the coefficients of the programmable filter 506, in
In
One technique for improving the convergence rate of the adaptive filter in such conditions is to spectrally flatten the input signal before passing it to the adaptive filter. In
In
A leaky average may be used to reduce the adverse impact of tonal noise on the filter coefficients. The leaky average of the adaptive filter coefficients may be approximated by equation 3:
y(n)=(1−α)y(n−1)+αh(n) Equation 3
where y(n) is the leaky average vector of the filter coefficients, h(n) is the input filter coefficient vector, and α is the leakage factor. By taking a leaky average of the adaptive filter coefficients, tonal noise present in the input signal may be substantially captured. The leaky average of the filter coefficients may then be subtracted from the substantially instantaneous estimate of the adaptive filter coefficients to substantially remove the effect of tonal noise from the estimated adaptive filter coefficients. The leaky average vector obtained from equation 3 corresponds to constant, unwanted tonal noise. Such values may be subtracted from the instantaneous estimate of the adaptive filter coefficients. This subtraction provides for revised filter coefficients that have substantially reduced the effect of tonal noise.
In
In
In
In
The delayed signal may be passed through a low-pass filter (Act 1306), or may be passed through spectral modification logic (Act 1308). The spectral modification logic substantially flattens the spectral character of all or a portion of the background noise before it is filtered by one or more (e.g., multistage) filters (e.g., a low pass filter, high pass filter, band pass filter, and/or spectral mask) at optional Act 1308. In some methods, the frequency and amplitude of the background noise is detected during talk spurts and pauses and may be modeled by a linear predictive coding filter. In these and other methods, some or all of the background noise is substantially flattened, and in other systems some or all of the background noise is dampened. The noise may be dampened to a comfort noise level, noise floor, or a predetermined level that a user expects to hear.
An adaptive filter such as a moving average filter, nonrecursive discrete-time filter, or adaptive FIR filter may model a portion of the speech spectrum with the flattened or dampened noise spectrum at Act 1310. In some enhancement systems, the adaptive filter changes or adapts its coefficients to match or approximate the input signal “x(n)” at discrete points in time. Using an adaptive filtering algorithm, the error signal “e(n)” is derived to through adder logic or an adder circuit (e.g., a vector adder) that subtracts the input signal “x(n)” from the adapted predicted output vector “y(n)”, as shown in equation 2 above.
In
At Act 1316, portions of the delayed input “x(n−D)” are processed by the programmed filter to yield a predictive output vector “ŷ(n)”. The predictive output vector “ŷ(n)” is then processed by weighting logic or a weighting circuit to yield a scalar output at Act 1318. In
The systems provide improved pitch estimation based on the calculated adaptive filter coefficients. The accuracy of the pitch estimate may vary with the pitch value. The accuracy of the pitch estimate from the filter coefficients may be expressed as:
Δfs=(fp)2/fs Equation 4
where “Δ fs” is the pitch tolerance range, fp is the estimated pitch, and fs is the sampling frequency.
Each of the systems and methods described above may be encoded in a signal bearing medium, a computer readable medium such as a memory, programmed within a device such as one or more integrated circuits, or processed by a controller, a digital signal processor and/or a general purpose processor (GPP). If the methods are performed by software, the software may reside in a memory resident to or interfaced to the spectral modification logic 602, adaptive filter 108, programmed filter 506 or any other type of non-volatile or volatile memory interfaced, or resident to the elements or logic that comprise the enhancement system. The memory may include an ordered listing of executable instructions for implementing logical functions. A logical function may be implemented through digital circuitry, through source code, through analog circuitry, or through an analog source such through an analog electrical, or optical signal. The software may be embodied in any computer-readable or signal-bearing medium, for use by, or in connection with an instruction executable system, apparatus, or device. Such a system may include a computer-based system, a processor-containing system, or another system that may selectively fetch instructions from an instruction executable system, apparatus, or device that may also execute instructions.
A “computer-readable medium,” “machine-readable medium,” “propagated-signal” medium, and/or “signal-bearing medium” may comprise any apparatus that contains, stores, communicates, propagates, or transports software for use by or in connection with an instruction executable system, apparatus, or device. The machine-readable medium may selectively be, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. A non-exhaustive list of examples of a machine-readable medium would include: an electrical connection “electronic” having one or more wires, a portable magnetic or optical disk, a volatile memory such as a Random Access Memory “RAM” (electronic), a Read-Only Memory “ROM” (electronic), an Erasable Programmable Read-Only Memory (EPROM or Flash memory) (electronic), or an optical fiber (optical). A machine-readable medium may also include a tangible medium upon which software is printed, as the software may be electronically stored as an image or in another format (e.g., through an optical scan), then compiled, and/or interpreted or otherwise processed. The processed medium may then be stored in a computer and/or machine memory.
The enhancement system may be modified or adapted to any technology or devices. The above described enhancement systems may couple or interface remote or local automatic speech recognition “ASR” engines. The ASR engines may be embodied in instruments that convert voice and other sounds into a form that may be transmitted to remote locations, such as landline and wireless communication devices (including wireless protocols such as those described in this disclosure) that may include telephones and audio equipment and that may be in a device or structure that transports persons or things (e.g., a vehicle) or stand alone within the devices. Similarly, the enhancement may be embodied in a vehicle with ASR or without ASR.
The ASR engines may be embodied in telephone logic that in some devices are a unitary part of vehicle control system or interface a vehicle control system. The enhancement system may couple pre-processing and post-processing logic, such as that described in U.S. application Ser. No. 10/973,575 “Periodic Signal Enhancement System,” filed Oct. 26, 2004, which is incorporated herein by reference. Similarly, all or some of the delay unit, adaptive filter, vector adder, and scalar adder may be modified or replaced by the enhancement system or logic described U.S. application Ser. No. 10/973,575.
The speech enhancement system is also adaptable and may interface systems that detect and/or monitor sound wirelessly or through electrical or optical devices or methods. When certain sounds or interference are detected, the system may enable the enhancement system to prevent the amplification or gain adjustment of these sounds or interference. Through a bus, such as communication bus, a noise detector may send a notice such as an interrupt (hardware of software interrupt) or a message to prevent the enhancement of these sounds or interferences while enhancing some or all of the speech signal. In these applications, the enhancement logic may interface or be incorporated within one or more circuits, logic, systems or methods described in “Method for Suppressing Wind Noise,” U.S. Ser. Nos. 10/410,736 and 10/688,802; and “System for Suppressing Rain Noise,” U.S. Ser. No. 11/006,935, each of which is incorporated herein by reference.
While various embodiments of the invention have been described, it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible within the scope of the invention. Accordingly, the invention is not to be restricted except in light of the attached claims and their equivalents.
This application is a continuation of U.S. application Ser. No. 11/298,052, filed Dec. 9, 2005, now U.S. Pat. No. 7,979,520 which is a continuation-in-part of U.S. application Ser. No. 10/973,575, filed Oct. 26, 2004 now U.S. Pat. No. 7,680,652. The disclosures of the above-identified applications are incorporated herein by reference.
Number | Name | Date | Kind |
---|---|---|---|
4238746 | McCool et al. | Dec 1980 | A |
4282405 | Taguchi | Aug 1981 | A |
4486900 | Cox et al. | Dec 1984 | A |
4531228 | Noso et al. | Jul 1985 | A |
4628156 | Irvin | Dec 1986 | A |
4630305 | Borth et al. | Dec 1986 | A |
4731846 | Secrest et al. | Mar 1988 | A |
4791390 | Harris et al. | Dec 1988 | A |
4811404 | Vilmur et al. | Mar 1989 | A |
4843562 | Kenyon et al. | Jun 1989 | A |
4939685 | Feintuch | Jul 1990 | A |
4969192 | Chen et al. | Nov 1990 | A |
5027410 | Williamson et al. | Jun 1991 | A |
5056150 | Yu et al. | Oct 1991 | A |
5146539 | Doddington et al. | Sep 1992 | A |
5278780 | Eguchi | Jan 1994 | A |
5313555 | Kamiya | May 1994 | A |
5377276 | Terai et al. | Dec 1994 | A |
5400409 | Linhard | Mar 1995 | A |
5406622 | Silverberg et al. | Apr 1995 | A |
5412735 | Engebretson et al. | May 1995 | A |
5432859 | Yang et al. | Jul 1995 | A |
5459813 | Klayman | Oct 1995 | A |
5473702 | Yoshida et al. | Dec 1995 | A |
5479517 | Linhard | Dec 1995 | A |
5494886 | Kehne et al. | Feb 1996 | A |
5495415 | Ribbens et al. | Feb 1996 | A |
5502688 | Recchione et al. | Mar 1996 | A |
5526466 | Takizawa | Jun 1996 | A |
5568559 | Makino | Oct 1996 | A |
5572262 | Ghosh | Nov 1996 | A |
5584295 | Muller et al. | Dec 1996 | A |
5590241 | Park et al. | Dec 1996 | A |
5615298 | Chen | Mar 1997 | A |
5617508 | Reaves | Apr 1997 | A |
5641931 | Ogai et al. | Jun 1997 | A |
5677987 | Seki et al. | Oct 1997 | A |
5680508 | Liu | Oct 1997 | A |
5692104 | Chow et al. | Nov 1997 | A |
5701344 | Wakui | Dec 1997 | A |
5714997 | Anderson | Feb 1998 | A |
5742694 | Eatwell | Apr 1998 | A |
5819215 | Dobson et al. | Oct 1998 | A |
5845243 | Smart et al. | Dec 1998 | A |
5864798 | Miseki et al. | Jan 1999 | A |
5920840 | Satyamurti et al. | Jul 1999 | A |
5920848 | Schutzer et al. | Jul 1999 | A |
5933801 | Fink et al. | Aug 1999 | A |
5949886 | Nevins et al. | Sep 1999 | A |
5949888 | Gupta et al. | Sep 1999 | A |
5953694 | Pillekamp | Sep 1999 | A |
6011853 | Koski et al. | Jan 2000 | A |
6084907 | Nagano et al. | Jul 2000 | A |
6104992 | Gao et al. | Aug 2000 | A |
6111957 | Thomasson | Aug 2000 | A |
6144336 | Preston et al. | Nov 2000 | A |
6163608 | Romesburg et al. | Dec 2000 | A |
6167375 | Miseki et al. | Dec 2000 | A |
6173074 | Russo | Jan 2001 | B1 |
6175602 | Gustafson et al. | Jan 2001 | B1 |
6188979 | Ashley | Feb 2001 | B1 |
6192134 | White et al. | Feb 2001 | B1 |
6199035 | Lakaniemi et al. | Mar 2001 | B1 |
6219418 | Eriksson et al. | Apr 2001 | B1 |
6249275 | Kodama | Jun 2001 | B1 |
6282430 | Young | Aug 2001 | B1 |
6405168 | Bayya et al. | Jun 2002 | B1 |
6408273 | Quagliaro et al. | Jun 2002 | B1 |
6434246 | Kates et al. | Aug 2002 | B1 |
6473409 | Malvar | Oct 2002 | B1 |
6493338 | Preston et al. | Dec 2002 | B1 |
6498811 | Van Der Vleuten | Dec 2002 | B1 |
6507814 | Gao | Jan 2003 | B1 |
6587816 | Chazan et al. | Jul 2003 | B1 |
6628781 | Grundström et al. | Sep 2003 | B1 |
6633894 | Cole | Oct 2003 | B1 |
6643619 | Linhard et al. | Nov 2003 | B1 |
6687669 | Schrögmeier et al. | Feb 2004 | B1 |
6690681 | Preston et al. | Feb 2004 | B1 |
6725190 | Chazan et al. | Apr 2004 | B1 |
6771629 | Preston et al. | Aug 2004 | B1 |
6782363 | Lee et al. | Aug 2004 | B2 |
6804640 | Weintraub et al. | Oct 2004 | B1 |
6822507 | Buchele | Nov 2004 | B2 |
6836761 | Kawashima et al. | Dec 2004 | B1 |
6859420 | Coney et al. | Feb 2005 | B1 |
6871176 | Choi et al. | Mar 2005 | B2 |
6891809 | Ciccone et al. | May 2005 | B1 |
6898293 | Kaulberg | May 2005 | B2 |
6910011 | Zakarauskas | Jun 2005 | B1 |
6937978 | Liu | Aug 2005 | B2 |
7020291 | Buck et al. | Mar 2006 | B2 |
7117149 | Zakarauskas | Oct 2006 | B1 |
7146012 | Belt et al. | Dec 2006 | B1 |
7146316 | Alves | Dec 2006 | B2 |
7167516 | He | Jan 2007 | B1 |
7167568 | Malvar et al. | Jan 2007 | B2 |
7206418 | Yang et al. | Apr 2007 | B2 |
7231347 | Zakarauskas | Jun 2007 | B2 |
7269188 | Smith | Sep 2007 | B2 |
7272566 | Vinton | Sep 2007 | B2 |
20010005822 | Fujii et al. | Jun 2001 | A1 |
20010028713 | Walker | Oct 2001 | A1 |
20020052736 | Kim et al. | May 2002 | A1 |
20020071573 | Finn | Jun 2002 | A1 |
20020176589 | Buck et al. | Nov 2002 | A1 |
20030040908 | Yang et al. | Feb 2003 | A1 |
20030093265 | Xu et al. | May 2003 | A1 |
20030093270 | Domer | May 2003 | A1 |
20030097257 | Amada et al. | May 2003 | A1 |
20030101048 | Liu | May 2003 | A1 |
20030206640 | Malvar et al. | Nov 2003 | A1 |
20030216907 | Thomas | Nov 2003 | A1 |
20040002856 | Bhaskar et al. | Jan 2004 | A1 |
20040024600 | Hamza et al. | Feb 2004 | A1 |
20040071284 | Abutalebi et al. | Apr 2004 | A1 |
20040078200 | Alves | Apr 2004 | A1 |
20040138882 | Miyazawa | Jul 2004 | A1 |
20040165736 | Hetherington et al. | Aug 2004 | A1 |
20040167777 | Hetherington et al. | Aug 2004 | A1 |
20040179610 | Lu et al. | Sep 2004 | A1 |
20050075866 | Widrow | Apr 2005 | A1 |
20050114128 | Hetherington et al. | May 2005 | A1 |
20050240401 | Ebenezer | Oct 2005 | A1 |
20060034447 | Alves et al. | Feb 2006 | A1 |
20060056502 | Callicotte et al. | Mar 2006 | A1 |
20060074646 | Alves et al. | Apr 2006 | A1 |
20060089958 | Giesbrecht et al. | Apr 2006 | A1 |
20060089959 | Nongpiur et al. | Apr 2006 | A1 |
20060100868 | Hetherington et al. | May 2006 | A1 |
20060115095 | Giesbrecht et al. | Jun 2006 | A1 |
20060116873 | Hetherington et al. | Jun 2006 | A1 |
20060251268 | Hetherington et al. | Nov 2006 | A1 |
20060287859 | Hetherington et al. | Dec 2006 | A1 |
20070033031 | Zakarauskas | Feb 2007 | A1 |
20070136055 | Hetherington | Jun 2007 | A1 |
Number | Date | Country |
---|---|---|
2158847 | Sep 1994 | CA |
2157496 | Oct 1994 | CA |
2158064 | Oct 1994 | CA |
0 076 687 | Apr 1983 | EP |
0 275 416 | Jul 1988 | EP |
0 558 312 | Sep 1993 | EP |
0 629 996 | Dec 1994 | EP |
0 629 996 | Dec 1994 | EP |
0 750 291 | Dec 1996 | EP |
0 948 237 | Oct 1999 | EP |
1 450 353 | Aug 2004 | EP |
1 450 354 | Aug 2004 | EP |
1 669 983 | Jun 2006 | EP |
06269084 | Sep 1994 | JP |
06319193 | Nov 1994 | JP |
WO 0041169 | Jul 2000 | WO |
WO 0156255 | Aug 2001 | WO |
WO 0173761 | Oct 2001 | WO |
WO 2006130668 | Dec 2006 | WO |
Number | Date | Country | |
---|---|---|---|
20110276324 A1 | Nov 2011 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 11298052 | Dec 2005 | US |
Child | 13105612 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 10973575 | Oct 2004 | US |
Child | 11298052 | US |