Adaptive filter pitch extraction

Description

BACKGROUND OF THE INVENTION

1. Technical Field

This invention relates to signal processing systems, and more particularly to systems that estimate pitch.

2. Related Art

Some audio processing systems capture sound, reproduce sound, and convey sound to other devices. In some environments, unwanted components may reduce the clarity of a speech signal. Wind, engine noise and other background noises may obscure the signal. As the noise increases, the intelligibility of the speech may decrease.

Many speech signals may be classified into voiced and unvoiced. In the time domain, unvoiced segments display a noise like structure. Little or no periodicity may be apparent. In the speech spectrum, voiced speech segments have almost a periodic structure.

Some natural speech has a combination of a harmonic spectrum and a noise spectrum. A mixture of harmonics and noise may appear across a large bandwidth. Non-stationary and/or varying levels of noise may be highly objectionable especially when the noise masks voiced segments and non-speech intervals. While the spectral characteristics of non-stationary noise may not vary greatly, its amplitude may vary drastically.

To facilitate reconstruction of a speech signal having voiced and unvoiced segments, it may be necessary to estimate the pitch of the signal during the voiced speech. Accurate pitch estimations may improve the perceptual quality of a processed speech segment. Therefore, there is a need for a system that facilitates the extraction of pitch from a speech signal.

SUMMARY

A system extracts pitch from a speech signal. The system estimates the pitch in voiced speech by enhancing the signal by deriving adaptive filter coefficients, and estimating pitch using the derived coefficients.

Other systems, methods, features and advantages of the invention will be, or will become, apparent to one with skill in the art upon examination of the following figures and detailed description. It is intended that all such additional systems, methods, features and advantages be included within this description, be within the scope of the invention, and be protected by the following claims.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention can be better understood with reference to the following drawings and description. The components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention. Moreover, in the figures, like referenced numerals designate corresponding parts throughout the different views.

FIG. 1 is a block diagram of a speech signal enhancement system.

FIG. 2 is a spectral plot of a speech waveform.

FIG. 3 is a plot of adaptive filter coefficients adapted to the waveform of FIG. 2.

FIG. 4 is a plot of the autocorrelation values of the waveform of FIG. 2.

FIG. 5 is a block diagram of a second speech signal enhancement system.

FIG. 6 is a block diagram of a third speech signal enhancement system.

FIG. 7 is a spectral plot of a speech waveform in a tonal noise environment.

FIG. 8 is a plot of filter coefficients in the tonal noise environment of FIG. 7.

FIG. 9 is a plot of the filter coefficients of FIG. 8 after reducing the effect of tonal noise from the speech.

FIG. 10 is a spectral plot of a speech waveform.

FIG. 11 is a plot of filter coefficients of FIG. 10 with no noise added.

FIG. 12 is a plot of the filter coefficients of FIG. 10 with noise added to the input signal.

FIG. 13 is a flow diagram of a signal enhancement.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

Enhancement logic improves the perceptual quality of a processed speech signal. The logic may automatically identify and enhance speech segments. Selected voiced and/or unvoiced segments may be processed and amplified in one or more frequency bands. To improve perceptual quality, the pitch of the signal is estimated. The versatility of the system allows the enhancement logic to enhance speech before it is passed or processed by a second system. In some applications, speech or other audio signals may be passed to remote, local, or mobile system such as an automatic speech recognition engine that may capture and extract voice in the time and/or frequency domains.

The enhancement systems may interface or comprise a unitary part of a vehicle or a communication system (e.g., a wireless telephone, an automatic speech recognition system, etc). The systems may include preprocessing logic and/or post-processing logic and may be implemented in hardware and/or software. In some systems, software is processed by a digital signal processor (DSP), general purpose processor (GPP), or some combination of DSP and GPP. The DSP may execute instructions that delay an input signal, track frequency components of a signal, filter a signal, and/or reinforce selected spectral content. In other systems, the hardware or software may be programmed or implemented in discrete logic or circuitry, a combination of discrete and integrated logic or circuitry, and/or may be distributed across and executed by multiple controllers or processors.

The system for estimating the pitch in a speech signal approximates the position of peaks, k, of the signal. The pitch may be estimated by equation 1:

fp=fs/(D+k) Equation 1

where fp is the estimated pitch, fs is the sampling frequency, the signal has been delayed by D samples before passing through an adaptive filter, and k is a peak position of the adaptive filter coefficients.

In FIG. 1, an adaptive filter enhances the periodic component of speech signal “x(n)”. The periodic component of the signal relates to voiced speech. The enhancement system 100 of FIG. 1 processes a signal input “x(n)”. The input signal “x(n)” is coupled to delay unit 104. The delay unit 104 imparts a delay. The delay may vary with the implementation of the enhancement system 100 and comprise a portion of a memory or buffer that temporarily holds data to be transferred to another location for a defined or programmable period. In FIG. 1, input “x(n)” is coupled to the adaptive filter 108.

The adaptive filter 108 passes output signal “y(n)”. The adaptive filter 108 may track one or more frequency components of the input signal based on the delayed input signal. The filter 108 tracks the fundamental frequencies of the input signal as the pitch changes during voiced speech. The filter 108 may comprise a Finite Impulse Response Filter (FIR) adapted by a Normalized Least Mean Squares (NLMS) technique or other adaptive filtering technique such as Recursive Least Squares (RLS) or Proportional NLMS.

In some enhancement systems the adaptive filter 108 changes or adapts its coefficients to match or approximate the response of the input signal “x(n)”. Using an adaptive filtering algorithm, the error signal “e(n)” is derived through adder logic or an adder circuit 110 (e.g., a vector adder) that subtracts the input signal “x(n)” from the adapted predicted output vector “y(n)”. As shown in equation 2:

vector e(n)=vector y(n)−x(n) Equation 2

Using this measure, the adaptive filter 108 changes its coefficients in attempt to reduce the difference between the adapted predicted output vector “y(n)” and the discrete input signal “x(n).”

The adaptive filter output 108, “y(n)”, is processed by weighting logic or a weighting circuit 112 to yield a scalar output. In FIG. 1, the weighting logic or circuit 112 may comprise a summing filter that removes the negative coefficients of the predictive output vector “y(n)” before summing the coefficients to derive a scalar output. The scalar output is then added to the input signal through adder logic or an adder circuit 114 (e.g., a scalar adder). To minimize enhancing high frequency components of the input or discrete input “x(n)”, a front-end process, circuit(s), processor(s), controller(s) or interface may further condition the delayed input in other communication systems and/or enhancement systems.

When a speech signal “x(n)” is enhanced, the filter coefficients of adaptive filter 108 approximate the autocorrelation values of the speech signal. Therefore, these filter coefficients obtained by adaptive filter 108 may be used to estimate the pitch in voiced speech.

FIG. 2 shows a spectral plot of a speech waveform. In this figure, the spectrogram shows the frequency content of the word utterances in speech as a function of time, in seconds. The stronger the frequency component, the darker the shade in the spectrogram. As seen, the word utterances typically have striations. These striations comprise the harmonics of speech, which are multiples of pitch frequency.

FIG. 3 shows a plot of the adaptive filter coefficients as they adapt to the speech waveform of FIG. 2. In FIG. 4, a plot of the autocorrelation values of the speech waveform is shown. Both FIGS. 3 and 4 have been substantially time-aligned with FIG. 2 for illustrative purposes. In FIGS. 3 and 4, the numbers shown on the x-axis correspond to the number of frames, wherein each frame is approximately 5.8 ms. In FIG. 3, frame number 400 corresponds to a time of 2.32 seconds in FIG. 2.

In FIG. 3, the y-axis represents the filter coefficients of adaptive filter 108, while in FIG. 4 the y-axis represents the lag-value of the autocorrelation of the speech signal. Specifically, FIG. 3 shows, graphically, the filter coefficients for adaptive filter 108, which for this example ranges from 1 to 120. A filter coefficient of 1 may represent the first filter coefficient of adaptive filter 108. In FIG. 3, each filter coefficient changes with time because error feedback “e(n)” of FIG. 1 continuously adapts the filter coefficient values.

The periodic components of input signal “x(n)” form substantially organized regions on the plot. The periodic components of the signal contribute to larger peak values in the filter coefficients of the adaptive filter, while the background noise and non-periodic components of speech do not contribute to large filter coefficients values. By using FIG. 2, which has been time-aligned with FIG. 3, to identify the periodic, non-periodic, and silent (that is, only background noise) portions of the speech signal, it may be observed that the filter coefficients in FIG. 3 have relatively small values (i.e., lighter shade) during non-periodic and background-noise-only portions of speech and relatively large peak values (i.e., dark striations) during periodic portions of speech. The dark spot 345 shown in FIG. 3 corresponds to peak filter coefficient values during a periodic segment (or voiced speech segment) of input signal “x(n).”

When the input signal “x(n)” is periodic, the filter coefficient values, which are similar to the autocorrelation function values, may be used to calculate the period of this signal. As expressed in equation 1, the pitch of signal “x(n)” may be approximated by the inverse of the position of the peak filter coefficient. The position of the filter coefficients is analogous to the lag of the autocorrelation function.

For clean speech with relatively low noise, the position of the peak of the filter coefficient values may yield the lag. Taking the inverse of the lag, the pitch frequency may be obtained. For example, in FIG. 3, at frame 400 (or approximately 2.32 seconds), the largest peak occurs at position 60, as shown by dark spot 345. Assuming the signal has not been delayed, e.g., D=0, then the estimated pitch may be calculated as fp=fs/k=(11025 Hz)/(60)=184 Hz.

The adaptive filter coefficients shown in FIG. 3 are updated on a sample-by-sample basis, while the autocorrelation values of FIG. 4 are updated on a block-by-block basis. By updating the filter coefficients on a sample-by-sample basis, enhanced identification of filter coefficient peaks and improved pitch estimation in the speech sample may be achieved. Improving resolution yields an improved tracking performance of the estimated pitch, particularly where the pitch varies rapidly.

FIG. 5 is a system for obtaining an improved pitch estimate. In speech, the higher frequency harmonics may be stronger than the pitch of the signal. In such cases, these stronger harmonics may cause undesirable multiple peaks to overlap with the peaks caused by the signal at the pitch frequency, thereby making estimation of the desirable peak positions inaccurate. If the estimation of the peak positions is inaccurate, the estimated pitch may be inaccurate.

In FIG. 5, communication or enhancement system 500 comprises low-pass filter 502. The low-pass filter reduces undesirable peaks that may be caused by higher frequency harmonics by changing the frequency spectrum of a delayed signal. The lower frequency noise components may be used to estimate the filter coefficients. Low-pass filter 502 may attenuate frequencies above a predefined threshold, for example, about 1,000 Hz. By filtering out high frequency values from input signal “x(n)”, more accurate filter coefficients may be obtained. The more accurate filter coefficients may yield an improved pitch estimate.

In operation, signal “x(n)” is passed through low-pass filter 502 before passing to adaptive filter 108. Digital delay unit 104 couples the input signal “x(n)” to the low-pass filter 502 and a programmable filter 506 that may have a single input and multiple outputs. While the system encompasses many techniques for choosing the coefficients of the programmable filter 506, in FIG. 5 the programmed filter 506 copies the adaptive filter coefficients from adaptive filter 108. The filter coefficients are copied at substantially the sampling rate of the enhancement system 500. The sampling rate of the enhancement system 500 may vary with a desired resolution of the enhanced speech signal. While the transfer functions of the adaptive filter 108 and programmed filter 506 may change as the amplitude and/or frequency of the input signal “x(n)” or the delayed input signal “x(n−D)” changes, the programmed filter 506 has substantially the same transfer function as the adaptive filter 108 as each sample point of the input signal is processed. Temporally, this may occur at the sampling frequency of the enhancement system 500.

In FIG. 5, portions of the delayed input “x(n−D)” are processed by the programmed to filter 506 to yield a predictive output vector “ŷ(n)”. The predictive output vector “ŷ(n)” is then processed by weighting logic or a weighting circuit 112 to yield a scalar output. In FIG. 5, the weighting logic or circuit 112 may comprise a summing filter that removes the negative coefficients of the predictive output vector “ŷ(n)” before summing the coefficients to derive a scalar output. The scalar output then is added to the input signal through an adder logic or an adder circuit 114 (e.g., a scalar adder), which enhances the periodicity or harmonic structure of voiced speech with little or no amplification of the background noise.

FIG. 6 is another system that improves the pitch estimate in a speech sample. Poor pitch estimates may occur in high-amplitude, low-frequency noise conditions that may be caused by road bumps, wind buffets and car noises. These undesirable noises may prevent the adaptive filter coefficients from converging properly. If the adaptive filter coefficients cannot converge correctly, then the derivation of pitch may be inaccurate.

One technique for improving the convergence rate of the adaptive filter in such conditions is to spectrally flatten the input signal before passing it to the adaptive filter. In FIG. 6, spectral flattening is performed by spectral modification logic 602, before signal “x(n−D)” is passed through adaptive filter 108. Spectral modification logic 602 substantially flattens the spectral character of the background noise within the input “x(n)” or the delayed input “x(n−D)” that may include speech and background noise. In some systems the frequency and/or amplitude of portions of the background noise is detected during talk spurts and pauses. In some applications, the detected noise is modeled by an n-pole linear predictive coding “LPC” filter model, where n can vary between 1 and 20. In these and other systems, some of the background noise is substantially flattened, and in other systems some of the background noise is dampened. The noise may be dampened to a comfort noise level, noise floor, or a predetermined level that a user expects to hear.

In FIGS. 7-9, pitch estimation is improved by taking a leaky average of adaptive filter coefficients. If a speech signal is corrupted by a constant tonal noise, the adaptive filter coefficients may also have peaks corresponding to the tonal noise, which may adversely affect the pitch estimate. FIG. 7 shows a spectral plot of a speech waveform. In FIG. 7, a tonal noise masks portions of the speech signal. Specifically, the darker shading in the substantially horizontal lines of the spectrogram shows that an enhancement is amplifying the continuous tonal noise.

A leaky average may be used to reduce the adverse impact of tonal noise on the filter coefficients. The leaky average of the adaptive filter coefficients may be approximated by equation 3:

y(n)=(1−α)y(n−1)+αh(n) Equation 3

where y(n) is the leaky average vector of the filter coefficients, h(n) is the input filter coefficient vector, and α is the leakage factor. By taking a leaky average of the adaptive filter coefficients, tonal noise present in the input signal may be substantially captured. The leaky average of the filter coefficients may then be subtracted from the substantially instantaneous estimate of the adaptive filter coefficients to substantially remove the effect of tonal noise from the estimated adaptive filter coefficients. The leaky average vector obtained from equation 3 corresponds to constant, unwanted tonal noise. Such values may be subtracted from the instantaneous estimate of the adaptive filter coefficients. This subtraction provides for revised filter coefficients that have substantially reduced the effect of tonal noise.

In FIG. 8, a plot of filter coefficients, which have not been enhanced using a leaky average, is shown. Various dark horizontal lines 845 may be seen. These lines correspond to a constant tonal noise, such as engine noise of a vehicle. In estimating the pitch of voiced speech, it may be desirable to remove such lines.

In FIG. 9, the amplitude of the speech signal is improved, with little or no amplification of the continuous interference of the tonal noise. In particular, the constant tonal noise lines 845 have been reduced.

In FIGS. 10-12, another technique that enhances the pitch estimation of a speech signal is described. When a speech signal has low noise content, such as when passing through silent portions of speech, the values of the adaptive filter coefficients tend to adhere to the last significant coefficient. This may occur where the magnitude of the speech signal in the silent portions is too weak to cause significant changes to the adaptive filter coefficients. One way to reduce this “adherence effect” by adding a small-magnitude random noise to the reference input signal of the adaptive filter.

FIG. 10 shows a spectral plot of a speech waveform. In FIG. 11, filter coefficients may adhere to the last significant coefficient values while passing through silent portions of the speech waveform. As shown, the dark spot 1145 of FIG. 11 may correspond to an instance where there is a clean signal comprising primarily speech, with little or no noise. After this instance in time, a grey smearing effect may be seen. In particular, the arrows in FIG. 11 depict filter coefficients for which an adherence effect may be present. Since the error signal that adapts to the filter coefficient of the previous value may be carried through a region where there is no speech, the feedback signal may be very small. Such adherence due to low background noise may pose a problem when estimating pitch in a speech signal because the dark speech signal may extend into a region where there is little or no speech.

In FIG. 12, a small amount of background noise has been added to input signal “x(n−D)” of FIGS. 5 and 6, before the signal is passed through adaptive filter 108. A relatively small amount of white noise may be added to the signal. The effect of adding random noise to the input signal “x(n−D)” is shown in FIG. 12. With random noise added, the error signal “e(n)” that is relayed to adaptive filter 108 causes the filter to adapt in regions where there is little or no speech. The smearing effect from FIG. 11 may be substantially reduced.

FIG. 13 is a flow diagram of a signal enhancement. An input signal is digitized (Act 1302) and delayed (Act 1304). The delay may be implemented by a memory, buffer, or logic that counts to a specified number before passing the signal or other devices that causes the input signal to reach its destination later in time. In some systems the delay may comprise a propagation delay.

The delayed signal may be passed through a low-pass filter (Act 1306), or may be passed through spectral modification logic (Act 1308). The spectral modification logic substantially flattens the spectral character of all or a portion of the background noise before it is filtered by one or more (e.g., multistage) filters (e.g., a low pass filter, high pass filter, band pass filter, and/or spectral mask) at optional Act 1308. In some methods, the frequency and amplitude of the background noise is detected during talk spurts and pauses and may be modeled by a linear predictive coding filter. In these and other methods, some or all of the background noise is substantially flattened, and in other systems some or all of the background noise is dampened. The noise may be dampened to a comfort noise level, noise floor, or a predetermined level that a user expects to hear.

An adaptive filter such as a moving average filter, nonrecursive discrete-time filter, or adaptive FIR filter may model a portion of the speech spectrum with the flattened or dampened noise spectrum at Act 1310. In some enhancement systems, the adaptive filter changes or adapts its coefficients to match or approximate the input signal “x(n)” at discrete points in time. Using an adaptive filtering algorithm, the error signal “e(n)” is derived to through adder logic or an adder circuit (e.g., a vector adder) that subtracts the input signal “x(n)” from the adapted predicted output vector “y(n)”, as shown in equation 2 above.

In FIG. 13, the adaptive filter changes its coefficients as it reduces the difference between the adapted predicted output vector “y(n)” and the discrete input signal “x(n)”. Based on the feedback from error signal “e(n)”, the adaptive filter coefficients are derived (Act 1312). Based on the adaptive filter coefficients, the pitch of the speech signal is estimated (Act 1314).

At Act 1316, portions of the delayed input “x(n−D)” are processed by the programmed filter to yield a predictive output vector “ŷ(n)”. The predictive output vector “ŷ(n)” is then processed by weighting logic or a weighting circuit to yield a scalar output at Act 1318. In FIG. 13, the weighting logic or circuit may comprise a summing filter that removes the negative coefficients of the predictive output vector “ŷ(n)” before summing the coefficients to derive a scalar output. The scalar output is then added to the input signal through adder logic or an adder circuit (e.g., a scalar adder) at Act 1320 which enhances the periodicity or harmonic structure of voiced speech to derive an enhanced speech signal.

The systems provide improved pitch estimation based on the calculated adaptive filter coefficients. The accuracy of the pitch estimate may vary with the pitch value. The accuracy of the pitch estimate from the filter coefficients may be expressed as:

Δfs=(fp)²/fs Equation 4

where “Δ fs” is the pitch tolerance range, fp is the estimated pitch, and fs is the sampling frequency.

Each of the systems and methods described above may be encoded in a signal bearing medium, a computer readable medium such as a memory, programmed within a device such as one or more integrated circuits, or processed by a controller, a digital signal processor and/or a general purpose processor (GPP). If the methods are performed by software, the software may reside in a memory resident to or interfaced to the spectral modification logic 602, adaptive filter 108, programmed filter 506 or any other type of non-volatile or volatile memory interfaced, or resident to the elements or logic that comprise the enhancement system. The memory may include an ordered listing of executable instructions for implementing logical functions. A logical function may be implemented through digital circuitry, through source code, through analog circuitry, or through an analog source such through an analog electrical, or optical signal. The software may be embodied in any computer-readable or signal-bearing medium, for use by, or in connection with an instruction executable system, apparatus, or device. Such a system may include a computer-based system, a processor-containing system, or another system that may selectively fetch instructions from an instruction executable system, apparatus, or device that may also execute instructions.

A “computer-readable medium,” “machine-readable medium,” “propagated-signal” medium, and/or “signal-bearing medium” may comprise any apparatus that contains, stores, communicates, propagates, or transports software for use by or in connection with an instruction executable system, apparatus, or device. The machine-readable medium may selectively be, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. A non-exhaustive list of examples of a machine-readable medium would include: an electrical connection “electronic” having one or more wires, a portable magnetic or optical disk, a volatile memory such as a Random Access Memory “RAM” (electronic), a Read-Only Memory “ROM” (electronic), an Erasable Programmable Read-Only Memory (EPROM or Flash memory) (electronic), or an optical fiber (optical). A machine-readable medium may also include a tangible medium upon which software is printed, as the software may be electronically stored as an image or in another format (e.g., through an optical scan), then compiled, and/or interpreted or otherwise processed. The processed medium may then be stored in a computer and/or machine memory.

The enhancement system may be modified or adapted to any technology or devices. The above described enhancement systems may couple or interface remote or local automatic speech recognition “ASR” engines. The ASR engines may be embodied in instruments that convert voice and other sounds into a form that may be transmitted to remote locations, such as landline and wireless communication devices (including wireless protocols such as those described in this disclosure) that may include telephones and audio equipment and that may be in a device or structure that transports persons or things (e.g., a vehicle) or stand alone within the devices. Similarly, the enhancement may be embodied in a vehicle with ASR or without ASR.

The ASR engines may be embodied in telephone logic that in some devices are a unitary part of vehicle control system or interface a vehicle control system. The enhancement system may couple pre-processing and post-processing logic, such as that described in U.S. application Ser. No. 10/973,575 “Periodic Signal Enhancement System,” filed Oct. 26, 2004, which is incorporated herein by reference. Similarly, all or some of the delay unit, adaptive filter, vector adder, and scalar adder may be modified or replaced by the enhancement system or logic described U.S. application Ser. No. 10/973,575.

The speech enhancement system is also adaptable and may interface systems that detect and/or monitor sound wirelessly or through electrical or optical devices or methods. When certain sounds or interference are detected, the system may enable the enhancement system to prevent the amplification or gain adjustment of these sounds or interference. Through a bus, such as communication bus, a noise detector may send a notice such as an interrupt (hardware of software interrupt) or a message to prevent the enhancement of these sounds or interferences while enhancing some or all of the speech signal. In these applications, the enhancement logic may interface or be incorporated within one or more circuits, logic, systems or methods described in “Method for Suppressing Wind Noise,” U.S. Ser. Nos. 10/410,736 and 10/688,802; and “System for Suppressing Rain Noise,” U.S. Ser. No. 11/006,935, each of which is incorporated herein by reference.

While various embodiments of the invention have been described, it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible within the scope of the invention. Accordingly, the invention is not to be restricted except in light of the attached claims and their equivalents.

Claims

1. A system for estimating pitch, comprising: a discrete input configured to receive a speech signal;a delay unit coupled with the discrete input and configured to delay the speech signal;an adaptive filter coupled with the delay unit and configured to process the speech signal output from the delay unit; anda pitch estimator comprising a processor configured to analyze adaptive filter coefficients that define a transfer function of the adaptive filter and estimate a pitch of the speech signal based on the adaptive filter coefficients.
2. The system of claim 1 where the pitch estimator is configured to locate a position of a peak in the adaptive filter coefficients and estimate the pitch of the speech signal based on the position of the peak.
3. The system of claim 1 where the adaptive filter is configured to generate an output signal based on the speech signal received from the delay unit and the adaptive filter coefficients that define the transfer function of the adaptive filter.
4. The system of claim 3 where the adaptive filter is configured to change, on a sample-by-sample basis, the adaptive filter coefficients based on a difference between the output signal and the speech signal received at the discrete input.
5. The system of claim 1 further comprising a low-pass filter coupled between the delay unit and the adaptive filter.
6. The system of claim 1 further comprising spectral modification logic coupled between the delay unit and the adaptive filter.
7. The system of claim 1 further comprising a programmable filter coupled to the delay unit and configured to process the speech signal output from the delay unit, where the programmable filter comprises a transfer function functionally related to the transfer function of the adaptive filter.
8. The system of claim 7 where the adaptive filter coefficients of the adaptive filter are copied to the programmable filter to match the transfer function of the programmable filter with the transfer function of the adaptive filter.
9. The system of claim 1 where the system is configured to add a noise to the speech signal before passing the speech signal through the adaptive filter.
10. The system of claim 1 where the pitch estimator is configured to estimate the pitch of the speech signal according to fp=fs/(D+k), where fp is the estimated pitch of the speech signal, fs is a sampling frequency, D is a number of samples that the speech signal has been delayed by the delay unit, and k is a position of a peak in the adaptive filter coefficients.
11. A system for estimating pitch, comprising: an adaptive filter that comprises adaptive filter coefficients and an input configured to receive a speech signal, where the adaptive filter coefficients define a transfer function of the adaptive filter; anda pitch estimator comprising a processor configured to analyze the adaptive filter coefficients and estimate a pitch of the speech signal based on the adaptive filter coefficients.
12. The system of claim 11 where the pitch estimator is configured to locate a position of a peak in the adaptive filter coefficients and estimate the pitch of the speech signal based on the position of the peak.
13. The system of claim 11 where the pitch estimator is configured to estimate the pitch of the speech signal according to fp=fs/(D+k), where fp is the estimated pitch of the speech signal, fs is a sampling frequency, D is a number of samples that the speech signal has been delayed, and k is a position of a peak in the adaptive filter coefficients.
14. The system of claim 11 where the adaptive filter is configured to generate an output signal based on the speech signal and the adaptive filter coefficients, and where the adaptive filter is configured to change, on a sample-by-sample basis, the adaptive filter coefficients based on a difference between the output signal and the speech signal.
15. A method for estimating pitch, the method comprising: receiving a speech signal;delaying the speech signal;passing the delayed speech signal through an adaptive filter;obtaining adaptive filter coefficients that define a transfer function of the adaptive filter; andanalyzing the adaptive filter coefficients by a processor to estimate a pitch of the speech signal based on the adaptive filter coefficients.
16. The method of claim 15 where analyzing the adaptive filter coefficients to estimate the pitch of the speech signal comprises: locating a position of a peak in the adaptive filter coefficients; andestimating the pitch of the speech signal based on the position of the peak.
17. The method of claim 15 where analyzing the adaptive filter coefficients to estimate the pitch of the speech signal comprises estimating the pitch of the speech signal according to fp=fs/(D+k), where fp is the estimated pitch of the speech signal, fs is a sampling frequency, D is a number of samples that the speech signal has been delayed, and k is a position of a peak in the adaptive filter coefficients.
18. The method of claim 15 further comprising: generating an output signal from the adaptive filter based on the delayed speech signal and the adaptive filter coefficients that define the transfer function of the adaptive filter; andchanging, on a sample-by-sample basis, the adaptive filter coefficients based on a difference between the output signal and the speech signal.
19. The method of claim 15 further comprising passing the speech signal through a low-pass filter before passing the signal through the adaptive filter.
20. The method of claim 15 further comprising spectrally flattening the speech signal before passing the signal through the adaptive filter.

PRIORITY CLAIM

This application is a continuation of U.S. application Ser. No. 11/298,052, filed Dec. 9, 2005, now U.S. Pat. No. 7,979,520 which is a continuation-in-part of U.S. application Ser. No. 10/973,575, filed Oct. 26, 2004 now U.S. Pat. No. 7,680,652. The disclosures of the above-identified applications are incorporated herein by reference.

US Referenced Citations (136)

Number	Name	Date	Kind
4238746	McCool et al.	Dec 1980	A
4282405	Taguchi	Aug 1981	A
4486900	Cox et al.	Dec 1984	A
4531228	Noso et al.	Jul 1985	A
4628156	Irvin	Dec 1986	A
4630305	Borth et al.	Dec 1986	A
4731846	Secrest et al.	Mar 1988	A
4791390	Harris et al.	Dec 1988	A
4811404	Vilmur et al.	Mar 1989	A
4843562	Kenyon et al.	Jun 1989	A
4939685	Feintuch	Jul 1990	A
4969192	Chen et al.	Nov 1990	A
5027410	Williamson et al.	Jun 1991	A
5056150	Yu et al.	Oct 1991	A
5146539	Doddington et al.	Sep 1992	A
5278780	Eguchi	Jan 1994	A
5313555	Kamiya	May 1994	A
5377276	Terai et al.	Dec 1994	A
5400409	Linhard	Mar 1995	A
5406622	Silverberg et al.	Apr 1995	A
5412735	Engebretson et al.	May 1995	A
5432859	Yang et al.	Jul 1995	A
5459813	Klayman	Oct 1995	A
5473702	Yoshida et al.	Dec 1995	A
5479517	Linhard	Dec 1995	A
5494886	Kehne et al.	Feb 1996	A
5495415	Ribbens et al.	Feb 1996	A
5502688	Recchione et al.	Mar 1996	A
5526466	Takizawa	Jun 1996	A
5568559	Makino	Oct 1996	A
5572262	Ghosh	Nov 1996	A
5584295	Muller et al.	Dec 1996	A
5590241	Park et al.	Dec 1996	A
5615298	Chen	Mar 1997	A
5617508	Reaves	Apr 1997	A
5641931	Ogai et al.	Jun 1997	A
5677987	Seki et al.	Oct 1997	A
5680508	Liu	Oct 1997	A
5692104	Chow et al.	Nov 1997	A
5701344	Wakui	Dec 1997	A
5714997	Anderson	Feb 1998	A
5742694	Eatwell	Apr 1998	A
5819215	Dobson et al.	Oct 1998	A
5845243	Smart et al.	Dec 1998	A
5864798	Miseki et al.	Jan 1999	A
5920840	Satyamurti et al.	Jul 1999	A
5920848	Schutzer et al.	Jul 1999	A
5933801	Fink et al.	Aug 1999	A
5949886	Nevins et al.	Sep 1999	A
5949888	Gupta et al.	Sep 1999	A
5953694	Pillekamp	Sep 1999	A
6011853	Koski et al.	Jan 2000	A
6084907	Nagano et al.	Jul 2000	A
6104992	Gao et al.	Aug 2000	A
6111957	Thomasson	Aug 2000	A
6144336	Preston et al.	Nov 2000	A
6163608	Romesburg et al.	Dec 2000	A
6167375	Miseki et al.	Dec 2000	A
6173074	Russo	Jan 2001	B1
6175602	Gustafson et al.	Jan 2001	B1
6188979	Ashley	Feb 2001	B1
6192134	White et al.	Feb 2001	B1
6199035	Lakaniemi et al.	Mar 2001	B1
6219418	Eriksson et al.	Apr 2001	B1
6249275	Kodama	Jun 2001	B1
6282430	Young	Aug 2001	B1
6405168	Bayya et al.	Jun 2002	B1
6408273	Quagliaro et al.	Jun 2002	B1
6434246	Kates et al.	Aug 2002	B1
6473409	Malvar	Oct 2002	B1
6493338	Preston et al.	Dec 2002	B1
6498811	Van Der Vleuten	Dec 2002	B1
6507814	Gao	Jan 2003	B1
6587816	Chazan et al.	Jul 2003	B1
6628781	Grundström et al.	Sep 2003	B1
6633894	Cole	Oct 2003	B1
6643619	Linhard et al.	Nov 2003	B1
6687669	Schrögmeier et al.	Feb 2004	B1
6690681	Preston et al.	Feb 2004	B1
6725190	Chazan et al.	Apr 2004	B1
6771629	Preston et al.	Aug 2004	B1
6782363	Lee et al.	Aug 2004	B2
6804640	Weintraub et al.	Oct 2004	B1
6822507	Buchele	Nov 2004	B2
6836761	Kawashima et al.	Dec 2004	B1
6859420	Coney et al.	Feb 2005	B1
6871176	Choi et al.	Mar 2005	B2
6891809	Ciccone et al.	May 2005	B1
6898293	Kaulberg	May 2005	B2
6910011	Zakarauskas	Jun 2005	B1
6937978	Liu	Aug 2005	B2
7020291	Buck et al.	Mar 2006	B2
7117149	Zakarauskas	Oct 2006	B1
7146012	Belt et al.	Dec 2006	B1
7146316	Alves	Dec 2006	B2
7167516	He	Jan 2007	B1
7167568	Malvar et al.	Jan 2007	B2
7206418	Yang et al.	Apr 2007	B2
7231347	Zakarauskas	Jun 2007	B2
7269188	Smith	Sep 2007	B2
7272566	Vinton	Sep 2007	B2
20010005822	Fujii et al.	Jun 2001	A1
20010028713	Walker	Oct 2001	A1
20020052736	Kim et al.	May 2002	A1
20020071573	Finn	Jun 2002	A1
20020176589	Buck et al.	Nov 2002	A1
20030040908	Yang et al.	Feb 2003	A1
20030093265	Xu et al.	May 2003	A1
20030093270	Domer	May 2003	A1
20030097257	Amada et al.	May 2003	A1
20030101048	Liu	May 2003	A1
20030206640	Malvar et al.	Nov 2003	A1
20030216907	Thomas	Nov 2003	A1
20040002856	Bhaskar et al.	Jan 2004	A1
20040024600	Hamza et al.	Feb 2004	A1
20040071284	Abutalebi et al.	Apr 2004	A1
20040078200	Alves	Apr 2004	A1
20040138882	Miyazawa	Jul 2004	A1
20040165736	Hetherington et al.	Aug 2004	A1
20040167777	Hetherington et al.	Aug 2004	A1
20040179610	Lu et al.	Sep 2004	A1
20050075866	Widrow	Apr 2005	A1
20050114128	Hetherington et al.	May 2005	A1
20050240401	Ebenezer	Oct 2005	A1
20060034447	Alves et al.	Feb 2006	A1
20060056502	Callicotte et al.	Mar 2006	A1
20060074646	Alves et al.	Apr 2006	A1
20060089958	Giesbrecht et al.	Apr 2006	A1
20060089959	Nongpiur et al.	Apr 2006	A1
20060100868	Hetherington et al.	May 2006	A1
20060115095	Giesbrecht et al.	Jun 2006	A1
20060116873	Hetherington et al.	Jun 2006	A1
20060251268	Hetherington et al.	Nov 2006	A1
20060287859	Hetherington et al.	Dec 2006	A1
20070033031	Zakarauskas	Feb 2007	A1
20070136055	Hetherington	Jun 2007	A1

Foreign Referenced Citations (19)

Number	Date	Country
2158847	Sep 1994	CA
2157496	Oct 1994	CA
2158064	Oct 1994	CA
0 076 687	Apr 1983	EP
0 275 416	Jul 1988	EP
0 558 312	Sep 1993	EP
0 629 996	Dec 1994	EP
0 629 996	Dec 1994	EP
0 750 291	Dec 1996	EP
0 948 237	Oct 1999	EP
1 450 353	Aug 2004	EP
1 450 354	Aug 2004	EP
1 669 983	Jun 2006	EP
06269084	Sep 1994	JP
06319193	Nov 1994	JP
WO 0041169	Jul 2000	WO
WO 0156255	Aug 2001	WO
WO 0173761	Oct 2001	WO
WO 2006130668	Dec 2006	WO

Related Publications (1)

	Number	Date	Country
	20110276324 A1	Nov 2011	US

Continuations (1)

	Number	Date	Country
Parent	11298052	Dec 2005	US
Child	13105612		US

Continuation in Parts (1)

	Number	Date	Country
Parent	10973575	Oct 2004	US
Child	11298052		US

Adaptive filter pitch extraction

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

US Classifications

Field of Search

US

International Classifications

Abstract