Claims
- 1. A method for performing pitch estimation comprising:
- receiving a speech frame comprising a plurality of speech samples;
- determining an order-two inverse filter for said speech frame using said plurality of speech samples;
- determining a dominant formant frequency from coefficients of the order-two inverse filter;
- calculating an autocorrelation function for said speech frame; and
- estimating a pitch period for said speech frame using said autocorrelation function, wherein said estimating includes using said dominant formant frequency to discriminate a dominant formant from pitch information in the autocorrelation function;
- wherein said determining an order-two inverse filter for said speech frame comprises:
- computing a plurality of candidate order-two inverse filters at a plurality of locations in said speech frame, wherein said computing generates a set of coefficients for each of said candidate order-two inverse filters;
- computing an energy value for each of said candidate order-two inverse filters, wherein said energy value is computed from said set of coefficients of the corresponding candidate order-two inverse filter;
- identifying a minimizing order-two inverse filter with a minimum energy value among said plurality of candidate order-two inverse filters as said order-two inverse filter.
- 2. The method of claim 1, wherein said computing of each of said candidate order-two inverse filters comprises analyzing a number of speech samples which spans less than a full pitch period in time duration.
- 3. The method of claim 2, wherein said number of speech samples is determined using the pitch value estimated from a previous speech frame.
- 4. The method of claim 1, wherein said computing a candidate order-two inverse filter comprises performing an order-two Linear Predictive Coding (LPC) analysis.
- 5. The method of claim 1, wherein said set of coefficients generated for each of said candidate order-two inverse filters includes a pair of filter coefficients a.sub.1 and a.sub.2.
- 6. The method of claim 5, wherein said computing an energy value for each of said candidate order-two inverse filters comprises:
- calculating a corresponding pair of reflection coefficients k.sub.1 and k.sub.2 from the corresponding filter coefficients a.sub.1 and a.sub.2 according to the relations ##EQU17## calculating the energy value from the corresponding reflection coefficients according to the relation
- E=(1-k.sub.1.sup.2)(1-k.sub.2.sup.2).
- 7. The method of claim 5, wherein said determining a dominant formant frequency comprises:
- calculating a discriminant d according to the equation d=a.sub.1.sup.2 +4a.sub.2, wherein a.sub.1 and a.sub.2 denote the coefficients of the order-two inverse filter;
- calculating the angle of the complex number; ##EQU18## multiplying said angle by a scaling factor, wherein said scaling factor equals the sampling rate for said speech frame divided by 2.pi..
- 8. A system for estimating the pitch period of a speech waveform comprising:
- an input for receiving a plurality of speech samples;
- at least one processor coupled to said input;
- wherein said at least one processor determines an order-two inverse filter based on said plurality of speech samples;
- wherein said at least one processor determines a dominant formant frequency from coefficients of the order-two inverse filter;
- wherein said at least one processor calculates an autocorrelation function for said plurality of speech samples;
- wherein said at least one processor estimates a pitch period for said plurality of speech samples using the autocorrelation function, wherein said at least one processor uses said dominant formant frequency to discriminate a dominant formant from pitch information in the autocorrelation function;
- wherein, in determining the order-two inverse filter, said at least one processor:
- computes a plurality of candidate order-two inverse filters at a plurality of locations in said speech frame, wherein said computing generates a set of coefficients for each of said candidate order-two inverse filters;
- computes an energy value for each of said candidate order-two inverse filters, wherein said energy value is computed from said set of coefficients of the corresponding candidate order-two inverse filter;
- identifies a minimizing order-two inverse filter with a minimum energy value among said plurality of candidate order-two inverse filters as said order-two inverse filter.
- 9. The system of claim 8, wherein in computing each of said candidate order-two inverse filters said at least one processor analyzes a number of speech samples which spans less than a full pitch period in time duration.
- 10. The system of claim 9, wherein said number of speech samples is determined using the pitch value estimated from a previous speech frame.
- 11. The system of claim 8, wherein in computing a computing a candidate order-two inverse filter said at least one processor performs an order-two Linear Predictive Coding (LPC) analysis.
- 12. The system of claim 8, wherein said set of coefficients generated for each of said candidate order-two inverse filters comprises a pair of filter coefficients a.sub.1 and a.sub.2.
- 13. The system of claim 12, wherein, in computing the energy value for each of said candidate order-two inverse filters, said at least one processor calculates a corresponding pair of reflections coefficients k.sub.1 and k.sub.2 from the corresponding coefficients according to the relations ##EQU19## and calculates the energy value according to the equation
- E=(1-k.sub.1.sup.2)(1-k.sub.2.sup.2).
- 14. The system of claim 13, wherein, in determining a dominant formant frequency, said at least one processor:
- calculates a discriminant d according to the equation d=a.sub.1.sup.2 +4a.sub.2, wherein a.sub.1 and a.sub.2 denote the coefficients of the order-two inverse filter;
- calculates the angle of the complex number; ##EQU20## multiplies said angle by a scaling factor, wherein said scaling factor equals the sampling rate for said speech frame divided by 2.pi..
- 15. A method for performing pitch estimation comprising:
- receiving a speech frame comprising a plurality of speech samples;
- determining an order-two inverse filter for said speech frame using said plurality of speech samples;
- determining a dominant formant frequency from coefficients of the order-two inverse filter;
- calculating an autocorrelation function for said speech frame; and
- estimating a pitch period for said speech frame using said autocorrelation function, wherein said estimating includes using said dominant formant frequency to discriminate a dominant formant from pitch information in the autocorrelation function;
- wherein said estimating a pitch period further comprises:
- identifying a list of time-delays corresponding to peaks in the autocorrelation function;
- setting the pitch period equal to the dominant formant period if the dominant formant period, and its second, third, fourth, and fifth multiples occur in said list of time-delays, wherein said dominant formant period is the inverse of the dominant formant frequency;
- removing the dominant formant period from the list of time-delays, and after said removing, scanning a remaining list of time-delays, if it is not the case that the dominant formant period and its first, second, third, fourth, and fifth multiples occur in said list of time-delays.
- 16. A system for estimating the pitch period of a speech waveform comprising:
- an input for receiving a plurality of speech samples;
- at least one processor coupled to said input;
- wherein said at least one processor determines an order-two inverse filter based on said plurality of speech samples;
- wherein said at least one processor determines a dominant formant frequency from coefficients of the order-two inverse filter;
- wherein said at least one processor calculates an autocorrelation function for said plurality of speech samples;
- wherein said at least one processor estimates a pitch period for said plurality of speech samples using the autocorrelation function, wherein said at least one processor uses said dominant formant frequency to discriminate a dominant formant from pitch information in the autocorrelation function;
- wherein in estimating said pitch period said at least one processor:
- identifies a list of time-delays corresponding to peaks in the autocorrelation function;
- sets the pitch period equal to the dominant formant period if the dominant formant period, and its second, third, fourth, and fifth multiples occur in said list of time-delays, wherein said dominant formant period is the inverse of the dominant formant frequency;
- removes the dominant forrnant period from the list of time-delays, and after said removal, scans a remaining list of time-delays, if it is not the case that the dominant formant period and its first, second, third, fourth, and fifth multiples occur in said list of time-delays.
CONTINUATION DATA
This is a continuation-in-part of application Ser. No. 08/647,843 titled "System and Method for Improved Pitch Estimation Which Performs First Formant Energy Removal For A Frame Using Coefficients From A Prior Frame" filed May 15, 1996, now U.S. Pat. No. 5,937,374 whose inventors are John G. Bartkowiak and Mark A. Ireton.
US Referenced Citations (6)
Non-Patent Literature Citations (3)
Entry |
Chen, "One-Dimensional Digital Signal Processing" Marcel Dekker, pp. 161-162, 1979. |
Microsoft "Computer Dictionary" Microsoft Press pp. 290 and 291, 1994. |
Rabiner, et al, "Digital Processing of Speech Signals," Bell Laboratories, published by Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1978, pp. 441-450. |
Continuation in Parts (1)
|
Number |
Date |
Country |
Parent |
647843 |
May 1996 |
|