Claims
- 1. A low bit rate codec for coding and decoding a speech signal comprising:
- means for receiving the speech signal and dividing the speech signal into speech frames;
- linear predictive code analysis means operative on a speech frame for performing linear predictive code analysis on a first and a second linear prediction window the first linear prediction window being centered at the middle of the speech frame and the second linear prediction window being centered at the edge of the speech frame, wherein the linear predictive code analysis means generates a first set of filter coefficients for the first linear prediction window and a second set of filter coefficients for the second linear prediction window;
- pitch estimation means for generating a pitch estimate for each of a first and a second pitch estimation window, the first pitch estimation window being centered at the middle of the speech frame and the second pitch estimation window being centered at the edge of the speech frame;
- mode classification means responsive to the first and the second sets of filter coefficients and the first and the second pitch estimates, for classifying the speech frame into one of a plurality of modes, wherein a first mode is predominantly voiced and a second mode is not predominantly voiced;
- encoding means for encoding the speech frame based on the classified mode of the speech frame, wherein, for a speech frame classified in the first mode, the encoded speech frame encodes information derived from the second set of linear coefficients and the second pitch estimate, and for a speech frame classified in the second mode, the encoded speech frame encodes information derived from the first and the second sets of linear coefficients;
- transmitting means for transmitting the encoded speech frame;
- receiving means for receiving a transmission for an encoded speech frame and identifying the transmitted speech frame as one of a first mode and a second mode speech frame; and
- decoder means for decoding the transmitted speech frame in a mode-specific manner based on the identified mode of the transmitted speech frame.
- 2. The low bit rate codec recited in claim 1 wherein said pitch estimation means comprises:
- error computing means receiving data for computing an error function for each of the first and the second pitch estimation windows;
- refining means responsive to the computed error functions for refining past pitch estimates;
- pitch tracking means responsive to said refined past pitch estimates for producing a set of pitch candidates for each of the first and the second pitch estimation windows;
- a pitch selector for selecting and outputting a pitch estimate from the set of pitch candidates for each of the first and the second pitch estimation windows.
- 3. The low bit rate codec recited in claim 2 wherein said mode classification means comprises:
- an interpolater for generating an interpolated set of filter coefficients for the first linear prediction window based on the second set of filter coefficients;
- a cepstral distortion tester for comparing a cepstral distortion measure between the first set of filter coefficients and the interpolated set of filter coefficients against a threshold value;
- a first pitch deviation tester for comparing a refined pitch estimate for the second pitch estimation window and the first pitch estimate;
- a second pitch deviation tester for comparing the second pitch estimate and the first pitch estimate; and
- mode selection means for selecting one of the first mode and the second mode for classifying the speech frame based on the comparisons by the cepstral distortion tester and the first and second pitch deviation testers.
- 4. A method of encoding and decoding a speech signal comprising the steps of:
- receiving a speech signal and dividing the speech signal into speech frames;
- performing linear predictive code analysis on a speech frame in each of a first and a second linear prediction window, the first linear prediction window being centered at the middle of the speech frame and the second linear prediction window being centered at the edge of the speech frame;
- generating a first set of filter coefficients for the first linear prediction window and a second set of filter coefficients for the second linear prediction window;
- generating a first pitch estimate for a first pitch estimation window and a second pitch estimate for a second pitch estimation window, the first pitch estimation window being centered at the middle of the speech frame and the second pitch estimation window being centered at the edge of the speech frame;
- classifying the speech frame into one of a plurality of modes based on the first and the second sets of filter coefficients and the first and the second pitch estimates, wherein a first mode is predominantly voiced and a second mode is not predominantly voiced;
- encoding the speech frame based on the classified mode of the speech frame, wherein, for a speech frame classified in the first mode, the encoded speech frame encodes information derived from the second set of linear coefficients and the second pitch estimate, and for a speech frame classified in the second mode, the encoded speech frame encodes information derived from the first and the second sets of linear coefficients; transmitting the encoded speech frame;
- receiving a transmission for an encoded speech frame and identifying the transmitted speech frame as one of a first mode and a second mode speech frame; and
- decoding the transmitted speech frame in a mode-specific manner, based on the identified mode of the transmitted speech frame.
- 5. The method of claim 4, further including the steps of:
- synthesizing a speech signal from the decoded speech frame; and
- post filtering the synthesized speech signal.
- 6. A coder for encoding a speech signal comprising:
- a receiver for receiving the speech signal and dividing the speech signal into speech frames;
- a linear predictor for performing linear predictive code analysis on a first and a second linear prediction window, the first linear prediction window being centered at the middle of the speech frame and the second linear prediction window being centered at the edge of the speech frame, wherein the linear predictor generates a first set of filter coefficients for the first linear prediction window and a second set of filter coefficients for the second linear prediction window;
- a pitch estimator for generating a pitch estimate for each of a first and a second pitch estimation window, the first pitch estimation window being centered at the middle of the speech frame and the second pitch estimation window being centered at the edge of the speech frame;
- a mode classifier responsive to the first and the second sets of filter coefficients and the first and the second pitch estimates, for classifying the speech frame into one of a plurality of modes, wherein a first mode is predominantly voiced and a second mode is not predominantly voiced; and
- an encoder for encoding the speech frame based on the classified mode of the speech frame.
- 7. The coder recited in claim 6 wherein the pitch estimator comprises:
- an error calculator for receiving data for calculating an error function for the first and the second pitch estimation windows;
- a refiner responsive to the calculated error functions for refining past pitch estimates;
- a pitch tracker responsive to the refined past pitch estimates for producing a set of pitch candidates for each of the first and the second pitch estimation windows;
- a pitch selector for selecting and outputting a pitch estimate from the set of pitch candidates for each of the first and the second pitch estimation windows.
- 8. The coder recited in claim 6 wherein the mode classifier comprises:
- an interpolater for generating an interpolated set of filter coefficients for the first linear prediction window based on the second set of filter coefficients;
- a cepstral distortion tester for comparing a cepstral distortion measure between the first set of filter coefficients and the interpolated set of filter coefficients against a threshold value;
- a first pitch deviation tester for comparing a refined pitch estimate for the second pitch estimation window and the first pitch estimate;
- a second pitch deviation tester for comparing the second pitch estimate and the first pitch estimate; and
- a mode selector for selecting one of the first mode and the second mode for classifying the speech frame, based on the comparisons by the cepstral distortion tester and the first and second pitch deviation testers.
- 9. The coder recited in claim 6, wherein each speech frame is partitioned into subframes, and the coder further comprises a closed loop pitch estimator for estimating a pitch for each subframe of a speech frame classified in the first mode based on the second pitch estimate for the speech frame.
- 10. The coder recited in claim 6, wherein the speech frame is partitioned into subframes, and the coder further comprises a delayed decision excitation modeler for modeling the excitation of each subframe with a set of excitation parameters by:
- estimating M pitch estimates for each subframe;
- determining a set of MN excitation parameter candidates for each excitation parameter for each of the M pitch estimates based on N previously coded speech subframes; and
- selecting L excitation parameter estimates from each set of MN excitation parameter candidates;
- wherein M, N and L are positive integers variable with each subframe.
- 11. The coder recited in claim 10, further comprising a glottal pulse fixed codebook and a multi-innovation fixed codebook, wherein for a speech frame classified in the first mode, one of the set of excitation parameters is an index into the glottal pulse fixed codebook, and for a speech frame classified in the second mode, one of the set of excitation parameters is an index into the multi-innovation fixed codebook.
- 12. A method of encoding a speech signal comprising the steps of:
- receiving a speech signal and dividing the speech signal into speech frames;
- performing linear predictive code analysis on a speech frame in a first and a second linear prediction window, the first linear prediction window being centered at the middle of the speech frame and the second linear prediction window being centered at the edge of the speech frame;
- generating a first set of filter coefficients for the first linear prediction window and a second set of filter coefficients for the second linear prediction window;
- generating a first pitch estimate for a first pitch estimation window and a second pitch estimate for a second pitch estimation window, the first pitch estimation window being centered at the middle of the speech frame and the second pitch estimation window being centered at the edge of the speech frame;
- classifying the speech frame into one of a plurality of modes based on the first and the second sets of filter coefficients and the first and the second pitch estimates, wherein a first mode is predominantly voiced and a second mode is predominantly not voiced;
- encoding the speech frame based on the classified mode of the speech frame; and
- transmitting the encoded speech frame.
- 13. The encoding method recited in claim 12 wherein the pitch estimate generation step further comprises:
- receiving data for calculating an error function for the first and the second pitch estimation windows;
- refining past pitch estimates responsive to the calculated error functions;
- producing a set of pitch candidates for each of the first and the second pitch estimation windows responsive to the refined past pitch estimates;
- selecting and outputting a pitch estimate from the set of pitch candidates for each of the first and the second pitch estimation windows.
- 14. The encoding method recited in claim 12 wherein the mode classification step further comprises:
- generating an interpolated set of filter coefficients for the first linear prediction window based on the second set of filter coefficients;
- comparing a cepstral distortion measure between the first set of filter coefficients and the interpolated set of filter coefficients against a threshold value;
- comparing a first pitch deviation between the refined pitch estimate for the second pitch estimation window and the first pitch estimate;
- comparing a second pitch deviation between the second pitch estimate and the first pitch estimate; and
- selecting one of the first mode and the second mode for classifying the speech frame, based on the comparisons of the cepstral distortion tester, and the first and second pitch deviations.
- 15. The encoding method recited in claim 12, further comprising the steps of:
- partitioning each speech frame into subframes; and
- estimating a pitch through a closed loop pitch estimation for each subframe of a speech frame classified in the first mode based on the second pitch estimate for the speech frame.
- 16. The encoding method recited in claim 12, further comprising the steps of:
- partitioning the speech frame into subframes; and
- modeling the excitation of each subframe with a set of excitation parameters by:
- estimating M pitch estimates for each subframe;
- determining a set of MN excitation parameter candidates for each excitation parameter for each of the M pitch estimates based on N previously coded speech subframes; and
- selecting L excitation parameter estimates from each set of MN excitation parameter candidates;
- wherein M, N and L are positive integers variable with each subframe.
- 17. The encoding method recited in claim 16, further comprising the step of providing a glottal pulse fixed codebook and a multi-innovation fixed codebook, wherein for a speech frame classified in the first mode, one of the set of excitation parameters is an index into the glottal pulse fixed codebook, and for a speech frame classified in the second mode, one of the set of excitation parameters is an index into the multi-innovation fixed codebook.
BACKGROUND OF THE INVENTION
The following patent application is a Continuation-in-Part application under 37 CFR 1.62 of pending prior application Ser. No. 07/891,596, filed on Jun. 1, 1992 of Kumar Swaminathan for CELP EXCITATION ANALYSIS FOR VOICED SPEECH.
US Referenced Citations (13)
Foreign Referenced Citations (1)
Number |
Date |
Country |
127729 |
Feb 1984 |
EPX |
Continuation in Parts (1)
|
Number |
Date |
Country |
Parent |
891596 |
Jun 1992 |
|