Claims
- 1. A coding system for a coder/decoder (codec) for providing adaptive bandwidth broadening to an encoder, comprising:
a linear prediction (LP) front end, adapted to process an input signal which provides LP parameters that are computed during a predetermined interval; an open loop pitch estimator, adapted to perform pitch frequency estimation on said input signal for substantially all of said predetermined intervals; an adaptive bandwidth broadening module, adapted to perform the following operations:
derive a spectrum sampling frequency for said predetermined interval as the pitch frequency or its integer submultiple depending on the pitch frequency; determine a LP power spectrum at the harmonics of said spectrum sampling frequency for said input signal for said frame; compute a peak to average ratio of said LP spectrum based on said spectrum sampling frequency of said frame; and adaptively bandwidth broaden said LP filter coefficients based on said peak to average ratio of said LP spectrum for all harmonic multiples of said spectral sampling frequency.
- 2. A system as recited in claim 1, wherein said predetermined interval is preferably 20 ms in duration.
- 3. A system as recited in claim 1, wherein said codec comprises a frequency domain interpolative (FDI) codec.
- 4. A system as recited in claim 1, wherein said harmonic multiples of the spectrum sampling frequency are within 0 to 4 kHz.
- 5. A coding system for a codec, comprising:
A linear prediction front end adapted to process an input signal to provide LP parameters which are quantized and encoded over predetermined intervals and are used to compute a LP residual signal; an open loop pitch estimator adapted to process the LP residual signal, pitch information, pitch interpolation information and provide a pitch contour within the predetermined intervals; a prototype waveform extraction module, which is adapted in response to the LP residual signal and the pitch contour to extract a prototype waveform (PW) for a number of equal subintervals within the predetermined intervals and to extract an additional approximate PW in the subinterval immediately after the ending of a previous subinterval; a PW gain computation module, adapted to compute a PW gain for substantially all the subintervals; and a gain vector predictive vector quantization (VQ) module, adapted to quantize and encode the PW gains for substantially all the subintervals after they are filtered by a weighted window, decimated, and after subtracting from them a predicted average PW gain value for a current predetermined interval computed from the quantized PW gain values of a preceding predetermined interval.
- 6. A system as recited in claim 5, wherein said predetermined interval is preferably 20 ms in duration.
- 7. A system as recited in claim 5, wherein said weighted window comprises a 3 point window.
- 8. A system as recited in claim 5, wherein said decimation comprises a 2:1 decimation.
- 9. A system as recited in claim 5, wherein said gain vector predictive VQ module is further adapted to perform predictive vector quantization of the decimated and smoothed PW gains based on the predicted average PW gain estimate and a codebook indicating corrections to the estimated PW gains.
- 10. A system as recited in claim 5, further comprising:
a gain decoder interpolation module, adapted to decay the average PW gain value for the preceding predetermined interval in order to mitigate the effect of transmission errors on the PW gain parameter.
- 11. A frequency domain interpolative (FDI) coder/decoder (codec), comprising:
a PW normalization and alignment module, adapted to compute a sequence of aligned prototype waveform (PW) vectors for a frame via a low complexity alignment process; and a PW subband correlation computation module, adapted to compute a PW correlation vector for all harmonics for the frame and average the PW correlation vector across the harmonics in five subbands in order to derive a PW subband correlation vector.
- 12. A system as recited in claim 11, further comprising:
a voicing measure computation module, adapted to provide a voicing measure that characterizes a degree of voicing.
- 13. A system as recited in claim 12, wherein said voicing measure is derived from input factors that are correlated to a degree of periodicity for the frame.
- 14. A system as recited in claim 11, wherein said PW correlation vector comprises the average correlation between successive PW vectors as a function of frequency.
- 15. A system as recited in claim 11, wherein said PW subband correlation vector comprises a degree of stationarity of successive pitch cycles of an input signal.
- 16. A system as recited in claim 12 further comprising:
a PW correlation and vector measure vector quantization (VQ) module, adapted to encode a composite vector derived from said PW subband correlation vector and the voicing measure based on spectrally weighted vector quantization.
- 17. A system as recited in claim 11, further comprising:
an autoregressive module, adapted to reconstruct a PW phase at the decoder substantially every sub-frame using the received voicing measure, PW subband correlation vector and pitch frequency contour information.
- 18. A system as recited in claim 17, wherein said autoregressive module is further adapted to compute a value for the input signal via a weighted combination of a first complex vector and a second complex vector.
- 19. A system as recited in claim 18, wherein said first complex vector is derived from a random phase vector and said second complex vector is derived from a fixed phasevector.
- 20. A system as recited in claim 19, wherein said second complex vector is obtained by oversampling a phase spectrum of a voiced pitch pulse.
- 21. A frequency domain interpolative (FDI) coder/decoder (codec), comprising:
a PW magnitude quantizer, adapted to perform the following:
directly quantize a prototype waveform (PW) in a magnitude domain for substantially every frame without said PW being decomposed into complex components; hierarchically quantize a PW magnitude vector based on a voicing classification using a mean-deviations representation; adaptively vector quantize the mean component of the representation in multiple subbands; derive a variable dimension deviations vector as the difference of the input PW magnitude vector and the full band representation of the quantized PW subband mean vector for all harmonics; select a fixed dimensional deviations subvector from the said variable dimensional deviations vector based on location of speech formant frequencies for a subframe; and provide the said fixed dimensional deviations subvector for adaptive vector quantization.
- 22. A coding system for a coder/decoder (codec), comprising:
a linear prediction (LP) front end, adapted to process an input signal which provides LP parameters that are computed during a predetermined interval; an open loop pitch estimator, adapted to perform pitch estimation on said input signal for substantially all of said predetermined intervals; a voice activity detection module, that uses the LP parameters and pitch information; a voicing measure computation module, adapted to provide a voicing measure that characterizes a degree of voicing and is derived from a plurality of input parameters that are correlated to the degree of periodicity of the input signal for substantially all predetermined intervals; a prototype waveform (PW) subband correlation computation module, adapted to provide a PW subband correlation vector, said PW subband correlation vector characterizing a degree of correlation between successive PW vectors as a function of frequency and computed for substantially all predetermined intervals; an adaptive bandwidth broadening module, adapted to reduce annoying artifacts due to spurious spectral peaks by performing the following:
compute a measure of VAD likelihood based on voice activity detection (VAD) flags for a preceding, a current and a next predetermined interval; and compute average PW gain values for inactive predetermined intervals and active unvoiced predetermined intervals.
- 23. A system as recited in claim 22 wherein said adaptive bandwidth broadening module is further adapted to perform the following:
compute a parameter αfatt to determine the degree of bandwidth broadening necessary for the interpolated LP synthesis filter coefficients using a VAD likelihood measure, PW gain averages and the PW subband correlation quantization index.
- 24. A system as recited in claim 22 wherein said adaptive bandwidth broadening module is further adapted to attenuate out-of-band components of a reconstructed PW vector by performing the perform the following:
compute a first corner frequency for a low frequency based on a pitch frequency; compute a second corner frequency at a high frequency based on the pitch frequency and αfatt; and determine a rate of attenuation of high frequency components as a square law function, based on αfatt.
- 25. A system as recited in claim 22, wherein said predetermined interval is preferably 20 ms in duration.
- 26. A system as recited in claim 22, wherein said predetermined interval comprises a frame.
- 27. A low bit rate coding system for a coder/decoder (codec), comprising:
a linear prediction (LP) front end, adapted to process an input signal which provides LP parameters that are computed during a predetermined interval; an open loop pitch estimator, adapted to perform pitch estimation on said input signal for substantially all of said predetermined intervals; a voice activity detection module, adapted to process and provide the LP parameters and pitch information to the decoder; a prototype waveform (PW) encoder, adapted to provide a look ahead based on said predetermined interval in order to smooth PW parameters; and a voicing measure computation module, adapted to provide a voicing measure, said voicing measure characterizing a degree of voicing derived from a plurality of input parameters that are correlated to the degree of periodicity of the input signal for substantially all predetermined intervals.
- 28. A system as recited in claim 27 wherein said PW parameters comprise at least one of gain, a voicing measure, subband correlations and spectral magnitude.
- 29. A system as recited in claim 27 further comprising:
a prototype waveform (PW) subband correlation computation module, adapted to provide a PW subband correlation vector, said PW subband correlation vector characterizing a degree of correlation between successive PW vectors as a function of frequency and computed for substantially all predetermined intervals to obtain PW vectors for a current predetermined interval and a look ahead predetermined interval.
- 30. A system as recited in claim 27 further comprising:
A PW gain computation module, adapted to compute a PW gain for substainally all sub-predetermined intervals including a current predetermined interval and a look ahead predetermined interval.
- 31. A system as recited in claim 27 further comprising:
a voicing measure smoothing module, adapted to smooth a voicing measure by combining a voicing measure associated with a current predetermined interval and a look ahead predetermined interval.
- 32. A system as recited in claim 27 further comprising:
a PW gain smoothing module, adapted to provide PW gain smoothing via a parabolic symmetric window for each predetermined interval and a 2:1 decimation, quantization and transmission to the decoder, said parabolic symmetric window is centered at a edge of the predetermined interval; and a PW magnitude smoothing module, adapted to represent a PW spectral magnitude at a frame edge via a smoothed PW subband mean approximation.
- 33. A system as recited in claim 32 further comprising:
a PW magnitude quantization module, adapted to quantize and provide a smoothed PW subband mean approximation to the decoder.
- 34. A system as recited in claim 27 further comprising:
an adaptive bandwidth broadening module, adapted to reduce annoying artifacts due to spurious spectral peaks by performing the following:
compute a measure of VAD likelihood based on voice activity detection (VAD) flags for a preceding, a current and a next two predetermined intervals; and compute average PW gain values for inactive predetermined intervals and active unvoiced predetermined intervals.
- 35. A system as recited in claim 27, wherein said codec operates at 2.4 kbps.
- 36. A low bit rate coding system for a coder/decoder (codec), comprising:
a linear prediction (LP) front end, adapted to process an input signal which provides LP parameters that are estimated, quantized and transmitted for substantially all frames of a first duration; an open loop pitch estimator, adapted to perform pitch estimation on said input signal for substantially all of said frames of a first duration and quantize and transmit pitch information for substantially all frames of a second duration; a voice activity detection module, adapted to combine voice activity detection (VAD) flags associated with two successive frames of a first duration based on processing the LP parameters and the pitch information every frame of a first duration and transmitting the VAD flags to the decoder substantially every frame of a second duration; and a prototype waveform (PW) encoder, adapted to provide a look ahead frame based on said frame of a first duration in order to smooth PW parameters including at least one of PW gain, a voicing measure, subband correlations and spectral magnitude.
- 37. A system as recited in claim 36, wherein said codec operates at 1.2 kbps.
- 38. A system as recited in claim 36, wherein said frames of a first duration comprise 20 ms each, and frames of a second duration comprise 40 ms each.
- 39. A system as recited in claim 36 further comprising:
a voicing measure computation module, adapted to provide a voicing measure, said voicing measure characterizing a degree of voicing derived from a plurality of input parameters that are correlated to the degree of periodicity of the input signal for substantially all the frames of a first duration.
- 40. A system as recited in claim 36 further comprising:
a voicing measure smoothing module, adapted to combine a voicing measure associated with a second half of a current frame of a second duration and a voicing measure associated with a look ahead frame of a first duration based on their respective energies in order to smooth the voicing measures; a prototype waveform (PW) subband correlation computation module, adapted to provide a PW subband correlation vector, said PW subband correlation vector characterizing a degree of correlation between successive PW vectors as a function of frequency and computed for a current frame of a first duration in order to provide PW vectors for a current frame of a second duration and a look ahead frame of a first duration; a PW gain computation module, adapted to compute a PW gain for substainally all subframes for both the current frame of a second duration and the look ahead frame of a first duration; and said prototype waveform (PW) subband correlation computation module being further adapted to quantize and transmit a composite PW subband correlation vector and voicing measure to the decoder;
- 41. A system as recited in claim 36 further comprising:
a PW gain smoothing module, adapted to provide PW gain smoothing via a parabolic symmetricwindow for each instant of time followed by a 4:1 decimation, quantization and transmission to the decoder for substantially all the frames of a second duration, said parabolic symmetric window is centered at a edge of the frame of a second duration; and a PW magnitude smoothing module, adapted to represent a PW spectral magnitude at the frame edge of a second duration via a smoothed PW subband mean approximation.
- 42. A system as recited in claim 36 further comprising:
a PW magnitude quantization module, adapted to quantize and provide a smoothed PW subband mean approximation to the decoder.
- 43. A system as recited in claim 36 further comprising:
an adaptive bandwidth broadening module at the decoder, adapted to reduce annoying artifacts due to spurious spectral peaks in inactive noise frames by performing the following:
compute a measure of VAD likelihood based on the VAD flags for a preceding, a current and a next frame of a second duration; and compute average PW gain values for the inactive noise frames and active unvoiced voice frames.
- 44. A method for providing adaptive bandwidth broadening to an encoder of a coder/decoder (codec), comprising:
processing an input signal which provides LP parameters that are computed during a predetermined interval; performing pitch frequency estimation on said input signal for substantially all of said predetermined intervals; deriving a spectrum sampling frequency for said predetermined interval as the pitch frequency or its integer submultiple depending on the pitch frequency; determining a LP power spectrum at the harmonics of said spectrum sampling frequency for said input signal for said frame; computing a peak to average ratio of said LP spectrum based on said spectrum sampling frequency of said frame; and adaptively bandwidth broadening said LP filter coefficients based on said peak to average ratio of said LP spectrum for all harmonic multiples of said spectral sampling frequency.
- 45. A method of providing a coding system for a codec, comprising:
processing an input signal to provide LP parameters which are quantized and encoded over predetermined intervals and are used to compute a LP residual signal; processing the LP residual signal, pitch information, pitch interpolation information and providing a pitch contour within the predetermined intervals; extracting a prototype waveform (PW) for a number of equal subintervals within the predetermined intervals and extracting an additional approximate PW in the subinterval immediately after the ending of a previous subinterval in response to the LP residual signal and the pitch contour; computing a PW gain for substantially all the subintervals; and quantizing and encoding the PW gains for substantially all the subintervals after the subintervals are filtered by a weighted window, decimated, and subtracted from a predicted average PW gain value for a current predetermined interval which is computed from the quantized PW gain values of a preceding predetermined interval.
- 46. A method of providing a coding system for a coder/decoder (codec), comprising:
computing a sequence of aligned prototype waveform (PW) vectors for a frame via a low complexity alignment process; and computing a PW correlation vector for all harmonics for the frame and averaging the PW correlation vector across the harmonics in five subbands in order to derive a PW subband correlation vector.
- 47. A method of providing a coding system for a frequency domain interpolative (FDI) coder/decoder (codec), comprising:
directly quantizing a prototype waveform (PW) in a magnitude domain for substantially every frame without said PW being decomposed into complex components; hierarchically quantizing a PW magnitude vector based on a voicing classification using a mean-deviations representation; adaptively vector quantizing the mean component of the representation in multiple subbands; deriving a variable dimension deviations vector as the difference of the input PW magnitude vector and the full band representation of the quantized PW subband mean vector for all harmonics; selecting a fixed dimensional deviations subvector from the said variable dimensional deviations vector based on a location of speech formant frequencies for a subframe; and providing the said fixed dimensional deviations subvector for adaptive vector quantization.
- 48. A method of providing a coding system for a coder/decoder (codec), comprising:
processing an input signal which provides LP parameters that are computed during a predetermined interval; performing a pitch estimation on said input signal for substantially all of said predetermined intervals; processing the LP parameters and pitch information; providing a voicing measure that characterizes a degree of voicing and is derived from a plurality of input parameters that are correlated to the degree of periodicity of the input signal for substantially all predetermined intervals; providing a PW subband correlation vector, said PW subband correlation vector characterizing a degree of correlation between successive PW vectors as a function of frequency and computed for substantially all predetermined intervals; reducing annoying artifacts due to spurious spectral peaks by performing the following:
computing a measure of VAD likelihood based on voice activity detection (VAD) flags for a preceding, a current and a next predetermined interval; and computing average PW gain values for inactive predetermined intervals and active unvoiced predetermined intervals.
- 49. A method of providing a low bit rate coding system for a coder/decoder (codec), comprising:
processing an input signal which provides LP parameters that are computed during a predetermined interval; performing pitch estimation on said input signal for substantially all of said predetermined intervals; processing the LP parameters and pitch information to the decoder; providing a look ahead based on said predetermined interval in order to smooth PW parameters; and providing a voicing measure, said voicing measure characterizing a degree of voicing derived from a plurality of input parameters that are correlated to the degree of periodicity of the input signal for substantially all predetermined intervals.
- 50. A method of providing a low bit rate coding system for a coder/decoder (codec), comprising:
processing an input signal which provides LP parameters that are estimated, quantized and transmitted for substantially all frames of a first duration; performing a pitch estimation on said input signal for substantially all of said frames of a first duration and quantizing and transmiting pitch information for substantially all frames of a second duration; combining voice activity detection (VAD) flags associated with two successive frames of a first duration; processing the LP parameters and the pitch information every frame of a first duration and transmitting the VAD flags to the decoder substantially every frame of a second duration; and providing a look ahead frame based on said frame of a first duration in order to smooth PW parameters including at least one of PW gain, a voicing measure, subband correlations and a spectral magnitude.
PRIORITY
[0001] This application claims benefit under 35 U.S.C. §119(e) from U.S. Provisional Patent Application Serial No. 60/362,706, entitled “A 1.2/2.4 KBPs Voice CODEC Based On Frequency Domain Interpolation (FDI) Technology”, filed on Mar. 8, 2002, the entire contents of which is incorporated herein by reference.
[0002] Related material may also be found in U.S. NonProvisional patent application Ser. No. 10/073,128, entitled “Prototype Waveform Magnitude Quantization For A Frequency Domain Interpolative Speech CODEC”, filed on Aug. 23, 2002, the entire contents of which is incorporated herein by reference.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60362706 |
Mar 2002 |
US |