Claims
- 1. In a communication system, a method comprising the steps of:
- converting a caller's voice message to a sequence of phonemes, whereby the caller's voice message is intended for a receiving device of the communication system;
- generating a sequence of phoneme indexes corresponding to the sequence of phonemes;
- generating corresponding voice features for each phoneme of the sequence of phonemes, wherein the voice features are determined from the caller's voice message; and
- transmitting the sequence of phoneme indexes and corresponding voice features to the receiving device for generating a voice signal representative of the caller's voice message, wherein the corresponding voice features comprise:
- spectral features of at least a portion of the corresponding phoneme;
- an average energy level of the corresponding phoneme;
- a duration of the corresponding phoneme; and
- a pitch period representative of a periodicity of the corresponding phoneme.
- 2. The method as recited in claim 1, wherein the converting step comprises the steps of:
- sampling a voice signal;
- applying a Fourier transform to a plurality of frame intervals of the sampled voice signal to generate spectral data having a spectral envelope for each of the plurality of frame intervals;
- subdividing the spectral data for each of the plurality of frame intervals into a plurality of bands;
- filtering out the spectral envelope from the spectral data of each of the plurality of frame intervals to generate filtered spectral data for each of the plurality of frame intervals;
- applying a Fourier transform to the filtered spectral data for each of the plurality of bands to generate an autocorrelation function for each of the plurality of bands;
- measuring a value of the magnitude of the autocorrelation function for each of the plurality of bands, whereby the value is a measure of a degree of voiceness for each of the plurality of bands;
- applying the degree of voiceness for each of the plurality of bands to a corresponding plurality of phoneme models; and
- deriving the sequence of phonemes from the voice signal by searching through a phoneme library according to predictions made by the corresponding plurality of phoneme models.
- 3. The method as recited in claim 7, further comprising the steps of:
- determining an average magnitude for each of the plurality of bands;
- applying a logarithmic function to the average magnitude to generate a converted average magnitude;
- decorrelating the converted average magnitude to generate spectral envelope features; and
- applying the spectral envelope features for each of the plurality of bands to the corresponding plurality of phoneme models.
- 4. The method as recited in claim 2, wherein the value of the magnitude of the autocorrelation function is a peak magnitude.
- 5. The method as recited in claim 2, wherein for each of the plurality of frame intervals, the value of the magnitude of the autocorrelation function for each of the plurality of bands is determined by:
- summing the autocorrelation function of each of the plurality of bands to generate a composite autocorrelation function;
- determining a peak magnitude of the composite autocorrelation function;
- determining from the peak magnitude a corresponding frequency mark; and
- utilizing the corresponding frequency mark to determine a corresponding magnitude value for each of the plurality of bands.
- 6. The method as recited in claim 2, wherein the Fourier transform comprises a fast Fourier transform.
- 7. The method as recited in claim 1, wherein the communication system is a radio communication system, and wherein the receiving device is a SCR (selective call radio).
- 8. In a receiving device, a method comprising the steps of:
- receiving from a communication system a sequence of phoneme indexes representative of a caller's voice message;
- receiving voice features corresponding to each phoneme index of the sequence of phoneme indexes, wherein the voice features for each phoneme index are representative of a corresponding phoneme derived from the caller's voice message;
- searching for a sequence of phoneme models corresponding to the sequence of phoneme indexes, wherein the sequence of phoneme models are derived from a plurality of predetermined phoneme models;
- modifying each phoneme model of the sequence of phoneme models according to the voice features of each phoneme index; and
- generating an audible voice message representative of the caller's voice message according to the sequence of modified phoneme models, wherein the voice features for the corresponding phoneme comprise:
- spectral features of at least a portion of the corresponding phoneme;
- an average energy level of the corresponding phoneme;
- a duration of the corresponding phoneme; and
- a pitch period representative of a periodicity of the corresponding phoneme.
- 9. The method as recited in claim 8, wherein the communication system is a radio communication system, and wherein the receiving device is a SCR (selective call radio).
- 10. A communication system, comprising:
- a transmitter for transmitting messages to a plurality of receiving devices coupled to the communication system; and
- a processing system coupled to the transmitter, wherein the processing system is adapted to:
- convert a caller's voice message to a sequence of phonemes, whereby the caller's voice message is intended for a receiving device;
- generate a sequence of phoneme indexes corresponding to the sequence of phonemes;
- generate voice features corresponding to each phoneme of the sequence of phonemes, wherein the voice features are determined from a portion of the caller's voice message; and
- cause the transmitter to transmit the sequence of phoneme indexes and the corresponding voice features to the receiving device for generating a voice signal representative of the caller's voice message, wherein the voice features for a corresponding phoneme comprise:
- spectral features of at least a portion of the corresponding phoneme;
- an average energy level of the corresponding phoneme;
- a duration of the corresponding phoneme; and
- a pitch period representative of a periodicity of the corresponding phoneme.
- 11. The communication system as recited in claim 10, wherein the processing system is adapted to cause a voice recognition system to perform the converting step and the generating step.
- 12. The communication system as recited in claim 10, wherein the step of converting the caller's voice message to the sequence of phonemes includes the steps of:
- sampling a voice signal generated by a caller during a plurality of frame intervals, wherein the voice signal is representative of a message intended for the receiving device;
- applying a Fourier transform to a plurality of frame intervals of the sampled voice signal to generate spectral data having a spectral envelope for each of the plurality of frame intervals;
- subdividing the spectral data for each of the plurality of frame intervals into a plurality of bands;
- filtering out the spectral envelope from the spectral data of each of the plurality of frame intervals to generate filtered spectral data for each of the plurality of frame intervals;
- applying a Fourier transform to the filtered spectral data for each of the plurality of bands to generate an autocorrelation function for each of the plurality of bands;
- measuring a value of the magnitude of the autocorrelation function for each of the plurality of bands, whereby the value is a measure of a degree of voiceness for each of the plurality of bands;
- applying the degree of voiceness for each of the plurality of bands to a corresponding plurality of phoneme models; and
- deriving the sequence of phonemes from the voice signal by searching through a phoneme library according to predictions made by the corresponding plurality of phoneme models.
- 13. The communication system as recited in claim 10, wherein the communication system is a radio communication system, and wherein the receiving device is a SCR (selective call radio).
- 14. A receiving device, comprising:
- a receiver;
- a memory;
- an audio circuit; and
- a processor coupled to the receiver, the memory, and the audio circuit, wherein the processor is adapted to:
- cause the receiver to receive from a communication system a sequence of phoneme indexes representative of a caller's voice message;
- cause the receiver to receive from the communication system voice features for each phoneme index of the sequence of phoneme indexes, wherein the voice features for each phoneme index are representative of a corresponding phoneme derived from the caller's voice message;
- search in the memory for a sequence of phoneme models corresponding to the sequence of phoneme indexes, wherein the sequence of phoneme models are derived from a plurality of predetermined phoneme models stored in the memory;
- modify each phoneme model of the sequence of phoneme models according to the voice features of each corresponding phoneme index;
- generate a voice signal according to the sequence of modified phoneme models; and
- cause the audio circuit to generate an audible voice message in response to the voice signal, wherein the audible voice message is representative of the caller's voice message, wherein the voice features for the corresponding phoneme comprise:
- spectral features of at least a portion of the corresponding phoneme;
- an average energy level of the corresponding phoneme;
- a duration of the corresponding phoneme; and
- a pitch period representative of a periodicity of the corresponding phoneme.
- 15. The receiving device as recited in claim 14, wherein the processor is adapted to cause a synthesizer circuit to perform the generating step.
- 16. The receiving device as recited in claim 14, wherein the communication system is a radio communication system, and wherein the receiving device is a SCR (selective call radio).
RELATED INVENTIONS
The present invention is related to the following inventions which are assigned to the same assignee as the present invention:
U.S. application Ser. No. 09/050,184 filed Mar. 30, 1998 by Andric et al., entitled "Voice Recognition System in a Radio Communication System and Method Therefor."
U.S. application Ser. No. 09/067,779, filed Apr. 27, 1999, mailed Apr. 23, 1998 by Cheng et al., entitled "Reliable Conversion of Voice in a Radio Communication System and Method Therefor."
US Referenced Citations (12)
Non-Patent Literature Citations (3)
Entry |
Young, Jansen, Odell, Ollason and Woodland, The HTK Book, Entropic Cambridge Research Laboratory, Cambridge, England. |
Joseph Picone, Continuous Speech Recognition Using Hidden Markov Models, IEEE ASSP Magazine, pp. 26-41, Jul. 1990. |
Normandin, Cardin and De Mori, High-Performance Connected Digit Recognition Using Maximum Mutual Information Estimation, IEEE Transactions on Speech and Audio Processing, vol. 2, No. 2, Apr. 1994. |