Claims
- 1. A method for speech synthesis in an electronic speech synthesis system, the speech synthesis method comprising:
- a) storing in a memory of the electronic speech synthesis system a voice table comprised of a set of phonetic waveforms, each phonetic waveform of the set of phonetic waveforms corresponding to a demi-diphone of the voice table;
- b) receiving as an input to the electronic speech synthesis system a phonetic string representative of speech to be synthesized by electronic speech system, the phonetic string comprising diphones, the diphones comprising demi-diphones;
- c) generating synthetic speech of the phonetic string representative of speech in the electronic speech synthesis system by outputting stored voice table phonetic waveforms by:
- i) retrieving a stored voice table phonetic waveform corresponding to a demi-diphone of the input phonetic string representative of speech in the case of the demi-diphone of the phonetic string representative of speech having a phonetic waveform in the voice table corresponding to the demi-diphone;
- ii) retrieving a stored voice table phonetic waveform not corresponding to a demi-diphone of the input phonetic string representative of speech in the case of the demi-diphone of the phonetic string representative of speech not having a phonetic waveform in the voice table corresponding to the demi-diphone by locating a substitute demi-diphone of the voice table having a corresponding stored voice table phonetic waveform which has phonetic features meeting:
- A) a threshold set of phonetic features of the demi-diphone not having a corresponding stored voice table phonetic waveform, wherein the threshold set describes a minimum set of characteristics that must be share d by:
- 1) the demi-diphone of the phonetic string representative of speech not having a phonetic waveform in the voice table corresponding to the demi-diphone and,
- 2) the substitute demi-diphone of the voice table having a corresponding stored voice table phonetic waveform; and,
- B) the most features in common with the demi-diphone not having a corresponding stored voice table phonetic waveform.
- 2. An electronic speech synthesis system comprising:
- a) means for storing in a memory of the electronic speech synthesis system a voice table comprised of a set of phonetic waveforms, each phonetic waveform of the set of phonetic waveforms corresponding to a demi-diphone of the voice table;
- b) means for receiving as an input to the electronic speech synthesis system a phonetic string representative of speech to be synthesized by electronic speech system, the phonetic string comprising diphones, the diphones comprising demi-diphones;
- c) means for generating synthetic speech of the phonetic string representative of speech in the electronic speech synthesis system by outputting stored voice table phonetic waveforms by:
- i) retrieving a stored voice table phonetic waveform corresponding to a demi-diphone of the input phonic string representative of speech in the case of the demi-diphone of the phonetic string representative of speech having a phonetic waveform in the voice table corresponding to the demi-diphone;
- ii) retrieving a stored voice table phonetic waveform not corresponding to a demi-diphone of the input phonetic string representative of speech in the case of the demi-diphone of the phonetic string representative of speech not having a phonetic waveform in the voice table corresponding to the demi-diphone by locating a substitute demi-diphone of the voice table having a corresponding stored voice table phonetic waveform which has phonetic features meeting:
- A) a threshold set of phonetic features of the demi-diphone not having a corresponding stored voice table phonetic waveform, wherein the threshold set describes a minimum set of characteristics that must be shared by:
- 1) the demi-diphone of the phonetic string representative of speech not having a phonetic waveform in the voice table corresponding to the demi-diphone and,
- 2) the substitute demi-diphone of the voice table having a corresponding stored voice table phonetic waveform; and,
- B) the most features in common with the demi-diphone not having a corresponding stored voice table phonetic waveform.
- 3. An electronic speech synthesis system comprising:
- a) a memory for storing a voice table comprised of a set of phonetic waveforms, each phonetic waveform of the set of phonetic waveforms corresponding to a demi-diphone of the voice table;
- b) an input comprising a phonetic string representative of speech to be synthesized by the electronic speech system, the phonetic string comprising diphones, the diphones comprising demi-diphones;
- c) an output comprising:
- i) a stored voice table phonetic waveform corresponding to a demi-diphone of the input phonetic string representative of speech in the case of the demi-diphone of the phonetic string representative of speech having a phonetic waveform in the voice table corresponding to the demi-diphone;
- ii) a stored voice table phonetic waveform not corresponding to a demi-diphone of the input phonetic string representative of speech not having a phonetic waveform in the voice table corresponding to the demi-diphone by locating a substitute demi-diphone of the voice table having a corresponding stored voice table phonetic waveform which has phonetic features meeting:
- A) a threshold set of phonetic features of the demi-diphone not having a corresponding stored voice table phonetic waveform, wherein the threshold set describes a minimum set of characteristics that must be shared by:
- 1) the demi-diphone of the phonetic string representative of speech not having a phonetic waveform in the voice table corresponding to the demi-diphone and,
- 2) the substitute demi-diphone of the voice table having a corresponding stored voice table phonetic waveform; and,
- B) the most features in common with the demi-diphone not having a corresponding stored voice table phonetic waveform.
- 4. A program storage medium having a program stored therein for causing a computer to perform the steps of:
- a) storing in a memory of the computer a voice table comprised of a set of phonetic waveforms, each phonetic waveform of the set of phonetic waveforms corresponding to a demi-diphone of the voice table;
- b) receiving as an input to the computer a phonetic string representative of speech to be synthesized by the computer, the phonetic string comprising diphones, the diphones comprising demi-diphones;
- c) generating synthetic speech of the phonetic string representative of speech in the computer by outputting stored voice table phonetic waveforms by:
- i) retrieving a stored voice table phonetic waveform corresponding to a demi-diphone of the input phonetic string representative of speech in the case of the demi-diphone of the input phonetic string representative of speech having a phonetic waveform in the voice table corresponding to the demi-diphone;
- ii) retrieving a stored voice table phonetic waveform not corresponding to a demi-diphone of the input phonetic string representative of speech in the case of the demi-diphone of the phonetic string representative of speech not having a phonetic waveform in the voice table corresponding to the demi-diphone by locating a substitute demi-diphone of the voice table having a corresponding stored voice table phonetic waveform which has phonetic features meeting:
- A) a threshold set of phonetic features of the demi-diphone not having a corresponding stored voice table phonetic waveform, wherein the threshold set describes a minimum set of characteristics that must be shared by:
- 1) the demi-diphone of the phonetic string representative of speech not having a phonetic waveform in the voice table corresponding to the demi-diphone and,
- 2) the substitute demi-diphone of the voice table having a corresponding stored voice table phonetic waveform; and,
- 2) the substitute demi-diphone of the voice table having a corresponding stored voice table phonetic waveform; and,
- B) the most features in common with the demi-diphone not having a corresponding stored voice table phonetic waveform.
CROSS-REFERENCE TO RELATED APPLICATIONS
This is a continuation of application Ser. No. 08/675,424, filed Jul. 3, 1996, which is a continuation of application Ser. No. 09/007,297, filed Jan. 21, 1993.
This application is related to co-pending patent application having Ser. No. 08/006,881, entitled "METHOD AND APPARATUS FOR SYNTHETIC SPEECH IN FACIAL ANIMATION" having the same inventive entity, assigned to the assignee of the present application, and filed with the United States Patent and Trademark Office on the same day as the present application.
US Referenced Citations (7)
Non-Patent Literature Citations (3)
Entry |
J.R. Deller, "discrete-Time processing of Speech Signals," 1987, pp. 115-137. |
T. Parsons, "Voice and Speech Processing," 1987, pp. 92-96. |
L.R. Rabiner, "digital Processing of Speech Signals," 1978, pp. 42-43. |
Continuations (2)
|
Number |
Date |
Country |
Parent |
675424 |
Jul 1996 |
|
Parent |
007297 |
Jan 1993 |
|