Claims
- 1. A method for converting text to concatenated voice by utilizing a digital voice library and a set of playback rules, the digital voice library including a plurality of speech items including words and syllables and a corresponding plurality of voice recordings wherein each speech item corresponds to at least one available voice recording, the method comprising:
training the digital voice library to associate each syllable speech item with a literal text syllable of the particular syllable speech item.
- 2. The method of claim 1 further comprising:
receiving a sequence of words including known words that correspond to word speech items in the digital voice library and including unknown words; converting each known word into a word speech item in accordance with the digital voice library; and for each unknown word, parsing the unknown word to determine a sequence of literal text syllables and converting the text syllable sequence to a sequence of syllable speech items in accordance with the digital voice library.
- 3. The method of claim 2 further comprising:
converting the sequence of word speech items and syllable speech items into a sequence of voice recordings in accordance with the set of playback rules.
- 4. The method of claim 3 further comprising:
generating voice data based on the sequence of voice recordings by concatenating adjacent recordings in the sequence of voice recordings.
- 5. The method of claim 4 wherein training the digital voice library further comprises:
utilizing a neural network having an input and an output to train the digital voice library with the neural network receiving the literal text syllable of the particular syllable speech item as input and with the neural network outputting the associated syllable speech item.
- 6. The method of claim 4 wherein training the digital voice library further comprises:
manually associating each syllable speech item with the literal text syllable of the particular syllable speech item.
- 7. The method of claim 4 wherein, for each unknown word, parsing and converting further comprises:
parsing the unknown word to determine a sequence of literal text syllables and known words, and converting the sequence to a sequence of syllable speech items and word speech items in accordance with the digital voice library.
- 8. The method of claim 7 wherein parsing further comprises:
parsing the unknown word in the forward direction to determine any known words; parsing the unknown word in the reverse direction to determine any known words; where any known words overlap, selecting the larger word; parsing the unknown word in the forward direction to determine any literal text syllables; and parsing the unknown word in the reverse direction to determine any literal text syllables.
- 9. The method of claim 7 wherein multiple voice recordings that correspond to a single speech item represent various inflections of that single speech item, and wherein converting the sequence of word speech items and syllable speech items further comprises:
determining a desired inflection for each speech item in the sequence of speech items based on the set of playback rules; and determining a sequence of voice recordings by determining a voice recording for each speech item based on the desired inflection for the particular speech item and based on the available voice recordings that correspond to the particular speech item.
- 10. The method of claim 7 wherein multiple voice recordings that correspond to a single speech item represent various inflections and ligatures of that single speech item, and wherein converting the sequence of word speech items and syllable speech items further comprises:
determining a desired inflection and desired ligatures for each speech item in the sequence of speech items based on the set of playback rules; and determining a sequence of voice recordings by determining a voice recording for each speech item based on the desired inflection and desired ligatures for the particular speech item and based on the available voice recordings that correspond to the particular speech item.
- 11. The method of claim 4 comprising:
for each unknown word, after the unknown word is parsed, storing results of the parsing in the digital voice library so that a next encounter with the same unknown word may be handled more efficiently.
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of U.S. provisional application Ser. No. 60/241,572 filed Oct. 19, 2000.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60241572 |
Oct 2000 |
US |