Claims
- 1. A method of identifying prosody for a synthesized speech segment that is formed from a string of lexical words, the method comprising:
converting the string of lexical words into a string of prosodic words; identifying the prosody from the string of prosodic words.
- 2. The method of claim 1 wherein converting the string of lexical words into a string of prosodic words comprises combining at least two lexical words in the string of lexical words to form a prosodic word in the string of prosodic words.
- 3. The method of claim 2 wherein combining at least two lexical words comprises:
identifying at least one category for each lexical word; and determining whether to combine the two lexical words based on the categories of the lexical words.
- 4. The method of claim 3 wherein determining whether to combine the two lexical words comprises applying the categories of the lexical words to a classification and regression tree.
- 5. The method of claim 3 wherein determining whether to combine the two lexical words comprises examining a probability that describes the likelihood that the lexical words form a prosodic word given the categories.
- 6. The method of claim 1 wherein converting the string of lexical words into a string of prosodic words comprises dividing a lexical word into smaller prosodic words.
- 7. The method of claim 6 wherein dividing a lexical word into smaller prosodic words comprises accessing an annotated lexicon to determine how to divide the lexical word into smaller prosodic words.
- 8. The method of claim 1 wherein converting the string of lexical words into a string of prosodic words comprises:
dividing at least one lexical word in the string of lexical words into smaller prosodic words to form a modified string; and combining at least two words in the modified string into a prosodic word.
- 9. The method of claim 1 wherein identifying the prosody from the string of prosodic words comprises identifying at least one prosodic feature from the set of prosodic features consisting of pitch contour, duration, pauses, word initial, word middle and word end.
- 10. A method of training a model for converting a string of lexical words into a string of prosodic words, the method comprising:
identifying a pair of lexical words that form a single prosodic word when spoken; identifying categories for the pair of lexical words; and training the model based on the identification of the pair of lexical words and the categories for the pair of lexical words.
- 11. The method of claim 10 wherein training the model comprises training a statistical model.
- 12. The method of claim 11 wherein training a statistical model comprises:
identifying a set of categories for each pair of lexical words in the strings of lexical words; producing a category count for each set of categories by counting the number of pairs of lexical words for which the set of categories was identified; producing a prosodic word count for each set of categories by counting the number of pairs of lexical words that were identified as forming a single prosodic word and for which the set of categories was identified; and using the prosodic word count and the category count to train the statistical model.
- 13. The method of claim 12 further comprising using a weighting function with the prosodic word count and the category count to train the statistical model.
- 14. The method of claim 13 wherein the weighting function gives preference to sets of categories that have a high category count.
- 15. The method of claim 10 wherein training the model comprises training a classification and regression tree.
- 16. The method of claim 10 further comprising annotating a lexicon to indicate how to divide at least one lexical word into multiple prosodic words.
- 17. The method of claim 16 wherein annotating a lexicon comprises:
removing words with more than a selected number of characters from a lexicon to form a short-word lexicon; and segmenting each removed word based on words in the short-word lexicon to produce smaller words.
- 18. The method of claim 17 wherein annotating the lexicon further comprises:
combining at least some of smaller words to form combined words, the combined words and the smaller words that are not combined forming prosodic words; and annotating the lexicon based on the prosodic words.
- 19. The method of claim 18 wherein combining at least some of the smaller words comprises using the model to convert the smaller words into combined words.
- 20. A computer-readable medium having computer-executable instructions for performing steps comprising:
identifying lexical words in a string of characters; identifying prosodic words from the lexical words; and using the prosodic word boundaries when setting the prosody for synthesized speech formed from the string of characters.
- 21. The computer-readable medium of claim 20 wherein the step of identifying prosodic words comprises combining a pair of lexical words to form a prosodic word.
- 22. The computer-readable medium of claim 21 wherein combining lexical words comprises combining lexical words on the basis of a model.
- 23. The computer-readable medium of claim 22 wherein the model comprises a statistical model.
- 24. The computer-readable medium of claim 22 wherein the model comprises a classification and regression tree.
- 25. The computer-readable medium of claim 20 wherein the step of identifying prosodic words comprises dividing a lexical word into at least two prosodic words.
- 26. The computer-readable medium of claim 25 wherein dividing a lexical word comprises:
accessing a lexicon to find an entry for the lexical word; retrieving information from the entry describing how the lexical word is to be divided; and dividing the lexical word based on the information.
- 27. The computer-readable medium of claim 20 wherein the step of identifying prosodic words comprises:
dividing at least one lexical word into at least two prosodic words and replacing the lexical word with the prosodic words to form an intermediate string of words comprising at least one of the lexical words identified from the string of characters and the at least two prosodic words; and combining at least two words in the intermediate string of words to form a prosodic word.
- 28. A method for annotating a text corpus, the method comprising:
converting lexical words in the text corpus into prosodic words; annotating the text corpus to indicate the location of the prosodic words in the text corpus.
REFERENCE TO RELATED APPLICATION
[0001] The present application claims priority to a U.S. Provisional application having serial No. 60/251,167, filed on Dec. 4, 2000 and entitled “PROSODIC WORD SEGMENTATION AND MULTI-TIER NONUNIFORM UNIT SELECTION”.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60251167 |
Dec 2000 |
US |