Claims
- 1. A method of modeling phenomena comprising the steps of:
creating a set of tags, each tag controlling one or more aspects of one or more phenomena; arranging selected members of the set of tags in a desired sequence to produce phenomena as defined by the sequence of tags; and processing the tags in order to produce phenomena having the characteristics defined by the tags.
- 2. The method of claim 1 wherein the phenomena controlled by the tags are characteristics of speech, wherein the step of arranging selected members of the tags in a desired sequence comprises placing the selected members of the set of tags into a body of text and wherein the step of processing the tags comprises processing the body of text and the tags to produce speech having characteristics defined by the tags.
- 3. The method of claim 2 wherein the characteristics of speech are prosodic characteristics of speech.
- 4. The method of claim 3 wherein each tag imposes a constraint on the prosodic characteristics of speech affected by the tag.
- 5. The method of claim 4 wherein each of the tags specifies an action to be taken and includes parameters defining attributes and associated values providing information about the action to be taken.
- 6. The method of claim 5 wherein each of the tags may include a parameter specifying the location at which the tag takes effect.
- 7. The method of claim 6 wherein the set of tags includes tags which establish settings which remain unchanged until altered by a subsequent tag.
- 8. The method of claim 7 wherein the set of tags includes members which define the pitch behavior of speech over the course of a phrase.
- 9. The method of claim 8 wherein the set of tags includes tags defining accents which define the pitch behavior of local influences within a phrase.
- 10. The method of claim 6 wherein the set of tags includes tags defining phrase boundaries which mark boundaries between regions at which tags have effect.
- 11. The method of claim 10 wherein a tag which defines a phrase boundary prevents tags following the tag which marks the boundary from influencing speech components preceding the tag which marks the boundary.
- 12. The method of claim 9 wherein each of the tags may include values defining type and strength in order to define interaction of the tag with other tags.
- 13. The method of claim 12 wherein a tag may compromise its shape, average pitch or both depending on the value defining type.
- 14. The method of claim 8 wherein the step of processing the tags includes establishing a phrase curve by creating and solving equations defined by tags which specify changes in pitch and tags which specify rates of changes in pitch.
- 15. The method of claim 14 wherein the body of text and the tags are processed one minor phrase at a time.
- 16. The method of claim 15 wherein processing of a phrase includes using values describing properties prevailing near the end of an immediately preceding phrase.
- 17. The method of claim 9 wherein the step of processing the tags includes establishing a pitch curve by creating and solving equations defined by tags which specify accents.
- 18. The method of claim 17 wherein the body of text and the tags are processed one minor phrase at a time.
- 19. The method of claim 18 wherein processing of a phrase includes using values describing properties prevailing near the end of an immediately preceding phrase.
- 20. A method of processing a body of text including tags defining prosodic characteristics of speech to be produced by processing the text, comprising the steps of:
extracting the tags from the text; creating a set of equations defining a phrase curve; solving the set of equations to produce the phrase curve; creating a set of equations defining a pitch curve; solving the set of equations to produce the pitch curve; mapping linguistic concepts represented by the phrase curve and the pitch curve to acoustical observables; and performing a nonlinear transformation to adjust the prosodic characteristics defined by tags to human perceptions and expectations.
- 21. A method of defining a set of tags specifying prosodic characteristics of a target speaker, comprising the steps of:
selecting a body of training text; receiving speech representing reading of the training text by the target speaker to form a training corpus; analyzing the training corpus to identify prosodic characteristics of the training corpus; and creating a set of tags defining the identified prosodic characteristics of the training corpus.
- 22. A method of placing tags in text for text to speech processing comprising the steps of:
placing tags in a body of training text to model prosodic characteristics of a training corpus produced by reading of the training text; analyzing the placement of the tags in the training text to develop a set of rules for placement of tags in text; and applying the rules to text for which text to speech processing is desired to place tags in the text in order to produce speech having desired prosodic characteristics.
- 23. A text to speech system for receiving text inputs comprising text to be processed to generate speech and tags defining prosodic characteristics of the speech to be generated, comprising:
a text input interface for receiving the text input; a speech modeler operative to process the text inputs to produce speech having the prosodic characteristics specified by the tags; and a speech output interface for producing the speech output.
- 24. The system of claim 23 wherein the speech modeler is further operative to process a training corpus representing a reading of text by a target speaker to produce tags defining prosodic characteristics of the training corpus and use the tags to produce speech having prosodic characteristics typical of the target speaker.
- 25. A method of modeling a series of motions comprising:
selecting and placing a sequence of tags to define a desired sequence of motions; analyzing the tags in order to define the motions defined by the tags; identifying a time sequence of motions which minimizes motion effort and motion error; and producing the time sequence of motions.
- 26. The method of claim 25 wherein the step of selecting and placing the tags is preceded by a step of producing a set of tags to produce desired motion components and wherein the step of selecting and placing the tags comprises making selections from the set of tags.
- 27. The method of claim 2 wherein each tag imposes a constraint on motion of an articulator used to produce speech.
- 28. The method of claim 1 wherein each tag imposes a constraint on modeled muscular motions used to simulate gestures or facial expression.
- 29. A method of modeling muscle dynamics, comprising the steps of:
creating a set of tags, each tag controlling one or more aspects of modeled muscular motion; arranging selected members of the set of tags in a desired sequence to produce modeled muscular motion as defined by the sequence of tags; and processing the tags in order to produce modeled muscular motion having the characteristics defined by the tags.
- 30. A method of processing tags defining a model of muscular dynamics, comprising the steps of:
creating a set of equations defining a sequence of modeled muscular motions; solving the set of equations to produce a motion curve defining the sequence of modeled muscular motions; mapping muscular motion dynamics represented by the motion curve and the pitch curve to a sequence of observable motions; and performing a nonlinear transformation to adjust the muscle dynamics defined by tags to reflect characteristics of natural muscle dynamics.
- 31. The method of claim 9 wherein one or more tags are placed within a proper noun comprising two or more words, each such tag producing prosody indicating to a listener that the proper noun is to be interpreted as a single entity rather than as more than one entity.
- 32. The method of claim 31 wherein the tag produces an increase in the pitch and speed of speech over the speech affected by the tag.
- 33. The method of claim 9 wherein one or more tags are placed to produce a word having prosody indicating that the word requires confirmation.
- 34. The method of claim 33 wherein the prosody indicating that the word requires confirmation is characterized by a relatively high and increasing pitch across the word requiring confirmation.
Parent Case Info
[0001] This application claims the benefit of U.S. Provisional Application Serial No. 60/230,204, filed Sep. 5, 2000 and U.S. Provisional Application Serial No. 60/236,002, filed Sep. 28, 2000, both of which are incorporated herein by reference in their entirety.
Provisional Applications (2)
|
Number |
Date |
Country |
|
60230204 |
Sep 2000 |
US |
|
60236002 |
Sep 2000 |
US |