Claims
- 1. A method for synthesizing a voice signal based on a predetermined voice control information stream, the voice signal selectively synthesized to have a particular prosodic style, the method comprising the steps of:
analyzing said predetermined voice control information stream to identify one or more portions thereof for prosody control; selecting one or more prosody control templates based on the particular prosodic style selected for said voice signal synthesis; applying said one or more selected prosody control templates to said one or more identified portions of said predetermined voice control information stream, thereby generating a stylized voice control information stream; and synthesizing said voice signal based on said stylized voice control information stream so that said synthesized voice signal has said particular prosodic style.
- 2. The method of claim 1 wherein said voice signal comprises a speech signal and wherein said predetermined voice control information stream comprises predetermined text.
- 3. The method of claim 1 wherein said voice signal comprises a speech signal and wherein said predetermined voice control information stream comprises predetermined annotated text.
- 4. The method of claim 1 wherein said voice signal comprises a singing voice signal and wherein said predetermined voice control information stream comprises a predetermined musical score.
- 5. The method of claim 1 wherein said particular prosodic style is representative of a specific person.
- 6. The method of claim 1 wherein said particular prosodic style is representative of a particular group of people.
- 7. The method of claim 1 wherein said step of analyzing said predetermined voice control information stream comprises parsing said predetermined voice control information stream and extracting one or more features therefrom.
- 8. The method of claim 1 wherein said one or more prosody control templates comprise tag templates which are selected from a tag template database.
- 9. The method of claim 8 wherein said step of applying said selected prosody control templates to said identified portions of said predetermined voice control information stream comprises the steps of:
expanding each of said tag templates into one or more tags; converting said one or more tags into a time series of prosodic features; and generating said stylized voice control information stream based on said time series of prosodic features.
- 10. The method of claim 1 further comprising the step of computing one or more phoneme durations, and wherein said step of synthesizing said voice signal is also based on said one or more phoneme durations.
- 11. An apparatus for synthesizing a voice signal based on a predetermined voice control information stream, the voice signal selectively synthesized to have a particular prosodic style, the apparatus comprising:
means for analyzing said predetermined voice control information stream to identify one or more portions thereof for prosody control; means for selecting one or more prosody control templates based on the particular prosodic style selected for said voice signal synthesis; means for applying said one or more selected prosody control templates to said one or more identified portions of said predetermined voice control information stream, thereby generating a stylized voice control information stream; and means for synthesizing said voice signal based on said stylized voice control information stream so that said synthesized voice signal has said particular prosodic style.
- 12. The apparatus of claim 11 wherein said voice signal comprises a speech signal and wherein said predetermined voice control information stream comprises predetermined text.
- 13. The apparatus of claim 11 wherein said voice signal comprises a speech signal and wherein said predetermined voice control information stream comprises predetermined annotated text.
- 14. The apparatus of claim 11 wherein said voice signal comprises a singing voice signal and wherein said predetermined voice control information stream comprises a predetermined musical score.
- 15. The apparatus of claim 11 wherein said particular prosodic style is representative of a specific person.
- 16. The apparatus of claim 11 wherein said particular prosodic style is representative of a particular group of people.
- 17. The apparatus of claim 11 wherein said means for analyzing said predetermined voice control information stream comprises means for parsing said predetermined voice control information stream and means for extracting one or more features therefrom.
- 18. The apparatus of claim 11 wherein said one or more prosody control templates comprise tag templates which are selected from a tag template database.
- 19. The apparatus of claim 18 wherein said means for applying said selected prosody control templates to said identified portions of said predetermined voice control information stream comprises:
means for expanding each of said tag templates into one or more tags; means for converting said one or more tags into a time series of prosodic features; and means for generating said stylized voice control information stream based on said time series of prosodic features.
- 20. The apparatus of claim 11 further comprising means for computing one or more phoneme durations, and wherein said means for synthesizing said voice signal is also based on said one or more phoneme durations.
CROSS-REFERENCE TO RELATED APPLICATION
[0001] The present application hereby claims the benefit of previously filed Provisional patent application Ser. No, 60/314,043, “Method and Apparatus for Controlling a Speech Synthesis System to Provide Multiple Styles of Speech,” filed by G. P. Kochanski et al. on Aug. 22, 2001.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60314043 |
Aug 2001 |
US |