Claims
- 1. Apparatus for speech animation of desired text, comprising:
- first input means for receiving speech samples derived from input audio data and for providing a sample speech signal representing said speech samples, said input speech samples being in the voice of a selected person;
- first segmentation means coupled to said input means for extracting constituent speech segments in accordance with a predetermined speech segmentation plan from said sample speech signal;
- encoding means for digitally encoding said constituent speech segments;
- second input means for receiving and encoding desired speech text;
- second segmentation means, coupled to said second input means and responsive to desired speech text for segmenting said desired speech text into a plurality of constituent text segments in accordance with said predetermined segmentation plan;
- combining means for combining a plurality of said encoded constituent speech segments for providing a digital speech signal representative of desired animated speech corresponding to said desired speech text, said digital speech signal being representative of desired animated speech in the voice of said selected person, each of said plurality of encoded constituent speech segments corresponding to at least one of said plurality of constituent text segments; and
- storage means for storing said digitally encoded constituent speech segments in at least one predefined voice reference file, said predefined voice reference file comprises a language library for storing predefined sets of language rules associated with a selected language, a recording library for storing recorded speech sequences in said selected language for said selected person, a voice library for storing said encoded constituent speech segments in said selected language for said selected person, whereby a separate predefined voice reference file is defined and identified for each said selected person;
- one of said language libraries being defined for each of a plurality of selectable languages, each said language library being accessed by each said voice reference file associated with a selected language, each said language file including:
- a set of language segmentation rules defined for said selected language;
- a set of prosody rules defined in accordance with said language segmentation rules for said selected language;
- a set of text segmentation rules defined in accordance with said language segmentation rules for said selected language; and
- a set of resynthesis configuration parameters for configuring said combining means for said selected language.
- 2. Apparatus for speech animation of desired text, comprising:
- first input means for receiving speech samples derived from input audio data and for providing a sample speech signal representing said speech samples, said input speech samples being in the voice of a selected person;
- first segmentation means coupled to said input means for extracting constituent speech segments in accordance with a predetermined speech segmentation plan from said sample speech signal;
- encoding means for digitally encoding said constituent speech segments;
- second input means for receiving and encoding desired speech text;
- second segmentation means, coupled to said second input means and responsive to desired speech text for segmenting said desired speech text into a plurality of constituent text segments in accordance with said predetermined segmentation plan;
- combining means for combining a plurality of said encoded constituent speech segments for providing a digital speech signal representative of desired animated speech corresponding to said desired speech text, said digital speech signal being representative of desired animated speech in the voice of said selected person, each of said plurality of encoded constituent speech segments corresponding to at least one of said plurality of constituent text segments; and
- storage means for storing said digitally encoded constituent speech segments in at least one predefined voice reference file, said predefined voice reference file comprises a language library for storing predefined sets of language rules associated with a selected language, a recording library for storing recorded speech sequences in said selected language for said selected person, a voice library for storing said encoded constituent speech segments in said selected language for said selected person, whereby a separate predefined voice reference file is defined and identified for each said selected person;
- said voice library including:
- at least one selectable predetermined speech segmentation plan; and
- a segment library associated with each said selectable predetermined speech segmentation plan for storing said constituent speech segments extracted from said speech samples in accordance with said associated speech segmentation plan.
- 3. Apparatus as in claim 2 further comprising a segmentation dictionary file associated with each said selectable predetermined speech segmentation plan for associating each of said speech segments in said associated segment library with a corresponding utterance containing said associated speech segment, said speech samples being derived from said utterances.
- 4. Apparatus as in claim 2 wherein said voice library further comprises:
- a resynthesis data file associated with each said selectable predetermined speech segmentation plan for storing selected data and parameters corresponding to said selected voice; and
- a resynthesis configuration file associated with each said selectable predetermined speech segmentation plan for storing selected data and parameters for configurating said combining means for said selected voice utilizing said selectable predetermined speech segmentation plan.
- 5. Apparatus for speech animation of desired text, comprising:
- first input means for receiving speech samples derived from input audio data and for providing a sample speech signal representing said speech samples;
- first segmentation means including automatic extraction means coupled to said input means for automatically extracting constituent speech segments in accordance with a predetermined speech segmentation plan from said sample speech signal;
- encoding means for digitally encoding said constituent speech segments;
- second input means for receiving and encoding desired speech text;
- second segmentation means, coupled to said second input means and responsive to desired speech text for segmenting said desired speech text into a plurality of constituent text segments in accordance with said predetermined segmentation plan;
- combining means for combining a plurality of said encoded constituent speech segments for providing a digital speech signal representative of desired animated speech corresponding to said desired speech text, each of said plurality of encoded constituent speech segments corresponding to at least one of said plurality of constituent text segments;
- storage means for storing said digitally encoded constituent speech segments in a predefined voice library, said speech samples being input audibly in the voice of a selected person and said predefined voice library being identified as the voice of said selected person providing said speech samples;
- said voice library including at least one selectable predetermined speech segmentation plan;
- a segment library associated with each said selectable predetermined speech segmentation plan for storing said constituent speech segments extracted from said speech samples in accordance with said associated speech segmentation plan; and
- editing means for manually editing and modifying said automatically extracted constituent speech segments.
- 6. Apparatus as in claim 5 wherein said editing means includes means for manually extracting said constituent speech segments from said speech samples.
- 7. Apparatus as in claim 6 wherein said editing means further includes:
- display means for displaying a visual image of said sample speech signal and of said extracted constituent speech segments; and
- audio test means for providing an audio output corresponding to the constituent speech segment or segments currently being edited.
- 8. Apparatus as in claim 7 wherein said editing means is coupled to said combining means providing for the testing and editing of said digital speech signal.
- 9. Apparatus for speech animation of desired text, comprising:
- first input means for receiving speech samples derived from input audio data and for providing a sample speech signal representing said speech samples, said speech samples being input in the voice of a selected person;
- first segmentation means including automatic extraction means coupled to said input means for automatically extracting constituent speech segments in accordance with a predetermined speech segmentation plan from said sample speech signal, said first segmentation means including editing means for manually editing and modifying said automatically extracted constituent speech segments, said first segmentation means including means for providing a residual excitation signal associated with said sample speech signal;
- encoding means for digitally encoding said constituent speech segments and said residual excitation signal as a voiced component and an unvoiced component thereof;
- second input means for receiving and encoding desired speech text;
- second segmentation means, coupled to said second input means and responsive to desired speech text for segmenting said desired speech text into a plurality of constituent text segments in accordance with said predetermined segmentation plan;
- combining means for combining a plurality of said encoded constituent speech segments for providing a digital speech signal representative of desired animated speech corresponding to said desired speech text, each of said plurality of encoded constituent speech segments corresponding to at least one of said plurality of constituent text segments; and
- storage means for storing said digitally encoded constituent speech segments and said digitally encoded components of said residual excitation signal in a predefined voice library, said predefined voice library being identified as the voice of said selected person providing said speech samples;
- said voice library including at least one selectable predetermined speech segmentation plan; and
- a segment library associated with each said selectable predetermined speech segmentation plan for storing said constituent speech segments extracted from said speech samples in accordance with said associated speech segmentation plan.
- 10. Apparatus as in claim 9 wherein said editing means includes means for manually extracting said constituent speech segments from said speech samples.
- 11. Apparatus as in claim 10 wherein said editing means further includes:
- display means for displaying a visual image of said sample speech signal, said residual excitation signal and of said extracted constituent speech segments; and
- audio test means for providing an audio output corresponding to the speech segment or segments currently being edited.
- 12. Apparatus as in claim 11 wherein said editing means is coupled to said combining means providing for the testing and editing of said digital speech signal.
- 13. A method for providing animated speech corresponding to user input text, comprising the steps of:
- receiving speech samples derived from input audio data and for providing a sample speech signal representing said speech samples;
- extracting constituent speech segments from said speech samples in accordance with a predetermined segmentation plan;
- encoding said constituent speech segments;
- receiving and encoding desired speech text unrelated to said speech samples;
- segmenting desired speech text into a plurality of constituent text segments in accordance with said predetermined segmentation plan;
- combining a plurality of said encoded constituent speech segments, each of said plurality of encoded constituent speech segments corresponding to at least one of said plurality of constituent text segments for providing a speech signal representative of desired animated speech;
- storing said encoded constituent speech segments in a voice library file, said voice library including at least one selectable predetermined speech segmentation plan; and a segment library associated with each said selectable predetermined speech segmentation plan for storing said constituent speech segments extracted from said speech samples in accordance with said associated speech segmentation plan; and
- editing said speech signal.
CROSS REFERENCE TO RELATED APPLICATION
This is a continuation of application Ser. No. 07/497,937, filed Mar. 23, 1990, now abandoned.
US Referenced Citations (12)
Continuations (1)
|
Number |
Date |
Country |
Parent |
497937 |
Mar 1990 |
|