Claims
- 1. An automated method of generating a subword model for speech recognition dependent on phoneme context for processing speech information using a Hidden Markov Model in which static features of speech and dynamic features of speech are modeled as a chain of a plurality of output probability density distributions, comprising the step of:
- determining a phoneme context class which is a model unit allocated to each model, the number of states used for representing each model, relationship of sharing of states among a plurality of models, and output probability density distribution of each model, by repeating splitting of a small number of states, provided in an initial Hidden Markov Model.
- 2. The method according to claim 1, wherein splitting includes
- the step of splitting, in a parallel domain, one state into two states corresponding to different phoneme context classes to absorb change in the static features of speech derived from a difference of the phoneme context, and the step of splitting, in a serial domain, one state into two states corresponding to different phoneme segments to absorb change in the dynamic features of speech generated in a certain phoneme context class, and
- repeating state splitting in the domain which enables a higher value of an evaluation with respect to an actual speech sample to successively make the model structure more precise.
- 3. The method according to claim 2, further comprising the step of:
- in state splitting in a contextual domain, splitting with respect to actual phoneme context class and allocating two split phonetic context classes, to attain a highest value of an evaluation with respect to the actual speech sample, to one of the two states respectively generated by state splitting to make the model unit successively smaller.
- 4. The method according to claim 1, further comprising the step of:
- allocating a mixture gaussian density distribution having the mixture number of 2 as each output probability density distribution, and
- allocating a single gaussian distribution of one of two distributions constituting the mixture gaussian density distribution to two states newly generating by the state splitting, to realize significant reduction in an amount of calculation necessary for re-estimating output probability density distribution parameters after state splitting.
- 5. The method according to claim 4, further comprising the step of:
- every time state splitting is executed, recovering the single gaussian distribution generated by state splitting to the original mixture gaussian density distribution having the mixture number of 2, and carrying out re-training to optimize model parameters.
- 6. The method according to claim 1, further comprising the step of:
- prior to state splitting, calculating a magnitude of all states existing in a speech parameter space by determining a state having a largest magnitude as a splittee state, thereby significantly reducing an amount of calculation by avoiding a round robin type calculation necessary for determining an optimal splittee state.
- 7. The method according to claim 4, further comprising the step of:
- after state splitting is completed and a final model unit and structure are determined, carrying out re-training to replace the mixture gaussian density distribution having the mixture number of 2, allocated to each state, by the output probability density distribution to be used in an actual Hidden Markov Network.
Priority Claims (1)
Number |
Date |
Country |
Kind |
4-064296 |
Mar 1992 |
JPX |
|
Parent Case Info
This application is a continuation of application Ser. No. 07/953,354 filed Sep. 30, 1992.
US Referenced Citations (4)
Foreign Referenced Citations (3)
Number |
Date |
Country |
0 312 209 |
Apr 1989 |
EPX |
33 37 353 |
Apr 1984 |
DEX |
0271325 |
Mar 1990 |
JPX |
Non-Patent Literature Citations (1)
Entry |
Rabiner, "A Tuturiol on Hidden Markov Models and Selected Applications in Speech Recognition", Proc. IEEE, vol. 77 No. 8 1989. |
Continuations (1)
|
Number |
Date |
Country |
Parent |
953354 |
Sep 1992 |
|