Claims
- 1. In a speech recognition system using a method for recognizing human speech, the method being of the type comprising the steps of:
selecting a model to represent a selected subunit of speech, the model having associated with it a plurality of states and each state having associated with it a probability function, the probability function having undetermined parameters, the probability functions being represented by a mixture of simple probability functions; extracting features from a set of speech training data; using the features to determine parameters for the probability functions in the model, an improved method for recognizing speech, the improvement comprising the steps of:
determining states that may be represented by a set of simple probability functions; clustering said states that may be represented by a set of simple probability functions into a limited number of clusters; defining a plurality of cluster codebooks, one cluster codebook associated with each one of said clusters and said cluster codebooks having a large number of simple probability functions per code book; and estimating the simple probability functions in each cluster codebook to fit states in that cluster.
- 2. The method according to claim 1 wherein the number of clusters is kept low to improve processing speed while the number of simple probability functions per cluster is increased for greater recognition accuracy.
- 3. The method according to claim 1 wherein the number of clusters is between approximately 40 and 200 and wherein at least one cluster has assigned to it 500 to 2000 simple probability functions.
- 4. The method according to claim 1 wherein the number of clusters is more than 10 and wherein the ratio of the number of clusters to the total number of simple probability functions in the system is less than 0.002.
- 5. The method according to claim 1 wherein the number of clusters is more than 9 and at least one cluster has more than approximately 1,000 simple probability functions.
- 6. The method according to claim 1 wherein the simple probability functions are Gaussians.
- 7. The method according to claim 1 wherein different numbers of simple probability functions are used in different clusters.
- 8. The method according to claim 7 wherein the number of simple probability functions used for a particular cluster is determined by a training algorithm.
- 9. The method according to claim 7 wherein the number of simple probability functions used for a particular cluster is indicated by a human system designer.
- 10. The method according to claim 1 wherein the number of said clusters is equal to the number of phones in the system.
- 11. The method according to claim 1 wherein the model is a three-state Hidden Markov Model.
- 12. The method according to claim 1 wherein states are clustered according to an agglomerative hierarchical clustering scheme.
- 13. The method according to claim 1 wherein states are clustered so as to nearly eliminate overlap between clusters.
- 14. The method according to claim 1 further comprising:
caching log-likelihoods for the simple probability functions in a mixture as soon as they are computed for a frame so that if the same mixture needs to be evaluated at that frame for another triphone state, the cache is used.
- 15. The method according to claim 1 wherein redundant simple probability functions in the state cluster overlap region are more effectively used to cover the acoustic space of the clusters, resulting in smaller variances and a reducing the number of distance components to be computed.
- 16. The method according to claim 1 further comprising:
reducing the size of a simple probability function shortlists by decreasing the number of state clusters with a corresponding reduction in simple probability function computations.
- 17. A computer readable data file stored or transmitted on a media that when loaded into an appropriately configured computer system will enable the system to perform probabilistic recognition using a set of probabilistic models including a limited number of state clusters, at least one state clustered represented by a large number of simple probability functions.
- 18. A speech recognizer comprising:
a logic processing device; storage means; a set of probabilistic models stored in the storage means; said models including a limited number of state clusters, at least one state clustered represented by a large number of simple probability functions; a feature extractor in the computer for extracting feature data capable of being processed by said computer from a speech signal; and recognizing means for matching features from unidentified speech data to the models to produce a most likely path through the models where the path defines the most likely subunits and words in the speech data.
- 19. A method for developing tied-transform HMMs able to estimate probability function parameters with little data comprising:
selecting a small enough number of state clusters M so that robust estimates are possible; training an HMM for a smaller number of state clusters M, for which it is assumed there is enough data to robustly estimate each probability functions parameters; determining for each state cluster in the larger HMM an ancestor descendent relationship with a cluster in the smaller HMM; defining a mapping from the smaller to the larger HMM in terms of the ancestor descendent relationship; wherein probability functions in the state clusters of the larger HMM are transformed versions of the ancestor probability functions in the smaller HMM
Parent Case Info
[0001] This application claims priority from provisional patent application 60/097,789, filed Aug. 25, 1998, which is incorporated herein by reference.
Government Interests
[0002] This invention was supported in part by a grant from DARPA through Naval Command And Control Ocean Surveillance Center Under Contract N66001-94-c-6048 and in part by SRI International of Menlo Park, Calif. The Government may have certain rights in this material.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60097789 |
Aug 1998 |
US |
Continuations (1)
|
Number |
Date |
Country |
Parent |
09379411 |
Aug 1999 |
US |
Child |
10029420 |
Oct 2001 |
US |