Claims
- 1. A method for estimating pronunciation probabilities in word pronunciation networks that incorporate dialectal variations in pronunciation for use in a speech recognition system wherein said word networks comprise a plurality of nodes, each node connected to its successive node by one or more arcs, each arc having associated with it a phone and a numerical variable for storing the pronunciation probability that arc is taken comprising the steps of:
- determining equivalence classes for a plurality of nodes in the word pronunciation networks by:
- if context with surrounding nodes is not relevant phone choice at a node, classifying that node with nodes having similar phone choices into the same equivalence class; and
- if context with surrounding nodes is important in determining phone choice at a node, classifying that node in an equivalence class with similar nodes having similar phone choices and sharing identical relevant contextual constraints, such that all nodes in the same equivalence class may share training samples for estimating pronunciation probabilities; and
- using a set of training samples to estimate the pronunciation probabilities in the word pronunciation network such that training samples for a given word will contribute to the training of networks for all other words that have any nodes in an equivalence class with any of the nodes of that word.
- 2. The method according to claim 1, wherein said arcs in said word networks are brought into existence by applying a set of phonological rules to a set of baseform word models and wherein context with surrounding nodes is considered important and nodes are placed in the same equivalence class if and only if there is a one-to-one correspondence between phones of outgoing arcs; and there is a one-to-one correspondence between sets of phonological rules responsible for bringing each of said outgoing arcs into existence.
- 3. The method according to claim 1, further comprising the steps of:
- replacing arcs in said word pronunciation networks with hidden Markov models of phones;
- estimating counts of traversals of each transition of a forward-backward algorithm for each training sample, wherein transitions into initial nodes of hidden Markov model phone models correspond to arcs in word pronunciation networks; and
- after combining counts for nodes in equivalence classes, employing said combined counts for estimating probabilities of each of the outgoing arcs from each node in the word pronunciation networks.
- 4. The method according to claim 3, wherein said estimating step is given by: ##EQU4## where P.sub.i,j =estimated probability for arc i in equivalence class j;
- C.sub.i,k =forward-backward count for arc i leaving node k.
- 5. A method for building word pronunciation networks that incorporate dialectal variations in pronunciation useful for recognizing speech by a data processing computer comprising the steps of:
- acquiring a set of baseform word models for each of the words in the vocabulary to be recognized;
- acquiring a set of phonological rules that define allowed pronunciation variations, at least one of said phonological rules specifying allophonic choices allowed in a particular context;
- applying the phonological rules to the baseform word models to obtain a set of word models that incorporate dialectal variations in pronunciation; and
- determining equivalence classes, each equivalence class being a grouping of nodes within said word models that incorporate dialectal variations in pronunciation wherein each node in an equivalence class represents the same possible pronunciation variations governed by the same phonological rules.
- 6. The method according to claim 5 further comprising using a set of training samples to estimate the probabilities of the pronunciation variations in the word models that incorporate dialectal variations in pronunciation and sharing said probabilities among nodes in the same equivalence class such that training samples for a given word will contribute to the training of the models for other words that have a node in an equivalence class in common with the node of that given word.
- 7. The method according to claim 5 wherein the baseform word models are generated automatically by executing a set of letter to sound rules on the alphabetic representation of the words.
Parent Case Info
This application is a file-wrapper-continuation of application Ser. No. 08/071,801 filed Jun. 1, 1993 which is a division of application Ser. No. 07/648,097, now U.S. Pat. No. 5,268,990 filed Jan. 21, 1991.
ACKNOWLEDGEMENT OF SPONSORSHIP
This invention was partially funded under grants from the National Science Foundation (NSF EE-8517751) and the United States Navy (Contract No. N-00039-85-C-0302), in addition to funding from SRI International of Menlo Park, Calif.
US Referenced Citations (4)
Non-Patent Literature Citations (2)
Entry |
Flanagan, "Speech Analysis, Synthesis and Perception,"New York, Academic Press Inc., 1965, p. 14. |
Bourlard et al., "Speaker Dependent Connected Speech Recognition via Phonemic Markov Models", ICASSP 85, 26-29 Mar. 1985, Tampa, FL, pp. 1213-1216. |
Divisions (1)
|
Number |
Date |
Country |
Parent |
648097 |
Jan 1991 |
|
Continuations (1)
|
Number |
Date |
Country |
Parent |
71801 |
Jun 1993 |
|