Claims
- 1. In a system that accepts an input automaton wherein the input automaton represents multiple hypotheses, a method for finding N-best distinct hypotheses of the input automaton, the method comprising:
computing an input potential for each state of the input automaton to a set of final states, wherein at least one input potential is used to determine a determinized potential of each state in a result of determinization and wherein the determinized potential of a particular state in the result of determinization can be determined without fully determinizing the input automaton; and identifying N-best distinct paths in the result of determinization of the input automaton using the determinized potential of the states in the result of determinization, wherein the N-best distinct paths of the result of determinization are labeled with the N-best hypotheses of the input automaton.
- 2. A method as defined in claim 1, wherein computing an input potential for each state of the input automaton to a set of final states further comprises running a shortest-paths algorithm from the set of final states using a reverse of a digraph.
- 3. A method as defined in claim 1, wherein identifying N-best distinct paths in the result of determinization of the input automaton further comprises computing the result of determinization on-the-fly.
- 4. A method as defined in claim 3, wherein computing the result of determinization further comprises:
creating a new state in the result of determinization; creating a new transition from a previous state to the new state in the result of determinization, wherein the new transition has a label that is the same as a label from a transition of the input automaton; creating a subset of state pairs that correspond to the new state; and determining a determinized potential for the new state using an input potential from at least one state of the input automaton.
- 5. A method as defined in claim 4, wherein each pair includes a state from the input automaton and a remainder weight, wherein creating a subset of state pairs further comprises constructing a pair for each destination state of the transition in the input automaton, wherein each pair includes a state of the input automaton and a remainder weight.
- 6. A method as defined in claim 4, wherein creating a new transition from a previous state in the result of determinization to the new state only depends on states and remainder weights of a particular subset corresponding to the previous state of the result of determinization and on the input automaton.
- 7. A method as defined in claim 1, wherein identifying N-best distinct paths in the result of determinization of the input automaton using the determinized potential of the states in the result of determinization further comprises propagating the input potential of a particular state to states of the result of determinization.
- 8. In a system that receives an automaton representing multiple strings, a method for identifying the N-best distinct strings of the automaton, the method comprising:
partially determinizing an automaton during a search for the N-best distinct strings by partially creating a determinized automaton while searching for the N-best distinct strings by:
creating an initial state of the partially determinized automaton, wherein the initial state corresponds to a state pair; creating a transition leaving the initial state, wherein the transition has a label and a weight; creating a destination state for that transition, wherein the destination state corresponds to a subset of state pairs; creating additional states in the partially determinized automaton, wherein each additional state corresponds to a different subset of state pairs and wherein the states in the partially determinized automaton are connected by transitions that have labels and weights; propagating a potential from at least one state of the automaton to each state in the partially determinized automaton; and identifying the N-best distinct strings of the automaton from the partially created determinized automaton using an N-best paths process.
- 9. A method as defined in claim 8, further comprising computing an input potential for each state of the automaton, wherein each input potential represents a shortest distance from a corresponding state to a set of final states.
- 10. A method as defined in claim 9, wherein computing an input potential for each state of the automaton further comprises running a shortest-paths algorithm from the set of final states using a reverse of a digraph.
- 11. A method as defined in claim 8, wherein identifying the N-best distinct strings of the automaton from the partially determinized automaton occurs without fully determinizing the automaton.
- 12. A method as defined in claim 8, wherein creating additional states in the partially determinized automaton further comprises creating new transitions that connect previous states of the partially determinized automaton with the additional state, wherein creating new transitions depends on a subset of state pairs and the automaton, wherein each state pair includes a state of the automaton and a remainder weight.
- 13. A method as defined in claim 8, wherein partially determinizing the automaton during a search for the N-best distinct strings comprises determinizing only a portion of the automaton visited during the search for the N-best strings.
- 14. A method as defined in claim 8, wherein partially determinizing the automaton during a search for the N-best distinct strings further comprises:
creating subsets of pairs for each state of the partially determinized automaton, wherein each pair includes a state of the automaton and a remainder weight; and assigning a weight to a final state of the partially determinized automaton.
- 15. A method as defined in claim 8, further comprising terminating the search for the N-best strings when a final state of the result of determinization has been extracted N times.
- 16. A method as defined in claim 8, further comprising prioritizing pairs in each subset according to a determinized potential.
- 17. In an automatic speech recognition system that produces a weighted automaton that represents a plurality of strings, wherein some of the strings are redundant, a method for finding the N-best distinct strings of the weighted automaton, the method comprising:
computing a shortest distance from each input state of a weighted automaton to a set of final states of the weighted automaton; partially creating a determinized automaton in an order dictated by an N-best paths search, wherein partially creating the determinized automaton further comprises:
forming determinized states wherein each determinized state corresponds to a weighted subset of pairs, wherein each pair includes:
a state that references the weighted automaton; and a remainder weight that is calculated based at least on the weights of transitions included in the weighted automaton; making determinized transitions, wherein each determinized transition includes:
a label corresponding to one of the labels on a transition of the weighted automaton; and a transition weight wherein the transition weight is calculated based on at least the remainder weight of the determinized state from which the determinized transitions leave and the weights of transitions in the input automaton that have the same label as the determinized transition; and repeating the steps of forming and making until enough of the partially determinized automaton has been created to find the N-best strings of the input automaton; and identifying the N-best paths of the partially determinized automaton, wherein the N-best paths of the partially determinized automaton correspond exactly with N-best distinct strings of the weighted automaton.
- 18. A method as defined in claim 17, wherein computing a shortest distance from each input state of the weighted automaton to a set of final states of the weighted automaton further comprises running a shortest paths algorithm from the set of final states using a reverse of a digraph.
- 19. A method as defined in claim 17, wherein the transition weight is the negative log of a probability corresponding to the determinized transition.
- 20. In a system where an input weighted automaton comprises input states and input transitions, wherein the input transitions interconnect the input states to form a plurality of complete paths from any one member of a set of beginning input states to any one member of a set of final input states, wherein the input transitions comprise a label and a weight, a method of partially determinizing the input automaton to identify the N-best strings of the input automaton, the method comprising:
creating a deterministic automaton by:
forming a sufficient number of determinized states and determinized transitions in an order dictated by an N-shortest paths algorithm to create N complete paths; and interconnecting the determinized states and determinized transitions to form complete paths; and identifying the N-best strings of the input weighted automaton by searching for the N-best complete paths of the deterministic automaton.
- 21. The method as set forth in claim 20, wherein the deterministic automaton represents at least one of a word lattice and a phone lattice.
- 22. The method as set forth in claim 20, wherein the determinized transitions have a weight that corresponds to a probability.
- 23. The method as set forth in claim 20, further comprising the step of computing potentials for each of the input states of the input weighted automaton prior to the step of creating.
- 24. The method as set forth in claim 23 wherein the N-shortest paths algorithm is dependent on the potentials obtained in the step of computing.
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of U.S. Provisional Patent Application Serial No. 60/369,109 entitled “An Efficient Algorithm for the N-Best Strings Problem”, filed on Mar. 29, 2002, which is incorporated herein by reference.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60369109 |
Mar 2002 |
US |