Claims
- 1. A method of speech recognition, the method comprising:
receiving an observation value that describes a portion of a speech signal; identifying a predicted value for a hypothesis phonological unit using a production-related dynamics value that is a linear interpolation between a production-related dynamics value at a time corresponding to the end of a preceding phonological unit and a production-related target of the hypothesis phonological unit, wherein the linear interpolation utilizes a time-dependent interpolation weight; and determining the difference between the observation value and the predicted value to determine a likelihood for the hypothesis phonological unit.
- 2. The method of claim 1 wherein the time-dependent weight comprises a term that is an exponential function of time.
- 3. The method of claim 2 wherein the time-dependent weight further comprises a critical damping function.
- 4. The method of claim 1 wherein the time-dependent interpolation weight is dependent on the amount of time that has passed since the hypothesis phonological unit began.
- 5. The method of claim 1 wherein the time-dependent interpolation weight comprises at least one constant that is selected based on the hypothesis phonological unit.
- 6. The method of claim 5 further comprising:
receiving a separate observation value for each of a set of time indices aligned with the hypothesis phonological unit; identifying a separate predicted value at each of the time indices using a same constant to determine a separate time-dependent interpolation weight for each time index; and determining the difference between each respective observation value and each respective predicted value to determine a likelihood for the phonological unit.
- 7. The method of claim 6 wherein identifying a predicted value further comprises multiplying a production-related dynamics value determined for a time index by a value that is not dependent on the phonological unit.
- 8. The method of claim 1 wherein determining a likelihood for the hypothesis phonological unit further comprises determining the likelihood of a sequence of hypothesis phonological units that end in the hypothesis phonological unit, the likelihood of the sequence based on a score associated with a class of production-related dynamics values.
- 9. The method of claim 8 wherein the score associated with a class of production-related dynamics values is determined at a boundary between the hypothesis phonological unit and the previous phonological unit in the sequence of phonological units.
- 10. The method of claim 9 wherein the score associated with a class of production-related dynamics values is selected from a set of scores, one for each class of production-related dynamics values, based on a class of production-related dynamics value at a previous time that maximizes the likelihood of the sequence of phonological units.
- 11. A computer-readable medium having computer-executable instructions for performing steps comprising:
selecting a hypothesis speech unit; selecting a hypothesis duration for the speech unit; identifying a production-related target and a time constant based on the hypothesis speech unit; selecting a starting production-related value; using the time constant and the hypothesis duration to generate a time-dependent interpolation weight; and using the starting production-related value, the production-related target, and the time-dependent interpolation weight to determine a likelihood for the combination of the hypothesis speech unit and the hypothesis duration.
- 12. The computer-readable medium of claim 11 wherein selecting a starting production-related value comprises selecting a starting production-related value that maximizes the likelihood for the combination of the hypothesis speech unit and the hypothesis duration.
- 13. The computer-readable medium of claim 12 wherein determining a likelihood for the combination of the hypothesis speech unit and the hypothesis duration further comprises determining the likelihood for a sequence of speech units that ends in the hypothesis speech unit.
- 14. The computer-readable medium of claim 13 wherein determining a likelihood for the sequence of speech units comprises utilizing a score associated with the end of a speech unit that precedes the hypothesis speech unit in the sequence of speech units.
- 15. The computer-readable medium of claim 14 wherein utilizing a score associated with the end of a speech unit comprises utilizing a score associated with a class of production-related values.
- 16. The computer-readable medium of claim 15 wherein utilizing a score associated with a class of production-related values comprises utilizing a score associated with the class of the starting production-related value.
- 17. The computer-readable medium of claim 11 wherein determining a likelihood for the combination of the hypothesis speech unit and the hypothesis duration comprises determining a production-related value for the hypothesis speech unit based on the starting production-related value, the production-related target and the time-dependent interpolation weight.
- 18. The computer-readable medium of claim 17 wherein determining a likelihood for the combination of the hypothesis speech unit and the hypothesis duration further comprises generating a predicted value through steps comprising multiplying the production-related value for the hypothesis speech unit by zero when the hypothesis speech unit is silence.
- 19. The computer-readable medium of claim 17 wherein determining a likelihood for the combination of the hypothesis speech unit and the hypothesis duration further comprises generating a predicted value through steps comprising multiplying the production-related value for the hypothesis speech unit by zero when the hypothesis speech unit is noise.
- 20. The computer-readable medium of claim 17 wherein determining the likelihood for the combination of the hypothesis speech unit and the hypothesis duration further comprises determining the difference between the predicted value and an observation value.
- 21. A method of decoding a speech signal by generating a score for a current state in a finite state system, the method comprising:
determining a production-related value for the current state based on an optimal production-related value at the end of a preceding state, the optimal production-related value being selected from a set of continuous values, using the production-related value to determine a likelihood of a phone being represented by a set of observation vectors that are aligned with a path between the preceding state and the current state; and combining the likelihood of the phone with a score from the preceding state to determine the score for the current state, the score from the preceding state being associated with a discrete class of production-related values wherein the class matches the class of the optimal production-related value.
- 22. The method of claim 21 wherein the optimal production-related value is the production-related value that maximizes the score for the current state.
- 23. The method of claim 21 wherein the method is used to generate a separate score for each class of production-related values in a plurality of classes.
- 24. The method of claim 21 wherein the production-related value is further based on the length of time between the current time in the current state and the end of the preceding state.
- 25. The method of claim 24 wherein the production-related value is calculated using a time-dependent interpolation weight.
- 26. The method of claim 25 wherein the time-dependent interpolation weight is further dependent on a time constant associated with the phone.
- 27. The method of claim 26 wherein using the production-related value to determine the likelihood of a phone comprises multiplying the production-related value by a value that is not dependent on the phone.
- 28. A computer-readable medium having computer-executable instructions for performing steps comprising:
selecting an optimal hidden dynamic value from a set of continuous hidden dynamic values; using the optimal hidden dynamic value to determine a probability for a current phone; using the optimal hidden dynamic value to select a path score from a set of path scores, each path score associated with a different discrete class of hidden dynamic values, and the selected path score being associated with the class of the optimal hidden dynamic value; and combining the selected path score and the probability for the current phone to form a path score for a path that includes the current phone.
- 29. The computer-readable medium of claim 28 wherein selecting an optimal hidden dynamic value comprises selecting a hidden dynamic value that maximizes the path score for the path that includes the current phone.
- 30. The computer-readable medium of claim 28 wherein using the optimal hidden dynamic value to determine a probability for a current phone comprises generating an expected hidden dynamic value for the current phone based on the optimal hidden dynamic value.
- 31. The computer-readable medium of claim 30 wherein the expected hidden dynamic value is based on an interpolation between the optimal hidden dynamic value and a target value associated with the current phone.
- 32. The computer-readable medium of claim 31 wherein the expected hidden dynamic value is further based on a length of time associated with the current phone.
- 33. The computer-readable medium of claim 32 wherein the interpolation utilizes a time-dependent interpolation weight that is associated with the current phone and the length of time associated with the current phone.
- 34. The computer-readable medium of claim 33 wherein using the optimal hidden dynamic value to determine a probability further comprises multiplying the expected hidden dynamic value by zero when the current phone is a noise phone.
Parent Case Info
[0001] The present application claims priority from a U.S. Provisional Application having serial No. 60/398,166 and filed on Jul. 23, 2002 and claims priority from a U.S. Provisional Application having serial No. 60/405,971 and filed on Aug. 26, 2002.
Provisional Applications (2)
|
Number |
Date |
Country |
|
60398166 |
Jul 2002 |
US |
|
60405971 |
Aug 2002 |
US |