Claims
- 1. A method of recognizing speech, the method comprising:
receiving an input speech signal; performing an initial recognition on the input speech signal to generate a first pass result; generating a first grammar based upon the first pass result, the first grammar having a portion set to match a first part of the input speech signal; and applying the first grammar to the input speech signal to generate a second pass result.
- 2. The method of claim 1, wherein generating a first grammar comprises:
determining a context of the first pass result; determining the portion of the first grammar to be set to match the first part of the input speech signal based upon the determined context of the first pass result; and generating the first grammar with the portion set to match the first part of the input speech signal.
- 3. The method of claim 1, wherein applying the first grammar comprises:
setting the first part of the input speech signal as matched with the portion of the first grammar; recognizing a second part of the input speech signal; generating the second pass result based upon the recognized second part of the input speech signal.
- 4. The method of claim 1, wherein the second pass result is modified based upon location-based information.
- 5. The method of claim 1, further comprising:
generating a second grammar based upon the second pass result, the second grammar limiting the second part of the input speech signal to the second pass result and configured to recognize the first part of the input speech signal within the second pass result; and applying the second grammar to the input speech signal to generate a third pass result.
- 6. The method of claim 5, wherein generating a second grammar comprises:
limiting the second part of the input speech signal to the second pass result; and generating a model corresponding to the first part of the input speech signal and varying within the second pass result.
- 7. The method of claim 6, wherein applying the second grammar comprises comparing the first part of the input speech signal to the model while the second part of the input speech signal is limited to the second pass result.
- 8. The method of claim 5, wherein the third pass result is modified based upon location-based information.
- 9. The method of claim 1, wherein the first part of the input speech signal corresponds to a street address and the second part of the input speech signal corresponds to a city name.
- 10. A computer program product method for recognizing speech, the computer program product stored on a computer readable medium and adapted to perform a method comprising:
receiving an input speech signal; performing an initial recognition on the input speech signal to generate a first pass result; generating a first grammar based upon the first pass result, the first grammar having a portion set to match a first part of the input speech signal; and applying the first grammar to the input speech signal to generate a second pass result.
- 11. The computer program product of claim 10, wherein generating a first grammar comprises:
determining a context of the first pass result; determining the portion of the first grammar to be set to match the first part of the input speech signal based upon the determined context of the first pass result; and generating the first grammar with the portion set to match the first part of the input speech signal.
- 12. The computer program product of claim 10, wherein applying the first grammar comprises:
setting the first part of the input speech signal as matched with the portion of the first grammar; recognizing a second part of the input speech signal; and generating the second pass result based upon the recognized second part of the input speech signal.
- 13. The computer program product of claim 10, wherein the second pass result is modified based upon location-based information.
- 14. The computer program product of claim 10, wherein the method further comprises:
generating a second grammar based upon the second pass result, the second grammar limiting the second part of the input speech signal to the second pass result and configured to recognize the first part of the input speech signal within the second pass result; and applying the second grammar to the input speech signal to generate a third pass result.
- 15. The computer program product of claim 14, wherein generating a second grammar comprises:
limiting the second part of the input speech signal to the second pass result; and generating a model corresponding to the first part of the input speech signal and varying within the second pass result.
- 16. The computer program product of claim 15, wherein applying the second grammar comprises comparing the first part of the input speech signal to the model while the second part of the input speech signal is limited to the second pass result.
- 17. The computer program product of claim 14, wherein the third pass result is modified based upon location-based information.
- 18. The computer program product of claim 10, wherein the first part of the input speech signal corresponds to a street address and the second part of the input speech signal corresponds to a city name.
- 19. A speech recognition system using a multiple pass speech recognition method including at least a first pass and a second pass, the speech recognition system comprising:
a speech recognition engine for performing an initial recognition on an input speech signal in the first pass to generate a first pass result and applying a first grammar to the input speech signal in the second pass to generate a second pass result; a grammar database for storing a plurality of grammar; and a dynamic grammar generator for generating the first grammar based upon the first pass result using the grammar stored in the grammar database, the first grammar having a portion set to match a first part of the input speech signal and configured to recognize a second part of the input speech signal.
- 20. The speech recognition system of claim 19, wherein the dynamic grammar generator determines a context of the first pass result and determines the portion of the first grammar to be set to match the first part of the input speech signal based upon the determined context of the first pass result.
- 21. The speech recognition system of claim 19, further comprising a processor coupled to the speech recognition engine and configured to modify the second pass result based upon location-based information.
- 22. The speech recognition system of claim 19, wherein the multiple pass speech recognition method further comprises a third pass,
the dynamic grammar generator generating a second grammar based upon the second pass result, the second grammar limiting the second part of the speech to the second pass result and configured to recognize the first part of the input speech signal within the second pass result; and the speech recognition engine applying the second grammar to the input speech signal to generate a third pass result.
- 23. The speech recognition system of claim 22, wherein the dynamic grammar generator limits the second part of the input speech signal to the second pass result and generates a model corresponding to the first part of the input speech signal and varying within the second pass result as part of the second grammar.
- 24. The speech recognition system of claim 23, wherein the speech recognition engine applies the third pass grammar to the input speech signal in the third pass by comparing the first part of the input speech signal to the model while limiting the second part of the input speech signal to the second pass result.
- 25. The speech recognition system of claim 22, further comprising a processor coupled to the speech recognition engine configured to modify the third pass result based upon location-based information.
- 26. The speech recognition system of claim 19, wherein the first part of the input speech signal corresponds to a street address and the second part of the input speech signal corresponds to a city name.
- 27. The speech recognition system of claim 19, wherein the speech recognition system is networked and includes a server and a client, the client comprising the speech buffer and the server comprising the speech recognition engine, the dynamic grammar generator, and the grammar database.
- 28. The speech recognition system of claim 19, wherein the speech recognition system is networked and includes a server and a client, the client comprising the speech buffer and the speech recognition engine, and the server comprising the dynamic grammar generator, and the grammar database.
- 29. A method of recognizing speech, the method comprising:
receiving an input speech signal; performing an initial recognition on the input speech signal to generate a first pass result; determining a level of the first pass result in a knowledge hierarchy; and generating a first grammar having a level higher in the knowledge hierarchy than the level of the first pass result, the second pass grammar having a portion set to match a first part of the input speech signal; and applying the first grammar to the input speech signal to generate a second pass result.
- 30. The method of claim 29, wherein generating a first grammar comprises:
determining the portion of the first grammar to be set to match the first part of the input speech signal based upon the determined level of the first pass result; and generating the first grammar having the portion set to match the first part of the input speech signal.
- 31. The method of claim 29, wherein applying the first grammar comprises:
setting the first part of the input speech signal as matched with the portion of the first grammar; recognizing a second part of the input speech signal; and generating the second pass result based upon the recognized second part of the input speech signal.
- 32. The method of claim 29, wherein the second pass result is modified based upon location-based information.
- 33. The method of claim 29, further comprising:
generating a second grammar based upon the second pass result, the second grammar having a level lower in the knowledge hierarchy than both the level of the second pass result and the level of the first grammar, the second grammar limiting the second part of the input speech signal to the second pass result and configured to recognize the first part of the input speech signal within the second pass result; and applying the second grammar to the input speech signal to generate a third pass result.
- 34. The method of claim 33, wherein generating a second grammar comprises:
limiting the second part of the input speech signal to the second pass result; generating a model corresponding to the first part of the input speech signal and varying within the second pass result.
- 35. The method of claim 33, wherein applying the second grammar comprises comparing the first part of the input speech signal to the model while the second part of the input speech signal is limited to the second pass result.
- 36. The method of claim 33, wherein the third pass result is modified based upon location-based information.
- 37. The method of claim 29, wherein the first part of the input speech signal corresponds to a street address and the second part of the input speech signal corresponds to a city name.
- 38. A server for use in a networked speech recognition system using a multiple pass speech recognition method including at least a first pass and a second pass for recognition of an input speech signal, the server comprising:
a grammar database for storing a plurality of grammar; and a dynamic grammar generator for generating a first grammar based upon a result of the first pass using the grammar stored in the grammar database, the first grammar having a portion set to match a first part of the input speech signal and configured to recognize a second part of the input speech signal.
- 39. The server of claim 38, wherein the dynamic grammar generator further generates a second grammar based upon a result of the second pass, the second grammar limiting the second part of the input speech signal to the result of the second pass and configured to recognize the first part of the input speech signal within the result of the second pass result.
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims priority under 35 U.S.C. §119(e) to co-pending U.S. Provisional Patent Application No. 60/413,958, entitled “Multiple Pass Speech Recognition Method and System,” filed on Sep. 25, 2002, the subject matter of which is incorporated by reference herein in its entirety.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60413958 |
Sep 2002 |
US |