Claims
- 1. A system for disambiguating speech input comprising:
a speech recognition component that receives recorded audio or speech input and generates:
one or more tokens corresponding to the speech input; and for each of the one or more tokens, a confidence value indicative of the likelihood that the a given token correctly represents the speech input; a selection component that identifies, according to a selection algorithm, which two or more tokens are to be presented to a user as alternatives; one or more disambiguation components that perform an interaction with the user to present the alternatives and to receive a selection of alternatives from the user, the interaction taking place in at least a visual mode; and an output interface that presents the selected alternative to an application as input.
- 2. The system of claim 1, wherein the disambiguation components and the application reside on a single computing device.
- 3. The system of claim 1, wherein the disambiguation components and the application reside on separate computing devices.
- 4. The system of claim 1, wherein the one or more disambiguation components perform said interaction by presenting the user with alternatives in a visual mode, and by receiving the user's selection in a visual mode.
- 5. The system of claim 4, wherein the disambiguation components present the alternatives to the user in a visual form and allow the user to select from among the alternatives using a voice input.
- 6. The system of claim 1, wherein the one or more disambiguation components perform said interaction by presenting the user with alternatives in a visual mode, and by receiving the user's selection in either a visual mode, a voice mode, or a combination of visual mode and voice mode.
- 7. The system of claim 1, wherein the selection component filters the one or more tokens according to a set of parameters.
- 8. The system of claim 7, wherein the set of parameters is user specified.
- 9. The system of claim 1, wherein the one or more disambiguation components disambiguates the alternatives in plural iterative stages, whereby the first stage narrows the alternatives to a number of alternatives that is smaller than that initially generated by the selection component, but greater than one, and whereby the one or more disambiguation components operative iteratively to narrow the alternatives in subsequent iterative stages.
- 10. The system of claim 9, whereby the number of iterative stages is limited to a specified number.
- 11. A method of processing speech input comprising:
receiving a speech input from a user; determining whether the speech input is ambiguous; if the speech input is not ambiguous, then communicating a token representative of the speech input to an application as input to the application; and if the speech input is ambiguous:
performing an interaction with the user whereby the user is presented with plural alternatives and selects an alternative from among the plural alternatives, the interaction being performed in at least a visual mode; communicating the selected alternative to the application as input to the application.
- 12. The method of claim 11, wherein the interaction comprises the concurrent use of said visual mode and said voice mode.
- 13. The method of claim 12, wherein the interaction comprises the user selecting from among the plural alternatives using a combination of speech and visual-based input.
- 14. The method of claim 11, wherein the interaction comprises the user selecting from among the plural alternatives using visual input.
CROSS-REFERENCE TO RELATED CASES
[0001] This application claims the benefit of U.S. Provisional Application No. 60/432,227, entitled “Techniques for Disambiguating Speech Input Using Multimodal Interfaces,” filed on Dec. 10, 2002.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60432227 |
Dec 2002 |
US |