Some computer systems may be adapted to detect and recognize spoken words. Typically, an input device, such as a microphone or a telephone, receives the spoken words and converts the words into an analog or digital computer readable representation. An automated speech recognition (ASR) engine may utilize the representation to detect and recognize the words.
In many situations, the ASR engine may be licensed to an organization from an external developer of the engine. The license may specify the maximum number of simultaneous connections allowed to be established with the ASR engine. Unfortunately, the number of connections needed may exceed the number of connections allowed by the license. In addition, modifying the license to increase the number of allowable connections may result in a fee imposed by the developer.
In accordance with at least some embodiments, a system comprises a first speech recognition engine, a second speech recognition engine, and evaluation logic coupled to the first and second speech recognition engines. The evaluation logic evaluates the first and second speech recognition engines based on evaluation voice signals from a user and, based on the evaluation, selects one of said speech recognition engines to process additional speech signals from the user.
For a detailed description of exemplary embodiments of the invention, reference will now be made to the accompanying drawings in which:
Certain terms are used throughout the following description and claims to refer to particular system components. As one skilled in the art will appreciate, various companies may refer to a component by different names. This document does not intend to distinguish between components that differ in name but not function. In the following discussion and in the claims, the terms “including” and “comprising” are used in an open-ended fashion, and thus should be interpreted to mean “including, but not limited to . . . .” Also, the term “couple” or “couples” is intended to mean either an indirect or direct electrical connection. Thus, if a first device couples to a second device, that connection may be through a direct electrical connection, or through an indirect electrical connection via other devices and connections.
The following discussion is directed to various embodiments of the invention. Although one or more of these embodiments may be preferred, the embodiments disclosed should not be interpreted, or otherwise used, as limiting the scope of the disclosure, including the claims. In addition, one skilled in the art will understand that the following description has broad application, and the discussion of any embodiment is meant only to be exemplary of that embodiment, and not intended to intimate that the scope of the disclosure, including the claims, is limited to that embodiment.
The network 104 couples together the audio device 106 and the computer system 102 and facilitates the exchange of data between the audio device 106 and the computer system 102. The audio device 106 may comprise a telephone, and the network 104 may comprise the infrastructure of telephone lines and signal switches that route telephone calls. In some embodiments of the invention, the network 104 may be an internet protocol (IP) network, such as the Internet, and the audio device 106 may comprise a voice-over-IP (VoIP) transmitter and receiver.
The I/O interface 112 couples together the network 104 and the computer system 102 and facilitates the exchange of data between the network 104 and the computer system 102. The I/O interface 112 comprises hardware that is capable of establishing a connection with the network 104, such as modems and network adapters. “Utterances” from a user 116 of the audio device 106 may be converted into an analog or digital representation by the audio device 106 and routed through the network 104 to the I/O interface 112. As used herein, an utterance is a vocalization that represents a certain meaning to the system 100. Utterances may be a single word, a few words, a sentence, or even multiple sentences. Once received by the I/O interface 112, the representation may be stored in the memory 110 and processed by the SR module 114 and the CPU 108.
The IVR platform 202 may comprise a plurality of speech recognition applications that facilitate messaging, portals, and other enhanced voice-enabled interactive services. Typically, the IVR platform 202 is capable of handling a plurality of simultaneous user sessions. Each user session represents an established connection between the IVR platform 202 and the user 116 of the system 100.
To enable ASR functionality, the IVR platform 202 may establish connections with the primary and secondary ASR engines 212 and 214 through the dialog manager 204. The interface 216 negotiates the desired connections with the ASR switch 206. The ASR switch 206 may establish and release connections to the primary ASR engine via the interface 218 and establish and release connections to the secondary ASR engine 214 via the interface 220.
The primary and secondary ASR engines 212 and 214 may comprise logic that performs ASR functions, such as signal processing and matching. The logic embodied in the ASR engines 212 and 214 may be the same or different from each other. If ASR logic is different in the engines 212 and 214, the resulting relative accuracy or performance of the engines may differ. The primary and secondary ASR engines 212 and 214 may be representative of a commercial grade ASR engine and an in-house or open source ASR engine, respectively.
The primary ASR engine 212 is used pursuant to an associated license that specifies the number of simultaneous connections that may be established between the IVR platform 202 and the primary ASR engine 212. The license may carry an associated fee that increases with the larger numbers of licensed connections. For example, a twenty-connection license may cost twice the amount of a ten-connection license. The secondary ASR engine 214 may not have an associated license and thus may establish any number of connections with the IVR platform 202. The secondary ASR engine 214 may be exemplary of an open source ASR engine.
The embodiments of the invention effectively reduce the number of connections established to the primary ASR engine 212 by utilizing the secondary ASR engine 214 whenever a predetermined evaluation condition is met. Since the secondary ASR engine 214 may not have an associated licensing fee, the overall costs associated with ASR functionality in the system 100 may be reduced.
Referring again to
Verification-based evaluation criteria compare the output of the primary and secondary ASR engines 212 and 214. If the secondary engine 214 produces output identical to the primary ASR engine 212, the secondary ASR engine 214 may be used, thereby allowing other connections to use the licensed ports of the primary ASR engine 212.
Response time-based evaluation criteria determine (e.g., measure), a parameter such as the response time of the primary and secondary ASR engines 212 and 214. If, compared to the primary ASR engine 212, the secondary ASR engine 214 has an identical or shorter response time, the secondary ASR engine 214 may be used after validation.
Confidence-based evaluation criteria use a confidence score generated by the primary and secondary ASR engines 212 and 214 during the evaluation. A threshold may be set that determines when the evaluator 210 should select the secondary ASR engine 214 over the primary ASR engine 212. For example, the threshold may represent a fraction of the confidence score obtained from the primary ASR engine 212. If the confidence score of the secondary ASR engine 214 is equal to or higher than the threshold level, the secondary ASR engine 214 may be utilized.
Continuation-based evaluation criteria determine whether a user has successfully navigated through an ASR menu. For example, if the user is able to reach a menu beyond the first level of a menu system with both ASR engines 212 and 214, the secondary engine 214 may be selected and utilized for the user's future utterances. Successful navigation to a secondary level of the menu system may provide a relative indicator that the secondary ASR engine 214 is detecting and recognizing the user's voice commands.
The ASR switch 206 may use the results of the evaluation, as well as the optional port monitor 208, to determine which connections may be maintained and which connections may be released. In some embodiments, the port monitor 208 may be included and used to monitor currently used ports of the primary ASR engine 212. The port monitor 208, optionally in conjunction with the evaluator 210, determines whether the primary ASR engine 212 should be used without further consideration or whether the exemplary procedure of
The above discussion is meant to be illustrative of the principles and various embodiments of the present invention. Numerous variations and modifications will become apparent to those skilled in the art once the above disclosure is fully appreciated. It is intended that the following claims be interpreted to embrace all such variations and modifications.