Many automated systems require a secure password or code to be entered using telephone keys to access information or to perform different functions. For example, automated banking systems may require a secure password or security code to retrieve account information. Such systems may prompt a user to input secret information, such as a birth date or social security number, or other password associated with the user. The system then verifies the user's input or response against a stored record of the secret information or password to verify the authenticity of the user. These simple numeric passwords are often relatively easy to discover, surreptitiously.
Different applications use phone or dialog systems to prompt a user to enter speech information as a response to the prompt, in order to perform tasks. These applications use speech recognition systems to recognize the input speech. Such speech recognition systems use grammars to identify words in a spoken utterance. In the context of a phone or dialog system for secure information, it is difficult to build a grammar for the secure data. This is because, for a grammar to recognize a word, it must have a rule written for that word. Thus, proper names and other words often used as secret password information are not well dealt with in grammars. Further, even if the grammar does contain the secret password, if the automated speech recognition takes place in the telephone dialog system, outside of a secure application or system, security is compromised because the secret password information is now generally unsecured.
Embodiments of the present invention address one or more of these and/or other problems. This background is not intended to limit the invention in any way, and is provided by way of example only.
Embodiments of the present invention relate to a speech recognition system for secure information. The speech recognition system includes a sub-word speech unit recognition component which interfaces with a security system. The sub-word speech unit recognition component receives a speech input utterance, representing a password or secret information, from a user, recognizes the sub-word speech units in the utterance and provides the sub-word speech units to the security system to compare the sub-word speech units against stored information or data.
The above summary is provided to introduce a selection of concepts in a simplified form that are further described in the Detailed Description section below. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
Embodiments of the present invention relate to sub-word speech recognition for secure information. Prior to describing the invention in more detail, an embodiment of on illustrative a computing environment 100 in which the invention can be implemented will be described with respect to
The computing system environment 100 shown in
The invention is operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, and/or configurations that may be suitable for use with the invention include, but are not limited to, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Those skilled in the art can implement aspects of the present invention as instructions stored on computer readable media based on the description and figures provided herein.
The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
With reference to
Computer 110 typically includes a variety of computer readable media. Computer readable media can be any available media that can be accessed by computer 110 and includes both volatile and nonvolatile media, removable and non-removable media. By way of example, and not limitation, computer readable media may comprise computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computer 100. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier WAV or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, FR, infrared and other wireless media. Combinations of any of the above should also be included within the scope of computer readable media.
The system memory 130 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 131 and random access memory (RAM) 132. A basic input/output system 133 (BIOS), containing the basic routines that help to transfer information between elements within computer 110, such as during start-up, is typically stored in ROM 131. RAM 132 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 120. By way o example, and not limitation,
The computer 110 may also include other removable/non-removable volatile/nonvolatile computer storage media. By way of example only,
The drives and their associated computer storage media discussed above and illustrated in
A user may enter commands and information into the computer 110 through input devices such as a keyboard 162, a microphone 163, and a pointing device 161, such as a mouse, trackball or touch pad. Other input devices (not shown) may include a joystick, game pad, satellite dish, scanner, or the like. These and other input devices are often connected to the processing unit 120 through a user input interface 160 that is coupled to the system bus, but may be connected by other interface and bus structures, such as a parallel port, game port or a universal serial bus (USB). A monitor 191 or other type of display device is also connected to the system bus 121 via an interface, such as a video interface 190. In addition to the monitor, computers may also include other peripheral output devices such as speakers 197 and printer 196, which may be connected through an output peripheral interface 190.
The computer 110 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 180. The remote computer 180 may be a personal computer, a hand-held device, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer 110. The logical connections depicted in
When used in a LAN networking environment, the computer 110 is connected to the LAN 171 through a network interface or adapter 170. When used in a WAN networking environment, the computer 110 typically includes a modem 172 or other means for establishing communications over the WAN 173, such as the Internet. The modem 172, which may be internal or external, may be connected to the system bus 121 via the user-input interface 160, or other appropriate mechanism. In a networked environment, program modules depicted relative to the computer 110, or portions thereof, may be stored in the remote memory storage device. By way of example, and not limitation,
Embodiments of the present invention relate to a speech recognition system 200 for secure information which has varied applications and is not limited to the specific embodiments shown. In the embodiment shown in
In one embodiment, speech recognition system 206 includes a sub-word speech unit recognition component 212. The sub-word speech unit recognition component 212 receives the response or utterance 210 from user 207. Component 212 recognizes, in the input speech utterance or response 210, sub-word speech units 214, such as phonemes.
In the embodiment shown, the security system 204 includes a secure database or secure information 220. In the embodiment described, the database 220 includes sub-word speech units corresponding to security data, such as passwords or security codes. As shown, the recognition component 212 interfaces with the security system 204 through a secure interface 222 for authentication of the input speech or utterance 210. Secure interface 222 illustratively is a firewall or other interface that employs a security protocol. The particular interface or protocol is not important for purposes of the present invention other than to say that the data in security system 204 is more secure than that in application 202.
In particular, in an illustrated embodiment, the system 200 is used to verify or authenticate a password or security code. The password or code is input by the user 207 in response to prompt 208. The utterance is processed into sub-word speech units 214 by the sub-word speech unit recognition component 212. The application 202 provides the sub-word speech units 214 in addition to a user identification 224, such as the user's name, account number or other identification code, to the security system 204.
The security system 204 uses the sub-word speech units 214 and user identification 224 to access stored information indicative of the password or security code corresponding to the received user identification 224. The stored information may be, for example, stored sub-word speech units. Sub-word speech units corresponding to the input speech are compared to stored data or stored sub-word speech units by a speech unit comparator component 225.
If the input sub-word speech units 214 match the stored password or security code then, an authorization message 226 is provided to application 202 through the secure interface 222 that the password is correct. Otherwise, the message 226 indicates that the password is not correct. As described, for the secure information, only sub-word speech units are recognized at application 202 and passed to the security system 204 over secure interface 222. Thus, word level recognition of secure information is not available outside of the security system 204 to protect the security of the information.
In response to the prompt 208, the user 207 utters a response 210 as shown in block 234. The sub-word speech units in uttered response 210 are recognized by the sub-word speech unit recognition component 212 as illustrated by block 236. The sub-word speech units 214 are provided to the security system 204 through the secure interface 222 along with other identifying information 224 as illustrated by step 238. The security system 204 compares sub-word speech units 214 with secure data or information stored in store 220 for the identified user 207.
In particular, in the illustrated embodiment, speech unit comparator component 225 retrieves stored sub-word speech units for the secure data or information and compares the stored sub-word speech units to the input sub-word speech units 214 for the input utterance as illustrated by block 240. The stored sub-word speech units and the sub-word speech units for the input speech or utterance are compared to determine if the input utterance matches the stored data or password for the user 207 as illustrated by block 242.
If there is a match, then the security system or application 204 sends a message 226 to the application 202 verifying the match as shown in block 248 and the application 202 unlocks the task or information sought by user 207, as shown in block 250. For example, if the sub-word speech units for the input utterance match the sub-word speech units or phonemes for the stored information, the security system can unlock the application 202 so that the user can access otherwise locked information or perform a desired task or tasks.
If there is no match, then the security system 204 sends a message to the application 202 that there is no match as shown in block 252, and the application 202 remains locked and/or displays an error message to the user 207 as illustrated by block 254.
In the embodiments described, the secure information is never fully recognized outside of security system 204. Instead, only the sub-word speech units corresponding to the secure information are recognized and passed to the security system 204. Thus, word-level grammars for the secure information need not be available outside of the security system 204. For example, if the user is prompted to input the user's mother's maiden name to unlock a bank account of a telephonic banking system, the word level recognition is not available outside of the security system 204. Instead, the input utterance of the user's mother's maiden name is recognized as sub-word speech units, and the sub-word speech units are passed to the security system 204 to verify that the user's input utterance matches the data for the user's mother's maiden name stored in the secure database 220.
As illustrated in
As illustrated by block 288, the system determines if the user's response is non-audible (such as text) or speech. If the user's secure information is entered via the audio input device 260, sub-word speech units are recognized for the secure information entered by the user with the sub-word speech unit recognizer 268 as illustrated by block 290. If the user's response is entered as text input, sub-word speech units are generated for the text input or response by the sub-word speech unit generator 270 as illustrated by step 292. Once the sub-word speech units 271 are generated or recognized, the sub-word speech units 271 are stored in the secure database 220 under the user's identification or account, as illustrated by block 294.
Although the present invention has been described with reference to particular embodiments, workers skilled in the art will recognize that changes may be made in form and detail without departing from the spirit and scope of the invention.