This application relates to mobile telephone communication systems. In particular, it relates to methods of real-time extraction and storing the information received from the voice channel and temporarily saved on a mobile telephone as an audio-buffering record.
In the last decade, mobile networking has become a mature technology coalescing various capabilities ranging from wireless telephony to basic computing and internet connection. The heart of such networking remains a mobile phone conventionally processing voice signals. However, mobile phone capabilities of mobile networking remain limited. In particular, mobile phones have not been adapted to support a real-time memo function. As a result, a mobile-phone user receiving, for example, a telephone number from a transmitting party during a phone conversation, has to interrupt the flow of the conversation to be able to write down the number spoken to him, or memorize it.
Phone numbers are likely the single most common datum shared over the phone, very often in a situation when the user is distracted attending to other parallel tasks. The necessity to use both hands and eyes to find a pen and paper to record the spoken telephone number in a situation such as driving can be life-threatening. However, the urge to do so is frequent, as the whole purpose of using a mobile phone while driving is communication, and the spoken number is necessary for further communication. A real time capture of the telephone number within such context can be considered critical because otherwise the information is lost.
Kim, in U.S. Pat. No. 6,421,353, which is incorporated herein in its entirety, suggested a particular implementation of a mobile radio phone capable of general recoding and reproducing data received from a voice channel. However, the problem of real-time automatic extraction and recording of the telephone number transmitted from a communicating party without interruption of the phone conversation remains largely unsolved.
Embodiments of the present invention use speech recognition to realize a real-time memo function on a mobile phone or other mobile device for capturing and storing contact information such as a telephone number in recently processed audio data. A user input is received at a mobile device to capture contact information contained in recent audio data processed by the mobile device. Based on the received user input, speech in the recent audio data is identified that corresponds to the contact information. Then speech recognition program is used in a processor to extract the contact information from the identified speech. The contact information is stored in mobile device memory storage.
Embodiments of the present invention also include a mobile device for wireless networking. An audio buffer buffers recent audio data to be processed by the mobile device. A user input element receives a user input from a user to process the recent audio data buffered on the audio buffer. A device processor uses a speech recognition program for: (i.) identifying speech data in the recent audio data that corresponds to spoken contact information, (ii.) extracting the spoken contact information from the speech data, and (iii.) storing the contact information in a memory storage.
Embodiments of the present invention also include a computer program product for capturing contact information on a mobile device. The computer program product includes a tangible storage medium having a computer readable program code thereon. The computer program product includes program code for receiving a user input to capture contact information contained in recent audio data processed by the mobile device, program code for identifying speech in the recent audio data corresponding to the contact information, program code for using speech recognition to extract the contact information from the identified speech, and program code for storing the contact information in a mobile device memory storage.
In further specific embodiments, the extracted contact information is provided to the user and a confirmation input is received from the user that the contact information has been correctly extracted. For example, the extracted contact information may be audibly and/or visually provided to the user for confirmation. The extracted contact information also may be provided to the user in response to a confirmation request input from the user. The user input may be received from a hardware button on the mobile device or a programmable user input element on the mobile device.
In some specific embodiments, extracting the contact information may include outputting to the user a success tone indicating that the contact information has been confidently extracted; for example, when an extraction confidence level exceeds a confidence threshold value. Extracting the contact information also may include outputting to the user a warning tone indicating that the contact information may not have been successfully extracted; for example, when an extraction confidence level fails to reach a confidence threshold value.
The contact information may specifically include a telephone number. And the telephone number may be dialed in response to a dialing request from the user.
The embodiments of the present invention will become more apparent by referring to the following detailed description of the invention and the attached drawings in which:
Various embodiments of the present invention are directed to techniques for real-time extraction and storing contact information such as a telephone number, spoken over the mobile device by the transmitting party to the user and temporarily stored as an audio-buffering record on the mobile device. For the purposes of this disclosure and accompanying claims, real-time performance of a system is understood as performance which is subject to operational deadlines from a given event to a system's response to that event. For example, a real-time extraction of contact information (such as a telephone number, an address, or an e-mail address) from an audio buffer of a mobile device may be one triggered by the user and executed simultaneously with and without interruption of a mobile communication during which such telephone number has been recorded. Although the description of specific embodiments of the invention is provided for extraction of a telephone number, it is understood that the telephone number is used only as an example, and real-time extraction of any other pre-determined type of information stored on a mobile device is within the scope of the invention.
The microprocessor 104 continuously and automatically buffers a pre-determined amount of the recent audio data 102 on an audio buffer 106 of the mobile device 100, while simultaneously delivering the recent audio data to the user in a form of audio output 108 through a speaker 110. The amount of the audio data instantaneously present in the buffer may be set in different ways, for example by keeping on record only the last N seconds of the phone conversation. This predetermined amount of buffered, during N seconds, data may then be searched, using a speech recognition and extraction application 112, in response to a capture request that may be formatted as one of the user inputs 114, to extract a telephone number from speech represented by the buffered audio data.
Various user inputs 114 may be implemented with the help of a user-input element, which may be represented by, for example, a programmable element 116 or, in some cases, by a hardware button 120 of the user interface (UI) 118 of the mobile device 100. Both the programmable element and the hardware button are specifically configured to accept the user input, in the form of the capture request, to the mobile device, to initiate processing of the recent audio data 102 stored on the recorder 106 in the form of buffered data, to extract the telephone number. In embodiments where the hardware button 120 is used, it is preferably located on the side of the mobile device 100, as shown in
Referring to
The search and identification of the speech segment can be carried out using applications well known in the art, such as grammar-based speech end-pointing, for example. Grammar-based end-pointing is generally based on matching the elements of speech with an appropriate grammatical format. In the case of a domestic telephone number, for example, such grammatical format may be pre-determined to limit the telephone number to ten digits, the first three of which designate an area code. In a case of an international phone number, there may be required an additional designator of a country code, which may comprise three digits and precede the ten-digit number. An optional extension to the telephone number, which is known to be defined with appropriate cradling words (such as “extension”), can, therefore, also be readily recognized. It is understood, however, that the invention is not limited to telephone number formats. Specific embodiments of the invention may judiciously utilize various other formats corresponding to different types of well-structured contact information spoken to the user (such as a street address, or an e-mail address, or a URL) to facilitate identification of the speech segment containing the sought-after spoken information.
Referring, again, to
Embodiments of the invention warrant a minimum level of accuracy and confidence of the telephone-number extraction and recognition, as compared to conventional automatic speech-recognition technology. On one hand, the accuracy of speech-recognition is reciprocally affected by the amount of buffed data containing target information to be captured. To this end, in some embodiments, the buffer length may be determined and pre-set by, for example, having the buffer configured to store only the data received during last N seconds of the telephone conversation. Such determination and pre-setting may be made based upon, for example, statistically averaged amount of time necessary to speak out a telephone number. In such instance, the buffer space (N seconds) may be large enough to make it easy for the user to acquire a just-spoken telephone number, but not as large as to accommodate lots of additional, targetless audio data that might be misconstrued as part of a target utterance. This increases the accuracy of capturing the target information. On the other hand, once N has been preset for the system, by providing his input to the system the user increases the probability of the speech-recognition success because the user input marks the end of and, therefore, unambiguously, uniquely, and completely defines the N-second segment of the received audio data to be searched. Moreover, by optimizing the length N of the buffer 106, the amount of time required to complete the capture and extraction processes is optimized as well because the processor 104 does not have to unnecessarily handle excessive, targetless data.
In addition, to maximize accuracy of recognition and extraction of the spoken telephone number in specific embodiments, the grammar-based speech end-pointing algorithm of the invention may be judiciously designed to statistically incorporate existing history of telephone connections established with a particular mobile device. For example, a list of contacts, saved in memory of the device and containing phone numbers and other information previously used to place a call or extracted from previously received calls, may be incorporated to bias the end-pointing algorithm towards a preferred recognition hypothesis that has higher probability of success without user intervention. As another example, if many of the contacts from the contact list have associated email addresses from a particular domain (such as yahoo.com), the recognition process may be weighed or biased to prefer new contacts that are associated with the same domain.
Following the announcement, to the user, of the results of processing the spoken telephone number 304 from the recent audio data stored on the audio buffer, the mobile device 100 switches into one of two idle states, 220 or 222. These idle states assure that a live mobile phone conversation between the user and the transmitting party continues uninterrupted or, alternatively, voicemail interface remains uncompromised. Idling in the states 220 or 222, the mobile device 100 may be waiting for an appropriate user input, which is instructive of further operation of the mobile device. For example, the user may either request a re-capture 224 of the spoken-phone-number at step 226 (in case the extracted number was not recognized at step 214) or, otherwise, request a confirmation of the recognized phone number at step 228. Either request may be communicated to the mobile device 100 through the user input element of the UI 118 after the live mobile phone conversation or voice mailing has been completed, by either operating a programmable element 116 or pressing a hardware button 120, specifically configured to accept both the re-capture and the confirmation requests.
At step 230 and as shown in
As described, embodiments of the invention allow for the telephone numbers, exchanged by voice over the mobile device, to be saved and reused with nominal intervention by the user. The user's minimal attention is required only to mark the relevant buffered audio data to be searched, initiate further operation of the idling mobile device, and otherwise dispose appropriately of the correctly extracted telephone number. Respectively, as described, the user may provide a capture input initiating the extraction and recognition of the spoken telephone number, either a re-capture or confirmation request recognizing the results of extraction, and a request to either permanently store in the device memory, or dial, or appropriately further deal with the extracted number. In the process of real-time capture of the spoken number the user is, therefore, minimally distracted. The embodiments can be easily implemented as a combination of a computer program product and hardware, compatible with and integrated within existing mass- producible mobile phone devices.
It is understood that operation of the embodiments of the invention requires programmable computer instructions, configuration, and support embodying all or part of the functionality previously described herein with respect to the invention and locally loaded onto the mobile device 100. Those skilled in the art should appreciate that such computer instructions and support can be written in a number of programming languages for use with many computer architectures or operating systems. For example, some embodiments may be implemented as entirely software (e.g., a computer program product) in a procedural programming language (e.g., “C”) or an object oriented programming language (e.g., “C++”). Furthermore, such instructions may be stored in any memory device, such as semiconductor, magnetic, optical or other memory devices, and may be either transmitted to the mobile device 100 using any communications technology (such as optical, infrared, microwave, or other transmission technologies) or embedded in it in a form of a programmable hardware chip with a computer program product fixed in it. It is expected that such a computer program product may be distributed as a removable storage medium with accompanying printed or electronic documentation (e.g., shrink wrapped software), preloaded on a mobile device 100 (e.g., on a mobile device ROM or fixed disk), or distributed from a server or electronic bulletin board over the network (e.g., the Internet or World Wide Web). Of course, some embodiments of the invention may be implemented as a combination of both software and hardware. Still other alternative embodiments of the invention can be implemented as pre-programmed entirely hardware elements.
Although various exemplary embodiments of the invention have been disclosed, it should be apparent to those skilled in the art that various changes and modifications can be made which will achieve some of the advantages of the invention without departing from the true scope of the invention.
This application claims priority from U.S. Provisional Patent Application No. 61/050,281 filed on May 5, 2008, the disclosure if which is incorporated herein in its entirety.
Number | Date | Country | |
---|---|---|---|
61050281 | May 2008 | US |