This invention relates generally to smart phones and other portable electronic devices and, in particular, to such devices with self-training, lip-reading, and eye-tracking capabilities.
There are many instances wherein it would be advantageous for a smart phone or other portable electronic device to have a speech-to-text capability. For example, if somebody wishes to use the device as a dictation instrument, or if a user wants to convert spoken words into text to send a communication as a text rather than voice transmission.
One problem with speech-to-text systems is that they are inconvenient to train. Speaker-independent algorithms are more challenging than speaker-dependent algorithms, but one advantage of a cell phone or personal electronic device is that speaker-dependent training would suffice in almost all cases.
In training a speech-to-text system, such as Dragon Speak or other such programs, one has to sit down and go through an initial training program which can be quite lengthy and cumbersome. Any method which could alleviate this burden would be desirable.
Another issue with portable telephone use has to do with etiquette. Oftentimes, when people use their phones in restaurants, theaters, and so forth, their voice disturbs others around them, often leading to negative emotions. At the same time, there are instances when a user might need to use their cell phone or other portable electronic device in public, as in the case of emergencies. Accordingly, any system or method which could facilitate such a capability would also be welcomed.
Furthermore, given that many smart phones have user-pointing video cameras, it would be advantageous to use the camera in modes other than video conferencing, such as for eye-tracking.
This invention relates generally to smart phones and other portable electronic devices and, in particular, to such devices with self-training, lip-reading, and eye-tracking capabilities. A method of training a smartphone or other portable electronic device having a microphone, a display, a keyboard, an audio output and a memory, comprising the steps of: receiving words spoken by a user through the microphone; utilizing a speech-to-text algorithm to converting the spoken words into raw text; displaying the raw text on the display; correcting errors in the text using the keyboard; storing, in the memory, data representative of the spoken words in conjunction with the corrected text; and using the stored information to train the device so as to increase the likelihood that when the same word or words are spoken in the future the corrected text will be generated. The spoken words may form part of a phone conversation, with the raw text being displayed whether or not the user wishes to correct the text. The step of suggesting words for the user to speak may use the display or an audio output.
A method of training a smartphone or other portable electronic device having a microphone, a camera and a memory, comprising the steps of: watching a user's lips with the camera as they speak or mouth-out words; storing, in the memory, data representative of the words in conjunction with the user's lip movements; and using the stored information to generate the words based upon future lip movements by a user. The step of generating the words based upon future lip movements may include synthesizing speech representative of the words. The step of generating the words based upon future lip movements may include synthesizing speech representative of the words, and transmitting the synthesized speech to a listener as part of a phone conversation.
The method may include the steps of training the device to learn the user's voice by storing phonemes or other units of the user's speech. The step of generating the words based upon future lip movements may include synthesizing speech representative of the words in the user's voice using the phonemes or other units of the user's speech, and transmitting the synthesized user's speech to a listener as part of a phone conversation, for example.
A method of training a smartphone or other portable electronic device having a keyboard, a display, a camera and a memory, comprising the steps of tracking a user's eyes with the camera as they enter text using the keyboard; storing, in the memory, data representative of the text in conjunction with the user's eye movements; and using the stored information to move a pointing device on the display or control the device in some other manner based upon future eye movements by a user. The method may include the steps of determining if the user is texting while driving based upon the user's eye movements, and performing a function if it is determined that the user is texting while driving based upon the user's eye movements.
A method of determining is the user of a smartphone or other portable electronic device is texting while driving, includes the step of providing smartphone or other portable electronic device with a keypad or touch screen to enter text, a display to show the text entered or text received, a video camera having a field of view including the user of the device, and an eye-tracking application operative to use the video camera of the device to track the eye movements of the user while text is being entered or read on the display.
If it is determined that the user is moving at a rate of speed associated with motor vehicle travel, as though GPS or other methods, a determination is made if the user is engaged in a text-messaging session such as the user entering a text message or the device is receiving a text message, and if the user is looking away from the device during the text-messaging session a predetermined number of times during a predetermined interval of time. If both criteria are satisfied, a determination is made that the user is texting while driving and an action is initiated in response thereto.
The method may include the step of determining if the user is looking away from the device in the middle of entering or reading a sentence, or repeatedly looking away from the device at a particular angle indicative of needing to watch the road while texting. The method may include the step of providing a device with a forward-looking camera and, if the camera shows oncoming traffic, deciding that the user is texting while driving if the user's glances away from the device are related to oncoming traffic.
The action initiated in response to the determination that the user is texting while driving may be to terminate or delay texting operations until certain criteria are met such as vehicle speed falling below 10 MPH or stopping; issue a text or audio warning to the user of the device; issue a text or audio warning to the recipient(s) of the text message; and or record, for law enforcement or insurance purposes, the user's eye movements or a scene in front of the vehicle if the device has a forward-looking camera.
This invention broadly involves methods and apparatus enabling the user of a smart phone or other portable electronic device to train the device to convert speech into text and, in one embodiment, to convert lip movement into speech or text. These training capabilities are done gradually, and use an interface that might even be enjoyable, thereby resulting in a sophisticated electronic device with numerous capabilities not now possible. In an alternative embodiment the system and method includes eye-tracking capabilities. In all embodiments described herein, “keyboard” or “keypad” should be taken to include physical buttons or touch screens.
In accordance with the speech-to-text conversion aspect of the invention,
However, as shown in
In one mode of operation, the device 100 would be continuously converting the words spoken by a user into text, whether the user cares to correct the text or not. However, it is believed that if the text is always generated, it may actually be enjoyable for a user to “see” what they said, and go in and correct it, particularly for the purposes of generating a more sophisticated and accurate result. For example, during “down times,” while sitting in airports, and so forth, it might be enjoyable for a user to play with their device and simply train it on an off-line fashion, that is, whether or not they are talking to another individual.
In accordance with a different aspect of the invention,
It will be appreciated that if the user holds the smart phone or other device away from their face, any camera oriented toward the user may be utilized for lip-reading capabilities. For example, if the device is being used as a walkie-talkie or in speaker-phone mode, a camera at the upper end of the device may be used. In addition, particularly in this configuration, the device may present words for the user to say, with the device automatically interpreting the user's lip movements. This may be done if the user is actually annunciating the words out loud or simply moving their lips without sound. The words presented to the user may be randomly selected or, more preferably, chosen to advance the lip-reading capabilities. That is, words may be selected that exercise particular lip movements, and such words may be repeated over time to enhance the learning process.
The advantages of a smart phone or other portable electronic device having a lip-reading function are many. There are often times when background noise such as wind, and other conditions, makes reception of a user's voice problematic. In such situations, a trained system may either use lip movements entirely, or intelligent decisions may be made regarding the lip movements and those sounds which the device can interpret, thereby manipulating or deriving audio for the listening party which is much more intelligible.
Another advantage is that if a person using the device suddenly finds themselves in a situation where they need to speak quietly, they can automatically go from their own speaking voice to a silent lip-movement only mode of operation, in which case the system will automatically recognize that the person is still “speaking”, but doesn't want to use a loud voice. In such situations, the device will access the memory used to train the system, and automatically generate the user's voice for transmission to the receiving end. Again, as with background noise, the user doesn't necessarily have to go from a loud speaking voice to pure silence, but may go to a whispering voice, with the device making intelligent decisions about what the person is attempting to say, and generating a voice signal corresponding to that intention.
A further embodiment of the invention involves eye tracking. This capability would preferably be carried out when the user is texting with the smart phone or other device moved away from their face enabling the camera(s) to obtain a view of the user's eyes. In one mode, the camera(s) watch the user's eyes as they are entering words, with the device recording the user's gaze in relation to the letter or word being entered on the screen. Although such movements may be physically subtle, it is anticipated that the resolution of smart phone cameras will increase to gigapixels in the coming years, rendering such tracking capabilities highly practical.
In the text-entry mode of tracking, the relationship between the user's eyes (gaze) and the precise location on the screen will be learned and saved. This would facilitate various modes of operation, including the ability to move a cursor on the screen without touching it. Such a capability would be useful in a hand's free mode of operation and, if the device were programmed to recognize the common user(s) of the device, enhanced security during log-on, for example.
In another eye-tracking mode of operation, the device monitors the user's eye movements while texting to determine particular behaviors.
1) Does the user glace away from the keypad or display screen of the device more often than they would if they were not driving? For example, in a 10-second interval while text is being entered, does the user look away from the keypad or display screen of the device multiple times? If so, the user may be texting while driving.
2) Does the user glace away from the keypad or display screen of the device at times requiring their attention elsewhere? For example, does the user glace away from the keypad or display screen of the device and stop texting in the middle of a sentence? Do they do this multiple times during one sentence or during one message? If so, the user may be texting while driving.
3) Does the user look away from the keypad or display screen of the device multiple times at a particular angle indicative of needing to watch the road? Referring to
If the device has a forward-looking camera, additional tests may be performed. If the camera shows oncoming traffic, and if the user's glances away from the portable electronic device are related to the traffic, the user may be texting while driving. For example, if the user looks away from the device if or when oncoming traffic gets closer to the user's vehicle, this would almost certainly indicate texting while driving. Note that if the device can sense oncoming traffic, a speed sensor in the device may not be necessary.
If one or more of the above test indicate texting while driving, the device may perform one or more of several options:
This application claims priority from U.S. Provisional Patent Application Ser. No. 61/658,558, filed Jun. 12, 2012, the entire content of which is incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
61658558 | Jun 2012 | US |