1. Field
The disclosed embodiments generally relate to user interfaces and, more particularly to user interfaces including speech recognition.
2. Brief Description of Related Developments
Automatic speech recognition can be used in a variety of devices to enter text electronically by dictating the desired text. Depending on, for example, the speech recognition algorithm, the speaker's voice and the environmental conditions surrounding the speaker, the text recognition accuracy can range anywhere from zero to one-hundred percent for any given word, sentence or paragraph. The errors introduced in the speech recognition process generally take the form of, for example, wrong words, extra words or missing words in the resulting text. While the dictation of the desired text may be reasonably fast and effortless, the correction of the incorrect words in the resulting text is generally time consuming and tedious.
Generally the correction of the incorrect text occurs one word at a time, one character at a time or by correcting a string of adjacent text (e.g. text arranged one after another in a continuous string such as the words of a sentence). Generally the corrections are made by manually (e.g. through a keyboard or other physical input) retyping the incorrect text, selecting a better candidate for the intended text from a menu or through speech recognition by re-dictating the incorrect text. Generally for non-adjacent text, the correction algorithm must be restarted for each non-adjacent text, which makes correction of non-adjacent text repetitive, tedious and time consuming.
It would be advantageous to quickly and efficiently correct non-adjacent pieces of text that are input with automatic speech recognition.
The aspects of the disclosed embodiments are directed to a method including detecting a selection of a plurality of erroneous words in text presented on a display of a device, in an automatic speech recognition system, receiving sequentially dictated corrections for the selected erroneous words in a single, continuous operation where each dictated correction corresponds to at least one of the selected erroneous words, and replacing the plurality of erroneous words with one or more corresponding words of the dictated corrections where each erroneous word is matched with the one or more corresponding words of the dictated corrections in an order the erroneous words appear according to a reading direction of the text.
In another aspect, the disclosed embodiments are directed to a computer program product stored in a memory. The computer program product includes computer readable program code embodied in a computer readable medium for detecting a selection of a plurality of erroneous words in text presented on a display of a device, in an automatic speech recognition system, sequentially receiving a dictated correction for the selected erroneous words in a single, continuous operation where each dictated correction corresponds to at least one of the selected erroneous words, and replacing the plurality of erroneous words with one or more corresponding words of the dictated corrections where each erroneous word is matched with the one or more corresponding words of the dictated corrections in an order the erroneous words appear according to a reading direction of the text.
Other aspects of the disclosed embodiments are directed to an apparatus including a display and a processor configured to detect a selection of a plurality of erroneous words in text presented on the display, receive, through an automatic speech recognition module, sequentially dictated corrections for the selected erroneous words in a single, continuous operation where each dictated correction corresponds to at least one of the selected erroneous words, and replace the plurality of erroneous words with one or more corresponding words of the dictated corrections where each erroneous word is matched with the one or more corresponding words of the dictated corrections in an order the erroneous words appear according to a reading direction of the text.
Still other aspects of the disclosed embodiments are directed to a user interface including a display configured to display computer readable text, at least one input device configured to receive sequentially dictated corrections through automatic speech recognition for replacing a plurality of selected erroneous words in a single, continuous operation where each dictated correction corresponds to at least one of the selected erroneous words, and a processor being configured to detect a selection of the plurality of erroneous words in the computer readable text presented on the display, and replace the plurality of erroneous words with one or more corresponding words of the dictated corrections where each erroneous word is matched with the one or more corresponding words of the dictated corrections in an order the erroneous words appear according to a reading direction of the text.
The foregoing aspects and other features of the embodiments are explained in the following description, taken in connection with the accompanying drawings, wherein:
The aspects of the disclosed embodiments provide for the correction of adjacent text or words (e.g. pieces of text located next to each other) and non-adjacent text or words (e.g. incorrect text separated by correct text) in transcribed text that is entered into, for example, the system 100, through automatic speech recognition. Aspects of the disclosed embodiments also allow for the correction of adjacent or sequential text. The text corrections can be made quickly and efficiently by selecting all of the text to be corrected in the transcribed text and correcting the text in one operation or instance, as will be described in greater detail below. The aspects of the disclosed embodiments substantially eliminate repeating a correction task for each and every non-adjacent piece of text such that the automatic speech recognition feature of the system 100 is activated but one time for correcting all of the incorrect text in the transcribed text irrespective of the number of corrections performed.
In accordance with aspects of the disclosed embodiments, the system may include a speech recognition module 137, a display 114 and a touch/proximity screen 112 (referred to herein generally as a touch screen) or any other suitable input device. The speech recognition module 137 may be configured for continuous speech recognition. The speech recognition module 137 may include any suitable speech recognizer that may include algorithms for reducing the error rate of the speech recognition module including, but not limited to, background noise reduction and speech training features. Referring also to
The user may dictate any desired text into the system 100 using, for example, microphone 111 or any other suitable input device. In other embodiments the system 100 may acquire the text in any suitable manner including, but not limited to, electronic file/data transfers, creation in word processing documents or in any other manner such that the text is computer readable text. The text may be stored in a memory 182 of the system 100 or accessed remotely by the system. As used in the disclosed embodiments, the term “word” includes, but is not limited to, one or more individual characters or strings of characters (including, but not limited to, e.g. numbers, letters and symbols) and the term “text” includes, but is not limited to, individual words, one or more strings of words, or phrases. In this example, the dictated text is recognized and transcribed by, for example, the speech recognition module 137 in any suitable manner (
Referring to
To correct these incorrect texts 340, 350 the user activates, for example, the text correction module 138 (and/or the text correction application 195 which may be part of or work in conjunction with the text correction module 138) in any suitable manner including, but not limited to, voice commands or a menu of the system such as menu 124, and the options soft key 320. In other embodiments the text correction module 138 may be activated automatically after dictation of the intended text is completed. In another example, the system 100 may query the user, through for example, a “pop up” menu after the transcribed text is presented on the display 300 for allowing the user to either accept or decline whether incorrect text is to be indicated or identified. The incorrect text is selected by the user as shown in
In one aspect the speech recognition is activated for correcting the identified texts 340, 350 in any suitable manner. In one example the user may start the speech recognition correction in any suitable manner including, but not limited to, a voice command, selecting the speech recognition from a menu associated with the options soft key, a dedicated speech recognition key and activating any suitable predetermined application such as, for example, a spell/grammar check application. In other examples, the speech recognition correction may be initiated automatically after indication of the incorrect texts is complete. For example, the system 100 may be configured to automatically start the speech recognition correction after a predetermined time period has lapsed from the time the last text was indicated (e.g. the system waits “x” seconds to start the speech recognition correction after the last text is indicated). When the speech recognition correction is started the user dictates the intended corrections. In one embodiment, the system 100 may list the selected incorrect texts on the display 114 in the order in which they appear in the text to aid the user in making the corrections. In other embodiments, the user may be able to scroll through the text when making the corrections so the selected words can be viewed during dictation of the corrections. In this example, the intended corrections are dictated sequentially in the order the indicated text appears in the transcribed text. For example, in the English language the transcribed text is read from left to right such that the indicated texts would appear in the order “as anew”. It should be understood that the order in which the texts are dictated for correction depends on a direction that the language being inputted is read. For example, in Hebrew the intended corrections would be dictated in the order as they appear from right to left. In other examples, the intended corrections may be dictated in any suitable order or sequence.
To correct the indicated texts 340, 350 the user dictates the words “at noon”. The text correction application 195, for example, may be configured to place each recognized intended correction in place of a corresponding one of the indicated texts. In one aspect, in case of a mismatch between the number of intended corrections and the number of indicated texts such that there are more intended corrections than texts to be corrected (e.g. indicated texts) the extra intended corrections are placed after the last indicated text of the transcribed text. For example, referring to
Referring now to
In another example, still referring to
In another example, the system 100 may include a language model (which may be part of the speech recognition and/or text correction module or any other suitable module or application of the system). The system 100 may use the language model to determine how the corrections should be applied. Still referring to
The linguistic check based on the language model may also be applied when the number of selected words for correction 410, 420, 430 do not match the number of dictated corrections. In one example, referring to
It should also be understood that in one aspect the disclosed embodiments may also allow a user to correct any suitable number of individual characters in a manner substantially similar to those described above. For example, the user may dictate the word “foot” which is transcribed by the system 100 and displayed on, for example, display 114 as the word “soot”. The user can indicate or otherwise highlight the letter “s” in the word “soot”. When the speech recognition is activated the user may dictate the letter “f” which is recognized by the system 100 as an individual letter such that the letter “s” is replaced by the letter “f” in a manner substantially similar to that described above.
Referring again to
The input device 104 is generally configured to allow a user to input data and commands to the system or device 100. The input device 104 may include any suitable input features including, but not limited to hard and/or soft keys 110 and touch/proximity screen 112. The output device 106 is configured to allow information and data to be presented to the user via the user interface 102 of the device 100. The process module 122 is generally configured to execute the processes and methods of the disclosed embodiments. The application process controller 132 can be configured to interface with the applications module 180 and execute applications processes with respect to the other modules of the system 100. The communication module 134 may be configured to allow the device to receive and send communications and messages, such as, for example, one or more of voice calls, text messages, chat messages and email. The communications module 134 is also configured to receive communications from other devices and systems.
The applications module 180 can include any one of a variety of applications or programs that may be installed, configured or accessible by the device 100. In one embodiment the applications module 180 can include text correction application 195, web browser, office, business, media player and multimedia applications. The applications or programs can be stored directly in the applications module 180 or accessible by the applications module. For example, in one embodiment, an application or program such as the text correction application 195 may be network based, and the applications module 180 includes the instructions and protocols to access the program/application and render the appropriate user interface and controls to the user.
In one embodiment, the system 100 comprises a mobile communication device. The mobile communication device can be Internet enabled. The input device 104 can also include a camera or such other image capturing system 113. In one aspect the imaging system 113 may be used to image any suitable text. The image of the text may be converted into, for example, an editable document (e.g. word processor text, email message, text message or any other suitable document) with, for example, an optical character recognition module 139. Any incorrectly recognized text in the converted text can be corrected in a manner substantially similar to that described above with respect to
While the input device 104 and output device 106 are shown as separate devices, in one embodiment, the input device 104 and output device 106 can be combined and be part of and form the user interface 102. The user interface 102 can be used to display information pertaining to content, control, inputs, objects and targets as described herein.
The display 114 of the system 100 can comprise any suitable display, such as a touch screen display, proximity screen device or graphical user interface. The type of display is not limited to any particular type or technology. In other alternate embodiments, the display may be any suitable display, such as for example a flat display 114 that is typically made of a liquid crystal display (LCD) with optional back lighting, such as a thin film transistor (TFT) matrix capable of displaying color images.
In one embodiment, the user interface of the disclosed embodiments can be implemented on or in a device that includes a touch screen display or a proximity screen device 112. In alternate embodiments, the aspects of the user interface disclosed herein could be embodied on any suitable device that will display information and allow the selection and activation of applications or system content. The terms “select”, “touch” and “indicate” are generally described herein with respect to a touch screen-display. However, in alternate embodiments, the terms are intended to encompass the required user action with respect to other input devices. For example, with respect to a proximity screen device, it is not necessary for the user to make direct contact in order to select an object or other information. Thus, the above noted terms are intended to include that a user only needs to be within the proximity of the device to carry out the desired function, such as for example, selecting the text(s) to be corrected as described above.
Similarly, the scope of the intended devices is not limited to single touch or contact devices. Multi-touch devices, where contact by one or more fingers or other pointing devices can navigate on and about the screen, are also intended to be encompassed by the disclosed embodiments. Non-touch devices are also intended to be encompassed by the disclosed embodiments. Non-touch devices include, but are not limited to, devices without touch or proximity screens, where navigation on the display and menus of the various applications is performed through, for example, keys 110 of the system or through voice commands via voice recognition features of the system.
Some examples of devices on which aspects of the disclosed embodiments can be practiced are illustrated with respect to
As shown in
In the embodiment where the device 500 comprises a mobile communications device, the device can be adapted for communication in a telecommunication system, such as that shown in
In one embodiment the system is configured to enable any one or combination of voice communication, chat messaging, instant messaging, text messaging and/or electronic mail. It is to be noted that for different embodiments of the mobile terminal 600 and in different situations, some of the telecommunications services indicated above may or may not be available. The aspects of the disclosed embodiments are not limited to any particular set of services or applications in this respect.
The mobile terminals 600, 606 may be connected to a mobile telecommunications network 610 through radio frequency (RF) links 602, 608 via base stations 604, 609. The mobile telecommunications network 610 may be in compliance with any commercially available mobile telecommunications standard such as for example global system for mobile communications (GSM), universal mobile telecommunication system (UMTS), digital advanced mobile phone service (D-AMPS), code division multiple access 2000 (CDMA2000), wideband code division multiple access (WCDMA), wireless local area network (WLAN), freedom of mobile multimedia access (FOMA) and time division-synchronous code division multiple access (TD-SCDMA).
The mobile telecommunications network 610 may be operatively connected to a wide area network 620, which may be the Internet or a part thereof. A server, such as Internet server 622 can include data storage 624 and processing capability and is connected to the wide area network 620, as is an Internet client/personal computer 626. The server 622 may host a worldwide web/wireless application protocol server capable of serving worldwide web/wireless application protocol content to the mobile terminal 600.
A public switched telephone network (PSTN) 630 may be connected to the mobile telecommunications network 610 in a familiar manner. Various telephone terminals, including the stationary line telephone 632, may be connected to the public switched telephone network 630.
The mobile terminal 600 is also capable of communicating locally via a local link(s) 601 to one or more local devices 603. The local link(s) 601 may be any suitable type of link with a limited range, such as for example Bluetooth, a Universal Serial Bus (USB) link, a wireless Universal Serial Bus (WUSB) link, an IEEE 802.11 wireless local area network (WLAN) link, an RS-232 serial link, etc. The local devices 603 can, for example, be various sensors that can communicate measurement values or other signals to the mobile terminal 600 over the local link 601. The above examples are not intended to be limiting, and any suitable type of link may be utilized. The local devices 603 may be antennas and supporting equipment forming a wireless local area network implementing Worldwide Interoperability for Microwave Access (WiMAX, IEEE 802.16), WiFi (IEEE 802.11lx) or other communication protocols. The wireless local area network may be connected to the Internet. The mobile terminal 600 may thus have multi-radio capability for connecting wirelessly using mobile communications network 610, wireless local area network or both. Communication with the mobile telecommunications network 610 may also be implemented using WiFi, Worldwide Interoperability for Microwave Access, or any other suitable protocols, and such communication may utilize unlicensed portions of the radio spectrum (e.g. unlicensed mobile access (UMA)). In one embodiment, the communications module 134 is configured to interact with, and communicate to/from, the system described with respect to
Although the above embodiments are described as being implemented on and with a mobile communication device, it will be understood that the disclosed embodiments can be practiced on any suitable device incorporating a display, processor, memory and supporting software or hardware. For example, the disclosed embodiments can be implemented on various types of music, gaming and/or multimedia devices with one or more communication capabilities as described above. In one embodiment, the system 100 of
The user interface 102 of
The disclosed embodiments may also include software and computer programs incorporating the process steps and instructions described above. In one embodiment, the programs incorporating the process steps described herein can be stored on and/or executed in one or more computers.
Computer systems 702 and 704 may also include a microprocessor for executing stored programs. Computer 702 may include a data storage device 708 on its program storage device for the storage of information and data. The computer program or software incorporating the processes and method steps incorporating aspects of the disclosed embodiments may be stored in one or more computers 702 and 704 on an otherwise conventional program storage device. In one embodiment, computers 702 and 704 may include a user interface 710, and/or a display interface 712 from which aspects of the disclosed embodiments can be accessed. The user interface 710 and the display interface 712, which in one embodiment can comprise a single interface, can be adapted to allow the input of queries and commands to the system, as well as present the results of the commands and queries, as described with reference to FIGS. 1 and 3A-4C for example.
The aspects of the disclosed embodiments are directed to improving how corrections are made to text input in a device using automatic speech recognition. Aspects of the disclosed embodiments provide for selecting incorrectly transcribed adjacent and non-adjacent pieces of text for correction where all of the indicated pieces of text are corrected with one activation of the speech recognition module/application. Aspects of the disclosed embodiments also provide for the correction/replacement of a single word with multiple words and vice versa. The disclosed embodiments effectively avoid having to initiate the speech recognition module/application for each piece of text to be corrected saving the user time and decreasing the number of key presses needed to make the corrections.
It is noted that the embodiments described herein can be used individually or in any combination thereof. It should be understood that the foregoing description is only illustrative of the embodiments. Various alternatives and modifications can be devised by those skilled in the art without departing from the embodiments. Accordingly, the present embodiments are intended to embrace all such alternatives, modifications and variances that fall within the scope of the appended claims.