1. Field
The aspects of the disclosed embodiments generally relate to user interfaces and more particularly to a user interface for a peripheral device controlling applications of a device.
2. Brief Description of Related Developments
There are many applications that allow a user to provide voice commands to activate and control functions of the application. Headsets are one example of commonly used peripheral devices that allow for voice command applications. However, there can be many problems related to voice command applications when the user cannot see the display of the device. For example, the user may not necessarily know which application is currently active or open and therefore cannot provide the proper voice command instructions. Furthermore, if it is not possible to visualize the display, the user may not know which application is active or even which applications may be available to the user for selection and activation.
Some peripheral devices such as headset devices include input keys, such as buttons. However it may not always be clear what the function of the input key is in certain situations or applications. For example in some situations activating the input key could provide an “accept” function or a “cancel” function. Before making either choice, the user would need to be sure of the selection and may need a period of time in which to make such a determination. However that time cannot be too long, otherwise the system does not function efficiently.
In some situations the technical limitations of current speech recognition systems, and both accuracy and speed in portable devices, can provide a challenge for headset voice user interfaces. The speech recognition systems do make recognition errors. On the other hand, speed is an important value for recognition accuracy and can make a difference as to whether the user wants to use a system or not. Very long instructions or speech prompts are not necessarily an attractive option for such systems. Typically user interaction would preferably provide short prompts, such as text-to-speech (“TTS”) prompts, that allow ease-of-use with speed and efficiency.
The aspects of the disclosed embodiments are directed to at least a method, apparatus and computer program product. In one embodiment the method includes detecting an activation of an application control input key of a peripheral device to open an initial application. Once open, the initial application becomes an active application. An identifier of the open application is provided. A speech prompt to activate at least one function related to the active application can then be provided and the command can be executed by the active application upon detection of the input of the voice command, or a command to open a next application can be provided.
In one embodiment, after delivering the speech prompt to input a voice command to activate at least one function related to the active application, the application control input key can be used to open another or subsequent application, if a responsive voice command is not inputted. Once the subsequent application is opened, an identifier of the open subsequent application is provided and the subsequent application becomes the active application.
The foregoing aspects and other features of the embodiments are explained in the following description, taken in connection with the accompanying drawings, wherein:
The aspects of the disclosed embodiments generally provide a user interface that takes into account many technical, practical and user friendly aspects related to speech recognition and text-to-speech conversion using a peripheral device 108, such as for example a headset device. A peripheral device, as that term is used herein, generally refers to any device capable of being coupled or attached to an electronic device, such as for example, computing devices, telecommunication devices or internet capable devices, that can be used to expand functionalities of the device, such as for example in the form of input and output devices. Although the aspects of the disclosed embodiments will generally be described with respect to a headset, it will be understood that the disclosed embodiments are not so limited. In alternate embodiments, any device that includes an application control input key, other than including a headset device can be encompassed by the disclosed embodiments. For example, the peripheral device 108 could include a remote control, camera, multimedia device, microphone, joystick, pointing device, trackball or keyboard for example. In a situation where the peripheral device 108 includes only a single application control input key and the system 100 does not include a display 114, or the user has limited access to a display 114, the aspects of the disclosed embodiments can provide, voice prompts, text-to-speech prompts, audible application opening indications and allow for pauses so that the user can access and switch between applications 180 of the system 100.
In one embodiment, the user opens an application by pressing the application control input key 108a associated with the headset 108. Although only a single key is shown in the drawings as being associated with the headset, it will be understood that the headset can include more than one input key for activating and controlling functions of the headset. An audible indication can be provided when the application opens. A prompt, such as for example a text-to-speech prompt, can be provided to prompt the input of a command or function of the application. In alternate embodiments, the prompt can comprise any suitable prompt including an audible or visual prompt, or a combination thereof. For example, if the system 100 includes lights or LEDS, the prompt could comprise illumination of one or more of the lights or LEDS, either alone or in combination with an audible prompt. Each application could be assigned a particular light or LED, or combination, which will allow each application to be uniquely identified by illumination of same. In one embodiment, each application could be assigned a number, and a representative number of lights could be illuminated, or a number displayed on a digital display or screen. In this way, the aspects of the disclosed embodiments are not limited to audio or speech applications, and can be applied where speech is not available or not desired.
If the user wishes to access or open a different application 180, a press of the application control input key 108a, before providing a voice command, can cause the next application to be opened or activated. In essence, the pressing of the application control input key 108a at certain times will scroll the user through the applications 180 of the device or system 100. A prompt, such as a speech prompt, will inform the user which application is open and allow the user to recognize the application and decide whether to exercise the application or search for a different application.
Referring to
The input device(s) 104 are generally configured to allow for the input of data, instructions and commands to the system 100. In one embodiment, the input device 104 can be configured to receive input commands remotely or from another device that is not local to the system 100. The input device 104 can include devices such as, for example, keys 110, voice/speech recognition system 113, touch screen 112, menu 124, a camera device 125 or such other image capturing system. In alternate embodiments the input device can comprise any suitable device(s) or means that allows or provides for the input and capture of data, information and/or instructions to a device, as described herein. The output device(s) 106 are configured to allow information and data to be presented via the user interface 102 of the system 100 and can include one or more devices such as, for example, a display 114, audio device 115 or tactile output device 116. In one embodiment, the output device 106 can be configured to transmit output information to another device, which can be remote from the system 100. While the input device 104 and output device 106 are shown as separate devices, in one embodiment, the input device 104 and output device 106 can be combined into a single device, and be part of and form, the user interface 102. The user interface 102 can be used to receive and display information pertaining to content, objects and targets, as will be described below. In one embodiment, the headset device 108 can be part of both the input and output devices 104, 106, as it can be used to input commands and instructions and receive data and information in the form of audible sounds, data and prompts. While certain devices are shown in
The process module 122 is generally configured to execute the processes and methods of the disclosed embodiments. The application process controller 132 can be configured to interface with the applications module 180, for example, and execute applications processes with respect to the other modules of the system 100. In one embodiment the applications module 180 is configured to interface with applications that are stored either locally to or remote from the system 100 and/or web-based applications. The applications module 180 can include any one of a variety of applications that may be installed, configured or accessible by the system 100, such as for example, office, business, media players and multimedia applications, web browsers and maps. In alternate embodiments, the applications module 180 can include any suitable application. The communication module 134 shown in
The process module 122 can also include an application control unit or module 136 that is configured to open and close applications based on respective inputs and commands received. In one embodiment, the application control unit 136 is configured to recognize an input received from the headset device application control or input key 108a and open the corresponding application.
The process module 122 can also include an application indication unit or module 137 that is configured to recognize an opening of an application and provide an indication, such as for example, an audible indication, to the user that a new application has opened. For example in one embodiment, when an application opens the user is provided with a tone or beep to indicate that a new application opens. In alternate embodiments, the indication can comprise any suitable indication that will inform a user that an application has opened.
In one embodiment, the process module 122 also includes application ordering system or module 138 that is configured to maintain a list of all applications 180 stored in, or available to the system 100. The list can be configured by the user or pre-set by the device and the list can comprise any suitable or desired ordering of applications. The application ordering module 138 can be configured to scroll through an application list as application opening commands are received by the application control module 136. When an application opening command is inputted and transmitted, the application ordering module 138 can determine which application 180 is next to be opened. This allows the user to, in essence, scroll through a list of applications using the application control input key 108a in accordance with the aspects of the disclosed embodiments, even though the user may not necessarily be able to visualize the application list, the application or the next application. The set of applications 180 that are selected to be active in this voice-enabled user interface can be a preset default set of applications. In one embodiment the set of applications 180 can be configurable by the user. In an exemplary embodiment, the user can remove unwanted applications from the list of active applications. In one embodiment, the application ordering module 138 can be configured to assign each application a unique identifier, such as for example, an audible or visual identifier or indicator. For example, the indicator can be an audible indicator, such as a beep, tone or other unique sound. In one embodiment, the indicator is a text-to-speech prompt, such as the application name. In another embodiment, the indicator is a visual indicator, such as the illumination of one or more LEDs, or an image on a display. Alternatively, the indicator is tactile, such as for example, a vibration or pulsing action.
In one embodiment, the system 100 can also include a text-to-speech module and voice recognition system 142 that provides and receives voice commands, prompts and instructions. In the exemplary embodiment, these commands prompt and instructions are received and delivered through the peripheral device 108. The text-to-speech module 142 can interpret an application state and provide a voice prompt based on the application state that prompts for a command or input to take the application to the next state. The aspects of the disclosed embodiments allow a device to change application states or to move from one application to another when the application control input key 108a of the peripheral device 108 is activated. When an application is activated or opened, the text-to-speech module 142 can provide a prompt that indicates the application name of the open application. Activation of the application control input key 108a can cause a next application to be opened. When the desired application is reached, a voice command to activate a function of the application can be provided, if the user is familiar with the voice commands of the application, or the user can wait for an appropriate speech prompt that asks for a command input. After recognition of the opened application, the peripheral device application control input key 108a can be used to cancel the operation. This allows the user to quickly browse an application list. Different audio indications or sounds can be used during each process of the interaction with the headset user interface system. Using different sounds can allow the user to identify the particular state of the user interface, application and process. For example, the application opening indication can be one type of audio, while the execution of one or more application commands and functions can have a different sound or audio file associated therewith. Closing an application can be associated with a different sound or audio file. In alternate embodiments, any suitable audio indications and/or sounds can be used in the different stages of the process and processes described herein.
The user interface system of the disclosed embodiments takes advantage of a situation in which the peripheral device 108 includes only a single application control input key 108a, and a limited or no display. Audio sounds, such as beeps can inform a user that a new application is opening and speech prompts provide the user with enough information to identify the open application. In one embodiment, this can include the name of the application. By opening the applications separately, the size of the vocabulary can be limited for voice recognition purposes. Since a speech or voice prompt can be provided each time an application is opened, the correct application can easily and quickly be found. Experienced users can immediately access the functions of an application by inputting an appropriate command immediately after hearing the application name. However in one embodiment, short explanatory speech prompts can be provided for non-experienced users to provide guidance as to the instructions or prompts that are required to access the application functions. The aspects of the disclosed embodiments do not require that the user remembers any phrases or commands that the speech or voice recognition system is trained to recognize with respect to the applications and their names. If the user has accessed the wrong application or receives the wrong recognition result, the peripheral device application control input key 108a can be used to cancel or exit the application. Similar logic and patterns are used throughout all of the applications. The user can also pre-sort through different applications so that the order of the applications is logical or optimal for the best use or user experience.
Referring to
When the initial application opens, in one embodiment, an open application tone is generated 206. In one embodiment the open application tone comprises a short beep or other similar sound. In alternate embodiments, any suitable audible indication or tone can be used to indicate to the user that an application has opened.
After the open application tone, an application identifier or similar prompt can be provided 208. In one embodiment, the text-to-speech module 142 of
In one embodiment, generation of the open application indication 206 and application identifier 208 can be combined into a single function. For example, in a voice dialing application, both the open application indication and the application identifier can be identified by the same tone, such as a short sound resembling a dial tone. When the application is a music search, both the open application and application identifier can be identified by a short segment of music. The foregoing examples generally require that the user be familiar with the application identifier for each application. In an embodiment where the user is not familiar with the application identifier, the user can rely on voice commands and prompts, such as the voice command prompt 220. In a dialing application, the prompt can be “To dial a contact, say a contact name.” This prompt can be the application identifier in this type of situation.
In one embodiment, the application identifier prompt 208 will also include a prompt or request for the user to provide a voice command to activate at least one function of the open application.
After the application identifier prompt 208 is generated, in one embodiment it is determined 210 whether a responsive command is detected. In one embodiment, three different actions can be detected and/or recognized in this state, which can be referred to as a speech recognition, or user interface engine state. These actions can include a key press 212, a voice command 214 or a time out or rejection 216.
If a key press 212 is detected, in one embodiment, it is determined 224 if the number of attempts “a” to open an application or detect an input command exceeds a pre-determined number, or whether there are any more applications for the user to browse. In this determination, the variable “a” is the number of retries and N is the number of allowed retries. To determine whether there are any more available applications, “b”, the number of applications checked by the user, is compared to “M”, the total number of applications available. In one embodiment, the number of permitted retries “N” will be 0 or 1. Thus, in step 224, it is determined whether the user is in the voice command prompt state, where a>0, or if the user has browsed all of the applications, where b>=M. If the answer is no, the next application can be opened 226. When the next application opens, the value of variable “b” is incremented by 1. (b=b+1). If the answer to state 224 is yes, the application is closed or exits 228.
If, after the application identifier prompt 208, a voice command 214 is detected or recognized the application or function corresponding to the detected command can be executed 222.
In one embodiment, if, after the application identifier prompt 208, it is determined that a voice command input 214 is not detected or recognized and a input key 108a has not been activated 212, it is determined 216 whether a time-out is detected or a rejection is activated. A time-out can comprise a predetermined time period that has elapsed since the application identifier prompt 208. If a time-out or rejection 216 is detected, the variable “a”, the number if retries, is incremented by one. A determination 218 is made as to whether the number of retries “a” exceeds or is equal to the number of allowed retries “N”. If no, the system 100 can provide a prompt 220, such as a voice command prompt to advise the user that the system 100 is still waiting for an appropriate input command, such as a key press 212 or voice command 214. The system 100 can remain in this speech recognition/UI state while waiting for a command input 210. If number of retries “a” is greater than or equal to the number “N” of retries allowed, or permitted retries, the application exits.
When the number of retries “a” is not greater than or equal to “N” a voice command or other suitable prompt 220 can be generated that represents a request for an input of a command with respect to the open application. Thus, in a situation where the user is unfamiliar with or unsure of the voice command input to provide that corresponds to a function of the open application, by waiting, the application can provide appropriate guidance and instruction to the user. For example, in a dialing application, the prompt can be a voice prompt such as “please say number.” In alternate embodiment where the prompt is a visual cue, an indicator on the device may be highlighted that corresponds to an input key for the desired function. For example, in a phone or dialing application, the address book function key (hard or soft key) can be highlighted, prompting the user to select a contact from the address book, or a list of contacts might be displayed with a prompt to the user to select one of the contacts or input a dialing number. With respect to the example above, if the device includes a display, the text “please input number” might be displayed. Thus, although the exemplary embodiments are described with respect to voice command prompts and inputs, in alternate embodiments other audio and visual prompts and commands can be utilized. If a command 214 is detected, the appropriate function of the application will be executed 222, including exiting 228 the application, for example.
In one embodiment, detection of the activation of the application control input key 108a prior to a voice command input 214 will cause the next application, as determined by the application ordering system 138, to open 226. In this way, the user can essentially “scroll” through a list of applications even though the user might not be able to visualize the application list.
As shown in
After hearing the speech prompt “voice dialing” 306, the user can provide a responsive or command instruction, wait for another speech prompt to input a command, or press the application control input key 330 if the voice dialing application is not the desired application. For example, if the user does not provide a voice command and does not press the application control input key 330, the system goes into a voice dialing mode or function and a speech prompt, such as “Please say a name” is provided 310. The speech prompt 310 can be automatically provided after a predetermined time period 308 has elapsed. The time period 308 should be sufficient to allow the user to either provide a voice command or press the application control input key 330. In this example, the predetermined time period 308 is approximately two seconds. However in alternate embodiments, any suitable time period can be used that provides the user with sufficient time to provide an appropriate input to the system 100.
After the voice prompt 310, the user will have a sufficient period within which to provide a response command 312, which in this case is to say the recipient's name. After the recipient's name is recognized 314, the call can be made. The speech prompt 316 can be provided to inform the user that the call is in process. In one embodiment, during the speech prompt time and for a predetermined period of time thereafter, the application control input key 330 can be enabled so that pressing the input key 330 will cancel 322 the call. In one embodiment, the function of the application control key button 108a can vary depending on the particular application in the task or function being executed. If the call is not canceled, the application will make the call 320 in a suitable fashion.
In one embodiment, referring to
Referring to
Referring to
In one embodiment, the display 114 can be integral to the system 100. In alternate embodiments the display may be a peripheral display connected or coupled to the system 100. A pointing device, such as for example, a stylus, pen or simply the user's finger may be used with the display 114. In alternate embodiments any suitable pointing device may be used. In other alternate embodiments, the display may be any suitable display, such as for example a flat display 114 that is typically made of a liquid crystal display (LCD) with optional back lighting, such as a thin film transistor (TFT) matrix capable of displaying color images.
The terms “select” and “touch” are generally described herein with respect to a touch-screen display. However, in alternate embodiments, the terms are intended to encompass the required user action with respect to other input devices. For example, with respect to a proximity screen device, it is not necessary for the user to make direct contact in order to select an object or other information. Thus, the above noted terms are intended to include that a user only needs to be within the proximity of the device to carry out the desired function.
Similarly, the scope of the intended devices is not limited to single touch or contact devices. Multi-touch devices, where contact by one or more fingers or other pointing devices can navigate on and about the screen, are also intended to be encompassed by the disclosed embodiments. Non-touch devices are also intended to be encompassed by the disclosed embodiments. Non-touch devices include, but are not limited to, devices without touch or proximity screens, where navigation on the display and menus of the various applications is performed through, for example, keys 110 of the system or through voice commands via voice recognition features of the system.
Some examples of devices on which aspects of the disclosed embodiments can be practiced are illustrated with respect to
As shown in
In the embodiment where the device 600 comprises a mobile communications device, the device can be adapted for communication in a telecommunication system, such as that shown in
In one embodiment the system is configured to enable any one or combination of chat messaging, instant messaging, text messaging and/or electronic mail. It is to be noted that for different embodiments of the mobile terminal 700 and in different situations, some of the telecommunications services indicated above may or may not be available. The aspects of the disclosed embodiments are not limited to any particular set of services or communication system or protocol in this respect.
The mobile terminals 700, 706 may be connected to a mobile telecommunications network 710 through radio frequency (RF) links 702, 708 via base stations 704, 709. The mobile telecommunications network 710 may be in compliance with any commercially available mobile telecommunications standard such as for example the global system for mobile communications (GSM), universal mobile telecommunication system (UMTS), digital advanced mobile phone service (D-AMPS), code division multiple access 2000 (CDMA2000), wideband code division multiple access (WCDMA), wireless local area network (WLAN), freedom of mobile multimedia access (FOMA) and time division-synchronous code division multiple access (TD-SCDMA).
The mobile telecommunications network 710 may be operatively connected to a wide area network 720, which may be the Internet or a part thereof. An Internet server 722 has data storage 724 and is connected to the wide area network 720, as is an Internet client 726. The server 722 may host a worldwide web/wireless application protocol server capable of serving worldwide web/wireless application protocol content to the mobile terminal 700.
A public switched telephone network (PSTN) 730 may be connected to the mobile telecommunications network 710 in a familiar manner. Various telephone terminals, including the stationary telephone 732, may be connected to the public switched telephone network 730.
The mobile terminal 700 is also capable of communicating locally via a local link 701 to one or more local devices 703. The local links 701 may be any suitable type of link or piconet with a limited range, such as for example Bluetooth™, a Universal Serial Bus (USB) link, a wireless Universal Serial Bus (WUSB) link, an IEEE 802.11 wireless local area network (WLAN) link, an RS-232 serial link, etc. The local devices 703 can, for example, be various sensors that can communicate measurement values or other signals to the mobile terminal 700 over the local link 701. The above examples are not intended to be limiting, and any suitable type of link or short range communication protocol may be utilized. The local devices 703 may be antennas and supporting equipment forming a wireless local area network implementing Worldwide Interoperability for Microwave Access (WiMAX, IEEE 802.16), WiFi (IEEE 802.11x) or other communication protocols. The wireless local area network may be connected to the Internet. The mobile terminal 700 may thus have multi-radio capability for connecting wirelessly using mobile communications network 710, wireless local area network or both. Communication with the mobile telecommunications network 710 may also be implemented using WiFi, Worldwide Interoperability for Microwave Access, or any other suitable protocols, and such communication may utilize unlicensed portions of the radio spectrum (e.g. unlicensed mobile access (UMA)). In one embodiment, the navigation module 122 of
Although the above embodiments are described as being implemented on and with a mobile communication device, it will be understood that the disclosed embodiments can be practiced on any suitable device incorporating a processor, memory and supporting software or hardware. For example, the disclosed embodiments can be implemented on various types of music, gaming and multimedia devices. In one embodiment, the system 100 of
The user interface 102 of
The disclosed embodiments may also include software and computer programs incorporating the process steps and instructions described above. In one embodiment, the programs incorporating the process steps described herein can be executed in one or more computers.
Computer systems 802 and 804 may also include one or more processors for executing stored programs. Computer 802 may include a data storage device 808 on its program storage device for the storage of information and data. The computer program or software incorporating the processes and method steps incorporating aspects of the disclosed embodiments may be stored in one or more computers 802 and 804 on an otherwise conventional program storage device. In one embodiment, computers 802 and 804 may include a user interface 810, and/or a display interface 812 from which aspects of the invention can be accessed. The user interface 810 and the display interface 812, which in one embodiment can comprise a single interface, can be adapted to allow the input of queries and commands to the system, as well as present the results of the commands and queries, as described with reference to
The aspects of the disclosed embodiments are suitable for all applications that require the recognition of a command or selection from a list of items or search from this vocabulary. The user can press the application control input key 108a to open a new application, and a prompt can be provided to inform the user that the application has been opened as well as identify the application to the user. The user can either provide a voice command to the application or press the application control input key again to open another application. After recognition of the voice command based on the recognition results, the wanted action can be executed.
The user interface of the disclosed embodiments is intuitive for the first time user since the same pattern repeats for each process. First, the user selects the application by activating the application control input key, the recipient or command is recognized, and the wanted action takes place. If a recognition error occurs the action can be canceled by pressing the application control input key 108a. Applications can be pre-sorted to a desired order and unused or unwanted application can be removed from the active voice user input device application list. Recognition errors are minimized compared to “free” recognition, since only the vocabulary of the open application is active.
It is noted that the embodiments described herein can be used individually or in any combination thereof. It should be understood that the foregoing description is only illustrative of the embodiments. Various alternatives and modifications can be devised by those skilled in the art without departing from the embodiments. Accordingly, the present embodiments are intended to embrace all such alternatives, modifications and variances that fall within the scope of the appended claims.