The present disclosure relates generally to electronic device control, and more particularly to methods and an apparatus for controlling services on an electronic device using voice data.
Home electronic devices may include many features in addition to display of broadcast television. Some of these features may be network based services. Conventionally, users of these home electronic devices use remote controls to control device operations. Remote controls, however, often lack the ability to allow a user to quickly search for certain features provided by the home electronic device. Similarly, remotes do not allow for control of network based services. As a result, conventional remote controls provide access to limited features. Thus, one or more solutions are desired to allow users to control features provided by home electronic devices, such as network based services.
Disclosed and claimed herein are methods and an apparatus for controlling applications on a device. In one embodiment, a method includes detecting voice data from a user, converting the voice data to text data, matching the text data to an identifier, the identifier associated with a list of identifiers for controlling operation of at least one of the applications, and controlling the at least one of the applications based on the identifier matched with the text data. In another embodiment, the act of acquiring voice data is performed by a control device, which then sends the data to the main device.
Other aspects, features, and techniques will be apparent to one skilled in the relevant art in view of the following detailed description of the embodiments.
The features, objects, and advantages of the present disclosure will become more apparent from the detailed description set forth below when taken in conjunction with the drawings in which like reference characters identify correspondingly throughout and wherein:
One aspect of the disclosure relates to controlling operation of a device based on detected voice commands. In one embodiment, detected voice data may be employed to control an application running on an audio/video device such as a television. A method is provided for detecting voice data from a user, matching the voice data to an identifier, and controlling an application based on the matched identifier. One advantage of the embodiments described herein may be the ability to use voice commands to launch and operate network based services, which include applications and content. Services may include network based applications such as email services, social networking services, video sharing services, news services, and others. Content may include videos, audio, pictures, and text in a variety of formats, from various channels. In certain embodiments, a secondary device may be employed to acquire voice data, convert the voice data to text, and send the text data to the device to control the services.
In one embodiment, a method is provided for detecting voice data from a user and converting the voice data to text data. The method may include matching text data to an identifier associated with a list of identifiers for controlling operation of at least one of the applications, and controlling the at least one application based on the matched identifier. Voice data may be used to control the services, in contrast to conventional methods for controlling services.
As used herein, the terms “a” or “an” shall mean one or more than one. The term “plurality” shall mean two or more than two. The term “another” is defined as a second or more. The terms “including” and/or “having” are open ended (e.g., comprising). The term “or” as used herein is to be interpreted as inclusive or meaning any one or any combination. Therefore, “A, B or C” means “any of the following: A; B; C; A and B; A and C; B and C; A, B and C”. An exception to this definition will occur only when a combination of elements, functions, steps or acts are in some way inherently mutually exclusive.
Reference throughout this document to “one embodiment,” “certain embodiments,” “an embodiment,” or similar term means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. Thus, the appearances of such phrases in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner on one or more embodiments without limitation.
In accordance with the practices of persons skilled in the art of computer programming, one or more embodiments are described below with reference to operations that are performed by a computer system or a like electronic system. Such operations are sometimes referred to as being computer-executed. It will be appreciated that operations that are symbolically represented include the manipulation by a processor, such as a central processing unit, of electrical signals representing data bits and the maintenance of data bits at memory locations, such as in system memory, as well as other processing of signals. The memory locations where data bits are maintained are physical locations that have particular electrical, magnetic, optical, or organic properties corresponding to the data bits.
When implemented in software, the elements of the embodiments are essentially the code segments to perform the necessary tasks. The code segments can be stored in a processor readable medium, which may include any medium that can store or transfer information. Examples of the processor readable mediums include an electronic circuit, a semiconductor memory device, a read-only memory (ROM), a flash memory or other non-volatile memory, a floppy diskette, a CD-ROM, an optical disk, a hard disk, etc.
Referring now to
I/O interface 120 may be employed to communicate with processor 110 and control operation of device 102. I/O interface 120 may include one or more buttons for user input, such as a numerical keypad, volume control, menu controls, pointing device, track ball, mode selection buttons, and playback functionality (e.g., play, stop, pause, forward, reverse, slow motion, etc). Buttons of I/O interface 120 may include hard and soft buttons, wherein functionality of the soft buttons may be based on one or more applications running on device 102. I/O interface 120 may include one or more elements to allow for communication by device 102 by wired or wireless communication. For example, I/O interface 120 may allow for communication with the device through remote 160. Remote 160 may wirelessly send data to device 102 to control operation of device 102.
I/O interface 120 may also include one or more ports for receiving data, including ports for removable memory. I/O interface 120 may be configured to allow for network-based communications including but not limited to LAN, WAN, Wi-Fi, Bluetooth, etc. I/O interface 120 may allow device 102 to access network applications/services and to display services on display 140, such as applications and content, found on the internet. For example, in one embodiment, device 102 may run a main application that displays various services to a user. The services may be entertainment network based applications, such as, email services, social networking services, video sharing services, or numerous other entertainment applications. Other network based applications, such as, maps, news feeds, weather feeds, and other applications may also be displayed. Furthermore, in another embodiment, device 102 may run a main application that displays content to a user. The main application may display entertainment content such as video, audio, or pictures. In yet another embodiment, the main application may display both applications and content to a user by way of display 140.
Display 140 may be employed to display one or more applications executed by processor 110. In certain embodiments, display 140 may relate to a touch screen display and operate as an I/O interface. Microphone 150 may be configured to detect voice data and other audio data from a user or another source.
Referring now to
In one embodiment, process 200 may be initiated when a device (e.g. device 102) detects a trigger at block 210. The trigger may indicate to the device that a user is ready to speak identifier words that may be used by the device to control the services provided on the device. Detecting the trigger at block 210 may include detecting an input through an I/O interface (e.g., I/O interface 120). In one embodiment, the detected trigger may originate from a hard or soft buttons on an I/O interface. In another embodiment, the detected trigger may originate from a remote.
In another embodiment, process 200 may be initiated when a control device, which is used to control a display device that provides and displays services, detects a trigger at block 210. The trigger may indicate to the control device that a user is ready to speak identifier words that may be used by the display device to control services provided on the display device. Detecting the trigger at block 210 may include detecting an input through an I/O interface. In one embodiment, the detected trigger may originate from a hard or soft buttons on an I/O interface.
In one embodiment, a device may be configured to detect an audio command, such as a voice command based on a detected trigger. At block 220, voice data may be detected. Voice data may be detected utilizing a microphone. The detected voice data may be processed, digitized, and stored in a storage medium.
In another embodiment, a control device used to control a display device may be configured to detect an audio command, such as a voice command based on a detected trigger. At block 220, voice data may be detected. Voice data may be detected utilizing a microphone. The detected voice data may be processed, digitized, and stored in a storage medium of the control device. The control device may then send the voice data to the display device and process 200 may continue on the display device. Alternatively, the control device may not send the voice data to the display device and process 200 may continue on the control device.
Once the voice data is detected at block 220, the detected voice data is converted to text data at block 230. The voice data may be converted to text data using a speech to text application or algorithm. For example, in one embodiment, the voice data may be converted to text data using the speech to text application available on an operating system of a device. It should be understood that many different applications and algorithms may be used to convert the voice data to text data.
In another embodiment, the voice data may be converted to text data at block 230 using the speech to text application available on an operating system of a control device. The control device may then send the voice data to the display device and process 200 may continue on the display device.
Once the voice data is converted to text data at block 230, the voice to text conversion may be verified at block 236. At block 236, the voice to text conversion may be verified by a user. The conversion may be verified by the user by first displaying the converted text on a display (e.g. display 140), as depicted in
In some situations, the voice to text data conversion at block 230 may produce multiple text strings. For example, a user may say the word “email” and the voice to text application or algorithm may generate two or more alternative text strings such as “delete email,” “send email,” or “save email,” as depicted in
At block 240, text data is matched with an identifier that is associated with a list of identifiers for controlling operations of services, such as network based services, which include applications and content. The listing of identifiers may contain identifiers associated with the provided services. The listing of identifiers may also contain identifiers associated with actions that control the provided services. For example, the identifiers may include the actions of playing, pausing, stopping, or traversing content. The identifiers may further include the actions of navigating, selecting, or interacting with applications. In short, the listing of identifiers may include identifiers that correspond to any action that could be performed with another form of input on services such as network based services, which include applications and content. Furthermore, it should be understood that the listing of identifiers is not static. The listing of identifiers may be updated, augmented, or otherwise changed when content or an application provided by the network based services or otherwise is updated or changed.
In another embodiment, the listing of identifiers may be updated, augmented, or otherwise changed when content or an application are selected to allow for actions and information within the selected content or application to be included in the listing of identifiers. For example, if an application is selected, the listing of identifiers may be augmented to include names of the content provided by and commands associated with the selected application. In another embodiment, the listing of identifiers may be updated, augmented, or otherwise changed to incorporate actions and information within content or applications before they are selected by a user. In another embodiment, the listing of identifiers may include identifiers associated with user generated commands. For example, a user may specify that the phrase “3×” may be associated with the command to fast forward content at 3× speed.
Referring again to
In certain embodiments where text data is not verified and more than one text string is produced from the conversion from voice data to text data, process 200 may attempt to match the provided text strings until one text string matches an identifier. Alternatively, process 200 may attempt to match every provided text string with an identifier. In this situation, if more than text string matches an identifier, process 200 may verify with the user which text string is the correct conversion of the voice data received at block 220. The process 200 may verify with the user by displaying the multiple text strings to the user as depicted in
If the text data is matched with an identifier from the list of identifiers, process 200 proceeds to block 250. If the text data is not matched with an identifier from the list of identifiers, process 200 proceeds to block 246. At block 246, the user is notified that an identifier was not matched to the text data. The user may be notified by displaying a text box on a display (e.g. display 140). Alternatively, the user may be notified by a change in the visual appearance of a display, such as one or more of pulsing, fading, flashing, or undergoing other changes. Alternatively, an audio recording, such as a beep, tone, or voice recording, may indicate to the user that the voice data was not matched with an identifier at block 240. It should be understood that a variety of ways might be used to notify the user that a match was not made between the voice data and the list of identifiers. After notifying the user that an identifier was not matched to the text data at block 246, process 200 may return to block 210 to await another trigger.
With the text data matched to an identifier, process 200 controls one of the services provided by the application at block 250 according to the identifier matched with the text data. Each identifier within the list of identifiers may be associated with a command for controlling the services provided by the main application. Each identifier within the list of identifiers may be associated to a certain API (i.e., application programming interface) for the provided services. The name of a service, such as an application, may be an identifier that is associated with the command to launch the application. Here, the identifier may be linked to the API to launch an application. For example, if the identifier matched with the text data is the word “email,” then process 200 may launch an email application using a device (e.g. device 102) and display the application on a display (e.g. display 140). Furthermore, the name of content, such as music content, may be an identifier that is associated with the command to launch the music content.
For example, if the identifier matched with the text data is the word “mozart,” then process 200 may launch music composed by “mozart”.
Additional identifiers, besides the names of a service, may also be included in the list of identifiers. Words such as “play,” “pause,” “stop,” “next,” “back,” “forward,” “close,” and a host of other words associated with navigating and controlling services, channels, and sub-features may be used as identifiers. These identifiers may be matched with text data during process 200. After being matched with text data process 200 may control services according to the matched identifier at block 250. For example, if a movie was being displayed and the identifier matched with the text data is “pause,” then the movie may be paused. In another example, if an application had been launched, such as email, and the identifier matched with the text data is “close,” then the email may be closed. Process 200 thus enables a user to use their voice to control network based services, applications, content, or any combination of the three on a device.
Referring now to
Referring now to
As depicted in
As further depicted in
Microphone 580 may be configured to detect voice data and other audio data from a user or another source. I/O interface 576 may be employed to communicate with the processor 572. I/O interface 576 may include one or more buttons for user input, such as a numerical keypad, volume control, menu controls, pointing device, track ball, mode selection buttons, and playback functionality (e.g., play, stop, pause, forward, reverse, slow motion, etc). Buttons of I/O interface 576 may include hard and soft buttons, wherein functionality of the soft buttons may be based on one or more applications running on control device 570. I/O interface 576 may include one or more elements to allow for communication by control device 570 by wired or wireless communication. For example, I/O interface 576 may allow for communication between device 502 and control device 570. For example, control device 570 may send data wireless to device 502 to control operation of device 502. I/O interface 520 may also include one or more ports for receiving data, including ports for removable memory. I/O interface 520 may be configured to allow for network-based communications including but not limited to LAN, WAN, Wi-Fi, Bluetooth, etc.
While this disclosure has been particularly shown and described with references to exemplary embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the embodiments encompassed by the appended claims.