1. Field of the Invention
The present invention relates to the field of computing device interfaces and, more particularly, to user positional anchors for directional, user controlled audio playback from voice-enabled interfaces.
2. Description of the Related Art
Voice-enabled interfaces are able to accept and process speech input and/or to produce speech output. Voice-enabled interfaces are particularly advantageous for interacting with mobile and embedded computing devices which often have limited input/output peripherals due to their compact size and/or restrictions of their intended operational environment. Speech based interactions can be highly advantageous in situations where a device user is performing one or more tasks that require focused attention (e.g., driving or walking). For instance, media playing mobile devices and/or mobile telephones can be potentially dangerous when they require a user to look at a LCD screen and to manipulate selection controls with their hands. Despite this potential danger, visual and tactile based controls remain the most commonly implemented and used interactive mechanisms for mobile computing devices.
One reason that visual/tactile interactions remain predominant is that conventional voice-enabled interface controls are cumbersome to use in many common, re-occurring situations. For example, a device that audibly enumerates long playlists of selectable songs can quickly try a user's patience. Indexing a large set of songs by artist, album, and/or customizable playlists and then audibly presenting organized subsets of songs mitigates the problem to some extent and in some instances, but fails to resolve underlying systemic flaws.
For instance, hard drive equipped music playing devices can include hundreds of songs by a user preferred artist so that audibly enumerating available songs by the preferred artist results in too many entries for a user's comfort. In contrast, a user is able to quickly identify a desired song from a complete list of songs presented upon a scrollable visual display. What is needed is a new mechanism for interacting with computing devices that minimizes an amount of time a user is distracted by interactive controls (i.e., so that a user is not endangered while performing concurrent activities, such as driving), yet which permits a user to quickly target a desired item from a potentially large listing of items.
There are shown in the drawings, embodiments which are presently preferred, it being understood, however, that the invention is not limited to the precise arrangements and instrumentalities shown.
In various contemplated configurations, a rate of playback speed can be adjusted by a user. Further, audio samples can be played (e.g., an audio fast forward or audio reverse capability) to allow a user to quickly skip through audibly played content. When audio fast forwarding capabilities exist, a user can configure a sample duration of playback before skipping to another playback position and/or a distance of each audio skip. Additionally, in one embodiment a direction and speed of playback can be adjusted in proportion to a distance between a playback point and a previously established audio anchor. Thus a skip distance for an audio fast forwarding operation can automatically increase as distance from the audio anchor increases.
As illustrated, the device 100 can include an audio transducer 110, a voice user interface 116, an anchor processor 120 as well as an optional set of tactile controls 114 and an optional display 112. In various embodiments, the device 110 can be a media player, an entertainment system, a mobile phone, a desktop computer, a laptop computer, a navigation system, an embedded computing device, a standalone consumer electronic device, a kiosk, and other such devices.
The audio transducer 110 of device 100 can include a speaker and/or microphone which plays audio output and/or accepts audio input. Audio interactions between a user and the device 100 can occur via the voice user interface (VUI) 116. The VUI 116 can be a voice-only interface or can be a voice interfacing component of a multimodal interface. The display 112 and/or tactile controls 114 can be selectively included in embodiments that visually present content and/or that accept tactile input. The device 100 can also include one or more speech processing components (not shown) or be communicatively linked via a transceiver (not shown) to a speech processing system. The optional speech processing components can include a speech recognition engine for processing received audio input and/or a speech synthesizer for generating speech output from text. Speech output from device 100 need not be output converted from text, but can instead result from a playing of stored audio files that contain encoded speech. Audio anchors can be established and manipulated by the tactile controls 114, by voice commands, and/or by GUI based controls.
The anchor processor 120 can handle operations related to audio anchors, such as establishing audio anchors, removing audio anchors, setting audio anchor parameters, modifying device 110 behavior in accordance with established audio anchor parameters, playing content from an audio anchor, and the like. The anchor processor 120 can utilize one or more configuration parameters 124-127, which can be stored in memory space 122. The configuration parameters can include an anchor position 124, an anchor direction 125, an anchor magnitude 126, an anchor mode 127, and the like.
The anchor position 124 can specify a user established point within content that is to be audibly presented. The anchor direction 125 can indicate whether playback from the anchor point is to be forward, backward, from top-to-bottom, from bottom-to-top, from right-to-left, from left-to-right, and the like. The anchor magnitude 126 can include a rate of playback. The anchor magnitude 126 can also indicate a skipping distance and/or sampling duration for audio fast forwarding operations. The anchor mode 127 can be a configurable mode used to interpret a meaning intended for overloaded operators. For example, if the anchor mode 127 is in an audio fast forwarding configuration, pressing an overloaded tactile control (e.g., a minus sign or a less than arrow) can indicate that a skipping distance is to be decreased. When the anchor mode 127 is in a playback rate configuration, pressing the same control as before (e.g., a minus sign or a less than arrow) can decrease an audio playback rate.
In one arrangement, speech processing technologies can use a set of voice commands to establish and utilize audio anchors (as opposed to utilizing controls 215). Any of a variety of different voice commands (e.g., “anchor” for establishing an audio anchor, “faster” for increasing a speaking rate, “slower” for decreasing a speaking rate, “reverse” for changing an enumeration direction, and the like) can be used.
The tactile controls 215 can include any of a variety of controls, such as a main selector 220, a mode control 222, a magnitude control 224, a backward direction control 226, and a forward direction control 228. Each of the controls 215 can be overloaded. The display 230 can include a list of interface items 232. One of the interface items 232 can have focus 234 that can be visually indicated in display 230. The controls 215 and display 230 are to illustrate concepts only and the illustrated arrangement is not to be construed as a limitation of the scope of the device.
For example, in one contemplated embodiment (not shown), the controls 215 can include a Force Sensing Resistor (FSR) region, such as a region of a click wheel control used for many popular media playing devices (e.g., the IPOD). A rate of movement of a finger along the FSR region can determine a speed of a fast-forward or operation and/or a magnitude of a change made to a playback rate. In other embodiments, controls 215 can include a scroll wheel, a rotating dial, a twistable handle, an accelerometer, and the like that can each be used to increase/decrease a playback rate, an enumeration direction, and/or a fast-forward/fast-rewind rate.
It should be emphasized that one advantage of the arrangement shown in
The interface 310 can include interface items for contacts, relation, phone, an item list, and user comments. An audio anchor 330 can be established near the relation element. An anchor direction 332 of forward and an anchor magnitude 334 of four can be established. The magnitude 334 can indicate a rate of speech playback, which can be adjusted. A forward anchor direction can include that items are to be enumerated from left-to-right and from top-to-bottom starting at the audio anchor 330. Thus, a voice user interface 340 can audibly enumerate “Select relation . . . Family” followed by “Item List . . . Item A; Item B, Item C; Item D” followed by “Phone . . . 555-1234” as shown. If the audio direction 332 were set to backwards, then voice user interface 340 could audibly enumerate “Select Contact . . . Jim Smith.”
The present invention may be realized in hardware, software, or a combination of hardware and software. The present invention may be realized in a centralized fashion in one computer system, or in a distributed fashion where different elements are spread across several interconnected computer systems. Any kind of computer system or other apparatus adapted for carrying out the methods described herein is suited. A typical combination of hardware and software may be a general purpose computer system with a computer program that, when being loaded and executed, controls the computer system such that it carries out the methods described herein.
The present invention also may be embedded in a computer program product, which comprises all the features enabling the implementation of the methods described herein, and which when loaded in a computer system is able to carry out these methods. Computer program in the present context means any expression, in any language, code or notation, of a set of instructions intended to cause a system having an information processing capability to perform a particular function either directly or after either or both of the following: a) conversion to another language, code or notation; b) reproduction in a different material form.
This invention may be embodied in other forms without departing from the spirit or essential attributes thereof. Accordingly, reference should be made to the following claims, rather than to the foregoing specification, as indicating the scope of the invention.