The described embodiments relate to content-analysis techniques, including identifying audio/video (A/V) content based on analysis of associated thumbnail images.
The versatility and capabilities of consumer-electronics or electronic devices is increasing their popularity. For example, the communication capabilities of these electronic devices allow users to access content from a wide variety of sources, including high-definition content.
However, while the electronic devices typically include high-resolution displays that allow users to view high-definition content, the interface circuits and the communication bandwidth in many electronic devices can pose obstacles to simultaneous viewing of high-definition content.
In addition, the user interfaces associated with many electronic devices can be difficult to use. For example, the process of identifying content from a particular source, selecting the content and having the content piped to a particular display often requires that users perform multiple operations. This convoluted process is time-consuming and cumbersome. Moreover, users often make mistakes when attempting to navigate through a complicated set of options in different menus, which frustrates users and degrades their user experience.
The described embodiments include an audio/video (A/V) hub. This A/V hub includes: an antenna; an interface circuit that, during operation, communicates with an A/V display device and a portable electronic device; and a control circuit coupled to the interface circuit. During operation, the control circuit receives, via the interface circuit, user-interface activity information from the portable electronic device that specifies selection at a location in a user interface (which may be displayed on the portable electronic device), where an image in a set of images is displayed at or proximate to the location. Note that the image is associated with A/V content, which is unknown to the A/V hub, from a content provider. Then, the control circuit provides, via the interface circuit, a request to the content provider with information that specifies the location. In response to the request, the control circuit receives, via the interface circuit, the A/V content from the content provider, and the control circuit provides the A/V content to an A/V display device. Next, the control circuit identifies the A/V content by performing image analysis on the image based on predefined available A/V content from the content provider.
Note that the identified A/V content may include: a television program (such as a broadcast-television program), a cable-television program, an entertainment event, and/or a movie. For example, the entertainment event may include: a concert, and/or a sporting event.
Moreover, the predefined available A/V content may include a broadcast or transmission schedule of the content provider.
Furthermore, the image analysis may include identification of text in the image, and the identification may involve comparing the text to a set of titles associated with the predefined available A/V content. Alternatively or additionally, the identification may involve performing image analysis on the set of images. Prior to the image analysis, the control circuit may correct for an orientation (such as tilting) of the image.
In some embodiments, the control circuit identifies the A/V content by analyzing audio in the A/V content and/or by analyzing video in the A/V content.
Moreover, the control circuit may include: a processor; and a memory, coupled to the processor, which stores a program module. During operation, the program module may be executed by the processor. This program module may include instructions for at least some operations performed by the control circuit.
Another embodiment provides a computer-program product for use with the A/V hub. This computer-program product includes instructions for at least some of the operations performed by the A/V hub.
Another embodiment provides a method for identifying A/V content. This method includes at least some of the operations performed by the A/V hub.
Another embodiment provides the A/V display device and/or the portable electronic device.
This Summary is provided merely for purposes of illustrating some exemplary embodiments, so as to provide a basic understanding of some aspects of the subject matter described herein. Accordingly, it will be appreciated that the above-described features are merely examples and should not be construed to narrow the scope or spirit of the subject matter described herein in any way. Other features, aspects, and advantages of the subject matter described herein will become apparent from the following Detailed Description, Figures, and Claims.
Note that like reference numerals refer to corresponding parts throughout the drawings. Moreover, multiple instances of the same part are designated by a common prefix separated from an instance number by a dash.
An audio/video (A/V) hub that identifies A/V content is described. In particular, A/V hub may identify the A/V content based on an image associated with the A/V content in a user interface that was selected by a user. For example, the user interface may include, at different locations, images (such as thumbnail images) that are associated with different A/V content. When the user selects the A/V content associated with one of the images (such as by activating a virtual command icon at a location in the display), the A/V hub may request the A/V content from a content provider. However, while the A/V content is indirectly requested based on the location (i.e., based on the user's actions or user-interface activity), the specific A/V content may be unknown to A/V hub. Consequently, the A/V hub may identify the A/V content by performing image analysis on the image based on predefined available A/V content from the content provider.
By identifying the specific A/V content (which may be provided by one of multiple different content providers), this content-analysis technique may facilitate additional value-added services that can be provided to the user. For example, the A/V hub may be able to determine patterns of usage by the user, which may allow automated display of the A/V content or related A/V content (such as A/V content having similar characteristics) to the users. This may simplify use of the A/V hub, thereby reducing user frustration and/or may improve the user experience when using the A/V hub, an A/V display device that displays the A/V content, and/or a portable electronic device that displays the user interface. Consequently, the content-analysis technique may increase customer loyalty and revenue of a provider of the A/V hub.
In the discussion that follows the A/V hub (which is sometimes referred to an ‘electronic device’), the A/V display device, the portable electronic device, one or more content sources (which may be associated with one or more content providers), and/or another electronic device (such as a speaker and, more generally, a consumer-electronic device) may include radios that communicate packets or frames in accordance with one or more communication protocols, such as: an Institute of Electrical and Electronics Engineers (IEEE) 802.11 standard (which is sometimes referred to as ‘Wi-Fi®,’ from the Wi-Fi® Alliance of Austin, Tex.), Bluetooth® (from the Bluetooth Special Interest Group of Kirkland, Wash.), a cellular-telephone communication protocol, a near-field-communication standard or specification (from the NFC Forum of Wakefield, Mass.), and/or another type of wireless interface. In the discussion that follows, Wi-Fi is used as an illustrative example. For example, the cellular-telephone communication protocol may include or may be compatible with: a 2nd generation of mobile telecommunication technology, a 3rd generation of mobile telecommunications technology (such as a communication protocol that complies with the International Mobile Telecommunications-2000 specifications by the International Telecommunication Union of Geneva, Switzerland), a 4th generation of mobile telecommunications technology (such as a communication protocol that complies with the International Mobile Telecommunications Advanced specification by the International Telecommunication Union of Geneva, Switzerland), and/or another cellular-telephone communication technique. In some embodiments, the communication protocol includes Long Term Evolution or LTE. However, a wide variety of communication protocols may be used (such as Ethernet or a power-line communication protocol). In addition, the communication may occur via a wide variety of frequency bands. Note that the portable electronic device, the A/V hub, the A/V display device, the one or more content sources, one or more speakers and/or another electronic device may communicate using infra-red communication that is compatible with an infra-red communication standard (including unidirectional or bidirectional infra-red communication).
Moreover, A/V content in following discussion may include video and associated audio (such as music, sound, dialog, etc.), video only or audio only.
Communication among electronic devices is shown in
As described further below with reference to
As can be seen in
In the described embodiments, processing of a packet or frame in portable electronic device 110 and A/V hub 112 (and optionally one or more of the one or more A/V display devices 114, the one or more speakers 116 and/or the one or more content sources 126) includes: receiving wireless signals 120 with the packet or frame; decoding/extracting the packet or frame from received wireless signals 120 to acquire the packet or frame; and processing the packet or frame to determine information contained in the packet or frame (such as the information associated with a data stream). For example, the information from portable electronic device 110 may include user-interface activity information associated with a user interface displayed on touch-sensitive display (TSD) 124 in portable electronic device 110, which a user of portable electronic device 110 uses to control A/V hub 112, the one or more A/V display devices 114, the one or more speakers 116 and/or one of the one or more content sources 126. (In some embodiments, instead of or in additional to touch-sensitive display 124, portable electronic device 110 includes a user interface with physical knobs and/or buttons that a user can use to control A/V hub 112, the one or more A/V display devices 114, the one or more speakers 116 and/or one of the one or more content sources 126.) Alternatively, the information from A/V hub 112 may include device-state information about a current device state of one or more of A/V display devices 114, one or more of speakers 116 or one of the one or more content sources 126 (such as on, off, play, rewind, fast forward, a selected channel, selected content, a content source, etc.), or may include user-interface information for the user interface (which may be dynamically updated based on the device-state information and/or the user-interface activity information). Furthermore, the information from A/V hub 112 and/or one of the one or more content sources 126 may include audio and video (which is sometimes denoted as ‘audio/video’ or ‘A/V’) that are displayed or presented on one or more of A/V display devices 114 and/or one or more of speakers 116, as well as display instructions that specify how the audio and video are to be displayed.
However, as noted previously, the audio and video may be communicated between components in system 100 via wired communication. Therefore, as shown in
Note that A/V hub 112 may determine display instructions (with a display layout) for the A/V content based on a format of a display in one of A/V display devices 114, such as A/V display device 114-1. Alternatively, A/V hub 112 can use pre-determined display instructions or A/V hub 112 can modify or transform the A/V content based on the display layout so that the modified or transformed A/V content has an appropriate format for display on the display. Moreover, the display instructions may specify information to be displayed on the display in A/V display device 114-1, including where A/V content is displayed (such as in a central window, in a tiled window, etc.). Consequently, the information to be displayed (i.e., an instance of the display instructions) may be based on a format of the display, such as: a display size, display resolution, display aspect ratio, display contrast ratio, a display type, etc. Furthermore, note that when A/V hub 112 receives the A/V content from one of content sources 126, A/V hub 112 may provide the A/V content and display instructions to A/V display device 114-1 as frames with the A/V content are received from one of content sources 126 (e.g., in real time), so that the A/V content is displayed on the display in A/V display device 114-1. For example, A/V hub 112 may collect the A/V content in a buffer until a frame is received, and then A/V hub 112 may provide the complete frame to A/V display device 114-1. Alternatively, A/V hub 112 may provide packets with portions of a frame to A/V display device 114-1 as they are received. In some embodiments, the display instructions may be provided to A/V display device 114-1 differentially (such as when the display instructions change), regularly or periodically (such as in one of every N packets or in a packet in each frame) or in each packet.
Moreover, note that the communication between portable electronic device 110 and A/V hub 112 (and optionally one or more of the one or more A/V display devices 114, the one or more speakers 116 and/or the one or more content sources 126) may be characterized by a variety of performance metrics, such as: a data rate, a data rate discounting radio protocol overhead (which is sometimes referred to as a ‘throughput’), an error rate (such as a packet error rate, or a retry or resend rate), a mean-square error of equalized signals relative to an equalization target, intersymbol interference, multipath interference, a signal-to-noise ratio, a width of an eye pattern, a ratio of number of bytes successfully communicated during a time interval (such as 1-10 s) to an estimated maximum number of bytes that can be communicated in the time interval (the latter of which is sometimes referred to as the ‘capacity’ of a channel or link), and/or a ratio of an actual data rate to an estimated data rate (which is sometimes referred to as ‘utilization’). Moreover, the performance during the communication associated with different channels may be monitored individually or jointly (e.g., to identify dropped packets).
The communication between portable electronic device 110 and A/V hub 112 (and optionally one or more of the one or more A/V display devices 114, the one or more speakers 116 and/or the one or more content sources 126) in
As noted previously, a user may control A/V hub 112, one or more of the one or more A/V display devices 114, one or more of the one or more speakers 116 and/or one of the one or more content sources 126 via the user interface displayed on touch-sensitive display 124 on portable electronic device. In particular, at a given time, the user interface may include one or more virtual icons that allow the user to activate, deactivate or change functionality or capabilities of A/V hub 112, one or more of A/V display devices 114, one or more speakers 116 and/or one or more of content sources 126. For example, a given virtual icon in the user interface may have an associated strike area on a surface of touch-sensitive display 124. If the user makes and then breaks contact with the surface (e.g., using one or more fingers or digits, or using a stylus) within the strike area, portable electronic device 110 (such as a processor executing a program module) may receive user-interface activity information indicating activation of this command or instruction from a touch-screen input/output (I/O) controller, which is coupled to touch-sensitive display 124. (Alternatively, touch-sensitive display 124 may be responsive to pressure. In these embodiments, the user may maintain contact with touch-sensitive display 124 with an average contact pressure that is usually less than a threshold value, such as 10-20 kPa, and may activate a given virtual icon by increase the average contact pressure with touch-sensitive display 124 above the threshold value.) In response, the program module may instruct an interface circuit in portable electronic device 110 to wirelessly communicate the user-interface activity information indicating the command or instruction to A/V hub 112, and A/V hub 112 may communicate the command or the instruction to the target component in system 100 (such as A/V display device 114-1). This instruction or command may result in A/V display device 114-1 turning on or off, displaying A/V content from a particular content source, performing a trick mode of operation (such as fast forward, reverse, fast reverse or skip), etc. For example, A/V hub 112 may request the A/V content from content source 126-1, and then may provide the A/V content to along with display instructions to A/V display device 114-1, so that A/V display device 114-1 displays the A/V content. Alternatively or additionally, A/V hub 112 may provide audio content associated with video content from content source 126-1 to one or more of speakers 116.
One problem with using existing remote controls to control the operation of another component or electronic device is that the remote control does not receive any feedback from the electronic device. For example, many existing remote controls use infra-red communication. However, typically existing infra-red communication protocols are unidirectional or one-way communication, i.e., from a remote control to the electronic device. Consequently, the remote control usually does not have any knowledge of the effects of the commands or instructions that are communicated to the electronic device. In particular, the remote control is typically unaware of a current state of the electronic device, such as whether the electronic device is in: a power-on state, a power-off state, a playback state, a trick-mode state (such as fast forward, fast reverse, or skip), a pause state, a standby (reduced-power) state, a record state, a state in which A/V content associated with a given content source (such as cable television, a satellite network, a web page on the Internet, etc.) is received or provided, and/or another state. (Note that one or more of the states may be nested or concurrent with each other, such as the power-on state and the playback state.) By operating blindly in this way, existing remote control are unable to leverage knowledge of the current state of the electronic device to improve the user experience.
This problem is addressed in system 100. In particular, as described further below with reference to
Using the device-state information A/V hub 112 and/or portable electronic device 110 may dynamically adapt the user interface displayed on touch-sensitive display 124 on portable electronic device 110. For example, A/V hub 112 may provide, via radio 118-2, device-state information to portable electronic device 110 specifying a current state of the given electronic device. (Thus, this feedback technique may include bidirectional or two-way communication between A/V hub 112 and portable electronic device 110.) After radio 118-1 receives the device-state information, portable electronic device 110 (such as a program module executed in an environment, e.g., an operating system, in portable electronic device 110) may generate a user interface that includes one or more virtual command icons associated with the current state and one or more related states of the given electronic device. (Alternatively, portable electronic device 110 may selected a predefined or predetermined user interface.) Note that the one or more related states may be related to the current state in a state diagram (which may be stored in memory in portable electronic device 110) by corresponding operations that transition the given electronic device from the current state to the one or more related states. Then, portable electronic device 110 may display the user interface on touch-sensitive display 124.
In some embodiments, A/V hub 112 provides information specifying the type of the given electronic device, the manufacturer of the given electronic device, and/or context information that specifies a context of content (such as A/V content or information specifying a content provider) displayed on the given electronic device (such as on a display in A/V display device 114-1). For example, the context may include a type of the A/V content (such as sports, television, a movie, etc.), a location in the A/V content (such as a timestamp, an identifier of a sub-section in the content and/or a proximity to a beginning or an end of the A/V content), etc. In these embodiments, the one or more virtual command icons (and, thus, the user interface) may be based on the type of the given electronic device, the manufacturer and/or the context. Thus, only virtual command icons that are relevant to the given electronic device, the manufacturer and/or the context may be included in the user interface. For example, if the user selects or activates a content source (such as a channel) that is associated with a particular content provider, the user interface may include a series of virtual command icons at different locations in the user interface that are associated with different A/V content from the content provider. In addition, the user interface may include images (such as thumbnail images) of the A/V content proximate or adjacent to the virtual command icons.
Moreover, when the user activates one of the virtual command icons in the user interface, the touch-screen I/O controller in portable electronic device 110 may provide user-interface activity information specifying activation of a virtual command icon in the one or more virtual command icons, where the activation of the virtual command icon specifies a transition of the given electronic device from the current state to a new current state in the state diagram. As noted previously, the activation of the virtual command icon may involve a user of portable electronic device 110 contacting touch-sensitive display 124 within a strike area of the virtual command icon and then releasing the contact. In response to receiving the user-interface activity information, portable electronic device 110 may: request the A/V content associated with the virtual command icon, modify the user interface to change the one or more virtual command icons based on the new current state; and display the modified user interface on touch-sensitive display 124. Note that portable electronic device 110 may wait to change the one or more virtual command icons until the device-state information received from A/V hub 112 indicates that the given electronic device has transitioned to the new current state in response to a command or an instruction associated with the activation of the one of the virtual command icons. Thus, portable electronic device 110 may repeatedly perform the generating and the displaying operations so that the user interface is dynamically updated as the current state changes.
Alternatively or additionally, instead of portable electronic device 110 generating the user interface, A/V hub 112 may generate user-interface information that specifies the user interface (or instructions specifying objects or graphical information in the user interface) based on the one or more related states in the state diagram (which may be stored in memory in A/V hub 112) and one or more of: the device-state information, the type of the given electronic device, the manufacturer of the given electronic device, the context, user-interface activity information specifying activation (by the user) of one of the virtual command icons in the user interface (which may be received, via radio 118-2, from portable electronic device 110), and/or a display format in portable electronic device 110. (Alternatively, instead of generating the user interface, A/V hub 112 may select a predefined or predetermined user-interface information.) Then, A/V hub 112 may provide, via radios 118, the user-interface information to portable electronic device 110 for display on touch-sensitive display 124.
In this way, the user interface may be dynamically updated as the components in system 100 respond to commands or instructions received from portable electronic device 110 and/or A/V hub 112, so that the currently relevant one or more virtual icons are included in the user interface. This capability may simplify the user interface and make it easier for the user to navigate through and/or use the user interface.
Moreover, as described further below with reference to
However, from the perspective of A/V hub 112, the A/V content may still be unknown (i.e., A/V hub 112 may only ‘know’ indirect information that is associated with the A/V content, such as the location in the user interface). Thus, A/V hub 112 may not know the specific movie, television program, sporting event, entertainment event, etc. that the user selected. This ignorance may restrict the ability of A/V hub 112 to offer value-added services to the user, such as automating program recording, program offering and/or program selection on behalf of the user based on a history of user selections or activity when using A/V hub 112. In addition to an opportunity cost, the ignorance may also degrade the user experience, because the user may be forced to perform operations when using portable electronic device 110 and/or A/V hub 112, which can increase user frustration and, thus, may degrade the user experience.
In order to address this problem, A/V hub 112 may analyze additional information to directly identify the A/V content. For example, the content provider may provide a set of images (such as thumbnails) that are associated with the A/V content, and which are presented proximate to or adjacent to the locations and the virtual command icons in the user interface on portable electronic device 110. Then, when the user activates a virtual command icon at a location, the associated image may be analyzed to identify the selected A/V content. The image analysis may include optical character recognition of text in the image (such as a title of the A/V content, an actor in the A/V content, a director or a producer of the A/V content, etc.). In some embodiments, the image analysis includes a correction for tilting or a two- or three-dimensional orientation or rotation of the image, so that the image is transformed to a different two-dimensional plane that facilitates the image analysis. Furthermore, the image analysis may include pattern matching with a predefined group of A/V content provided by the content provider and/or face recognition of an actor in the predefined group of A/V content. In particular, the A/V content available from the content provider may be finite (such as a library with a finite number of movies or a broadcast or transmission schedule with a finite number of television programs, cable-television programs, sporting events or entertainment events), and this bound may be used to constrain (and, therefore, to simplify) the image analysis. Thus, instead of trying to identify the selected A/V content from tens of thousands of possible titles, the set of possible A/V content may be restricted to a few hundred titles available from the content provider or that the content provider is currently providing (which, as described further below with reference to
In some embodiments, the identification (or classification) involves analysis of audio or audio information and/or video or video information (e.g., at least portion of one or more images) in the selected A/V content. Note that image analysis may be understood to be extracting meaningful information from image information, such as by using digital image processing. For example, the image analysis may include: optical character recognition or text recognition, two-dimensional object recognition, three-dimensional object recognition, image segmentation, motion detection, face recognition, etc. In some embodiments, the image analysis includes: computing the Levenshtein distance between text in the image and text in a set of images provided by the content provider, computing a histogram of one or more images in the A/V content, computing a Fourier transform of one or more images in the A/V content (and, more generally, performing another type of transformation on the A/V content), identifying individuals (e.g., based on their behaviors relative to stored historical behaviors, face recognition analysis, etc.), comparing descriptors in one or more images with a feature data structure (which is sometimes referred to as a ‘model library’) using an object-detection or recognition technique (such as: scale invariant feature transform (SIFT), speed-up robust features (SURF), a binary descriptor (such as ORB), binary robust invariant scalable keypoints (BRISK), fast retinal keypoint (FREAK), etc.), performing video analysis (using a video-analysis technique such as: optical flow, a bag of systems representation, probabilistic kernels for the classification of auto-regressive visual processes, a mixture of dynamic textures, a histogram of oriented gradients, clouds of space-time interest points, mined hierarchical compound features, boosting efficient motion features, pyramid of histogram of gradients, scale-invariant feature transform, color histograms, bag of visual words representation, scene classification, face recognition, object recognition, etc.), and/or another analysis technique. In general, the text and video analysis may involve analyzing image information to determine one or more image features that characterize the image and/or the A/V content, which are then, respectively, compared to one or more image features associated with the set of images or the set of A/V content from the content provider to identify the A/V content based on the best match.
Similarly, audio analysis may be understood to be extracting meaningful information from audio information and/or sound, such as by using digital signal processing. This audio analysis may involve analyzing audio information to determine acoustic features that characterize the audio information, which are then compared to acoustic features associated with the set of A/V content from the content provider to identify the A/V content based on the best match. Note that the acoustic features may specify: time-domain information, frequency-domain information, spectral content, Mel frequency cepstral coefficients, Mel spectrum, cepstrum, chroma features, a spectral flux, a spectral flatness, a zero-crossing rate, an attack-time, a temporal increase, a temporal decrease, an effective duration, a temporal centroid, an energy modulation, a frequency modulation of an energy envelope, one or more auto-correlation coefficients, energy information (such as: global energy, harmonic energy, noise energy), a root-mean-square level, a bandwidth, a band-energy ratio, a delta spectrum magnitude, a pitch, a pitch strength, a spectral centroid, a spectral spread, a spectral skewness, a spectral kurtosis, a spectral slope, a spectral decrease, a spectral roll-off, a spectral variation, a fundamental frequency, noisiness, inharmonicity, a harmonic spectral deviation, an odd-to-even harmonic energy ratio, a harmonic tristimulus, a total loudness, a specific loudness, a relative specific loudness, a roughness, a sharpness, a spread, a spectral crest factor, temporal statistics (such as: the mean, variance, and/or deviation), and/or acoustic features based on gammatone filters. Thus, the acoustic features may be used to identify an actor in the A/V content based on their voice or vocal characteristics and/or what they say in the A/V content (e.g., using a voice-recognition technique and a corpus of dialog in the set of A/V content).
In some embodiments, the acoustic features include first, second and/or higher order instantaneous derivatives of one or more of the preceding specified information. Alternatively or additionally, the acoustic features may be determined using a pooling function over sets of several acoustic features extracted at different temporal locations of the audio information, where the pooling function can be, but is not restricted to: maximum, minimum, variance, standard deviation, mean, higher-order moments, higher-order centered moments, median, (1−x)·100 percentiles (where x is a percentile, an order statistic and/or more generally any summary statistic related to the value of any given acoustic feature), and/or integrals over the sets of features. Other embodiments may include a bag of features and/or a permutation feature computed from one or more of the preceding specified information and acoustic features. For example, given a spectral feature that represents the energy of the audio information in different frequency bands, a permutation feature may be computed by sorting the frequency bands based on their energy, and using the result of ranking the frequency bands as the permutation feature. Furthermore, statistical models computed from one or more of the preceding specified information and features may be used as acoustic features. In this case, given a set of features for the A/V content that are computed at the same and/or different temporal locations in the audio information, the audio information may be represented using a statistical model that describes the shape of the distribution of the set of features. Additionally, the features may include one or more sets of weights, derived from one or more of the preceding specified information, features and statistical models for the audio information in one or more instances of A/V content (such as in the set of A/V content). For example, cleaner and more robust identification for a particular instance of A/V content can be produced by modeling the co-occurrences/correlations of different features for several instances of A/V content. In particular, given A/V content and set of statistical models representing a predefined set of related or similar A/V content, the audio information in the new A/V content can be represented by a set of weights for the predefined set of A/V content. This set of weights may represent a high-level feature that can subsequently be used in a second stage of statistical modeling of the set of A/V content. In some embodiments, features derived by automatic recognition of speech and/or individuals that are talking from the audio information or from other features that characterize the audio information may be used in the content-analysis technique.
More generally, the image features and/or the acoustic features may be determined using an unsupervised learning technique. For example, the acoustic or image features may include absolute or relative counts of prototype acoustic or visual patterns, where the prototype acoustic or visual patterns may be learned from possibly large amounts of unlabeled data using unsupervised learning techniques such as: deep belief nets, clustering, vector quantization, and/or wavelets.
In these ways, the content-analysis technique may allow automated identification of selected A/V content even when a content provider of the A/V content does not explicitly share this information with A/V hub 112. This capability may allow A/V hub 112 to build a robust history of user selections or activity when using A/V hub 112, which in turn may allow value-added services to be provided to the user, such as automating program recording, program offering and/or program selection on behalf of the user. Consequently, the content-analysis technique may reduce user frustration and/or may improve the user experience when using portable electronic device 110, A/V hub 112, the one or more of A/V display devices 114, the one or more speakers 116 and/or the one or more content sources 126.
While the preceding embodiments illustrated the content-analysis technique with A/V content that is to be displayed on one of A/V display devices 114, in other embodiments the content-analysis technique may be used with audio content that is to be output by one or more of speakers 116. For example, a user may select initially unknown audio content (such as music) by making a selection at a location in a user interface. Then, while requesting and providing the selected audio content to one or more of speakers 116 (such as by providing a request that specifies the location to one of content sources 126), A/V hub 112 (or another component in system 100) may perform the content-analysis technique to identify the selected audio content.
Although we describe the network environment shown in
We now describe embodiments of a content-analysis technique.
In response to the request, the A/V hub receives, via the interface circuit, the A/V content (operation 214) from the content provider, and the A/V hub provides the A/V content (operation 216) to an A/V display device. Next, the A/V hub identifies the A/V content (operation 218) by performing analysis (such as image analysis) on the image based on predefined available A/V content from the content provider.
In some embodiments, prior to providing the A/V content (operations 216), the A/V hub optionally transforms the second A/V content from an initial format that is compatible with an initial communication protocol to a different communication protocol. For example, the A/V hub may transform or convert the second A/V content from an initial format that is compatible with a Transmission Control Protocol/Internet Protocol (TCP/IP) communication protocol to a format that is compatible with HDMI.
For example, interface circuit 318 in portable electronic device 110 may receive user-interface information 312, which is then provided to processor 320. Alternatively, processor 320 may generate user-interface information 312. Then, based on user-interface information 312, processor 320 may provide information specifying user interface 316 to touch-sensitive input/output (I/O) controller 322, which provides the information specifying user interface 316 to touch-sensitive display 124.
Moreover, touch-sensitive display 124 may provide information specifying user interaction 324 to touch-sensitive I/O controller 322. In turn, touch-sensitive I/O controller 322 may interpret the information specifying user interaction 324 to determine user-interface activity information 326. For example, user-interface activity information 326 may specify user selection of content source 126-1, such as user activation of the virtual command icon associated with content source 126-1. Touch-sensitive I/O controller 322 may provide user-interface activity information 326 to processor 320, which may provide user-interface activity information 326 to interface circuit 318.
Next, portable electronic device 110 (e.g., via interface circuit 318) may provide user-interface activity information 326 to A/V hub 112. After receiving user-interface activity information 326, interface circuit 314 may provide user-interface activity information 326 to processor 310. In response, processor 310 may instruct interface circuit 314 to provide request 328 for A/V content 330 (such as the A/V content) to content source 126-1. This request may be based on the virtual command icon that was activated or a location of the virtual command icon in the user interface (i.e., without explicit or direct knowledge of the A/V content). In addition, processor 310 may optionally determine display instructions 332 based on a format of a display in A/V display device 114-1. Alternatively, display instructions 332 may be predetermined or predefined.
After receiving request 328, content source 126-1 may provide A/V content 330 to A/V hub 112. Next, interface circuit 314 may optionally provide A/V content 330, which may optionally convert or transform 334 A/V content 330 from one format to another, such as from a format compatible with a TCP/IP communication protocol to a format compatible with a different communication protocol, such as HDMI. Moreover, interface circuit 314 may provide A/V content 330 and/or display instructions 332 (which may be provide differentially when there or changes or regularly, such as in each packet or in one of every N packets) to A/V display device 114-1 as frames with A/V content 330 are received from content source 126-1, so that A/V content 330 is displayed on the display in A/V display device 114-1. (Alternatively, in some embodiments interface circuit 314 provides A/V content 330 to processor 310, which instructs interface circuit 314 to provide A/V content 330 and display instructions 332 to A/V display device 114-1 as frames with A/V content 330 are received from content source 126-1.)
As shown in
In an exemplary embodiment, the selected A/V content is a movie is a group of movies provided by a content provider via content source. As shown in
When a user selects one of the movies by activating one of virtual command icons 512 (such as virtual command icon 512-4), the A/V hub may analyze image 514-4 to identify the movie. This is shown in
In this way, the content-analysis technique may allow the A/V hub to track user activity and A/V content selections, and to use this information to provide value-added services to the user. For example, the identified A/V content may be added to a history of user activity or selections, and this history may be used to automate future operations for the user. Thus, the A/V hub may use the history to determine which A/V content to: record or store for the user, recommend to the user and/or provide to the user at different locations, times of day, days of the week, etc. This capability may make it easier and more intuitive for a user to view A/V content using the A/V hub with a minimum of effort, thereby reducing user frustration and, thus, improving user satisfaction when using the portable electronic device, the A/V hub, and/or one or more A/V display devices. Consequently, method 200 (
In some embodiments of method 200 there are additional or fewer operations. Moreover, the order of the operations may be changed, and/or two or more operations may be combined into a single operation. Furthermore, one or more operations may be modified. For example, additionally, display instructions may be provided to an A/V display device differentially (such as when the display instructions change), regularly or periodically (such as in one of every N packets or in a packet in each frame) or in each packet.
Note that in this content-analysis technique the A/V hub may display the A/V content to an arbitrary A/V display device (including an A/V display device that is located remotely from the A/V hub, such as in another room) without a need for a separate set-top box that is located proximate to the A/V display device. Instead, the A/V hub may perform all of the frame-by-frame transcoding of the video content that is needed for the A/V display device to display the video content before providing the video content to the A/V display device. Thus, in contrast with many existing cable and satellite systems, the A/V hub may provide video content to multiple A/V display devices (such as N A/V display devices) without the use of N associated set-top boxes. Consequently, the A/V hub may eliminate the need for a separate set-top box in the same room as an A/V display device (although there may be a local wireless receiver that is associated with the A/V hub). This capability may be enabled by the knowledge of the device state information and the content selected by the users that is available to the A/V hub. In addition, this capability may eliminate the need for a user to know where or how a particular A/V display device is connected to a content source, such as cable television, a satellite dish or a security camera.
While the preceding embodiments illustrated the A/V hub performing the content-analysis technique, in some embodiments some or all of the operations in the content-analysis technique are performed remotely, such as by a cloud-based computer (such as a server that is accessed via a network, e.g., the Internet).
We now describe embodiments of determining device-state information. As noted previously, the device-state information (such as whether an electronic device is: electrically coupled to A/V hub 112 in
When the electrical coupling between the electronic device and input connector 712 is detected, control logic 724 may optionally attempt to identify the electronic device by providing consumer-electronics-control commands (which may be compatible with an HDMI standard) to the electronic device. Alternatively or additionally (such as when the attempt is unsuccessful), control logic 724 may provide a set of first control commands associated with different types of electronic devices until, in response, content activity (such as packets or frames associated with a data stream of content communicated to and/or from the electronic device) is detected by control logic 724 via input connector 712. For example, the set of first commands may include: a play command for the different types of electronic devices; and/or a trick-mode command (such as fast forward, reverse, fast reverse, or skip) for the different types of electronic devices. Moreover, when the content activity is detected, control logic 724 may provide a set of second control commands associated with different providers of electronic devices until a change in a state of the electronic device is detected by control logic 724 via input connector 712 and state-detection circuit 710. The set of second control commands may include: power-on control commands for the different providers of electronic devices; and/or power-off control commands for the different providers of electronic devices.
Alternatively or additionally, during operation control logic 724 may detect whether there is electrical coupling between the electronic device and input connector 712 using state-detection circuit 710 (
We now describe embodiments of an electronic device.
Memory subsystem 912 includes one or more devices for storing data and/or instructions for processing subsystem 910 and networking subsystem 914. For example, memory subsystem 912 can include dynamic random access memory (DRAM), static random access memory (SRAM), and/or other types of memory. In some embodiments, instructions for processing subsystem 910 in memory subsystem 912 include: one or more program modules or sets of instructions (such as program module 922 or operating system 924), which may be executed by processing subsystem 910. Note that the one or more computer programs or program modules may constitute a computer-program mechanism. Moreover, instructions in the various modules in memory subsystem 912 may be implemented in: a high-level procedural language, an object-oriented programming language, and/or in an assembly or machine language. Furthermore, the programming language may be compiled or interpreted, e.g., configurable or configured (which may be used interchangeably in this discussion), to be executed by processing subsystem 910.
In addition, memory subsystem 912 can include mechanisms for controlling access to the memory. In some embodiments, memory subsystem 912 includes a memory hierarchy that comprises one or more caches coupled to a memory in electronic device 900. In some of these embodiments, one or more of the caches is located in processing subsystem 910.
In some embodiments, memory subsystem 912 is coupled to one or more high-capacity mass-storage devices (not shown). For example, memory subsystem 912 can be coupled to a magnetic or optical drive, a solid-state drive, or another type of mass-storage device. In these embodiments, memory subsystem 912 can be used by electronic device 900 as fast-access storage for often-used data, while the mass-storage device is used to store less frequently used data.
Networking subsystem 914 includes one or more devices configured to couple to and communicate on a wired and/or wireless network (i.e., to perform network operations), including: control logic 916, interface circuits 918 and associated antennas 920. (While
Networking subsystem 914 includes processors, controllers, radios/antennas, sockets/plugs, and/or other devices used for coupling to, communicating on, and handling data and events for each supported networking system. Note that mechanisms used for coupling to, communicating on, and handling data and events on the network for each network system are sometimes collectively referred to as a ‘network interface’ for the network system. Moreover, in some embodiments a ‘network’ between the electronic devices does not yet exist. Therefore, electronic device 900 may use the mechanisms in networking subsystem 914 for performing simple wireless communication between the electronic devices, e.g., transmitting advertising or beacon frames and/or scanning for advertising frames transmitted by other electronic devices as described previously.
Within electronic device 900, processing subsystem 910, memory subsystem 912, networking subsystem 914 and optional feedback subsystem 934 are coupled together using bus 928. Bus 928 may include an electrical, optical, and/or electro-optical connection that the subsystems can use to communicate commands and data among one another. Although only one bus 928 is shown for clarity, different embodiments can include a different number or configuration of electrical, optical, and/or electro-optical connections among the subsystems.
In some embodiments, electronic device 900 includes a display subsystem 926 for displaying information on a display (such as the communication warning message), which may include a display driver, an I/O controller and the display. Note that a wide variety of display types may be used in display subsystem 926, including: a two-dimensional display, a three-dimensional display (such as a holographic display or a volumetric display), a head-mounted display, a retinal-image projector, a heads-up display, a cathode ray tube, a liquid-crystal display, a projection display, an electroluminescent display, a display based on electronic paper, a thin-film transistor display, a high-performance addressing display, an organic light-emitting diode display, a surface-conduction electronic-emitter display, a laser display, a carbon-nanotube display, a quantum-dot display, an interferometric modulator display, a multi-touch touchscreen (which is sometimes referred to as a touch-sensitive display), and/or a display based on another type of display technology or physical phenomenon.
Furthermore, optional feedback subsystem 934 may include one or more sensor-feedback mechanisms or devices, such as: a vibration mechanism or a vibration actuator (e.g., an eccentric-rotating-mass actuator or a linear-resonant actuator), a light, one or more speakers, etc., which can be used to provide feedback to a user of electronic device 900 (such as sensory feedback about the status of a user instruction to change the state of one of the components in system 100 in
Electronic device 900 can be (or can be included in) any electronic device with at least one network interface. For example, electronic device 900 can be (or can be included in): a desktop computer, a laptop computer, a subnotebook/netbook, a server, a tablet computer, a smartphone, a cellular telephone, a consumer-electronic device (such as a television, a set-top box, audio equipment, video equipment, etc.), a remote control, a portable computing device, an access point, a router, a switch, communication equipment, test equipment, and/or another electronic device.
Although specific components are used to describe electronic device 900, in alternative embodiments, different components and/or subsystems may be present in electronic device 900. For example, electronic device 900 may include one or more additional processing subsystems, memory subsystems, networking subsystems, and/or display subsystems. Moreover, while one of antennas 920 is shown coupled to a given one of interface circuits 918, there may be multiple antennas coupled to the given one of interface circuits 918. For example, an instance of a 3×3 radio may include three antennas. Additionally, one or more of the subsystems may not be present in electronic device 900. Furthermore, in some embodiments, electronic device 900 may include one or more additional subsystems that are not shown in
Moreover, the circuits and components in electronic device 900 may be implemented using any combination of analog and/or digital circuitry, including: bipolar, PMOS and/or NMOS gates or transistors. Furthermore, signals in these embodiments may include digital signals that have approximately discrete values and/or analog signals that have continuous values. Additionally, components and circuits may be single-ended or differential, and power supplies may be unipolar or bipolar.
An integrated circuit may implement some or all of the functionality of networking subsystem 914, such as one or more radios. Moreover, the integrated circuit may include hardware and/or software mechanisms that are used for transmitting wireless signals from electronic device 900 and receiving signals at electronic device 900 from other electronic devices. Aside from the mechanisms herein described, radios are generally known in the art and hence are not described in detail. In general, networking subsystem 914 and/or the integrated circuit can include any number of radios.
In some embodiments, networking subsystem 914 and/or the integrated circuit include a configuration mechanism (such as one or more hardware and/or software mechanisms) that configures the radios to transmit and/or receive on a given channel (e.g., a given carrier frequency). For example, in some embodiments, the configuration mechanism can be used to switch the radio from monitoring and/or transmitting on a given channel to monitoring and/or transmitting on a different channel. (Note that ‘monitoring’ as used herein comprises receiving signals from other electronic devices and possibly performing one or more processing operations on the received signals, e.g., determining if the received signal comprises an advertising frame, calculating a performance metric, performing spectral analysis, etc.) Furthermore, networking subsystem 914 may include at least one port (such as an HDMI port 932) to receive and/or provide the information in the data stream to A/V display device 114-1 (
While a communication protocol compatible with Wi-Fi was used as an illustrative example, the described embodiments may be used in a variety of network interfaces. Furthermore, while some of the operations in the preceding embodiments were implemented in hardware or software, in general the operations in the preceding embodiments can be implemented in a wide variety of configurations and architectures. Therefore, some or all of the operations in the preceding embodiments may be performed in hardware, in software or both. For example, at least some of the operations in the content-analysis technique may be implemented using program module 922, operating system 924 (such as drivers for interface circuits 918) and/or in firmware in interface circuits 918. Alternatively or additionally, at least some of the operations in the content-analysis technique may be implemented in a physical layer, such as hardware in interface circuits 918.
Moreover, while the preceding embodiments included a touch-sensitive display in the portable electronic device that the user touches (e.g., with a finger or digit, or a stylus), in other embodiments the user interface is display on a display in the portable electronic device and the user interacts with the user interface without making contact or touching the surface of the display. For example, the user's interact(s) with the user interface may be determined using time-of-flight measurements, motion sensing (such as a Doppler measurement) or another non-contact measurement that allows the position, direction of motion and/or speed of the user's finger or digit (or a stylus) relative to position(s) of one or more virtual command icons to be determined. In these embodiments, note that the user may activate a given virtual command icon by performing a gesture (such as ‘tapping’ their finger in the air without making contact with the surface of the display). In some embodiments, the user navigates through the user interface and/or activates/deactivates functions of one of the components in system 100 (
Furthermore, while A/V hub 112 (
While the preceding embodiments illustrated the content-analysis technique with audio and video content (such as HDMI content), in other embodiments the content-analysis technique is used in the context of an arbitrary type of data or information. For example, the content-analysis technique may be used with home-automation data. In these embodiments, A/V hub 112 (
In the preceding description, we refer to ‘some embodiments.’ Note that ‘some embodiments’ describes a subset of all of the possible embodiments, but does not always specify the same subset of embodiments.
The foregoing description is intended to enable any person skilled in the art to make and use the disclosure, and is provided in the context of a particular application and its requirements. Moreover, the foregoing descriptions of embodiments of the present disclosure have been presented for purposes of illustration and description only. They are not intended to be exhaustive or to limit the present disclosure to the forms disclosed. Accordingly, many modifications and variations will be apparent to practitioners skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present disclosure. Additionally, the discussion of the preceding embodiments is not intended to limit the present disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein.