The invention generally relates to annotating and reviewing consumable data such as any electronically accessible entertainment, and more particularly to applying the collective activity of consumers effort to identify interesting regions of consumable data to facilitate identifying annotations or “highlights” for the consumable data.
Current trend analysis suggests streamed consumable data will become a dominant distribution technique. In-Stat, LLC (see http://www.instat.com), a company providing analysis and forecasts of digital media and content, including video streaming, downloads and digital TV, estimates streaming and online access of consumable data is preferred by audience members over retail disc sales as the major distribution channel for people to receive consumable data in future digital entertainment delivery. This represents a rapid growth in online consumable data access, as exemplified by statistics provided by Cisco showing Internet video is approximately one-quarter of all non-peer-to-peer consumer Internet traffic, and it is expected that in 2012, Internet video traffic will be nearly 400 times the data usage for the entire U.S. Internet backbone in the year 2000 (see http://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/white_paper_c11-481374_ns827_Networking_Solutions_White_Paper.html). Similarly, the New York Times newspaper estimated that YouTube's 2007 video traffic alone exceeded the total Internet traffic for the United States in the year 2000 (see, e.g., http://www.nytimes.com/2008/03/13/technology/13net.html).
Existing research has resulted in numerous technologies, such as video analytics and applying artificial intelligence to consumable data, in an effort to better understand and recognize consumable data content. See, for example, TREC Video Retrieval Evaluation at http://trecvid.nist.gov, a conference sponsored by the National Institute of Standards and Technology (NIST) with support from other U.S. government agencies. The goal of TREC is to encourage information retrieval research; in 2001 and 2002 TREC provided video data to assist research in automatic segmentation, indexing, and content-based retrieval of digital video. However, this and other technologies have been unsuccessful in, for example, trying to identify areas of heightened interest to a particular audience.
The features and advantages of the present invention will become apparent from the following detailed description of the present invention in which:
Various embodiments of the invention concern utilizing collective behavior to improve identification results. In various illustrated embodiments an effort is made to identify regions of interest within an audio, video or other consumable/accessible data; the phrase “consumable data” will be used to collectively refer to such data, and it is intended to refer to data that is stored in any state preserving media or medium and that may be singly or multiply or simultaneously accessed. Consumable data may represent, for example, stored and/or streamed video or audio data, as well as individual frames, sections, portions, cuts, etc. of such audio, video, etc. data. It will be appreciated by one skilled in the art that audio and video data are presented for exemplary purposes and any data collection in which portions of interest may be identified by one or more entities are intended to be within the scope of the recited embodiments.
It will be appreciated that “interest” is a relative term that may have a different meaning depending on the intended audience, e.g., what is interesting to an adult audience may be very different than what is interesting to a young adult audience. Thus, even if not specifically called out below, it should be appreciated by one skilled in the art the same techniques described herein may give different results depending on the nature of the audience performing the described operations, and that results from diverse audiences may be selectively combined as desired.
In the illustrated embodiments, it is assumed interactive behaviors of a target audience (or audiences) are monitored as members of the audience interact with consumable data. This monitoring may be performed in real or near-real time as audiences interact with consumable data. Or, monitoring may occur after the fact based on data accumulated with respect to a particular viewing or data consumption experience. For convenience in describing various features of the inventive concepts presented herein, it will be assumed an audience is interacting with a video, such as a recorded (or buffered) video broadcast or electronically accessible movie. However, as discussed above, the principles herein apply to any consumable data. Through monitoring collective audience interaction, a collective intelligence can be harnessed to identify meaningful regions within consumable data, e.g., audio, video, etc. Meaningful regions for a video could be, for example, segments of a video (typically referred to as video highlights) identified as being interesting.
The phrase “interactive audience analysis” or IAA, may be used to refer to analysis performed on the actions of the target audience(s). IAA differs from, for example, current automated video analytics technologies such as those that attempt to extract video highlights based on automated computer vision, machine learning and other artificial intelligence technologies. It will be appreciated automated video analytics technology and the disclosed embodiments need not be mutually exclusive, e.g., the disclosed embodiments may be utilized in conjunction with video analytics. It will be appreciated video analytics may be performed before, during or after the IAA, e.g., video analytics may be a pre, post, or interim processing stage depending on the needs and/or goals of the IAA.
In the illustrated embodiment, it is assumed audience members are being monitored as they interact with streamed consumable data. This is a simplification assumption since it is typically easier to monitor access to streamed data, e.g., attempts to seek within a data stream can be determined by watching the commands to move within a stream that needs to be provided from an external source, However, it will be appreciated existing/stored content may be similarly monitored through use of hardware and/or software enabled devices configured to monitor data corresponding to seeking within a stream, and providing, e.g., by way of sending (pushing) the monitored data or allowing it to be accessed (pulled), the monitored data to an external entity, such as a cable television or satellite broadcast head end, Internet server (which may also provide streamed consumable data), etc.
As illustrated in
One consecutive (or relatively consecutive) consumption time of the consumable data is, as noted above, represented with the illustrated region 112. Region 112 has a width representing a length of time of consuming the consumable data. It is expected the length of time is less than (tn−to), otherwise the audience member would have consumed the entire consumable data. It will be appreciated that if the consumable data is video data, then the region 112 represents the amount of time of the video that has been watched, and if the consumable data is audio data, it represents the amount of time to which the audio data was listened. In the illustrated embodiment, it is expected the audience member may use a “fast forward” type of control, skip button or feature, or directly drag a currently play position marker, to move consumption of the consumable data from timing marker 104 indicating the end of the consumed region 112 to some other marker location, such as to marker 106, to skip over content within the consumable data considered less interesting, and allow accessing more interesting content. In the illustrated embodiment, movement of the current play marker within the consumable data represents a judgment or opinion of an audience member on whether a particular section of the consumable data is worth consuming, e.g., worth viewing, listening, reading, etc. as determined by the type of consumable data.
As with region 112, in the illustrated embodiment, marker 106 identifies the start of another region 114 representing more interesting content. At some point in time (not illustrated) the consumable data consumer moves the current play marker and skips to timing marker 108 and again watches or otherwise consumes another region 116 of consumable data. This repeats again where the current play jumps to timing marker 110, at which point the consumable data must have been interesting because a larger region 118 (larger with respect to the other regions 112-116) of the consumable data is viewed or otherwise consumed.
In the illustrated embodiment, it is assumed a consumer utilizes a fast forward/rewind, skip feature or button, or other technique to change the current play position. When access to the consumable data is for a subsequent, e.g., 2nd, 3rd, etc. time, it is assumed the consumer's judgment on what is an interesting region, e.g., a “highlight”, within the data is more accurate. Service providers may track the collective behaviors of a large group of consumers, and use the subsequent consumptions to refine what is considered interesting within a particular consumable data. For example, the most popular movie on youku.com (a Chinese video streaming site) is usually watched by more than 3,000,000 times, representing an enormous number of consumers that may be monitored, The service provider may monitor and learn how consumers extract highlights, and determine a collective judgment for the consumption. In selected embodiments determining a collective judgment is an iterative and adaptive process. In the illustrated embodiment, after consumption has identified the larger region 118, the consumer continues to consume the data, such as by skipping the current play marker to locations 202-206 and respectively watching or otherwise consuming data portions 210-214.
The embodiment represents the consumer, after watching or otherwise consuming for some period of time as illustrated in
As with
With such multiple consumer inputs, a service provider or other entity can combine the input to perform interactive audience analysis (IAA). Note that while the
In one embodiment, this weighting can be defined with respect to a set such that: {[t1, duration1, weight1], [t2, duration2, weight2], . . . , [tn, durationn, weightn]}, where after determining the first region collection 418, n=3 and the value for the regions 116, 306, 318 are pre-assigned to be 1 for the first consumer of the consumable data, e.g., the first viewer of a video. In one embodiment, when a second consumer accesses the consumable data and generates the second collection 420 of interesting regions, each of the second consumer's regions are also assigned a value of 1 for the second consumer's consumption, but an overlapping regions, e.g., the portion 422 identified by dashed brackets, assuming simple addition, that region would be assigned a value of 2. Over time, after many consumers access the consumable data, there will be certain regions of the consumable data that are statistically considered significantly more interesting to the aggregate audience that consumed the data.
In one embodiment, region weightings will be f(N) if the consumer has consumed the entire consumable data N times, e.g., watched a “full-length” video N times, where N>1 and f(N)>>1 (much greater than 1) so as to give great weight to the presumed accuracy of interesting region identification by consumers having knowledge of the entire consumable data from multiple entire consumption, e.g. from having watched an entire video multiple times. It will be appreciated that a service provider may offer some incentive, discount, coupon, or the like, e.g., a microeconomic stimulus, to encourage complete consumption and interesting region identification.
For example, if the consumable data includes a publicly released video such as a movie, one can acquire 502 data identifying interesting portions of consumable data, which for a movie would typically include trailers and other advertising regarding the movie. The acquired data can then be mapped 504 against the consumable data to identify 506 interesting regions within the consumable data. The phrase “exemplar data” will be used herein to refer to any data concerning the consumable data that may be mapped 504 to identify 506 interesting regions within the consumable data.
For the movie, exemplar data includes the trailers and other advertising regarding the movie, and video analytics may be employed to match exemplar data to the movie to identify the region or regions within the consumable data corresponding to the exemplar data. Movie trailer type of exemplar data are typically a “Director's Cut” of highlights, but they are usually combined into a single end-to-end presentation. In one embodiment, the entity or device pre-annotating the timeline may employ video analytics to detect 508 changes, such as scene changes, within the exemplar data and distinguish 510 multiple interesting sub-regions within the exemplar data. Video searching and/or video matching technologies may be applied 512 to identify longer versions of the distinguished 510 highlights within the exemplar data. Similarly, if the consumable data includes audio data such as a song or soundtrack, audio analytics (not illustrated) may be employed to identify where exemplar data may be found within the consumable data, as well as to find similar “sounds like” matches.
After identifying 506 interesting regions, in one embodiment, “fuzzy” matching may be performed 514 to allow finding portions of the consumable data that is “like” the exemplar data, and thus increase the number of identified interesting regions. To do so, for example, content analysis of video or audio data may be used to find other portions of the consumable data that is like the exemplar data. It will be appreciated fuzzy matching typically has an associated relevance rating to reflect a degree of relevance between a candidate match and the exemplar data. In one embodiment a required minimum degree of relevance, which can be arbitrarily set or determined with respect to the exemplar data, can be required for the candidate match to be considered an additional interesting region to be added to the identified 506 interesting regions.
Once interesting regions have been identified 506, 514 within the consumable data, these can be used to define the Collective Cut (CT), and they can be used to pre-annotate 516 the timeline for the consumable data. In one embodiment, the initially identified 506 regions are associated with a heavy weighting because the Director's Cut is considered to have high accuracy as to what is interesting.
As illustrated there are interesting region collections 622, 624 corresponding to the combined input from
Regions 624 include additional interesting regions 626-630 which may be identified by a consumer as discussed above in the other illustrated embodiments. In the
As more consumers contribute their refined and/or original identification of interesting regions within consumable data, the collection of interesting regions will continue to acquire more regions, each having varying weights. In one embodiment, a service provider, intermediary device along a transmission path or data path to the consumer, endpoint device utilized by the consumer, or other device, may elect to periodically condense region collections to reduce the number of regions being managed. In one embodiment, if two adjacent interesting regions have the same weight, they can be coalesced into one region. It will be appreciated consumer identification of interesting regions will not be precise, hence a tolerance may be applied when determining whether regions are adjacent. In one embodiment, multiple service providers may share interesting region identification consumable data common to the service providers to increase accuracy.
In one embodiment, when service providers have enough confidence in the collection of interesting regions, they may publish some or all of the identified regions, e.g., the service provider may elect to only release interesting regions that have been selected by a certain percentage of a targeted audience. Further, it will be appreciated that with the current ability to track a consumer's age and social, economic, religious, political, geographic, ethnic, food, etc. interests, a sufficiently large collection of interesting regions may be defined for and presented to specific audiences, e.g., a specific set of consumers sharing one or more desired characteristics. in one embodiment, service providers may provide customized annotations for specific customers having known interests and time availability, e.g., by way of questionnaires and/or monitored behavior or other meta data known about the consumer. The data known about the consumer can be used to select interesting regions relevant to the consumer and presented as the annotations for the consumable data. Regarding time availability, different consumers may have different amounts of available time to consumer data, such as the length of a bus or train ride to/from work, or other known time duration, and this may be a factor in the selection of regions for an annotation. For example, if one is short of time, an annotation may be defined such that it has only the highest rated region that fit within the time available to the consumer.
Typically, the environment includes a machine 800 that includes a system bus 802 to which is attached processors 804, a memory 806, e.g., random access memory (RAM), read-only memory (ROM), or other state preserving medium, storage devices 808, a video interface 810, and input/output interface ports 812. It will be appreciated that while elements of the machine 800 may be referenced in the singular, multiple elements not illustrated may be present. The machine may be controlled, at least in part, by input from conventional input devices, such as keyboards, mice, etc., as well as by directives received from another machine, interaction with a virtual reality (VR) environment, biometric feedback, cooperative or aggregate learning or other input source or signal.
The machine may include embedded controllers, such as programmable or non-programmable logic devices or arrays, Application Specific Integrated Circuits, embedded computers, smart cards, and the like. The machine may utilize one or more connections to one or more remote machines 814, 816, such as through a network interface 818, modem 820, or other communicative coupling Machines may be interconnected by way of one or more physical and/or logical networks 822, such as an intranet, the Internet, local area networks, wide area networks, cloud network, distributed network, peer-to-peer network, and the like. One skilled in the art will appreciated that communication with network 822 may utilize various wired and/or wireless short range or long range carriers and protocols, including radio frequency (RF), satellite, microwave, Institute of Electrical and Electronics Engineers (IEEE) 802.11, Bluetooth, optical, infrared, cable, laser, etc. In some embodiments, multiple ones of networks 822 may be simultaneously utilized, and metrics such as cost, efficiency, preferences, power, etc. may be applied to control how particular ones of networks 822 are selected and how data is apportioned across multiple active networks.
The invention may be described by reference to or in conjunction with associated data including functions, procedures, data structures, application programs, etc. which when accessed by a machine results in machine 800 components performing tasks or defining abstract data types or low-level hardware contexts. Associated data may be stored in, for example, volatile and/or non-volatile memory 806, or in storage devices 808 and their associated storage media, including hard-drives, floppy-disks, optical storage, tapes, flash memory, memory sticks, digital video disks, biological storage, etc. Associated data may be delivered wholly or in part over transmission environments, including network 822, in the form of packets, serial data, parallel data, propagated signals sent and/or received by a tangible component, etc., and may be used in a compressed or encrypted format. Associated data may be used in a distributed environment, and stored locally and/or remotely for access by single or multi-processor machines.
Thus, for example, with respect to the illustrated embodiments, assuming machine 800 embodies a device utilized by a
Having described and illustrated the principles of the invention with reference to illustrated embodiments, it will be recognized that the illustrated embodiments can be modified in arrangement and detail without departing from such principles. And, though the foregoing discussion has focused on particular embodiments, other configurations are contemplated. In particular, even though expressions such as “in one embodiment,” “in another embodiment,” or the like are used herein, these phrases are meant to generally reference embodiment possibilities, and are not intended to limit the invention to particular embodiment configurations. As used herein, these terms may reference the same or different embodiments that are combinable into other embodiments.
Consequently, in view of the wide variety of permutations to the embodiments described herein, this detailed description is intended to be illustrative only, and should not be taken as limiting the scope of the invention. What is claimed as the invention, therefore, is all such modifications as may come within the scope and spirit of the following claims and equivalents thereto.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/CN10/02025 | 12/13/2010 | WO | 00 | 12/30/2013 |