The identification of interesting portions of video content for playback, for example, as highlights, is often manually performed by the producer of the content. Thus, the portions chosen as highlights may be representative of the producer's best guess as to the interests of the broad viewing audience, rather than any particular individual or sub-group of the audience.
Various embodiments are disclosed herein that relate to selecting portions of video items based upon data from video viewing environment sensors. For example, one embodiment provides a method comprising receiving, for a video item, an emotional response profile for each viewer of a plurality of viewers, each emotional response profile comprising a temporal correlation of a particular viewer's emotional response to the video item when viewed by the particular viewer, and then selecting, using the emotional response profiles, a first portion of the video item judged to be more emotionally stimulating than a second portion of the video item. The selected first portion is then sent to another computing device in response to a request for the first portion of the video item without sending the second portion of the video item.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Furthermore, the claimed subject matter is not limited to implementations that solve any or all disadvantages noted in any part of this disclosure.
As mentioned above, selecting portions of video content items, such as sports presentations or movies, for use as highlights, trailers, or other such edited presentation has generally relied up on human editorial efforts. More recently, scraping has been used to aggregate computer network-accessible content into an easily browsable format to assist with content discovery. Scraping is an automated approach in which programs are used to harvest information from one or more content sources such as websites, semantically sort the information, and present the information as sorted so that a user may quickly access information customized to the user's interest.
Scraping may be fairly straightforward where entire content items are identified in the scrape results. For example, still images, video images, audio files, and the like may be identified in their entirety by title, artist, keywords, and other such metadata applied to the content as a whole. However, the identification of intra-video clips (i.e. video clips taken from within a larger video content item) poses challenges. For example, many content items may lack intra-media metadata that allows clips of interest to be identified and separately pulled from the larger content item. In other cases, video content items may be stored as a collection of segments that can be separately accessed. However, such segments may still be defined via human editorial input.
Thus, the disclosed embodiments relate to the automated identification of portions of video content that may be of particular interest compared to other portions of the same video content, and presenting the identified portions to viewers separate from the other portions. The embodiments may utilize viewing environment sensors, such as image sensors, depth sensors, acoustic sensors, and potentially other sensors such as biometric sensors, to assist in determining viewer preferences for use in identifying such segments. Such sensors may allow systems to identify individuals, detect and understand human emotional expressions of the identified individuals, and utilize such information to identify particularly interesting portions of a video content item.
In turn, emotional response profiles of the viewers to the video items are sent to a server computing device 130 via network 110, where, for each of the video items, the emotional responses from a plurality of viewers are synthesized into an aggregated emotional response profile for that video item. Later, a requesting viewer seeking an interesting or emotionally stimulating video clip taken from one of those video items may receive a list of portions of those video items judged to be more emotionally stimulating than other portions of those same items. From that list, the requesting viewer may request one or more portions of those video item(s) to view, individually or as a compilation. On receiving the request, the server computing device sends the requested portions to the requesting computing device without sending the comparatively less stimulating and/or less interesting portions of those video item(s). Thus, the requesting viewer is provided with a segment of the video item that the requesting viewer may likely find interesting and emotionally stimulating. Likewise, such analysis may be performed on plural video items to present a list of potentially interesting video clips taken from different video content items. This may help in content discovery, for example.
Video viewing environment sensor system 106 may include any suitable sensors, including but not limited to one or more image sensors, depth sensors, and/or microphones or other acoustic sensors. Data from such sensors may be used by computing device 104 to detect facial and/or body postures and gestures of a viewer, which may be correlated by media computing device 104 to human affect displays. As an example, such postures and gestures may be compared to predefined reference affect display data, such as posture and gesture data, that may be associated with specified emotional states. It will be understood that the term “human affect displays” as used herein may represent any detectable human response to content being viewed, including but not limited to human emotional expressions and/or detectable displays of human emotional behaviors, such as facial, gestural, and vocal displays, whether performed consciously or subconsciously.
Media computing device 104 may process data received from sensor system 106 to generate temporal relationships between video items viewed by a viewer and each viewer's emotional response to the video item. As explained in more detail below, such relationships may be recorded as a viewer's emotional response profile for a particular video item and included in a viewing interest profile cataloging the viewer's video interests. This may allow the viewing interest profile for a requesting viewer to be later retrieved and used to select portions of one or more video items of potential interest to the requesting viewer.
As a more specific example, image data received from viewing environment sensor system 106 may capture conscious displays of human emotional behavior of a viewer, such as an image of a viewer 160 cringing or covering his face. In response, the viewer's emotional response profile for that video item may indicate that the viewer was scared at that time during the item. The image data may also include subconscious displays of human emotional states. In such a scenario, image data may show that a user was looking away from the display at a particular time during a video item. In response, the viewer's emotional response profile for that video item may indicate that she was bored or distracted at that time. Eye-tracking, facial posture characterization and other suitable techniques may also be employed to gauge a viewer's degree of emotional stimulation and engagement with video item 150.
In some embodiments, an image sensor may collect light within a spectral region that is diagnostic of human physiological conditions. For example, infrared light may be used to approximate blood oxygen levels and/or heart rate levels within the body. In turn, such levels may be used to estimate the person's emotional stimulation.
Further, in some embodiments, sensors that reside in other devices than viewing environment sensor system 106 may be used to provide input to media computing device 104. For example, in some embodiments, an accelerometer and/or other sensors included in a mobile computing device 140 (e.g., mobile phones and laptop and tablet computers) held by a viewer 160 within video viewing environment 100 may detect gesture-based or other emotional expressions for that viewer.
As shown in
In some embodiments, sensor data from sensors on a viewer's mobile device may be provided to the media computing device. Further, supplemental content related to a video item being watched may be provided to the viewer's mobile device. Thus, in some embodiments, a mobile computing device 140 may be registered with media computing device 104 and/or server computing device 130. Suitable mobile computing devices include, but are not limited to, mobile phones and portable personal computing devices (e.g., laptops, tablet, and other such computing devices).
As shown in
At 202, method 200 includes collecting sensor data at the video viewing environment sensor, and potentially from mobile computing device 140 or other suitable sensor-containing devices. At 204, method 200 comprises sending the sensor data to the media computing device, which receives the input of sensor data. Any suitable sensor data may be collected, including but not limited to image sensor data, depth sensor data, acoustic sensor data, biometric sensor data, etc.
At 206, method 200 includes determining an identity of a viewer in the video viewing environment from the input of sensor data. In some embodiments, the viewer's identity may be established from a comparison of image data collected by the sensor data with image data stored in the viewer's personal profile. For example, a facial similarity comparison between a face included in image data collected from the video viewing environment and an image stored in the viewer's profile may be used to establish the identity of that viewer. A viewers' identity also may be determined from acoustic data, or any other suitable data.
At 208, method 200 includes generating an emotional response profile for the viewer, the emotional response profile representing a temporal correlation of the viewer's emotional response to the video item being displayed in the video viewing environment. Put another way, the viewer's emotional response profile for the video item indexes that viewer's emotional expressions and behavioral displays as a function of a time position within the video item.
In the example shown in
In some embodiments, semantic mining module 302 may be configured to distinguish between the viewer's emotional response to a video item and the viewer's general temper. For example, in some embodiments, semantic mining module 302 may ignore those human affective displays detected when the viewer's attention is not focused on the display device, or may record information regarding the user's attentive state in such instances. Thus, as an example scenario, if the viewer is visibly annoyed because of a loud noise originating external to the video viewing environment, semantic mining module 302 may be configured not ascribe the detected annoyance with the video item, and/or may not record the annoyance at that temporal position within the viewer's emotional response profile for the video item. In embodiments in which an image sensor is included as a video viewing environment sensor, suitable eye tracking and/or face position tracking techniques may be employed to determine a degree to which the viewer's attention is focused on the display device and/or the video item.
A viewer's emotional response profile 304 for a video item may be analyzed to determine the types of scenes/objects/occurrences that evoked positive and negative responses in the viewer. For example, in the example shown in
By performing such analysis for other content items viewed by the viewer, as shown at 310 of
Turning back to
At 214, method 200 includes aggregating a plurality of emotional response profiles from different viewers to form an aggregated emotional response profile for that video item. In some embodiments, method 200 may include presenting a graphical depiction of the aggregated emotional response profile at 216. Such views may provide a viewer with a way to distinguish emotionally stimulating and interesting portions of a video item from other portions of the same item at a glance, and also may provide a mechanism for a viewer to select such video content portions for view (e.g. where the aggregated profile acts as a user interface element that controls video content presentation).
Further, in some embodiments, such views may be provided to content providers and/or advertising providers so that those providers may discover those portions of video items that made emotional connections with viewers (and/or with viewers in various market segments). For example, in a live broadcast scenario, a content provider receiving such views may provide, in real time, suggestions to broadcast presenters about ways to engage and further connect with the viewing audience, potentially retaining viewers who might otherwise be tempted to change channels.
For example,
Returning to
In some embodiments, the request may include a search term and/or a filter condition provided by the requesting viewer, so that selection of the first portion of the video content may be based in part on the search term and/or filter condition. However, it will be appreciated that a requesting viewer may supply such search terms and/or filter conditions at any suitable point within the process without departing from the scope of the present disclosure.
At 220, method 200 includes selecting, using the emotional response profiles, a first portion of the video item judged to be more emotionally stimulating than a second portion of the video item. Thus, the emotional response may be used to identify portions of the video item that were comparatively more interesting to the aggregated viewing audience (e.g., the viewers whose emotional response profiles constitute the aggregated emotional response profile) than other portions evoking less of an emotional reaction in the audience. As a consequence, interesting portions of video media may be selected and/or summarized as a result of crowd-sourced emotional response information to longer video media.
In some embodiments, the crowd-sourced results may be weighted by the emotional response profiles for a group of potentially positively correlated viewers (e.g., people who may be likely to respond to a video item in a similar manner as the viewer as determined by a social relationship or other link between the viewers). Thus, in some embodiments, emotional response profiles for group members may have a higher weight than those for non-members. Once weights are assigned, selection may be performed in any suitable way. The weights could be assigned in any suitable manner, for example, a number in a range of zero to one. In one example, a weighted arithmetic mean may be calculated, as a function of time, to identify a mean magnitude of emotional stimulation at various time positions within the video item. As a consequence, the selection result may be comparatively more likely to be interesting to the viewer than an unweighted selection result (e.g., a selection result where all the aggregated emotional response profile is unweighted).
Further, in some embodiments, weights for a group (or a member of a group) may be based on viewer input. For example, weights may be based on varying levels of social connection and/or intimacy in a viewer's social network. In another example, weights may be based on confidence ratings assigned by the viewer that reflect a relative level of the viewer's trust and confidence in that group's (or member's) tastes and/or ability to identify portions of video items that the viewer finds interesting. In some other embodiments, confidence ratings may be assigned without viewer input according to characteristics, such as demographic group characteristics, suggesting positive correlations between group member interests and viewer interests. It will be understood that these methods for weighting emotional response profiles are presented for the purpose of example, and are not intended to be limiting in any manner.
In scenario 410, aggregated viewer emotional response profile 314 is weighted by viewers in the requesting viewer's social network. Thus, selection of the first portion of the video item is based on using a subset of the aggregated emotional response profiles corresponding to viewers belonging to the requesting viewer's social network. It will be appreciated that a social network may be any suitable collection of people with a social link to the viewer such that the viewer's interests may be particularly well-correlated with the collective interest of the network members. Such a network may be user-defined or defined automatically by a common characteristic between users (e.g., alumni relationships). In scenario 410, weighted emotional response profile 412 is used with preselected threshold 406 to identify first portion 404. Aggregated emotional response profile 314 is shown in dotted line for reference purposes only. Selecting the first portion based on the requesting viewer's social network may provide the requesting viewer with portions of the video item that are interesting and relevant to the requesting viewer's close social connections. This may enhance the degree of personalization of the first portion selected for the requesting viewer.
In scenario 420, aggregated viewer emotional response profile 314 is weighted by viewers in a demographic group to which the requesting viewer belongs. Thus, selection of the first portion of the video item is based on using a subset of the aggregated emotional response profiles corresponding to viewers belonging to the requesting viewer's demographic group. It will be appreciated that a demographic group may be defined based upon any suitable characteristics that may lead to potentially more highly correlated interests between group members than between all users. Weighted emotional response profile 422 is then used with preselected threshold 406 to identify first portion 404. Aggregated emotional response profile 314 is shown in dotted line for reference purposes only. Selecting the first portion based on the requesting viewer's demographic group may help the requesting viewer discover portions of the video item that are interesting people with similar tastes and interests as the requesting viewer's.
It will be appreciated that further personalization may be realized by using viewer-provided filters, such as search terms and/or viewer-defined viewing interests. For example, in some embodiments, selection of the first portions may also be based on the requesting viewer's viewing interest profile 308. In some embodiments, selection may be further based on a requesting-viewer supplied search term and/or filter condition, as shown at 430 in
In yet other embodiments, selection of the first portion of the video item may be based on a subset of the emotional response profiles selected by the viewer. For example, the viewer may opt to receive selected portions of video items and other content (such as the highlight lists, viewer reaction videos, and reaction highlight lists described below) that are based solely on the emotional response profiles of the viewer's social network. By filtering the emotional response profiles this way, instead of on a weighted or unweighted aggregated emotional response profile, relative level of personalization in the user experience may be enhanced.
Turning back to
Optionally, 222 may include, at 224, generating a viewer reaction video clip comprising a particular viewer's emotional, physical, and/or behavioral response to the video content item, as expressed by a human affect display recorded by a video viewing environment sensor. Such viewer reaction clips, at the option of the recorded viewer, may be stored with and/or presented concurrently with a related portion of the video item, so that a requesting viewer may view the video item and the emotional reaction of the recorded viewer to the video item. Thus, a requesting viewer searching for emotionally stimulating portions of a sporting event may also see clips of other viewer's reaction clips to that event. In some embodiments, the viewer reaction clips may be selected from viewers in the requesting viewer's social network and/or demographic group, which may further personalize the affinity that the requesting viewer experiences for the other viewer's reaction as shown in the viewer reaction clip.
In some embodiments, 222 may also include, at 226, generating a viewer reaction highlight clip list comprising video clips capturing reactions of each of one or more viewers to a plurality of portions of the video content item selected via the emotional response profiles. Such viewer reaction highlight clips lists may be generated by reference to the emotional reactions of other viewers to those clips in much the same was as interesting portions of the video item are selected, so that a requesting viewer may directly search for such viewer reaction clips and/or see popular and/or emotionally stimulating (as perceived by other viewers who viewed the viewer reaction clips) viewer reaction clips at a glance.
While the description of
At 234, method 200 includes receiving a request for the first portion of the requested video item. Receiving the request at 234 may include receiving a request for a first portion of a single requested video item and/or receiving for a plurality of portions selected from respective requested video items.
In some embodiments, the request for the requested video item(s) may include a search term and/or a filter condition provided by the requesting viewer. In such embodiments, the search term and/or filter condition may allow the requesting viewer sort through a list of first portions of respective video items according to criteria (such as viewing preferences) provided in the search term and/or filter condition.
Responsive to the request received at 234, method 200 includes, at 236, sending the first portion of the video content item to the requesting computing device in without sending the second portion of the video content item. For example, each of the scenarios depicted in
In some embodiments where more than one respective first portions of video items were requested, 236 may include sending the respective first portions as a single video composition. Further, in some embodiments, 236 may include, at 238, sending the viewer reaction video clip. At 240, the portion (or portions) of the video item(s) sent are output for display.
As introduced above, in some embodiments, the methods and processes described in this disclosure may be tied to a computing system including one or more computers. In particular, the methods and processes described herein may be implemented as a computer application, computer service, computer API, computer library, and/or other computer program product.
The computing system includes a logic subsystem (for example, logic subsystem 116 of mobile computing device 104 of
The logic subsystem may include one or more physical devices configured to execute one or more instructions. For example, the logic subsystem may be configured to execute one or more instructions that are part of one or more applications, services, programs, routines, libraries, objects, components, data structures, or other logical constructs. Such instructions may be implemented to perform a task, implement a data type, transform the state of one or more devices, or otherwise arrive at a desired result.
The logic subsystem may include one or more processors that are configured to execute software instructions. Additionally or alternatively, the logic subsystem may include one or more hardware or firmware logic machines configured to execute hardware or firmware instructions. Processors of the logic subsystem may be single core or multicore, and the programs executed thereon may be configured for parallel or distributed processing. The logic subsystem may optionally include individual components that are distributed throughout two or more devices, which may be remotely located and/or configured for coordinated processing. One or more aspects of the logic subsystem may be virtualized and executed by remotely accessible networked computing devices configured in a cloud computing configuration.
The data-holding subsystem may include one or more physical, non-transitory, devices configured to hold data and/or instructions executable by the logic subsystem to implement the herein described methods and processes. When such methods and processes are implemented, the state of the data-holding subsystem may be transformed (e.g., to hold different data).
The data-holding subsystem may include removable media and/or built-in devices. The data-holding subsystem may include optical memory devices (e.g., CD, DVD, HD-DVD, Blu-Ray Disc, etc.), semiconductor memory devices (e.g., RAM, EPROM, EEPROM, etc.) and/or magnetic memory devices (e.g., hard disk drive, floppy disk drive, tape drive, MRAM, etc.), among others. The data-holding subsystem may include devices with one or more of the following characteristics: volatile, nonvolatile, dynamic, static, read/write, read-only, random access, sequential access, location addressable, file addressable, and content addressable. In some embodiments, the logic subsystem and the data-holding subsystem may be integrated into one or more common devices, such as an application specific integrated circuit or a system on a chip.
It is to be appreciated that the data-holding subsystem includes one or more physical, non-transitory devices. In contrast, in some embodiments aspects of the instructions described herein may be propagated in a transitory fashion by a pure signal (e.g., an electromagnetic signal, an optical signal, etc.) that is not held by a physical device for at least a finite duration. Furthermore, data and/or other forms of information pertaining to the present disclosure may be propagated by a pure signal.
The terms “module,” “program,” and “engine” may be used to describe an aspect of the computing system that is implemented to perform one or more particular functions. In some cases, such a module, program, or engine may be instantiated via the logic subsystem executing instructions held by the data-holding subsystem. It is to be understood that different modules, programs, and/or engines may be instantiated from the same application, service, code block, object, library, routine, API, function, etc. Likewise, the same module, program, and/or engine may be instantiated by different applications, services, code blocks, objects, routines, APIs, functions, etc. The terms “module,” “program,” and “engine” are meant to encompass individual or groups of executable files, data files, libraries, drivers, scripts, database records, etc.
It is to be appreciated that a “service”, as used herein, may be an application program executable across multiple user sessions and available to one or more system components, programs, and/or other services. In some implementations, a service may run on a server responsive to a request from a client.
When included, a display subsystem may be used to present a visual representation of data held by the data-holding subsystem. As the herein described methods and processes change the data held by the data-holding subsystem, and thus transform the state of the data-holding subsystem, the state of display subsystem may likewise be transformed to visually represent changes in the underlying data. The display subsystem may include one or more display devices utilizing virtually any type of technology. Such display devices may be combined with the logic subsystem and/or the data-holding subsystem in a shared enclosure, or such display devices may be peripheral display devices.
It is to be understood that the configurations and/or approaches described herein are exemplary in nature, and that these specific embodiments or examples are not to be considered in a limiting sense, because numerous variations are possible. The specific routines or methods described herein may represent one or more of any number of processing strategies. As such, various acts illustrated may be performed in the sequence illustrated, in other sequences, in parallel, or in some cases omitted. Likewise, the order of the above-described processes may be changed.
The subject matter of the present disclosure includes all novel and nonobvious combinations and subcombinations of the various processes, systems and configurations, and other features, functions, acts, and/or properties disclosed herein, as well as any and all equivalents thereof.