AUDIO/VIDEO SYSTEM WITH VIEWER-STATE BASED RECOMMENDATIONS AND METHODS FOR USE THEREWITH

TECHNICAL FIELD

The present disclosure relates to audio/video systems that process and present audio and/or display video signals.

DESCRIPTION OF RELATED ART

Modern users have many options to view audio/video programming. Home media systems can include a television, a home theater audio system, a set top box and digital audio and/or A/V player. The user typically is provided one or more remote control devices that respond to direct user interactions such as buttons, keys or a touch screen to control the functions and features of the device. Audio/video content is also available via a personal computer, smartphone or other device. Such devices are typically controlled via a buttons, keys, a mouse or other pointing device or a touch screen.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

FIGS. 1-4 present pictorial diagram representations of various video devices in accordance with embodiments of the present disclosure.

FIG. 5 presents a block diagram representation of a system in accordance with an embodiment of the present disclosure.

FIG. 6 presents a pictorial representation of a screen display in accordance with an embodiment of the present disclosure.

FIG. 7 presents a pictorial representation of a screen display in accordance with an embodiment of the present disclosure.

FIG. 8 presents a block diagram representation of a user interest processor in accordance with an embodiment of the present disclosure.

FIG. 9 presents a pictorial representation of a presentation area in accordance with an embodiment of the present disclosure.

FIG. 10 presents a pictorial representation of a video image in accordance with an embodiment of the present disclosure.

FIG. 11 presents a graphical diagram representation of interest data in accordance with an embodiment of the present invention.

FIGS. 12 and 13 present pictorial diagram representations of components of a video system in accordance with embodiments of the present invention.

FIGS. 14 and 15 present pictorial diagram representations of video systems in accordance with embodiments of the present invention.

FIG. 16 presents a flowchart representation of a method in accordance with an embodiment of the present disclosure.

FIG. 17 presents a flowchart representation of a method in accordance with an embodiment of the present disclosure.

DETAILED DESCRIPTION

FIGS. 1-4 present pictorial diagram representations of various video devices in accordance with embodiments of the present disclosure. In particular, device 10 represents a set top box with or without built-in digital video recorder functionality or a stand-alone digital video player such as an internet video player, Blu-ray player, digital video disc (DVD) player or other video player. Device 20 represents a tablet computer, smartphone, phablet or other communications device. Device 30 represents a laptop, netbook or other portable computer. Device 40 represents a video display device such as a television or monitor. Device 50 represents an audio player such as a compact disc (CD) player, a MP3 player or other audio player.

The devices 10, 20, 30, 40 and 50 each represent examples of electronic devices that incorporate one or more elements of a system 125 that includes features or functions of the present disclosure. While these particular devices are illustrated, system 125 includes any device or combination of devices that is capable of performing one or more of the functions and features described in conjunction with FIGS. 5-16 and the appended claims.

FIG. 5 presents a block diagram representation of a system in accordance with an embodiment of the present disclosure. In an embodiment, system 125 includes a face 100, such as a television receiver, cable television receiver, satellite broadcast receiver, broadband modem, a Multimedia over Coax Alliance (MoCA) interface, Ethernet interface, local area network transceiver, Bluetooth, 3G or 4G transceiver and/or other information receiver or transceiver or network interface that is capable of receiving a received signal 98 and extracting one or more audio/video signals 110. In some implementations in the field, the audio video signals 110 is typically encrypted or copy protected (known collectively as Digital Rights Management or DRM). The decryption module and buffer 101 decrypts, descrambles or otherwise addresses the DRM so the rest of the system can process the unencrypted data available via clear A/V signal 111.

In addition to receiving video signal 98, the network interface 100 can provide an Internet connection, local area network connection or other wired or wireless connection to a remote recommendations database 94, as well as one or more portable device 103 such as tablets, smart phones, lap top computers or other portable devices. While shown as a single device, network interface 100 can be implemented by two or more separate devices, for example, to receive the received signal 98 via one network and to communicate with portable devices 103 and recommendations database 94 via one or more other networks.

The received signal 98 can be a broadcast video signal, such as a television signal, high definition television signal, enhanced definition television signal or other broadcast video signal that has been transmitted over a wireless medium, either directly or through one or more satellites or other relay stations or through a cable network, optical network or other transmission network. In addition, received signal 98 can be generated from a stored video file, played back from a recording medium such as a magnetic tape, magnetic disk or optical disk, and can include a streaming video signal that is transmitted over a public or private network such as a local area network, wide area network, metropolitan area network or the Internet.

Received signal 98 can include a compressed digital video signal complying with a digital video codec standard such as H.264, MPEG-4 Part 10 Advanced Video Coding (AVC), VC-1, H.265, or another digital format such as a Motion Picture Experts Group (MPEG) format (such as MPEG1, MPEG2 or MPEG4), QuickTime format, Real Media format, Windows Media Video (WMV) or Audio Video Interleave (AVI), etc. When the received signal 98 includes a compressed digital video signal, a decoding module 102 or other video codec decompresses the clear AV signal 111 from decryption module and buffer 101 to produce a decoded audio/video signal 112 suitable for display by a video display device of audio/video player 104 that creates an optical image stream either directly or indirectly, such as by projection. In the case where the A/V signal is not protected, the decryption module and buffer 101 can act as a pass through buffer of the signals as decryption/descrambling is not needed in this case.

In addition or in the alternative embodiment, the received signal 98 can include an audio component of a video signal, a broadcast audio signal, such as a radio signal, high definition radio signal or other audio signal that has been transmitted over a wireless medium, either directly or through one or more satellites or other relay stations or through a cable network, optical network or other transmission network. In addition, received signal 98 can be an audio component of a stored video file or streamed video signal, an MPEG3 (MP3) or other digital audio signal generated from a stored audio file, played back from a recording medium such as a magnetic tape, magnetic disk or optical disk, and can include a streaming audio signal that is transmitted over a public or private network such as a local area network, wide area network, metropolitan area network or the Internet.

When the received signal 98 includes a compressed digital audio signal, the decoding module 102 can decompress the clear A/V signal 111 from decryption module and buffer 101 and otherwise process the clear A/V signal 111 to produce a decoded audio signal suitable for presentation by an audio player included in audio/video player 104 and further to extract time-coded metadata 114 that indicates the content of the video program at various times. The decoded audio/video signal 112 can include a high definition media interface (HDMI) signal, digital video interface (DVI) signal, a composite video signal, a component video signal, an S-video signal, and/or one or more analog or digital audio signals.

When clear A/V signal 111 is received as digital video and the decoded video signal 112 is produced in a digital video format, the digital video signal may be optionally scrambled or encrypted, may include corresponding audio and may be formatted for transport via one or more container formats. Examples of such container formats are encrypted Internet Protocol (IP) packets such as used in IP TV, Digital Transmission Content Protection (DTCP), etc. In this case the payload of IP packets contain several transport stream (TS) packets and the entire payload of the IP packet is encrypted. Other examples of container formats include encrypted TS streams used in Satellite/Cable Broadcast, etc. In these cases, the payload of TS packets contain packetized elementary stream (PES) packets. Further, digital video discs (DVDs) and Blu-Ray Discs (BDs) utilize PES streams where the payload of each PES packet is encrypted. When the received signal 98 is scrambled or encrypted, the decoding module 102 will always be working on the clear A/V signal 111 from decryption module and buffer 101 to produce the decoded audio/video signal 112.

In an embodiment, the decoding module 102 not only decodes the clear A/V signal 111 but also includes a pattern recognition module to detect patterns of interest in the video signal and to generate time-coded metadata 114 that indicates patterns and corresponding features, such as people, objects, places, activities or other features as well as timing information that correlates the presence or absence of these people, objects, places, activities or other features in particular images in the decoded A/V signal 112. Examples of such a decoding module 102 is presented in conjunction with the U.S. Published Application 2013/0279603, entitled, VIDEO PROCESSING SYSTEM WITH VIDEO TO TEXT DESCRIPTION GENERATION, SEARCH SYSTEM AND METHODS FOR USE THEREWITH, the contents of which are incorporated herein by reference for any and all purposes. In addition or in the alternative, the decoding module 102 extracts time coded metadata 114 that was already included in the clear A/V signal 111. For example, the clear A/V signal 111 can have the time coded metadata 114 embedded as a watermark or other signal in the video content itself, or be in some different format that includes the video content from the received signal 98 and the time-coded metadata 114 as described in U.S. Pat. No. 8,842,879, entitled, “VIDEO PROCESSING DEVICE FOR EMBEDDING TIME-CODED METADATA AND METHODS FOR USE THEREWITH,” the contents of which are incorporated herein by reference for any and all purposes.

The system 125 includes a user interest processor 120 for use with the audio/video (A/V) player 104 that is playing a video program included in the decoded A/V signal 112. In particular, the user interest processor 120 includes a user interest analysis (UIA) generator 124 and a viewer state generator 128 that are both configured to analyze input data corresponding to the viewing of the video program via the A/V player 104 by one or more viewers. The input data can include sensor data 108 generated by one or more viewer sensors 106, A/V commands and/or other A/V control data 122 from the A/V player 104, input data received from one or more portable devices 103. The UIA generator 124 analyzes the input data to determine a period of interest corresponding to a viewer, to viewers collectively or to viewers individually and generates viewer interest data that indicates these periods of viewer interest. The viewer state generator 128 analyzes the input data to determine a viewer state of the at least one viewer corresponding to the plurality of video programs and to generate viewer state data that indicates the viewer state corresponding to the plurality of video programs.

Currently, Netflix and TiVo and others create customized recommendations for programming such as movies. They look at what content were watched by a user and generic movie info like genre, sports teams, actors, etc. to similar videos. The problem is, they don't necessarily know which viewers are watching, or know if any particular viewer liked a movie and they don't really know what portions of the movie that the viewer or viewers liked.

In an optional addition to the system, the portable device 103 with remote control or user enhancement (UE) application 130 is a special case of portable device(s) 103 such as a smartphone that is typically not shared with users and can be used as a unique way to identify the particular user/viewer to the system. A remote control application or user enhancement application 130 can interface with the system 125 via network interface 100, such as via Bluetooth, WiFi, Infrared or other wireless link. Besides operating as a remote control to control the operation of the system 125, it also can also store user preferences, including playlists, background desktop and other configurable user settings that would improve the navigation or visual experience to each individual user. The portable device 103 in this case then can also be used to uniquely identify each viewer and can improve the accuracy and effectiveness of user interest processor 120 in building a useful database for each unique user.

The user interest analysis generator 124 can be used to identify precise content features that are of interest to a particular viewer that can be used to generate the customized recommendations via recommendation selection generator 126. The recommendation selection generator 126 of the user interest processor 120 is configured to process the viewer interest data and time coded metadata 114 corresponding to the video program, and automatically generates recommendation data indicating at least one additional video program related to content of the video program during the period of interest. Because actual interest is monitored and correlated to particular content being displayed at that time, a wider range of features can be extracted and used to generate recommendations. Obscure actors of interest, fleeting scenes relating to a particular setting or a particular activity can be used to locate recommendations that are more focused on these features of interest that occur at particular times in a video.

In addition to monitoring viewer's interest in content during a video, the viewer state generator 128 generates viewer state data that indicates the viewer state of the viewers, such as the activities and emotional states of viewers during a video. For example, the activities and experiences surrounding the viewing of a video program such as eating snack foods, drinking beer or wine, texting, etc. can be identified and monitored. Viewer state information can be generated that indicates that the video program caused Mom to be sad, Dad to be excited and/or the kids to be bored. The system can consider what a user is doing on other devices, like looking up an actor on IMDB or searching the internet relating to Paris that may indicate an activity during the video such texting, web surfing. This viewer state information can also be used, in addition to or as an alternative to viewer interest in generating recommendations. In particular, the recommendation selection generator 126 is configured to process the viewer state data, and desired viewer state data than indicates a desired viewer state to generate recommendation data indicating at least one additional video program.

In an embodiment, the viewer specifies desired viewer state data either as preferences stored in a viewer profile or otherwise interactively prior to receiving recommendations. The desired viewer state data can indicate a desired emotion state such as a desired level of happiness/sadness, a desired level of excitement, a desired level of fear, etc. The desired viewer state data can also, or in the alternative, indicate a desired activity associated with the viewing such as eating, drinking, Internet activities, texting, dozing, etc. The recommendation selection generator 126 then generates the recommendation data based on a comparison of the desired viewer state data to the viewer state data corresponding to a plurality of previously viewed video programs. In this fashion, activities and/or emotional states of the viewer can be correlated with viewer interest and used to generate video program recommendations that attempt to recreate the entire experience in other video programs. In one example of operation, the viewer can specify desired viewer state data that indicates that he/she does not want too much fear or too much excitement—because the viewer plans to go to bed afterward and too much excitement could interfere with a desire for a full night's sleep. In another example, the viewer is hosting a party and wishes for recommendations that fit well with a party setting. The desired viewer state data can be received as A/V control data 122 via the viewer's interaction with A/V player, from the portable device(s) 103 and/or can be stored in a memory associated with the user interest processor 120.

In an embodiment, the recommendation selection generator 126 compares the desired viewer state data to the viewer state data corresponding to a plurality of video programs previously watched by the viewer to determine a subset of the plurality of video programs that match the desired viewer state data. In accordance with the previous example where the desired viewer state data indicates a desire for not too much fear or too much excitement, the recommendation selection generator 126 can identify a subset of previously viewed videos that did not generate too much fear or too much excitement in the viewer. The recommendation selection generator 126 can then generate recommendation data by searching a recommendation database, such as recommendations database 94 to determine one or more video programs to recommend that are correlated or similar to one or more of the subset of the plurality of video programs with corresponding viewer state data that match the desired viewer state data. While this search can be based only on general properties of the subset of previously watched video programs, in other embodiments, determining a correlation or similarity between the subset of previously viewed video programs and other video programs can be aided by also considering the interest data associated with this subset of previously viewed video programs. In this case, the recommendation selection generator 126 can generate the recommendation data by searching a recommendation database to determine that the at least one additional video program is correlated to time coded metadata indicating content during the period of interest corresponding to the at least one viewer in the subset of the plurality of video programs.

The operation of the user interest processor 120 can be described in conjunction with a further example where the desired viewer state data indicates a desire for the viewer to doze during the video program. The recommendation selection generator 126 can first generate a subset of previously watched videos where the viewer was engaged in the activity of dozing. Time coded metadata indicating content during a period of interest of the viewer while watching this subset of previously viewed videos is used to determine genres, actors, settings, situations, etc. in these videos that can aid in the search for additional video programs that might also be of interest to the viewer, but nevertheless recreate this same experience for the viewer.

The recommendation data can be presented for display to the viewer by a display device, such as the display device 105 associated with the A/V player 104. For example, the display device 105 can concurrently display at least a portion of the video program in conjunction with the recommendations data in a split screen mode, as a graphical or other media overlay or in other combinations during or after the presentation of the video program. In addition or in the alternative, the select portions of the recommendations data can be displayed on a display device associated with one or more portable devices 103 associated with the viewer or viewers—separately from the A/V player 104. Consider an example where the system 125 is implemented via a set top box and television with an associated cable connection. In addition, the network interface 100 of the system 125 further includes a cable modem with MoCA and WiFi capability that can communicate with the set top box via WiFi or MoCA, with the portable devices 103 via WiFi either directly or via a MoCA bridge device. In this fashion, a family viewing a video program on the television associated with the set top box can view the recommendations data via the portable devices 103 that are held by the family members.

In an embodiment, the user interest processor 120 operates based on input data that includes image data in a presentation area of the A/V player 104. For example, a viewer sensor 106 generates sensor data 108 in a presentation area of the A/V player 104. The viewer sensor 106 can include a digital camera such as a still or video camera that is either a stand-alone device, or is incorporated in any one of the devices 10, 20, 30, 40 or 50 or other device that generates sensor data 108 in the form of image data. In addition or in the alternative, the viewer sensor 106 can include an infrared sensor, thermal imager, background temperature sensor or other thermal sensor, an ultrasonic sensor or other sonar-based sensor, a proximity sensor, an audio sensor such as a microphone, a motion sensor, brightness sensor, wind speed sensor, humidity sensor, one or more biometric sensors and/or other sensors for generating sensor data 108 that can be used by the user interest analysis generator 124 for determining the presence of viewers, for identifying particular viewers, for characterizing their activities, their emotional states and/or for determining that one or more viewers are currently interested in the content of the video program and for generating viewer interest data in response thereto.

Consider again an example where a family is watching TV. One or more video cameras are stand-alone devices or are built into the TV, a set top, Blu-Ray player, or mobile devices associated with the users. The camera or cameras capture video of the presentation environment and users. The system 125 processes the video and detects if there are viewers present, how many viewers are present, the identities of each of the viewers and further the activities engaged in by each of the viewers and/or to determine the period of interest by each of the viewers and other emotional states. In particular, the system 125 determines which users are watching closely and are interested in or excited by, scared by, happy about what is being shown, from what angles they are watching, which users are not watching closely or engaged in a conversation, which users are not watching at all, and which users are asleep, texting, surfing the Internet, engaged with social media, etc.

In an embodiment, the viewer state generator 128 generates viewer state data and the user interest analysis generator 124 determines a period of interest corresponding to one or more viewers based on facial modeling and recognition that the at least one viewer has a facial expression corresponding to interest or some other emotional state. In addition, the input data can include audio data from a viewer sensor 106 in the form of a microphone included in a presentation area of the A/V player 104. The user interest analysis generator 124 can determine a period of interest or other emotional state corresponding to the at least one viewer based on recognition that utterances by the at least one viewer correspond to interest, fear, excitement, sadness, happiness, etc. An excited voice from a user can indicate interest, while a side conversation unrelated the video content can indicate a lack of interest and snoring can indicate that a viewer is dozing.

In another embodiment, the input data can include A/V control data 122 that includes commands from the A/V player 104 such as a pause command or a specific user interest command that is generated in response to commands issued by a user via a user interface of the A/V player 104. The user interest analysis generator 124 can determine a period of interest based on pausing of the video, and/or in response to a specific user indication of interest via another command. For example, when a viewer is interested in an actor/actress playing in a video and pauses the video, input data in the form of A/V control data 122 is presented to the user interest processor 120, the user interest analysis generator 124 detects the pause command and indicates a period of interest. The recommendation selection generator 126 analyzes the time coded metadata 114 to determine the actors or actresses, scenes, places, situations, etc. that are currently shown in the paused scene of the video program. As previously discussed the time coded metadata 114 can be generated by the decoding module 102 operating to automatically recognize the actor/actress in the video program at this point or based on other time coded metadata 114 extracted from the decoded A/V data. The recommendation selection generator 126 can then generate video program recommendations pertaining to the actor/actress in the video program at this point in the video such as his/her other films. This recommendation data can be passed to the A/V player 104 as A/V control data for display on the display device 105 during or after the video program and/or passed to one or more portable devices 103 via network interface 100 during the presentation of the video program or after the conclusion of the video program.

In another embodiment, the input data includes sensor data 108 from at least one biometric sensor associated with the viewer or viewers. The viewer state generator 128 generates viewer state data and the user interest analysis generator 124 determines a period of interest or excitement corresponding to the viewer or viewers based on recognition that the sensor data 108 indicates interest of the viewer or viewers or other emotional state. Such biometric sensor data 108 in response to, or that otherwise indicates, the interest or excitement of the user—in particular, the user's interest associated with the display of the video program by the A/V player 104.

In an embodiment, the viewer state generator 128 generates viewer state data and the user interest analysis generator 124 generates viewer interest data that indicates the periods of interest either on an individual viewer basis or collectively based on interest by any, all or a majority of viewers that are present. In circumstances where the recommendation data is passed to one or more portable devices 103 via network interface 100, individual interest on the part of a single user can trigger the recommendation data to be sent to only the viewer or viewers that are showing interest at the time.

In an embodiment, the viewer sensors 106 can include an optical sensor, resistive touch sensor, capacitive touch sensor or other sensor that monitors the heart rate and/or level of perspiration of the user. In these embodiments, a high level of interest or other emotional state can be determined by the viewer state generator 128 or the user interest analysis generator 124 based on a sudden increase in heart rate or perspiration.

In an embodiment, the viewer sensors 106 can include a microphone that captures the voice of the user and/or voices or others in the surrounding area. In these cases the voice of the user can be analyzed by the user interest analysis generator 124 and/or by the viewer state generator 128 based on speech patterns such as pitch, cadence or other factors and/or cheers, applause or other sounds can be analyzed to detect a high level of interest or other emotional state of the user or others.

In an embodiment, the viewer sensors 106 can include an imaging sensor or other sensor that generates a biometric signal that indicates a dilation of an eye of the user and/or a wideness of opening of an eye of the user. In these cases, a high level of user interest or other emotional state can be determined by the user interest analysis generator 124 and the viewer state generator 128 based on a sudden dilation of the user's eyes and/or based on a sudden widening of the eyes. It should be noted that multiple viewer sensors 106 can be implemented and the user interest analysis generator 124 can generate interest data and/or the viewer state generator 128 can generate viewer state data based on an analysis of the sensor data 108 from each of multiple viewer sensors 106. In this fashion, periods of time corresponding to high levels of interest and other emotional states and activities can be more accurately determined based on multiple different criteria.

Consider an example where a family is watching a video program. A sudden increase in heart rate, perspiration, eye wideness, pupil dilation, smile, changes in voice and spontaneous cheers, may together or separately indicate that one or more particular viewers have suddenly become interested, and highly excited. This period of excitement can be used to characterise the viewer's experience with the video program and further used to select portions of time coded metadata associated with the particular actors, places, events or situations, and/or objects and be used to generate recommendations data that other video programs that relate to these particular actors, places, events or situations, and/or objects. The recommendation data can be presented for display to all the viewers via the display device 105 or only to the particular viewer or viewers showing interest via portable device(s) 103 associated with these viewer(s).

It should be noted that while the sensor data 108 has been primarily described as coming from standalone sensors 106, sensors in a portable device or devices 103 in communication with network interface 100 and associated with one or more viewers can also be used to generate any of the input data previously described and further used to associate periods of viewer interest and generate viewer state data for particular viewers. Other input data can be generated by portable devices 103 for use by user interest analysis generator 124 and the viewer state generator 128. Consider a case where the portable device 103 includes an application or app such as a social media application, a browser application, or a media database application, that is downloaded to the portable device 103 and executed by the user/viewer. Input data can be generated by one or more of these apps to indicate user/viewer interest and other activities such as texting, engaging in social media, Internet browsing, etc. In particular, interest in a video can inspire someone to use a portable device and go looking for related topics on the Internet. The portable device may not be directly linked to the video and this may be interpreted as either interest or disinterest depending on the content of the information being accessed. For example, if a viewer is watching a movie and searching for an actor in a media database application such as IMDB or via a web browser, this portable device input data can be used by the user interest analysis generator 124 to indicate a period of interest and time coded metadata corresponding to the actor can be selected for display. In a similar fashion, a viewer that is generating a Facebook post or Twitter tweet regarding a particular actor can be used in determining a period of interest for that particular user/viewer. In the alternative, accessing unrelated information on the Internet, playing an unrelated game or engaging in other unrelated activities can generate portable device input data that can be used by the user interest analysis generator 124 to indicate a period of disinterest. In addition to receiving portable device input data from the device itself, in an embodiment other methods to monitor browsing traffic or other input data can be employed such as monitoring activity and receiving portable device input data through a home gateway, a remote server or other device.

In an embodiment, the user interest analysis generator 124 and the viewer state generator 128 can operate to identify the particular user based on input data such as: (1) User mobile device WiFi or other unique identifiers; (2) pattern or voice or face recognition of the user; (3) fingerprint recognition on any remote input device such as a remote control; (4) explicit choice by user on self-identification and/or (4) input from the remote control or user enhancement application 130. The user interest analysis generator 124 and the viewer state generator 128 can extract interest information simultaneously for multiple different viewers, e.g. Dad is interested in and excited by the action scenes, Mom liked the romance but was sad at the end of the video, the daughter was excited and really liked the actor that played the boy next door. In one mode of operation, the recommendation selection generator 126 is a self learning system, so for example, it can ship with a default set of rules based on geographical location derived from GPS or any location services available. The default system settings can include a default set of interesting topics associated with that geographic region. The system can, over time, collect profile data for each unique user in the home by picking up unique users as described previously and storing interest data regarding their interests and viewer state data for each video program that was viewed. In this fashion, the profile data for a particular viewer could start with all the sports teams available in the region and general user demographic data such as home renovation or gardening if the particular neighborhood has some likelihood of interest. Further any known information of the household obtained from social media could be made available for review by the key owners of the household and also used to feed the recommendation selection generator 126 for the system in setting up rules for each individual viewer. A socially active cyclist for example can be expected to get many cycling recommendations to start with. Further, with each user selection, over time, the system will learn what each individual user chooses to watch or repeat or skip over and what generates interest. The recommendation selection generator 126 builds up this user's likes and dislikes and viewer states associated with programs watched to update the profiles used by the recommendation selection generator 126 to match the history of choices, reactions and states associated with each user.

The decoding module 102, decryption module and buffer 101, A/V player 104 and the user interest processor 120 can each be implemented using a single processing device or a plurality of processing devices. Such a processing device may be a microprocessor, co-processors, a micro-controller, digital signal processor, microcomputer, central processing unit, field programmable gate array, programmable logic device, state machine, logic circuitry, analog circuitry, digital circuitry, and/or any device that manipulates signals (analog and/or digital) based on operational instructions that are stored in a memory. These memories may each be a single memory device or a plurality of memory devices. Such a memory device can include a hard disk drive or other disk drive, read-only memory, random access memory, volatile memory, non-volatile memory, static memory, dynamic memory, flash memory, cache memory, and/or any device that stores digital information. Note that when decoding module 102, decryption module and buffer 101, A/V player 104 and the user interest processor 120 implement one or more of their functions via a state machine, analog circuitry, digital circuitry, and/or logic circuitry, the memory storing the corresponding operational instructions may be embedded within, or external to, the circuitry comprising the state machine, analog circuitry, digital circuitry, and/or logic circuitry.

While the user interest analysis generator 124, the recommendation selection generator 126 and viewer state generator 128 are shown as separate portions of the user interest processor 120, these three modules can be implemented together or separately. While the recommendations database 94 is shown separately from the system 125, the recommendations database 94 can be incorporated in the user interest processor 120. While system 125 is shown as an integrated system, it should be noted that the system 125 can be implemented as a single device or as a plurality of individual components that communicate with one another wirelessly and/or via one or more wired connections. The further operation of video system 125, including illustrative examples and several optional functions and features is described in greater detail in conjunction with FIGS. 6-17 that follow.

FIG. 6 presents a pictorial representation of a screen display in accordance with an embodiment of the present disclosure. In particular, a screen display generated in conjunction with a system, such as system 125, is described in conjunction with functions and features of FIG. 5 that are referred to by common reference numerals.

In this example, a screen display 140 of a user interface of A/V player 104 is shown where a viewer has identified himself as the user “Dad” and has selected desired viewer state data that is passed to the user interest processor 120 as A/V control data 122. The recommendation selection generator 126 then selects and/or retrieves information pertaining to films to recommend. This recommendation data is passed to the A/V player 104 as A/V control data 122 on the display device 105 in region 144.

FIG. 7 presents a pictorial representation of a screen display in accordance with an embodiment of the present disclosure. In particular, a screen display generated in conjunction with a system, such as system 125, is described in conjunction with functions and features of FIG. 5 that are referred to by common reference numerals.

In this example, on the display device of portable device 103 is shown where a viewer has identified himself as the user “Dad” and has selected desired viewer state data that is passed to the user interest processor 120 via network interface 100. The recommendation selection generator 126 then selects and/or retrieves information pertaining to films to recommend. This recommendation data can be passed to network interface 100 for display on the display device of portable device 103, such as the tablet 14 shown.

FIG. 8 presents a block diagram representation of a user interest processor in accordance with an embodiment of the present disclosure. In particular, a block diagram is presented in conjunction with a system, such as system 125, that is described in conjunction with functions and features of FIG. 5 referred to by common reference numerals.

In particular, the user interest processor 120 includes a user interest analysis (UIA) generator 124 and a viewer state generator 128 that are both configured to analyze input data 99 corresponding to the viewing of the video program via the A/V player 104 by one or more viewers. The input data can be sensor data 108 generated by one or more viewer sensors 106, A/V commands and/or other A/V control data 122 from the A/V player 104, portable device input data 121 received from one or more portable devices 103. The UIA generator 124 analyzes the input data 99 to determine a period of interest corresponding to a viewer, to viewers collectively or to viewers individually and generates viewer interest data 75 that indicates these periods of viewer interest. The viewer state generator 128 analyzes the input data 99 to determine a viewer state of the at least one viewer corresponding to the plurality of video programs and to generate viewer state data 85 that indicates the viewer state corresponding to the plurality of video programs. The recommendation selection generator 126 is configured to process the viewer state data 85, and desired viewer state data received via either A/V control data 122 or portable device input 121 than indicates a desired viewer state to generate recommendation data indicating at least one additional video program to recommend.

This recommendation data is output as A/V control data 122 for display to the viewer by a display device, such as the display device 105 associated with the A/V player 104. In addition or in the alternative, the recommendation data can be output as secondary device output 123 for display on a display device associated with one or more portable devices 103 associated with the viewer or viewers—separately from the A/V player 104.

In an embodiment, the recommendation selection generator 126 implements a clustering algorithm, a heuristic prediction engine and/or artificial intelligence engine that operates in conjunction with a recommendations database 94 and optionally profile data collected that pertains to one or more viewers of the video program. As previously discussed, the recommendations can be tailored to the particular viewers based on viewer interest data 75 and/or viewer state data 85 associated with each viewer as well as profile data gathered and stored for each viewer.

FIG. 9 presents a pictorial representation of a presentation area in accordance with an embodiment of the present disclosure. In particular, the use of an example system 125 presented in conjunction with FIG. 5 is shown.

In this example, a viewer sensor 106 generates sensor data 108 in a presentation area 220 of the A/V player 104. The A/V player 104 includes a flat screen television 200 and speakers 210 and 212. The viewer sensor 106 can include a digital camera such as a still or video camera that is either a stand-alone device, or is incorporated in the flat screen television 200 and that generates sensor data 108 that includes image data. The viewer state generator 128 and the user interest analysis generator 124 analyze the sensor data 108 to detect and recognize the users 204 and 206 of the A/V player 104 and their level of interest, other emotional states and/or activities engaged in while the current video content being displayed.

FIG. 10 presents a pictorial representation of a video image in accordance with an embodiment of the present disclosure. In particular, a screen display 230 generated in conjunction with a system, such as system 125, is described in conjunction with functions and features of FIG. 5 that are referred to by common reference numerals.

In an embodiment, the viewer state generator 128 and the user interest analysis generator 124 generate viewer state data 85 and viewer interest data 75 corresponding to one or more viewers based on facial modelling and recognition of a viewer. In an embodiment, the viewer state generator 128 and the user interest analysis generator 124 analyze the sensor data 108 to determine a number of users that are present, the locations of the users, the viewing angle for each of the users and further viewer interest and emotion states while the audio or video content is being presented or otherwise displayed. These factors can be determined via a look-up table, state machine, algorithm or other logic.

In one mode of operation, the viewer state generator 128 and the user interest analysis generator 124 analyze sensor data 108 in the form of image data together with a skin color model used to roughly partition face candidates. The viewer state generator 128 and the user interest analysis generator 124 identify and track candidate facial regions over a plurality of images (such as a sequence of images of the image data) and detect a face in the image based on the one or more of these images. For example, the viewer state generator 128 and the user interest analysis generator 124 can operate via detection of colors in the image data. The viewer state generator 128 and the user interest analysis generator 124 generate a color bias corrected image from the image data and a color transformed image from the color bias corrected image. The viewer state generator 128 and the user interest analysis generator 124 then operate to detect colors in the color transformed image that correspond to skin tones. In particular, the viewer state generator 128 and the user interest analysis generator 124 can operate using an elliptic skin model in the transformed space such as a C_bC_rsubspace of a transformed YC_bC_rspace. In particular, a parametric ellipse corresponding to contours of constant Mahalanobis distance can be constructed under the assumption of Gaussian skin tone distribution to identify a facial region based on a two-dimension projection in the C_bC_rsubspace. As exemplars, the 853,571 pixels corresponding to skin patches from the Heinrich-Hertz-Institute image database can be used for this purpose, however, other exemplars can likewise be used in broader scope of the present disclosure.

In an embodiment, the viewer state generator 128 and the user interest analysis generator 124 track candidate facial regions over a sequence of images and detect a facial region based on an identification of facial motion and/or facial features in the candidate facial region over the sequence of images. This technique is based on 3D human face model that looks like a mesh. For example, face candidates can be validated for face detection based on the further recognition by the viewer state generator 128 and the user interest analysis generator 124 of facial features, like eye blinking (both eyes blink together, which discriminates face motion from others; the eyes are symmetrically positioned with a fixed separation, which provides a means to normalize the size and orientation of the head), shape, size, motion and relative position of face, eyebrows, eyes, nose, mouth, cheekbones and jaw. Any of these facial features extracted from the image data can be used by the viewer state generator 128 and the user interest analysis generator 124 to detect and recognize each viewer that is present.

Further, the viewer state generator 128 and the user interest analysis generator 124 can employ temporal recognition to extract three-dimensional features based on different facial perspectives included in the plurality of images to improve the accuracy of the detection and recognition of the face of each viewer. Using temporal information, the problems of face detection including poor lighting, partially covering, size and posture sensitivity can be partly solved based on such facial tracking. Furthermore, based on profile view from a range of viewing angles, more accurate and 3D features such as contour of eye sockets, nose and chin can be extracted.

Based on the number facial regions that are detected, the number of users present can be identified. In addition, the viewer state generator 128 and the user interest analysis generator 124 can identify the viewing angle of the users that are present based on the position of the detected faces in the field of view of the image data. In addition, the emotion states and activities being performed by each user can be determined based on an extraction of facial characteristic data such as relative position of face, position and condition of the eyebrows, eyes, nose, mouth, cheekbones and jaw, etc.

In addition to detecting and identifying the particular users, the viewer state generator 128 and the user interest analysis generator 124 can further analyze the faces of the users to generate viewer interest data 75 that indicates periods of viewer interest in particular content and viewer state data 85 that indicates emotional states and activities. In an embodiment, the image capture device is incorporated in the video display device such as a TV or monitor or is otherwise positioned so that the position and orientation of the users with respect to the video display device can be detected. In an embodiment the orientation of the face is determined to indicate whether or not the user is facing the video display device and whether the viewer is smiling. In this fashion, when the user's head is down or facing elsewhere, the user's level of interest in the content being displayed is low. Likewise, if the eyes of the user are closed for an extended period indicating sleep, the user's interest in the displayed content can be determined to be low. If, on the other hand, the user is facing the video display device and/or the position of the eyes and condition of the mouth indicate a heighten level of awareness, the user's interest can be determined to be high.

For example, a user can be determined to be watching closely if the face is pointed at the display screen and the eyes are open except during blinking events. Further other aspects of the face such as the eyebrows and mouth may change positions indicating that the user is following the display with interest or that the users is happy or sad. A user can be determined to be not watching closely if the face is not pointed at the display screen for more than a transitory period of time. A user can be determined to be engaged in conversation if the face is not pointed at the display screen for more than a transitory period of time, audio conversation is detected from one or more viewers, the face is pointed toward another user and/or if the mouth of the user is moving. A user can be determined to be sleeping if the eyes of the user are closed for more than a transitory period of time and/or if other aspects of the face such as the eyebrows and mouth fail to change positions over an extended period of time.

FIG. 11 presents a graphical diagram representation of interest data in accordance with an embodiment of the present invention. In particular, a graph of viewer interest data 75 as a function of time, generated in conjunction with a system, such as system 125, is described in conjunction with functions and features of FIG. 5 that are referred to by common reference numerals. In this example, an analysis of input data 99 are used to generate binary interest data that indicate periods of time that the viewer has reached a high level of interest. In the example shown, the viewer interest data 75 is presented as a binary value with a high logic state (periods 262 and 266) corresponding to high interest and a low logic state (periods 260, 264 and 268) corresponding to a low level of interest or otherwise a lack of high interest. While a single set of viewer interest data 75 is shown, this viewer interest data 75 can represent a collective group of viewers of a single viewer. While not specifically shown, viewer interest data 75 of this kind can be separately generated and tracked for a plurality of different viewers.

In an embodiment, the timing of periods 262 and 266 can be correlated to time stamps of clear video signal 111 to generate recommendations data based on the determine time-coded metadata 114 corresponding to the video content during these periods of high interest of the viewer or viewers. While the viewer interest data 75 is shown as a binary value, in other embodiments, viewer interest data 75 can be a multivalued signal that indicates a specific level of interest of the viewer or others and/or a rate of increase in interest of the viewer or viewers.

FIGS. 12 and 13 present pictorial diagram representations of components of a video system in accordance with embodiments of the present invention. In particular, a pair of glasses/goggles 16 are presented that can be used to implement system 125 or a component of video system 125.

The glasses/goggles 16, such as 3D viewing goggles or video display goggles include viewer sensors 106 in the form of perspiration and/or viewer sensors incorporated in the nosepiece 254, bows 258 and/or earpieces 256 as shown in FIG. 12. In addition, one or more imaging sensors implemented in the frames 252 can be used to indicate eye wideness and pupil dilation of an eye of the wearer 250 as shown in FIG. 13.

In an embodiment, the glasses/goggles 16 further include a short-range wireless interface such as a Bluetooth or Zigbee radio that communicates sensor data 108 via a network interface 100 or indirectly via a portable device 103 such as a smartphone, video camera, digital camera, tablet, laptop or other device that is equipped with a complementary short-range wireless interface. In another embodiment, the glasses/goggles 16 include a video player 104 with a heads up display, and some or all of the other components of the system 125.

FIGS. 14 and 15 present pictorial diagram representations of video systems in accordance with embodiments of the present invention. In these embodiments, the smartphone 14 includes resistive or capacitive sensors in its cases that generate input data 99 for monitoring heart rate and/or perspiration levels of the user as they grasp the device. Further the microphone or camera in each device can be used a viewer sensor 106 as previously described.

In yet another embodiment, a Bluetooth headset 18 or other audio/video adjunct device that is paired or otherwise coupled to the smartphone 14 can include resistive or capacitive sensors in their cases that generate input data 99 for monitoring heart rate and/or perspiration levels of the user. In addition, the microphone in the headset 18 can be used to generate further input data 99.

FIG. 16 presents a flowchart representation of a method in accordance with an embodiment of the present disclosure. In particular, a method is presented for use in with one or more features described in conjunction with FIGS. 1-15. Step 400 includes analyzing input data corresponding to a viewing of a plurality video programs via an audio/video (A/V) player by at least one viewer, to determine a viewer state of the at least one viewer corresponding to the plurality of video programs. Step 402 includes generating viewer state data that indicates the viewer state corresponding to the plurality of video programs. Step 404 includes processing the viewer state data, and desired viewer state data than indicates a desired viewer state to generate recommendation data indicating at least one additional video program.

In an embodiment, step 404 includes comparing the desired viewer state data to the viewer state data corresponding to the plurality of video programs to determine a subset of the plurality of video programs that match the desired viewer state data, and generating the recommendation data by searching a recommendation database to determine that the at least one additional video program is correlated to the subset of the plurality of video programs. The viewer state data can indicate one of a plurality of activities of the viewer during the viewing, such as dozing, eating, or interacting with a mobile device, etc.

FIG. 17 presents a flowchart representation of a method in accordance with an embodiment of the present disclosure. In particular, a method is presented for use in with one or more features described in conjunction with FIGS. 1-16. Step 410 includes analyzing the input data corresponding to the viewing of the plurality of video programs via the A/V player, to determine a period of interest corresponding to the at least one viewer. Step 412 includes generating viewer interest data that indicates the period of viewer interest.

In an embodiment, this method is used in conjunction with the method of FIG. 16, wherein the recommendation data in step 404 is generated further based on the viewer interest data. Step 404 can include comparing the desired viewer state data to the viewer state data corresponding to the plurality of video programs to determine a subset of the plurality of video programs that match the desired viewer state data, and generating the recommendation data by searching a recommendation database to determine that the at least one additional video program is correlated to time coded metadata indicating content during the period of interest corresponding to the at least one viewer in the subset of the plurality of video programs.

As may also be used herein, the term(s) “configured to”, “operably coupled to”, “coupled to”, and/or “coupling” includes direct coupling between items and/or indirect coupling between items via an intervening item (e.g., an item includes, but is not limited to, a component, an element, a circuit, and/or a module) where, for an example of indirect coupling, the intervening item does not modify the information of a signal but may adjust its current level, voltage level, and/or power level. As may further be used herein, inferred coupling (i.e., where one element is coupled to another element by inference) includes direct and indirect coupling between two items in the same manner as “coupled to”. As may even further be used herein, the term “configured to”, “operable to”, “coupled to”, or “operably coupled to” indicates that an item includes one or more of power connections, input(s), output(s), etc., to perform, when activated, one or more its corresponding functions and may further include inferred coupling to one or more other items. As may still further be used herein, the term “associated with”, includes direct and/or indirect coupling of separate items and/or one item being embedded within another item.

As may also be used herein, the terms “processing module”, “processing circuit”, “processor”, and/or “processing unit” may be a single processing device or a plurality of processing devices. Such a processing device may be a microprocessor, micro-controller, digital signal processor, microcomputer, central processing unit, field programmable gate array, programmable logic device, state machine, logic circuitry, analog circuitry, digital circuitry, and/or any device that manipulates signals (analog and/or digital) based on hard coding of the circuitry and/or operational instructions. The processing module, module, processing circuit, and/or processing unit may be, or further include, memory and/or an integrated memory element, which may be a single memory device, a plurality of memory devices, and/or embedded circuitry of another processing module, module, processing circuit, and/or processing unit. Such a memory device may be a read-only memory, random access memory, volatile memory, non-volatile memory, static memory, dynamic memory, flash memory, cache memory, and/or any device that stores digital information. Note that if the processing module, module, processing circuit, and/or processing unit includes more than one processing device, the processing devices may be centrally located (e.g., directly coupled together via a wired and/or wireless bus structure) or may be distributedly located (e.g., cloud computing via indirect coupling via a local area network and/or a wide area network). Further note that if the processing module, module, processing circuit, and/or processing unit implements one or more of its functions via a state machine, analog circuitry, digital circuitry, and/or logic circuitry, the memory and/or memory element storing the corresponding operational instructions may be embedded within, or external to, the circuitry comprising the state machine, analog circuitry, digital circuitry, and/or logic circuitry. Still further note that, the memory element may store, and the processing module, module, processing circuit, and/or processing unit executes, hard coded and/or operational instructions corresponding to at least some of the steps and/or functions illustrated in one or more of the Figures. Such a memory device or memory element can be included in an article of manufacture.

One or more embodiments have been described above with the aid of method steps illustrating the performance of specified functions and relationships thereof. The boundaries and sequence of these functional building blocks and method steps have been arbitrarily defined herein for convenience of description. Alternate boundaries and sequences can be defined so long as the specified functions and relationships are appropriately performed. Any such alternate boundaries or sequences are thus within the scope and spirit of the claims. Further, the boundaries of these functional building blocks have been arbitrarily defined for convenience of description. Alternate boundaries could be defined as long as the certain significant functions are appropriately performed. Similarly, flow diagram blocks may also have been arbitrarily defined herein to illustrate certain significant functionality.

To the extent used, the flow diagram block boundaries and sequence could have been defined otherwise and still perform the certain significant functionality. Such alternate definitions of both functional building blocks and flow diagram blocks and sequences are thus within the scope and spirit of the claims. One of average skill in the art will also recognize that the functional building blocks, and other illustrative blocks, modules and components herein, can be implemented as illustrated or by discrete components, application specific integrated circuits, processors executing appropriate software and the like or any combination thereof.

In addition, a flow diagram may include a “start” and/or “continue” indication. The “start” and “continue” indications reflect that the steps presented can optionally be incorporated in or otherwise used in conjunction with other routines. In this context, “start” indicates the beginning of the first step presented and may be preceded by other activities not specifically shown. Further, the “continue” indication reflects that the steps presented may be performed multiple times and/or may be succeeded by other by other activities not specifically shown. Further, while a flow diagram indicates a particular ordering of steps, other orderings are likewise possible provided that the principles of causality are maintained.

The one or more embodiments are used herein to illustrate one or more aspects, one or more features, one or more concepts, and/or one or more examples. A physical embodiment of an apparatus, an article of manufacture, a machine, and/or of a process may include one or more of the aspects, features, concepts, examples, etc. described with reference to one or more of the embodiments discussed herein. Further, from figure to figure, the embodiments may incorporate the same or similarly named functions, steps, modules, etc. that may use the same or different reference numbers and, as such, the functions, steps, modules, etc. may be the same or similar functions, steps, modules, etc. or different ones.

Unless specifically stated to the contra, signals to, from, and/or between elements in a figure of any of the figures presented herein may be analog or digital, continuous time or discrete time, and single-ended or differential. For instance, if a signal path is shown as a single-ended path, it also represents a differential signal path. Similarly, if a signal path is shown as a differential path, it also represents a single-ended signal path. While one or more particular architectures are described herein, other architectures can likewise be implemented that use one or more data buses not expressly shown, direct connectivity between elements, and/or indirect coupling between other elements as recognized by one of average skill in the art.

The term “module” is used in the description of one or more of the embodiments. A module implements one or more functions via a device such as a processor or other processing device or other hardware that may include or operate in association with a memory that stores operational instructions. A module may operate independently and/or in conjunction with software and/or firmware. As also used herein, a module may contain one or more sub-modules, each of which may be one or more modules.

While particular combinations of various functions and features of the one or more embodiments have been expressly described herein, other combinations of these features and functions are likewise possible. The present disclosure is not limited by the particular examples disclosed herein and expressly incorporates these other combinations.

	Number	Date	Country
Parent	14669876	Mar 2015	US
Child	14677191		US
Parent	14590303	Jan 2015	US
Child	14669876		US
Parent	14217867	Mar 2014	US
Child	14590303		US
Parent	14477064	Sep 2014	US
Child	14217867		US

AUDIO/VIDEO SYSTEM WITH VIEWER-STATE BASED RECOMMENDATIONS AND METHODS FOR USE THEREWITH

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

CROSS REFERENCE TO RELATED PATENTS

Continuation in Parts (4)