Videos are sequences of a large number of images, each of which called a frame, displayed in fast enough frequency so that the human eyes can perceive that as continuous content. Each frame of a video can have multiple objects, some of which may be animate (e.g., animals, insects, human beings, etc.) and some may be inanimate (e.g., rocks, chairs, books, etc., or things that do not have a life). In many applications, a viewer of a video may be interested in watching a specific object included in the video without having to watch the other objects. For example, the quarterback coach of a sports team might be interested in watching only the video frames that include the quarterback and no other player. In a similar manner, the defensive coordinator of a sports team may be interested only in watching the performance of a specific linebacker and no other player. As another example, a mother watching the video of a dance performance of her son's dancing group might be interested in watching only her son's moves. Thus, in these applications, a viewer of an original video may be interested in viewing the performance/progression of a specific object of interest in a video. Consequently, there is a need for systems and methods that provide a personalized viewing experience to viewers by focusing on an object of interest to the viewer.
Disclosed embodiments are directed at systems, methods, and apparatus for facilitating a personalized viewing experience. The method includes receiving a source video stream including multiple objects, wherein the source video stream includes a plurality of frames; identifying, within a frame in the plurality of frames, an object of interest to a viewer from the multiple objects based on specific audio or video features of the object of interest; automatically switching across the plurality of frames based on the specific audio or video features of the object of interest by identifying at least one frame in the plurality of frames having the object of interest; segmenting the source video stream into multiple chunks, wherein each chunk includes the at least one frame having the object of interest; and generating a target video stream by multiplexing the multiple chunks, the target video stream including sequentially-arranged frames having the object of interest.
The following detailed description of the invention is merely exemplary in nature and is not intended to limit the invention or the application and uses of the invention. Furthermore, there is no intention to be bound by any theory presented in the preceding background or the following detailed description.
The various embodiments described herein generally provide apparatus, systems and methods related to processing of a source video stream for generation of a target video stream that includes an object of interest to a viewer. In some embodiments, the target video stream may exclusively or primarily include the performance of the object of interest to the viewer, without including other persons in that video. This allows a viewer to focus on an object of his or her interest and not necessarily have to view the performances of other objects in the source video stream.
The object of interest in the original source video stream can be any animate or inanimate object. Non-limiting examples of a source video stream can be personal video recordings, TV shows, sports, movies, music, documentaries, streaming content from content providers such as NETFLIX©, HULU©, YOUTUBE©, HBO GO©, etc. In some embodiments, the viewer can express an indication of an object of interest in the source video stream by inputting/providing a textual description, by uploading/providing/submitting one or more images of the object of interest, by identifying (based on an identifier such as a name, a role in an event, or an image) an object of interest from a plurality of objects displayed on a menu of a user interface, or by providing one or more videos including the object of interest.
For example, the father of a high school student may be interested in watching the recorded performance of his son, i.e., an object of interest in a video of a game of the school's basketball team. The father may provide/upload/submit an image of his son to the disclosed system. The disclosed system receives the image of the son and tries to find a match between objects in each frame of the game's recording and the image of the son. Upon identifying the frames that include the son, the disclosed system can create a (new) target video stream that includes the son without necessarily including other players in the game. In some embodiments, a match between objects in a frame and the image of the son can be detected using any object recognition or machine learning algorithm. The disclosed system processes each of the video frames to identify frames that include the object of interest. These frames are then aggregated (“stitched”) together to produce the target video stream.
In some implementations, the target video stream is re-encoded at the same quality or a lower quality than the source video stream. For example, the source video stream can be a 4K video and a target video stream may be of ultra high definition (UHD), high definition (HD) or standard definition (SD) quality. The source video stream can be raw (unedited) footage shot on a video camera, an over-the-air broadcast from a satellite or cable television distribution system, or any other video. The source video stream may also be transferred over any type of communication network, such as the Internet or other wide area network, a local area network, a private network, a mobile communication system, a terrestrial television network, a cable television network, and a satellite television network. Additionally, the source video stream may be transmitted by way of any communication technology, such as by satellite, wire or optical cable, wireless, or other means. Further, the source video stream may be delivered by any transmission method, such as broadcast, multicast, simulcast, closed circuit, pay-per-view, on-demand, over-the-top (by “streaming,” file transfer, or other means), or other methods. The source video stream could also be saved as a digital file in any computing device.
As one example environment, a school operating the disclosed system can have a publicly-accessible web portal of the school's football team. The web portal can have a video of each game played by the team. For each video, the web portal can provide a menu displaying one or more members of the school's football team as selectable objects of interest. A person who is interested in team member A can select team member A as the object of interest. When the person chooses a game, the disclosed system can display a video (a/k/a target video stream) of the game to the person that includes team member A primarily or exclusively. In some embodiments, the disclosed system can also provide the person an option to view the original video (a/k/a source video stream) of the game. The objects of interest can be displayed on the menu by their name, a position that they play, an identifying image, or by any other suitable means.
Prior to generating the target video stream, the disclosed system is trained to identify the objects of interest from a training set of videos/images, using machine learning methodologies such as TENSORFLOW or YOLO. In the example of the school football team, a training set of videos can be a collection of videos or images of an object of interest (e.g., a quarterback). Based on the system getting trained using the training set of videos, the disclosed system is able to identify and extract frames that include the object of interest (e.g., the quarterback), from a source video stream of a recorded game. A training set, for example, can be for created for one or more persons/actors in a video. In the context of a football game, if each team has eleven players on the field at any instant, then there can be at most twenty-two (22) objects of interest. Hence, twenty-two (22) training sets can be used to train the system, with each set having several videos/images of a player.
In some embodiments, the disclosed system is able to tag one or more objects of interest in the source video stream, based on the system getting trained using the training set of videos. These tags can include an identification of the object of interest and can be overlayed on the objects of interest in the source; and displayed on a user interface. When a viewer viewing the source video stream desires to exclusively or primarily watch the performance of the object of interest to the viewer, the viewer can click on the tag, which causes the system to start playing a target video stream associated with the object of interest to the viewer. For example, this functionality can be included in the context of a set top box (STB) or digital video recorder (DVR) which can switch from playing the source video stream to playing the target video stream, in response to a prompt by a viewer clicking on a remote control operative to control the STB or the DVR.
Embodiments of the disclosed system can be owned and operated by organizations (for profit or non-profit), schools, cable companies, broadcasting companies, or private individuals.
Memory 205 can store instructions for running one or more applications or modules on processor(s) 210. For example, memory 205 could be used in one or more embodiments to house all or some of the instructions needed to execute the functionality of training module 215, object identification module 225, and target video generation module 230. Generally, memory 205 can include any device, mechanism, or populated data structure used for storing information. In accordance with some embodiments of the present disclosure, memory 205 can encompass, but is not limited to, any type of volatile memory, nonvolatile memory, and dynamic memory. For example, memory 205 can be random access memory, memory storage devices, optical memory devices, magnetic media, floppy disks, magnetic tapes, hard drives, SIMMs, SDRAM, DIMMs, RDRAM, DDR RAM, SODIMMS, EPROMs, EEPROMs, compact discs, DVDs, and/or the like. In accordance with some embodiments, memory 205 may include one or more disk drives, flash drives, one or more databases, one or more tables, one or more files, local cache memories, processor cache memories, relational databases, flat databases, and/or the like. In addition, those of ordinary skill in the art will appreciate many additional devices and techniques for storing information that can be used as memory 205.
Training module 215 is configured to “learn” specific audio and video features associated with the object of interest from the one or more training content, during a training phase. Examples of specific audio and video features associated with an object of interest can be a color of the object of interest, geometric attributes such as a height/width/depth of the object of interest, whether the object of interest changes form (animate) or does not change form (inanimate). Additionally, if the object of interest is animate, then specific audio and video features can be a color of different parts (e.g., the eyes, the shirt, the pants, etc.) of the object of interest, a pitch or a frequency of a voice of the object of interest, or an instrument played by the object of interest.
Object identification module 225 is configured to identify frames that include the object of interest.
Target video generation module 230 is configured to generate the target video streams. For example, these frames may be aggregated (“stitched”) together to produce the target video stream as a continuous stream and not a disjointed stream.
The snapshot 302 corresponds to a source video stream and includes four objects, object 1, object 2, object 3, object 4 denoted as 302, 304, 306, 308 respectively. In some embodiments, the snapshot 302 can be a static snapshot, e.g., produced when a viewer pauses the source video stream while playing. In some embodiments, a viewer does not necessarily have to pause the video stream while playing and the four objects in the snapshot are tagged or labeled as object 1, object 2, object 3, or object 4. Region 330 of snapshot 302 includes a message/notification region in which the user interface queries the viewer whether the viewer wishes to focus on object 1, object 2, object 3, or object 4. A user can specify a selection of his or her object of interest by clicking on any of buttons 320, 322, 324, or 326. In some embodiments, region 330 is generated when the user clicks on a button or otherwise interacts with the user interface. Accordingly, the disclosed system may generate a target video stream with the specified target of interest.
In some embodiments, the disclosed system automatically identifies based on tracking an eye movement of the viewer that is viewing the user interface. For example, cameras coupled to a TV, a STB, a computer monitor, a mobile device, or other devices can track the eye movement of a viewer over time. The cameras may transmit to the disclosed system information (e.g., over a wired or a wireless network) relating to a viewer's eye movement. Upon receiving the information, the disclosed system can process this information to determine which object if of interest to the user.
In some embodiments, user interface corresponds to a web page (e.g., of a school's football team) showing four possible choices (e.g., four players on the football team) that can be of possible interest to a user in connection with a source video stream (e.g., a recording of a football game). A user can click on one of the four objects to specify an object of interest. Accordingly, the disclosed system may generate a target video stream with the specified object of interest.
In some embodiments, the disclosed application program can be integrated with a voice control system such as AMAZON© ALEXA© or GOOGLE© ASSISTANT©.
Some of the embodiments described herein are described in the general context of methods or processes, which may be implemented in one embodiment by a computer program product, embodied in a computer-readable medium, including computer-executable instructions, such as program code, executed by computers in networked environments. A computer-readable medium may include removable and non-removable storage devices including, but not limited to, Read Only Memory (ROM), Random Access Memory (RAM), compact discs (CDs), digital versatile discs (DVD), etc. Therefore, the computer-readable media may include a non-transitory storage media. Generally, program modules may include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Computer- or processor-executable instructions, associated data structures, and program modules represent examples of program code for executing steps of the methods disclosed herein. The particular sequence of such executable instructions or associated data structures represents examples of corresponding acts for implementing the functions described in such steps or processes.
Some of the disclosed embodiments may be implemented as devices or modules using hardware circuits, software, or combinations thereof. For example, a hardware circuit implementation may include discrete analog and/or digital components that are, for example, integrated as part of a printed circuit board. Alternatively, or additionally, the disclosed components or modules may be implemented as an Application Specific Integrated Circuit (ASIC) and/or as a Field Programmable Gate Array (FPGA) device. Some implementations may additionally or alternatively include a digital signal processor (DSP) that is a specialized microprocessor with an architecture optimized for the operational needs of digital signal processing associated with the disclosed functionalities of this application. Similarly, the various components or sub-components within each module may be implemented in software, hardware or firmware. The connectivity between the modules and/or components within the modules may be provided using any one of the connectivity methods and media that is known in the art, including, but not limited to, communications over the Internet, wired, or wireless networks using the appropriate protocols. For example, the communications can include any combination of local area and/or wide area networks, using wired and/or wireless communication systems. The networks could use any or more protocols/technologies: Ethernet, IEEE 802.11 or Wi-Fi, worldwide interoperability for microwave access (WiMAX), cellular telecommunication (e.g., 3G, 4G, 5G), CDMA, cable, digital subscriber line (DSL), etc. Similarly, the networking protocols may include multiprotocol label switching (MPLS), transmission control protocol/Internet protocol (TCP/IP), User Datagram Protocol (UDP), hypertext transport protocol (HTTP), simple mail transfer protocol (SMTP), and file transfer protocol (FTP). Data exchanged over the one or more networks may be represented using technologies, languages, and/or formats including hypertext markup language (HTML) or extensible markup language (XML). In addition, all or some links can be encrypted using conventional encryption technologies such as secure sockets layer (SSL), transport layer security (TLS), and Internet Protocol security (IPsec).
The foregoing description of embodiments has been presented for purposes of illustration and description. The foregoing description is not intended to be exhaustive or to limit embodiments of the present invention to the precise form disclosed, and modifications and variations are possible in light of the above teachings or may be acquired from practice of various embodiments. The embodiments discussed herein were chosen and described in order to explain the principles and the nature of various embodiments and its practical application to enable one skilled in the art to utilize the present invention in various embodiments and with various modifications as are suited to the particular use contemplated. The features of the embodiments described herein may be combined in all possible combinations of methods, apparatus, modules, systems, and computer program products.
This patent application is a Continuation of U.S. Non-Provisional patent application Ser. No. 17/338,515, filed Jun. 3, 2021, entitled “APPARATUS, SYSTEMS AND METHODS FOR FACILITATING A PERSONALIZED VIEWING EXPERIENCE,” which is a Continuation of U.S. Non-Provisional patent application Ser. No. 15/845,704, filed Dec. 18, 2017, entitled “APPARATUS, SYSTEMS AND METHODS FOR FACILITATING A PERSONALIZED VIEWING EXPERIENCE,” the disclosures of which are incorporated herein by reference in their entirety.
Number | Name | Date | Kind |
---|---|---|---|
5005204 | Deaett | Apr 1991 | A |
6204840 | Petelycky et al. | Mar 2001 | B1 |
8854447 | Conness et al. | Oct 2014 | B2 |
9053711 | Smith et al. | Jun 2015 | B1 |
9251798 | Miao et al. | Feb 2016 | B2 |
9292895 | Rodriguez et al. | Mar 2016 | B2 |
9299364 | Pereira et al. | Mar 2016 | B1 |
9456273 | Wang et al. | Sep 2016 | B2 |
9626084 | Waggoner et al. | Apr 2017 | B2 |
9728188 | Rosen et al. | Aug 2017 | B1 |
9904509 | Raffa et al. | Feb 2018 | B1 |
9912373 | Wang | Mar 2018 | B1 |
10121165 | Mohajer et al. | Nov 2018 | B1 |
10276175 | Garcia | Apr 2019 | B1 |
10365885 | Naik Raikar et al. | Jul 2019 | B1 |
10657174 | Master et al. | May 2020 | B2 |
10901685 | Raikar et al. | Jan 2021 | B2 |
20020035723 | Inoue et al. | Mar 2002 | A1 |
20020152117 | Cristofalo et al. | Oct 2002 | A1 |
20030122862 | Takaku et al. | Jul 2003 | A1 |
20030229514 | Brown | Dec 2003 | A2 |
20050065976 | Holm et al. | Mar 2005 | A1 |
20050166258 | Vasilevsky | Jul 2005 | A1 |
20070005795 | Gonzalez | Jan 2007 | A1 |
20070087756 | Hoffberg | Apr 2007 | A1 |
20070250716 | Brunk et al. | Oct 2007 | A1 |
20080041220 | Foust et al. | Feb 2008 | A1 |
20100332003 | Yaguez | Dec 2010 | A1 |
20110013790 | Hilpert et al. | Jan 2011 | A1 |
20120155653 | Jax et al. | Jun 2012 | A1 |
20130007201 | Jeffrey et al. | Jan 2013 | A1 |
20130080159 | Sharifi et al. | Mar 2013 | A1 |
20130086051 | Brahms | Apr 2013 | A1 |
20140016787 | Neuendorf et al. | Jan 2014 | A1 |
20140032775 | Abiezzi et al. | Jan 2014 | A1 |
20140140536 | Serletic, II | May 2014 | A1 |
20140307896 | Park et al. | Oct 2014 | A1 |
20150016641 | Ugur et al. | Jan 2015 | A1 |
20150172787 | Geramifard | Jun 2015 | A1 |
20150193199 | Kim et al. | Jul 2015 | A1 |
20150205864 | Fuzell-Casey et al. | Jul 2015 | A1 |
20150215496 | Matsuo | Jul 2015 | A1 |
20150220633 | Fuzell-Casey et al. | Aug 2015 | A1 |
20150234564 | Snibbe et al. | Aug 2015 | A1 |
20150269951 | Kalker et al. | Sep 2015 | A1 |
20150331661 | Kalampoukas et al. | Nov 2015 | A1 |
20150373455 | Donaldson | Dec 2015 | A1 |
20160054903 | Jeong et al. | Feb 2016 | A1 |
20160071546 | Neymotin et al. | Mar 2016 | A1 |
20160103652 | Kuniansky | Apr 2016 | A1 |
20160112479 | Jayaraj et al. | Apr 2016 | A1 |
20160125889 | Westerman | May 2016 | A1 |
20160192105 | Breebaart et al. | Jun 2016 | A1 |
20160247537 | Ricciardi | Aug 2016 | A1 |
20160261917 | Trollope et al. | Sep 2016 | A1 |
20160261953 | Aggarwal et al. | Sep 2016 | A1 |
20160269712 | Ostrover et al. | Sep 2016 | A1 |
20160292266 | Mont-Reynaud et al. | Oct 2016 | A1 |
20170048596 | Fonseca, Jr. et al. | Feb 2017 | A1 |
20170072321 | Thompson | Mar 2017 | A1 |
20170099558 | Sptiznagle et al. | Apr 2017 | A1 |
20170109128 | Parvizi et al. | Apr 2017 | A1 |
20170125014 | Pogorelik et al. | May 2017 | A1 |
20170169833 | Lecomte et al. | Jun 2017 | A1 |
20170185375 | Martel et al. | Jun 2017 | A1 |
20170195819 | Harder et al. | Jul 2017 | A1 |
20170199934 | Nongpiur et al. | Jul 2017 | A1 |
20170229121 | Taki et al. | Aug 2017 | A1 |
20170244959 | Ranjeet et al. | Aug 2017 | A1 |
20170293461 | McCauley et al. | Oct 2017 | A1 |
20170295412 | Carroll et al. | Oct 2017 | A1 |
20170329493 | Jia et al. | Nov 2017 | A1 |
20170332036 | Panchaksharaiah et al. | Nov 2017 | A1 |
20180014041 | Chen et al. | Jan 2018 | A1 |
20180060022 | Kozlov | Mar 2018 | A1 |
20180095643 | Jia et al. | Apr 2018 | A1 |
20180121159 | Thompson et al. | May 2018 | A1 |
20180122403 | Koretzky | May 2018 | A1 |
20180139268 | Fuzell-Casey et al. | May 2018 | A1 |
20180146446 | Mate | May 2018 | A1 |
20180189020 | Oskarsson et al. | Jul 2018 | A1 |
20180322887 | Choo et al. | Nov 2018 | A1 |
20180341455 | Inanov et al. | Nov 2018 | A1 |
20190013027 | Page et al. | Jan 2019 | A1 |
20190191188 | Tilaye et al. | Jun 2019 | A1 |
20190294409 | Raikar et al. | Sep 2019 | A1 |
20210208842 | Cassidy | Jul 2021 | A1 |
20210289237 | Tilaye | Sep 2021 | A1 |
Entry |
---|
Hoekstra et al., “Presentation Agents That Adapts to Users' Visual Interest and Follow Their Preferences,” Proceedings of the 5th International Conference on Computer Vision System, 2007, 10 pages. |
Number | Date | Country | |
---|---|---|---|
20220353553 A1 | Nov 2022 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 17338515 | Jun 2021 | US |
Child | 17869678 | US | |
Parent | 15845704 | Dec 2017 | US |
Child | 17338515 | US |