The present disclosure relates generally to capturing user responses to spatial video content and, more particularly, to systems and methods for automatically creating and displaying a focal point density map to indicate areas of interest in space and time within the video content.
Over the past decade there has been an exponential growth in the prevalence of streaming media in the lives of the general public. Users frequently view video content on various websites or within mobile applications. The video content may be user generated (e.g., videos captured from user devices and posted on social networking sites such as Facebook and Snapchat), professional published content such as television and movies on sites such as Hulu, Netflix, and YouTube, or commercial content created for brands and companies published on their respective website or within an application. Existing forms of interactive video players allow a viewer to make choices on how to proceed through a video by playing the video, pausing the video, restarting the video, or exiting from the video at any point in time. Applications exist that can capture these temporal events as they relate to the video generally. Spatial video is a field with growing adoption over only the last few years. This new media is rendered in a sphere around a viewer, who may move and manipulate their point of view as the video plays. This format has new opportunity for interaction, and information about where viewers' focused greatly benefits the creators of such content. Current techniques do not capture a user's spatial focal point over time.
Systems and methods are presented for creating a spatially coordinated, temporally synchronized focal point density map indicating the elements of focus within video content. The focal point density map may, in certain instances, represent an amplitude of user engagement with the video content, which can be displayed in various manners to indicate elements within the content that attract user attention, where in space the user is looking, and when that attention span starts, wanes, stops, or transitions to other elements. User engagement may be tracked and stored at an individual user level as well as aggregated. The engagement data may be presented as a visual layer overlaid with the video content against which the engagement data was collected, effectively displaying a temporal interest heat map over the video content. Separately, or in addition to the heat map overlay, a graphical representation of the engagement data may be displayed. For example, an engagement map may include a horizontal axis representing the spatial dimension (e.g., degrees or radians from the center of the video) and the vertical axis representing the temporal dimension (e.g., the top of the graph represents the start of the video, and the bottom the end).
Therefore, in one aspect, a computer-implemented method for measuring and displaying user engagement with video content is provided. Orientation data is received from user devices as users of each device view video content on each respective user device, and, based on the orientation data, determining each user's focal point within the video either periodically or when a change in the focal point has occurred. A focal point density map is created for the video content, wherein the focal point density map visually indicates an aggregated temporal and spatial distribution of the users' focal points, and a display of the focal point density map and the associated video content is presented, thereby indicating elements of interest within the video content. The video may be standard form and resolution, panoramic, high-definition, and/or three-dimensional, and may contain audio tracks.
In some embodiments, the device orientation data includes accelerometer data, gyroscope data, and/or GPS data, each received from devices within the user devices. In embodiments in which the video is viewed using a desktop or other stationary device, mouse or pointer events may be used to determine orientation data. In some cases, a field of view of the video content is adjusted in response to the orientation data such that the focal point is substantially centered on a viewing screen of the user device. The orientation data can be stored such that the orientation data comprises a temporal data element, a spatial data element, a user identifier and a video content identifier, among other metadata describing the video content itself.
The display including the focal point density map and the associated video content may be presented as a layered display such that the density map is overlaid on the video content (which, if panoramic, may be presented as an equirectangular projection of the panoramic video content) and such that the focal point density map and video content are temporally and spatially synchronized. In some instances the video content may be spherical, allowing for both horizontal and vertical movements. The focal point density map may substantially transparent, thereby facilitating the viewing of elements within the video content behind the focal point density map. In some instances, the aggregate spatial distribution of the focal point density map is displayed as a gradient, such as a color gradient, a shading gradient and/or a transparency gradient.
The display may be presented in conjunction with a set of player controls (within or adjacent to the display), whereby the player controls facilitate manual manipulation of the video content and the focal point density map by a user. The aggregated temporal and aggregate spatial distribution of users' focal points can, in some embodiments, be filtered such that the focal point density map comprises a subset of the focal points based, for example, on user attributes and/or device attributes.
In another aspect, a system for displaying and measuring viewer engagement among elements of video content is provided. The system includes one or more computers programmed to perform certain operations, including receiving user device orientation data from user devices as users of each device views video content on each respective user device and periodically determining from the user device orientation data each user's focal point within the video. The computers are programmed to automatically create a focal point density map for the video content, wherein the focal point density map visually indicates an aggregated temporal and spatial distribution of users' focal points and to present a display of the focal point density map and the associated video content, thereby indicating elements of interest within the video content.
In some embodiments, the device orientation data includes accelerometer data, gyroscope data, and/or GPS data, each received from devices within the user devices. In some cases, a field of view of the video content is adjusted in response to the orientation data such that the focal point is substantially centered on a viewing screen of the user device. The orientation data can be stored such that the orientation data comprises a temporal data element, a spatial data element, a user identifier and a video content identifier, among other metadata describing the video content itself.
The display including the focal point density map and the associated video content may be presented as a layered display such that the density map is overlaid on the video content (which, if panoramic, may be presented as an equirectangular projection of the panoramic video content)and such that the focal point density map and video content are temporally and spatially synchronized. In some instances the video content may be spherical, allowing for both horizontal and vertical movements. The focal point density map may substantially transparent, thereby facilitating the viewing of elements within the video content behind the focal point density map. In some instances, the statistical spatial distribution of the focal point density map is displayed as a gradient, such as a color gradient, a shading gradient and/or a transparency gradient.
The display may be presented in conjunction with a set of player controls (within or adjacent to the display), whereby the player controls facilitate manual manipulation of the video content and the focal point density map by a user. The aggregated temporal and aggregate spatial distribution of users' focal points can, in some embodiments, be filtered such that the focal point density map comprises a subset of the focal points based, for example, on user attributes and/or device attributes.
Other aspects and advantages of the invention will become apparent from the following drawings, detailed description, and claims, all of which illustrate the principles of the invention, by way of example only.
A more complete appreciation of the invention and many attendant advantages thereof will be readily obtained as the same becomes better understood by reference to the following detailed description when considered in connection with the accompanying drawings. In the drawings, like reference characters generally refer to the same parts throughout the different views. Further, the drawings are not necessarily to scale, with emphasis instead generally being placed upon illustrating the principles of the invention.
Described herein are various implementations of methods and supporting systems for capturing, measuring, analyzing and displaying users' engagement with visual (still and moving video) content on user display devices. As used herein, video content may refer to any form of visually presented information, data, and images, including still images, moving pictures, data maps, virtual reality landscapes, video games, etc.
The mobile device 105, server 110, display device 115 and data storage server 120 communicate with each other (as well as other devices and data sources) via a network 125. The network communication may take place via any media such as standard and/or cellular telephone lines, LAN or WAN links (e.g., T1, T3, 56 kb, X.25), broadband connections (ISDN, Frame Relay, ATM), wireless links, and so on. Preferably, the network 125 can carry TCP/IP protocol communications, and HTTP/HTTPS requests made by the mobile device and the connection between the mobile device 105 and the server 110 can be communicated over such networks. In some implementations, the network includes various cellular data networks such as 2G, 3G, 4G, and others. The type of network is not limited, however, and any suitable network may be used. Typical examples of networks that can serve as the communications network 125 include a wireless or wired Ethernet-based intranet, a local or wide-area network (LAN or WAN), and/or the global communications network known as the Internet, which may accommodate many different communications media and protocols.
The mobile device 105 may include various functional components that facilitate the display and analysis of content on the device 105. For example, the mobile device 105 may include a vide player component 130. The video player component 130 receives content via the network 125 or from stored memory of the device 105 and renders the content in response to user commands. In some instance the video player 130 may be native to the device 105, whereas in other instances the video player 130 may be a specially-designed application installed on the device 105 by the user. The content rendered by the video player may be any form, including still photographs, panoramic photos, video, three-dimensional video, high-definition video, etc.
The mobile device 105 may also include one or more components that sense and provide data representing the location, orientation and/or movement of the device 105. For example, the mobile device 105 may include one or more accelerometers 135. For example, in certain mobile devices, three accelerometers 135 are used—one for each of the x, y and z axis. Each accelerometer 135 measures changes in velocity over time along a linear path. Combining readings from the three accelerometers 135 indicates device movement in any direction and the device's current orientation. The device 105 may also include a gyroscope 140 to measure the rate of rotation about each axis. In addition to the motion sensing capabilities provided by the accelerometer 135 and gyroscope 140, a GPS chipset 145 may be used to indicate a physical location of the device 105. Together, data gathered from the accelerometer 135 and gyroscope 140 indicates the rate and direction of movement of the device 105 in space, and data from the GPS chipset may provide location-based information such that applications operating on the device 105 may receive and respond to such information, as well as report such information to the server 110.
The server 110 many include various functional components, including, for example, a communications server 150 and an application server 155. The communication server provides the conduit through which requests for data and processing are received from the mobile device 105, as well as interaction with other servers that may provide additional content and user engagement data. The application server 155 stores and executes the primary programming instructions for facilitating the functions executed on the server 130. In some instances, the server 110 also includes an analytics engine 160 that analyzes user engagement data and provides historical, statistical and predictive breakdowns or aggregated summaries of the data. Content and data describing the content, user profiles, and user engagement data may be stored in a data storage application 165 on the data storage device 125. In some instances, data representing user orientation and interest include a temporal element (e.g. a timestamp and/or time range), a spatial element (such as an angular field of view and/or a focal point location), a user identifier to identify the individual viewing the content, and a content identifier to uniquely identify the content being viewed.
Once the application server 155 and the analytics engine 160 receive, analyze and format user engagement data, one or more displays 170 may be presented to a user who can view, interact with and otherwise manipulate the display 170 using keyboard commands, mouse movements, touchscreen commands and other means of command inputs.
Referring to
As an example only, implementations using an Apple iPhone as the device 105 utilize a Core Motion framework in which device motion events are represented by three data objects, each encapsulating one or more measurements. A CMAccelerometerData object captures the acceleration along each of the spatial axes, A CMGyroData object captures the rate of rotation around each of the three spatial axes, and A CMDeviceMotion object encapsulates several different measurements, including altitude and more useful measurements of rotation rate and acceleration. The CMMotionManager class is the central access point for Core Motion. Creating an instance of the class facilitates the specification of an update interval, requests that updates start, and handles motion events as they are delivered. All of the data-encapsulating classes of Core Motion are subclasses of CMLogItem, which defines a timestamp so that motion data can be tagged with a time and stored in the data storage device as described above. Motion data may be captured using a “pull” technique, in which an application periodically samples the most recent measurement of motion data, or “push” in which an application specifies an update interval and implements a block for handling the data. The Core Motion framework then delivers each update to the block, which can execute as a task in the operation queue.
Referring to
In some embodiments, the density map 505 may be uniform along the vertical axis, if, for example, the users' focal point is measured only along the horizontal axis. In other cases, the density map 505 may include a non-inform gradient where the users' focal point is measured along both the horizontal and vertical axes. In some implementations, the data may be structured and stores such that one dimension may be held constant while another changes.
As described above, the focal point data may be measured periodically while users are engaged with the content, thereby facilitating a temporal representation of the heat map. Specifically, the heat map display can indicate, over time, the relative engagement or interest in elements within the content. As the content is played or displayed, the density map changes to indicate users' interest at that point in the content. In some cases, the frequency with which the focal point data measures users' interest matches the frame rate of the content, thus showing the density map for each particular frame of the content.
Still referring to
Referring now to
As described above, the focal point density map comprises an aggregation of user-based focal point data collected over time and across a potentially wide variety of users (e.g., ages, locations, etc.).
Mobile device 105 and server(s) 110 may be implemented in any suitable way.
Exemplary mobile device 105 and exemplary server 110 may have one or more input and output devices. These devices can be used, among other things, to present a user interface and/or communicate (e.g., via a network) with other devices or computers. Examples of output devices that can be used to provide a user interface include printers or display screens for visual presentation of output and speakers or other sound generating devices for audible presentation of output. Examples of input devices that can be used for a user interface include keyboards, and pointing devices, such as mice, touch pads, and digitizing tablets. As another example, a computer may receive input information through speech recognition or in other audible format.
Although examples provided herein may have described the servers as residing on separate computers, it should be appreciated that the functionality of these components can be implemented on a single computer, or on any larger number of computers in a distributed fashion.
Having thus described several aspects of at least one embodiment of this invention, it is to be appreciated that various alterations, modifications, and improvements will readily occur to those skilled in the art.
Such alterations, modifications, and improvements are intended to be part of this disclosure, and are intended to be within the spirit and scope of the invention. Accordingly, the foregoing description and drawings are by way of example only. The above-described embodiments of the present invention can be implemented in any of numerous ways. For example, the embodiments may be implemented using hardware, software or a combination thereof. When implemented in software, the software code can be executed on any suitable processor or collection of processors, whether provided in a single computer or distributed among multiple computers. Further, it should be appreciated that a computer may be embodied in any of a number of forms, such as a rack-mounted computer, a desktop computer, a laptop computer, or a tablet computer. Additionally, a computer may be embedded in a device not generally regarded as a computer but with suitable processing capabilities, including a Personal Digital Assistant (PDA), a smart phone or any other suitable portable or fixed electronic device.
Such computers may be interconnected by one or more networks in any suitable form, including as a local area network or a wide area network, such as an enterprise network or the Internet. Such networks may be based on any suitable technology and may operate according to any suitable protocol and may include wireless networks, wired networks or fiber optic networks.
Also, the various methods or processes outlined herein may be coded as software that is executable on one or more processors that employ any one of a variety of operating systems or platforms. Additionally, such software may be written using any of a number of suitable programming languages and/or programming or scripting tools, and also may be compiled as executable machine language code or intermediate code that is executed on a framework or virtual machine.
In this respect, the invention may be embodied as a computer readable medium (or multiple computer readable media) (e.g., a computer memory, one or more floppy discs, compact discs, optical discs, magnetic tapes, flash memories, circuit configurations in Field Programmable Gate Arrays or other semiconductor devices, or other tangible computer storage medium) encoded with one or more programs that, when executed on one or more computers or other processors, perform methods that implement the various embodiments of the invention discussed above. The computer readable medium or media can be transportable, such that the program or programs stored thereon can be loaded onto one or more different computers or other processors to implement various aspects of the present invention as discussed above. The terms “program” or “software” are used herein in a generic sense to refer to any type of computer code or set of computer-executable instructions that can be employed to program a computer or other processor to implement various aspects of the present invention as discussed above. Additionally, it should be appreciated that according to one aspect of this embodiment, one or more computer programs that when executed perform methods of the present invention need not reside on a single computer or processor, but may be distributed in a modular fashion amongst a number of different computers or processors to implement various aspects of the present invention.
Computer-executable instructions may be in many forms, such as program modules, executed by one or more computers or other devices. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Typically, the functionality of the program modules may be combined or distributed as desired in various embodiments.
Also, data structures may be stored in computer-readable media in any suitable form. For simplicity of illustration, data structures may be shown to have fields that are related through location in the data structure. Such relationships may likewise be achieved by assigning storage for the fields with locations in a computer-readable medium that conveys relationship between the fields. However, any suitable mechanism may be used to establish a relationship between information in fields of a data structure, including through the use of pointers, tags or other mechanisms that establish a relationship between data elements.
Various aspects of the present invention may be used alone, in combination, or in a variety of arrangements not specifically discussed in the embodiments described in the foregoing and is therefore not limited in its application to the details and arrangement of components set forth in the foregoing description or illustrated in the drawings. For example, aspects described in one embodiment may be combined in any manner with aspects described in other embodiments.
Also, the invention may be embodied as a method, of which an example has been provided. The acts performed as part of the method may be ordered in any suitable way. Accordingly, embodiments may be constructed in which acts are performed in an order different than illustrated, which may include performing some acts simultaneously, even though shown as sequential acts in illustrative embodiments.
In some embodiments the functions may be implemented as computer instructions stored in portions of a computer's random access memory to provide control logic that affects the processes described above. In such an embodiment, the program may be written in any one of a number of high-level languages, such as FORTRAN, PASCAL, C, C++, C#, Java, javascript, Tcl, or BASIC. Further, the program can be written in a script, macro, or functionality embedded in commercially available software, such as EXCEL or VISUAL BASIC. Additionally, the software may be implemented in an assembly language directed to a microprocessor resident on a computer. For example, the software can be implemented in Intel 80×86 assembly language if it is configured to run on an IBM PC or PC clone. The software may be embedded on an article of manufacture including, but not limited to, “computer-readable program means” such as a floppy disk, a hard disk, an optical disk, a magnetic tape, a PROM, an EPROM, or CD-ROM.
Although the systems and methods described herein relate primarily to audio and video playback, the invention is equally applicable to various streaming and non-streaming media, including animation, video games, interactive media, and other forms of content usable in conjunction with the present systems and methods. Further, there can be more than one audio, video, and/or other media content stream played in synchronization with other streams. Streaming media can include, for example, multimedia content that is continuously presented to a user while it is received from a content delivery source, such as a remote video server. If a source media file is in a format that cannot be streamed and/or does not allow for seamless connections between segments, the media file can be transcoded or converted into a format supporting streaming and/or seamless transitions.
While various implementations of the present invention have been described herein, it should be understood that they have been presented by example only. Where methods and steps described above indicate certain events occurring in certain order, those of ordinary skill in the art having the benefit of this disclosure would recognize that the ordering of certain steps can be modified and that such modifications are in accordance with the given variations. For example, although various implementations have been described as having particular features and/or combinations of components, other implementations are possible having any combination or sub-combination of any features and/or components from any of the implementations described herein.