In the following, a multi-video player is disclosed. Specifically, a method for displaying a plurality of videos is disclosed comprising displaying videos with respect to their space and time relationship. Corresponding device is also disclosed.
This section is intended to introduce the reader to various aspects of art, which may be related to various aspects of the present invention that are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present invention. Accordingly, it should be understood that these statements are to be read in this light, and not as admissions of prior art.
Given the ubiquitous camera devices at hand for practically everybody and the facilities to share media in social networks, the amount of video data around us is growing faster and faster. However, navigation systems for these data have not evolved very much over the years. We still have to change from one video to another manually and, when we have thousands of videos from an event, this could become a tough task. Moreover, the average user has difficulties to find what he wants to see and to see all that may interests him.
A method, known to a person skilled in the art, for displaying several videos at the same time in a manner such that the user can explore the collection with the help of the system learned relations is the wall of videos. The viewing area is divided into several windows and a video is displayed in each window such as in the wall of videos from a concert's synchronized videos. However, it is difficult for a user to draw attention on multiple videos at the same time and the setting is static i.e. a same video of a same point of view is continuously displayed in the same window.
Other prior art either neglects the link between the videos, for example using a file navigation system as the file explorer on your favourite OS, or needs some GPS information relative to space and time localisation to present videos according to their relationship.
Thus an enhanced method for displaying, at the same time, multiple videos among a collection of videos, while dynamically visualising some spatial and temporal information between the multiple videos is therefore needed.
A salient idea is to display a principal video at the center of the screen, in high resolution with no deformation and, to dynamically have a secondary linked video (or many secondary videos) appearing with respect to the temporal instant of appearance of the link and with respect to the relative spatial position of this secondary video. The size, resolution and shape of the secondary videos are defined to limit the disturbance of the principal video watching, letting the user know there is a potential link towards another video. The number of secondary videos is limited for a dedicated screen. The user can at every instant, select a secondary video which becomes the principal one, displaying it in the center along with its linked secondary videos.
To that end, a method for displaying a plurality of videos is disclosed. The method comprises:
According to a specific characteristic, the size, structure, position, transparency, overlap of secondary graphical units further depend on a maximum number of graphical units; advantageously, the graphical interface of such a multi-video player is adapted to the size of the display on which the graphical interface is presented.
According to another specific characteristic, the information representative of spatio-temporal connectivity comprises a connectivity score and the size of the secondary graphical unit increases when the connectivity score increases. Advantageously the level of importance of the link is therefore intuitively rendered to a viewer.
According to another specific characteristic, the information representative of spatio-temporal connectivity comprises a relative spatial position between the main and secondary video segment and the relative spatial position between the main graphical unit and the secondary graphical unit corresponds to the relative spatial position between the main and secondary video segment. Such spatial information is based on the video content, for instance by finding a left or right view of the current scene of the main video. Advantageously, further to the temporal and relevance information of the link, the spatial information is therefore rendered to a viewer.
According to another specific characteristic, the information representative of spatio-temporal connectivity comprises a connectivity score and the transparency of a secondary graphical unit decreases when the connectivity score increases.
According to another specific characteristic, the information representative of spatio-temporal connectivity comprises a connectivity score and the overlap of a secondary graphical unit decreases when the connectivity score increases.
According to another specific characteristic, the information representative of spatio-temporal connectivity comprises a connectivity score and the secondary graphical unit is placed closer to the main graphical unit when the connectivity score increases.
According to another specific characteristic, the size, structure and position of the graphical units (main and second) are quantified, thus secondary graphical units appear and disappear at fixed position according to existence and relevance of the links.
According to another specific characteristic, the size, structure, position, transparency, overlap of the at least one secondary graphical unit continuously varies from a first size, structure, position, transparency, overlap corresponding to a first information to a second size, structure, position, transparency, overlap corresponding to a second information. Thus, smooth and dynamic positioning of links on the displayed video (both temporally and spatially) towards other videos of the collection is allowed.
According to others specific characteristics, the main graphical unit is centered in the display, the main graphical unit has a higher size than the at least one secondary graphical unit, the main graphical unit is placed in front of the at least one secondary graphical unit.
A graphical interface for displaying a plurality of videos is disclosed. The graphical interface comprises:
A device for displaying a plurality of videos is disclosed. The device comprises at least one processor configured to:
A device for comprising at least one face is disclosed. The device comprises:
A computer program product comprising program code instructions to execute of the steps of the displaying method according to any of the embodiments and variants disclosed when this program is executed on a computer.
A processor readable medium having stored therein instructions for causing a processor to perform at least the steps of the displaying method according to any of the embodiments and variants disclosed.
In the drawings, an embodiment of the present invention is illustrated. It shows:
According to an exemplary and non-limitative embodiment of the invention, the processing device 1 further comprises a computer program stored in the memory 120. The computer program comprises instructions which, when executed by the processing device 1, in particular by the processor 110, make the processing device 1 carry out the processing method described with reference to
According to exemplary and non-limitative embodiments, the processing device 1 is a device, which belongs to a set comprising:
The method for displaying a collection of videos in an intuitive and user-friendly way enhancing video browsing relies on two main steps:
Then, in a step S20, a video is presented to a user in a main graphical unit along with link information displayed in a first embodiment as a graphical unit (arrows) or in a second embodiment as properties of at least one second graphical unit displaying a secondary video related to the main video through the link.
The skilled in the art will appreciate that the disclosed method may be shared between a screen and a processing device, the processing device generating a user interface with first and second graphical units and sending information to a screen for displaying the generated user interface.
The step S10 of time-space link generation is explained according a particularly interesting embodiment wherein the links are described as homography (including affine transformation) between videos.
In a step S11 of key frame selection, the videos of the collection of videos are represented as a sequence of key frames. Since an exhaustive computation of the relative position of all the frames of each video within the dataset is not tractable, the first step is to detect some key frames in order to reduce the cardinality of the input frames set. The man skilled in the art know that step S11 is common in many video processing applications (scene summary, copy detection . . . ) and many algorithmic solutions exist to achieve it based on clustering plus election of a representative frame, shot detection plus stable frames, motion analysis, etc. Unlike in copy detection, in the context of user generated content of a same event, images within videos that are very different from the previous ones mean that a point of view has been changed or the object has changed. Consequently, a way to overcome the problem of finding an homography between one video and another one is solved by firstly finding temporal segments of the first video and temporal segments of the second one and then secondly by selecting a meaningful key frame for each segment wherein the homography is computed between pair of key frames of videos.
In a step S12 of key points selection, some points of interest are detected and described on each key frame selected in the previous step.
In a step S13 of key points matching, key points of pair of key frames of videos are matched. This is also a common step in image processing (stitching, retrieval . . . ) and one of the techniques is the well-known SIFT algorithm. This step is done on each pair of key frames.
In a step S14 of affine transform calculation, an affine transform is computed for each pair of key frames in different videos of the collection using matched key points. As the man skilled in art knows such affine transform is represented by 2×2 transformation matrix and a constant displacement vector.
In a step S15 of affine transform interpretation an advantageous processing is disclosed to estimate the relative position into space of the pair of key frames using the homography. The problem to solve is to find the relative position of two images (i.e key frames) while coefficients in the homography matrix stands for some translation and some rotation information. The step proposed here allows a fast geometric interpretation of the relative position between two images. As represented on
In a step S16 of link generation, information relative to spatial positioning of 2 videos is defined. That information is stored for each pairs of key frames as a metadata or recorded as a graph. Indeed, the previous process is iterated for all the pairs of key frames, hence all the pairs of video segments. A complete connection graph can then be constructed, giving the relationship between all the video segments. Even those with no direct relation can be estimated through transitivity, i.e. pathfinding through the graph. This graph can be appended to the videos as a metadata file and interpreted by the video player as it will be seen later with
In a step S21, a video MV is displayed in a graphical unit, e.g. by the module 12 i.e. the main video decoder. This video is called main video MV and the graphical unit is called main graphical unit. The main graphical unit is advantageously larger than secondary graphical units, placed in the center of the window display. According to a variant, the main video MV is selected by a user in the video among a collection of videos. According to other variant, the main video is automatically selected for reproduction by an application, for instance by selecting the last video viewed or the video the most viewed. The main video MV is split into temporal segments. Thus, when the main video MV is displayed into the main graphical unit, the module 12 is further configured to determine the temporal segment of the main video that is currently displayed.
In a step S22, a secondary video SVi is displayed in a secondary graphical unit e.g. by the module 18 i.e the secondary video decoder. In the following, a secondary video and the graphical unit presenting the video are indifferently named SV. According to a characteristic particularly advantageous, features of the secondary graphical unit depend on information representative of spatio-temporal connectivity between a pair comprising a main video segment currently displayed and a temporal secondary video segment currently displayed. As detailed with the various embodiments later on detailed, among the features of the graphical units, the size, the structure of the arrangement of graphical units, their relative position, their transparency, or their overlap are disclosed. According to a preferred embodiment, there is more than one secondary graphical unit, for instance from 3 to 12 as represented on
Thus in a substep S23, information representative of spatio-temporal connectivity from the main video is selected. Indeed as explained for the graph, a plurality of links connects the temporal segment of the main video (corresponding to a node in the graph) that is currently displayed with other segments of other videos (corresponding to others nodes) and, in a variant, segments of the main video itself (intra relationship) Advantageously, a score, called connectivity score, is attached to each link according to its relevance. The method is compatible with any metric for estimating the connectivity score. All methods share the hypothesis that a relational graph has been established between the videos of the collection. A graph may determine when, where and in what amount (if this can be quantified) two videos are related (by appearance similarity, action, point of view, semantics etc).
The links are sorted from the highest connectivity score to the lowest connectivity score and the secondary video SVi associated with links of the higher score are selected for visualization by the viewer of the main video MV. According to a first characteristic represented on
The links are continuously sorted for each current temporal segment of the main video, i.e; each time a new each current temporal segment is reached while the main video is decoded. Accordingly from a first segment of the main video to the following second segment of the main video, links are again sorted, a first and second information are selected and dependent graphical units characteristics, associated secondary video are obtained for each first and second segment for instance by the module 16. According to a sixth characteristic, the characteristics such as size, structure, position, transparency, overlap of the secondary graphical unit continuously varies from a first size, structure, position, transparency, overlap corresponding to a first information to a second size, structure, position, transparency, overlap corresponding to a second information thus allowing a smooth transition between 2 settings of graphical units. In a variant of the sixth characteristic, a straight change occurs from the first to the second graphical unit setting.
According to a seventh characteristic as represented on
In a step, not represented in
The implementations described herein may be implemented in, for example, a method or a process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method or a device), the implementation of features discussed may also be implemented in other forms (for example a program). An apparatus may be implemented in, for example, appropriate hardware, software, and firmware. The methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants (“PDAs”), and other devices that facilitate communication of information between end-users.
Implementations of the various processes and features described herein may be embodied in a variety of different equipment or applications, particularly, for example, equipment or applications. Examples of such equipment include an encoder, a decoder, a post-processor processing output from a decoder, a pre-processor providing input to an encoder, a video coder, a video decoder, a video codec, a web server, a set-top box, a laptop, a personal computer, a cell phone, a PDA, and other communication devices. As should be clear, the equipment may be mobile and even installed in a mobile vehicle.
Additionally, the methods may be implemented by instructions being performed by a processor, and such instructions (and/or data values produced by an implementation) may be stored on a processor-readable medium such as, for example, an integrated circuit, a software carrier or other storage device such as, for example, a hard disk, a compact diskette (“CD”), an optical disc (such as, for example, a DVD, often referred to as a digital versatile disc or a digital video disc), a random access memory (“RAM”), or a read-only memory (“ROM”). The instructions may form an application program tangibly embodied on a processor-readable medium. Instructions may be, for example, in hardware, firmware, software, or a combination. Instructions may be found in, for example, an operating system, a separate application, or a combination of the two. A processor may be characterized, therefore, as, for example, both a device configured to carry out a process and a device that includes a processor-readable medium (such as a storage device) having instructions for carrying out a process. Further, a processor-readable medium may store, in addition to or in lieu of instructions, data values produced by an implementation.
As will be evident to one of skill in the art, implementations may produce a variety of signals formatted to carry information that may be, for example, stored or transmitted. The information may include, for example, instructions for performing a method, or data produced by one of the described implementations. For example, a signal may be formatted to carry as data the rules for writing or reading the syntax of a described embodiment, or to carry as data the actual syntax-values written by a described embodiment. Such a signal may be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal. The formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream. The information that the signal carries may be, for example, analog or digital information. The signal may be transmitted over a variety of different wired or wireless links, as is known. The signal may be stored on a processor-readable medium.
A number of implementations have been described. Nevertheless, it will be understood that various modifications may be made. For example, elements of different implementations may be combined, supplemented, modified, or removed to produce other implementations. Additionally, one of ordinary skill will understand that other structures and processes may be substituted for those disclosed and the resulting implementations will perform at least substantially the same function(s), in at least substantially the same way(s), to achieve at least substantially the same result(s) as the implementations disclosed. Accordingly, these and other implementations are contemplated by this application.
Number | Date | Country | Kind |
---|---|---|---|
14306374.1 | Sep 2014 | EP | regional |
14188491.6 | Oct 2014 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2015/070273 | 9/4/2015 | WO | 00 |