This application claims priority from European Application No. 15306839.0, entitled “Method for Generating a User Interface Presenting a Plurality of Videos”, filed Nov. 19, 2015, the contents of which are hereby incorporated by reference in its entirety.
The present principles relate generally to the field of the presentation of videos from a plurality of videos related to the same event. More specifically, the present principles concern a method for generating a user interface presenting a plurality of synchronized videos on a display device and a device for implementing said method.
This section is intended to introduce the reader to various aspects of art, which may be related to various aspects of the present principles that are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present principles. Accordingly, it should be understood that these statements are to be read in this light, and not as admissions of prior art.
Today, with the availability of more and more digital devices, more and more video films are captured by people. Besides this explosion of video sharing in social networks, multiple users provide a potentially large number of videos of the same event. All these pieces of video content constitute a database wherein raw video material are numerous. Thus presenting these pieces of video content issued from crowd media creation is a challenge since the capacity of a user of viewing videos in a display device is limited.
A first technique for presenting videos is to create a mosaic where the display screen is split into as many units as video sources. However this technique is adapted for a limited number of videos so as to ease the presentation and the switching by a user to different video sources. This technique, inspired from thumbnail image display, does not scale when the size of the database increases since the user must dig into the huge database and the display screen is not extendible.
A second technique for presenting videos, which is usually used in video compositing interface, is to create a film strip where videos captured at a same instant are displayed. To that end, an absolute time stamp is defined, that can be the beginning of the event, and the videos are synchronized along this time stamp. The number of displayed videos in the second technique is thus reduced compared to the first technique: only the videos capturing a same scene at a same time but from different point of views are displayed. However the number of videos for a determined time stamp may vary dynamically and still may be very large.
The present principles tend to solve the issue of simultaneous synchronized videos rendering when a large number of contributions are to be processed and when real multiple point of views are to be presented to the user ensuring the best viewing experience.
The present principles provide a user interface presenting a plurality of temporally synchronized videos on a display device, wherein the user interface comprises a scalable number of video units in which videos are arranged according to their quality. To that end, a method, performed by a computer, is disclosed that generates a user interface presenting a plurality of temporally synchronized videos on a display device, wherein the user interface comprises a scalable number of video units. The method comprises:
According to various characteristics, either taken alone or in combination:
Advantageously, the plurality of video units comprise a first (or main) video unit and secondary video units. A reference video being a video with the highest value representative of video quality among the plurality of videos, or a video selected by a user, is displayed the first video unit and serves as reference for instance for temporal alignment of video and for content similarity. Then secondary (or auxiliary) videos according to their quality, contributor fame, similarity are displayed in secondary video units. Advantageously, thanks to one main video window and auxiliary videos (or still pictures) windows such embodiment allows easier content browsing.
According to a second aspect, a device is disclosed that comprises a processor configured to produce a video presentation user interface (UI) for a display device.
In a variant, the device comprises:
According to a specific embodiment, the device belongs to a set comprising:
According to another aspect, the present principles are directed to a graphics processing unit comprising means for executing code instructions for performing the method previously described.
According to a third aspect, a computer program product comprising program code instructions to execute the steps of the UI generating method in any of its variants when this program is executed on a computer is disclosed.
According to a fourth aspect, a processor readable medium is disclosed that has stored therein instructions for causing a processor to perform at least generating a user interface presenting a plurality of temporally synchronized videos for a display device, wherein the user interface comprises a scalable number of video units; obtaining a value representative of video quality from each of the plurality of videos; and selecting videos with the highest values representative of video quality among the plurality of videos for display in each of the video units.
According to a fifth aspect, a non-transitory program storage device is disclosed that is readable by a computer, tangibly embodies a program of instructions executable by the computer to perform a method for at least generating a user interface presenting a plurality of temporally synchronized videos for a display device, wherein the user interface comprises a scalable number of video units; obtaining a value representative of video quality from each of the plurality of videos; and selecting videos with the highest values representative of video quality among the plurality of videos for display in each of the video units.
While not explicitly described, the present embodiments may be employed in any combination or sub-combination. For example, the present embodiments are not limited to the described arrangement of videos units.
Besides, any characteristic or embodiment described for the UI generating method is compatible with a device intended to process the disclosed method and with a computer-readable storage medium storing program instructions.
Preferred features of the present principles will now be described, by way of non-limiting example, with reference to the accompanying drawings, in which:
A salient idea of the present principles is to present a subset of temporally synchronized videos in a video wall where the subset is selected according to information representative of video quality with respect to video parameters such as light, movement or saliency.
According to an exemplary and non-limitative embodiment of the present principles, the processing device 1 further comprises a computer program stored in the memory 120. The computer program comprises instructions which, when executed by the processing device 1, in particular by the processor 110, make the processing device 1 carry out the processing method described with reference to
According to exemplary and non-limitative embodiments, the processing device 1 is a device, which belongs to a set comprising:
The skilled in the art will appreciate that the present principles as described in the preferred embodiments are advantageously computed using a Graphics processing unit (GPU) on a graphics processing board for instance with regards to the decoders or to the obtaining of video parameters.
The described method is advantageously well adapted to a system or service allowing the ingest of various videos of a same event. As previously exposed, the videos are simultaneously rendered on a display to a user so as to ensure the best viewing experience even in case of a large number of videos with multiple viewpoints.
According to the present principles, videos for display are temporally synchronized.
In a variant of the representation of videos by segments as illustrated on
In yet another variant, a video is divided into temporally aligned chunk of equal time length and a timestamp (for instance 0, 200, 400) is obtained for each segment representing the video. In the following, any subdivision of a video is treated as a video. For instance, considering chunk of time length 200, video GoPro1_1 is divided into:
In a first step S10, a value representative of video quality is obtained from each video of the plurality of videos. To that end, each video of the database D is processed to extract a plurality of video parameters. According to an exemplary and non-limitative embodiments, parameters belong to a set comprising:
According to the different parameters, parameters values are obtained for a frame of the video (such as blur) or globally for the video sequence (such as movement or spatial resolution that is set for a capture). Thus in non-limiting example, parameters are determined either for a video, or each frame of a video, or at regular frame interval of a video, or a frame (being the first frame or a key frame) representative of a video. In other examples, for a given parameter, such as salience, a set of values is defined at regular time interval for the whole video. Advantageously, a global value for the given video parameter is then obtained for instance using the mean value of each values along the time length of the video. In others words, a value is obtained on the fly each N frames and values are integrated for the P frames of the whole video by computing the average parameter value for P/N frames.
The detailed operation of such parameters extraction as disclosed in the non-limiting above examples is out of scope of the present principles. Besides, the skilled in the art will appreciate that such parameters value might be pre-processed off-line and stored in the database D along with the video and the temporal information.
According to a particularly advantageous characteristic, a value representative of video quality is a weighed mean value integrating the values of several video parameters. As the value of the different parameters, video quality value is defined for the whole video (or for each temporal chunk of a video). According to a variant, the weighting of the different video parameters to create the video quality value is defined as system value. In another variant, the weighting is defined by the user through preference settings. In a preferred variant, the higher the quality value is, the higher is the quality of the video with respect to the defined parameters and weights.
According to another particular characteristic, a value representative of video contributor fame is obtained for each video of the plurality of videos. An alternative or complementary information attached to the videos can be the name of the contributor and its fames. Saying that his fame is a system information recovered thanks to service users feedbacks or social networks data. In that case the best videos are considered as the ones uploaded by the most famous contributor. In another variant, the fame is defined locally by the user: each time the user selects a video as the reference, the contributor local fame value is incremented. In a preferred variant, the higher the contributor fame value is, the most likely the videos are selected for display.
According to another particular characteristic, a value representative of video similarity is obtained from two videos of the plurality of videos. The goal here is to avoid having a wall composed of videos that have too similar viewpoints, since the user may want to exploit the richness of various viewpoints described hereafter. Advantageously, a video similarity value is obtained by determining the geometric transformation between a frame of a first video and the corresponding frame of a second video wherein the 2 corresponding frames are temporally aligned with respect to a time reference. The skilled in the art knows, that a geometric transformation is classically determined by extracting points of interest in both frames, computing an image descriptor such as SIFT (as described by Lowe, D. G. in “Distinctive image features from scale-invariant keypoints” in International journal of computer vision 2004, vol 60, pp 91-110) and estimating a geometric homography between frames through a Ransac regression. An homography is usually represented by a 3×3 matrix. The matrix H representing the transformation between a frame Xi of a video i and frame Xj of a video j is noted Hij. A point xi of frame Xi corresponding to a point xj of frame Xj is represented by the following equation xi=Hij×xj. Then, in case were an homography is estimated, a value of the similarity metric is defined for instance as the inverse of frobenius norm of the transformation matrix. However the present principles are compatible with any other norm applied to the matrix. The idea is that the larger the transformation, the lower the similarity value. In the case where, the frame are so distinct that an homography cannot be estimated, similarity value is set to zero. According to a particular variant, a transformation matrix is obtained for frames of the first and second videos at regular interval and a similarity value is obtained by integrating (as for quality parameters) the similarity values for the whole video.
In the following, this metric will be referred to as the geometric similarity metric or similarity metric. This metric is actually stored in a similarity table. For instance, a similarity table have video GoPro1_1, GoPro1_2, GoPro2_1, GoPro2_2, GoPro3_1, GoPro3_2 in column and row, the similarity value for the pair GoPro1_2, GoPro1_1 which are temporally synchronous is stored in the table at (GoPro1_2, GoPro1_1 ). Advantageously, the similarity value of videos not aligned or for a same video is set to 0. In a variant, the similarity value of videos not aligned or for a same video is set to a negative value (for instance −1).
Where Hwx-yz represents the similarity value between considered videos. Since H21−11=H11−21, ie the similarity is commutative, advantageously half of the table is filled as shown above.
Advantageously, videos are pre-processed to obtain the described values (quality, similarity, contributor fame) and the values are stored with the videos in the database D.
In a second step S20, a user interface is generated. The user interface is designed for presenting a plurality of videos on a display device. The user interface comprises a scalable number of video units as presented on
The number, size, aspect ratio and position of the video units is defined according to rendering device capabilities to ensure maximal viewing experience on the display device. According to a preferred characteristic, the video wall rendering is made of one main unit 1 and a series of smaller video units 2-18 as illustrated on
In a variant, the video units of the wall are all feed with videos. In another variant, video units with lower numbers (1 to 17 on
In a third step S30, the videos with the highest quality values among the plurality of videos are selected for display in each of the video units. The quality values at a given time stamp with respect to the reference time are ordered in decreasing order.
In a variant where quality and contributor fame metrics are combined, the selection is performed by choosing, among the set of available videos at a given time stamp, videos with the highest values representative of video quality and with the highest values representative of video contributor fame. A weighted linear combination is computed from the values representative of video quality and the values representative of video contributor fame. For instance, for a VideoCurrenti among the videos GoPro1_1, GoPro1_2, GoPro2_1, GoPro2_2, GoPro3_1, GoPro3_2
Scorei=QualityMetric(VideoCurrenti)+α FameMetric(VideoCurrenti)
Where α is a weight defined by the system or by the user through preference settings interface, this weight explicitly controls the importance of the contributor with respect to the quality in the final composition of the wall. In a variant, it is advantageous to consider persistency of a contributor inside a same video unit in order to allow to follow the same contributor uploads more easily. Thus, for instance GoPro1_1, GoPro1_2 captured by the same device (here associated to a contributor) should be presented in a same video unit for instance video unit 1.
In another variant where quality and similarity metrics are combined, the selection is performed by choosing, among the set of available videos at a given time stamp, the highest quality videos that are not too similar so as to select a different point of view of a same scene, since videos are temporally synchronized. A weighted linear combination of these two factors can be used to make a decision, with the following iterative process:
Of course, this mechanism applies when other metrics combination is considered for instance with contributor fame. Another weight is defined that controls the contributor fame in a score combining intrinsic quality metric, fame metric and possibly similarity metric.
Once the video to display are selected in S30, in an optional rendering step (not shown) the Technicolor video reframing technology is applied in order to automatically adapt the video and crop to the size and aspect ratio of the video wall.
Besides, the step S30 is iterated for a dynamic rendering of the video wall.
According to a first variant, the rendering is updated each time the reference video changes. Indeed, the reference video displayed in the main video unit 1 does not change until it ends or until the user selects one video inside the wall as described above. When the reference video changes, the whole video selection process of step 30 is iterated, a new distribution of videos, including the reference video, to display is defined and presented through the user interface. Besides, in the variant with chunks of fixed duration, the rendering is updated each time a new chunk GoPro1_1 _1 of the reference video GoPro1_1 is reached. The reference video segment frequency controls the secondary units update since the metrics of the synchronized secondary videos may vary thus producing a new video distribution to display. Advantageously, as previously described, one can consider persistency of contributors inside the same video unit in order to enhance the monitoring of video uploads of a same contributor. When the last video unit displays a sequence of still key frames, the unit is refreshed continuously at a frequency defined by the system, typically the segment length, or by the user through a preference parameter.
According to a second variant of the dynamic rendering, the rendering is updated each time the secondary video ends. When one secondary video ends, the whole distribution is updated except for the reference video. Again, the persistency of a contributor inside a same video unit constitutes an interesting variant resulting in a possible update only of the unit where the secondary video has ended.
According to a third variant of the dynamic rendering, the user interface is configured so that a user is able to select a reference video among the secondary videos. The rendering is then updated each time a user changes the reference video.
In a fourth step S40, the generated user interface and the selected videos are output or send to a display device for rendering to u user.
The implementations described herein may be implemented in, for example, a method or a process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method or a device), the implementation of features discussed may also be implemented in other forms (for example a program). An apparatus may be implemented in, for example, appropriate hardware, software, and firmware. The methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants (“PDAs”), and other devices that facilitate communication of information between end-users.
Implementations of the various processes and features described herein may be embodied in a variety of different equipment or applications, particularly, for example, equipment or applications. Examples of such equipment include an encoder, a decoder, a post-processor processing output from a decoder, a pre-processor providing input to an encoder, a video coder, a video decoder, a video codec, a web server, a set-top box, a laptop, a personal computer, a cell phone, a PDA, and other communication devices. As should be clear, the equipment may be mobile and even installed in a mobile vehicle.
Additionally, the methods may be implemented by instructions being performed by a processor, and such instructions (and/or data values produced by an implementation) may be stored on a processor-readable medium such as, for example, an integrated circuit, a software carrier or other storage device such as, for example, a hard disk, a compact diskette (“CD”), an optical disc (such as, for example, a DVD, often referred to as a digital versatile disc or a digital video disc), a random access memory (“RAM”), or a read-only memory (“ROM”). The instructions may form an application program tangibly embodied on a processor-readable medium. Instructions may be, for example, in hardware, firmware, software, or a combination. Instructions may be found in, for example, an operating system, a separate application, or a combination of the two. A processor may be characterized, therefore, as, for example, both a device configured to carry out a process and a device that includes a processor-readable medium (such as a storage device) having instructions for carrying out a process. Further, a processor-readable medium may store, in addition to or in lieu of instructions, data values produced by an implementation.
As will be evident to one of skill in the art, implementations may produce a variety of signals formatted to carry information that may be, for example, stored or transmitted. The information may include, for example, instructions for performing a method, or data produced by one of the described implementations. For example, a signal may be formatted to carry as data the rules for writing or reading the syntax of a described embodiment, or to carry as data the actual syntax-values written by a described embodiment. Such a signal may be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal. The formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream. The information that the signal carries may be, for example, analog or digital information. The signal may be transmitted over a variety of different wired or wireless links, as is known. The signal may be stored on a processor-readable medium.
A number of implementations have been described. Nevertheless, it will be understood that various modifications may be made. For example, elements of different implementations may be combined, supplemented, modified, or removed to produce other implementations. Additionally, one of ordinary skill will understand that other structures and processes may be substituted for those disclosed and the resulting implementations will perform at least substantially the same function(s), in at least substantially the same way(s), to achieve at least substantially the same result(s) as the implementations disclosed. Accordingly, these and other implementations are contemplated by this application.
Number | Date | Country | Kind |
---|---|---|---|
15306839.0 | Nov 2015 | EP | regional |