The present invention relates generally to the field of video editing. More precisely, the invention relates to a method and a device for editing a video sequence comprising multiple frames.
This section is intended to introduce the reader to various aspects of art, which may be related to various aspects of the present invention that are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present invention. Accordingly, it should be understood that these statements are to be read in this light, and not as admissions of prior art.
Photo editing applications are known wherein a single image (photo) is modified. Several tools are currently available either for the professional artist or for a home user. Among the vast palette of image modification tasks that one can apply we can mention: re-coloring, tinting, blurring, painting/drawing, segmenting/masking associated to per-region effect application, cloning, inpainting, texture insertion, logo insertion, object removal, etc.
With the advent of modern video capturing, processing and streaming systems, huge amounts of video data either captured by an end-user or from professional footage (films, publicity, SFX) are available, and video editing tools are becoming more popular.
Firstly reasoning at the level of a single image, one would like to be able to perform the previously mentioned image modification tasks to a whole video sequence. However, manually editing each frame of a video in such photo editing application is very time-consuming and painful for the operator.
Secondly, reasoning at the level of a video (multiple images), solutions are provided in a number of professional video editing applications, such as Adobe After Effects software, for modifying the video as a whole. Methods are known for propagating information from a first frame to subsequent frames. For instance, the document US2010/0046830 A1, relatively to the so-called tool “RotoBrush” bundled with Adobe After Effects software, describes a method for propagating a segmentation mask. It comprises the initialization usually provided by the user, an automatic segmentation into two classes (foreground/background) of a next frame based on motion estimation and a combined color model extracted from the previous segmented frame and the original frame. However such methods for automatic temporal propagation of segmentation rely on a sequential propagation of information. A user is able to correct the automatic segmentation before continuing with the propagation to subsequent images but a user cannot correct any frame of the video sequence and see the result propagated to any other frame, without applying the process to all of the intermediate frames. Thus, such methods do not provide a display interface where results are displayed simultaneously for several frames of the sequence for interactive multi-frame editing task. In the same domain, WO2012/088477 discloses automatically applying a color or depth information using masks throughout a video sequence. Besides, WO2012/088477 provides a display interface where a series of sequential frames are displayed simultaneously and ready for mask propagation to the subsequent frames via automatic mask fitting method. However, WO2012/088477 fails to disclose arbitrarily modifying a single pixel. WO2012/088477 uses a mask propagation process to determine where to apply or not an image transformation on the whole foreground or moving object mask (or segment). A mask in a first frame is identified as corresponding to the same object in the subsequent frames. However, each pixel of the first frame cannot, at all, be matched with a pixel in the subsequent frames, and this cannot be deduced from the mask propagation process. This makes impossible for the prior art to apply point-wise operations with instantaneous propagation. Pixel-wise image editing tools (paintbrush, erasing, drawing, paint bucket . . . ) cannot be trivially extrapolated to the video with the masks used in WO2012/088477.
Thirdly, solutions are provided in the domain of video editing method for modifying the texture of a 3D object. For instance, EP1498850 discloses automatically rendering an image of the 3D object based on simple texture images, modifying the rendered image and propagating the modification by updating the 3D model. However, this method do not apply to video images obtained from a source wherein, unlike in 3D synthesis, the operator do not have access to a model.
A highly desirable functionality of a video editing application is to be able to edit any image of the sequence at a pixel level and automatically propagate the change to the rest of the sequence.
The invention is directed to a method for editing and visualizing several video frames simultaneously while propagating the changes applied on one image to the others.
In a first aspect, the invention is directed to a method, performed by a processor, for editing a video sequence, comprising the steps of displaying a mother frame of the video sequence; capturing an information representative of a frame editing task applied by a user to the displayed mother frame wherein the frame editing task modifies an information related to at least a pixel of the displayed mother frame; and simultaneously displaying at least one child frame of the video sequence wherein the captured information is temporally propagated wherein the information representative of a frame editing task is propagated to at least a pixel in the at least one child frame corresponding to the at least a pixel of the displayed mother frame based on a motion field between the mother frame and the at least one child frame.
According to a further advantageous characteristic, the method comprises displaying a visual element linking the mother frame to the at least one child frame wherein when the visual element is inactivated by a user input, the temporal propagation of the captured information to the at least one child frame is inactivated.
According to another advantageous characteristic, the at least one child frame comprises any frame of the video sequence, distant from the mother frame from at least one frame.
According to another advantageous characteristic, the captured information is temporally propagated from the mother to the at least one child frame by a motion field of from-the-reference type.
According to another advantageous characteristic, the temporally propagated captured information in the at least one child frame is determined from the mother frame by a motion field of to-the-reference type.
According to a first variant, the information representative of an editing task comprises the location of a pixel of the mother frame on which a pointing element is placed; and the temporally propagated captured information comprises least a location in the at least one child frame which is function of a motion vector associated to the pixel of the mother frame toward the child frame.
According to a second variant, the information representative of an editing task comprises a location of a pixel of the mother frame on which a painting element is placed and comprises a region in the mother frame associated to the painting element; and the temporally propagated captured information comprises at least a location in the at least one child frame which is function of a motion vector associated to the pixel of the mother frame toward the child frame and comprises a region in the at least one child frame which is result of a transformation of the region from the mother to the child frame.
According to a third variant, the information representative of an editing task comprises an ordered list of locations in the mother frame corresponding to an ordered list of vertices of a polygon; and the temporally propagated captured information comprises an ordered list of locations in the at least one child frame, wherein each location in the at least one child frame is function of a motion vector associated to each location in the mother frame toward the child frame.
According to a fourth variant, the information representative of an editing task comprises a color value for a set of pixels of the mother frame; and the temporally propagated captured information comprising a color value of a pixel in the at least one child frame which is function of color value for the set of pixels of the mother frame, wherein the location of the set of pixels in the mother frame is function of a motion vector associated to the pixel in the at least one child frame to the mother frame.
According to a refinement of first to third variants, when the location in the at least one child frame, corresponding to the location in the mother frame, is occluded in the at least one child frame, the captured information is not propagated.
According to a refinement of the fourth variant, when the location of the set of pixels in the mother frame, corresponding to the location of the pixel in the at least one child frame, is occluded in the mother frame, the captured information is not propagated.
According to another advantageous characteristic, the method comprises a step of selecting, in response to a user input, a child frame as a new mother frame replacing the mother frame.
Advantageously, the method for multi-frame video editing allows accelerating the video compositing workflow for the professional user and is compatible with existing technologies. Advantageously, the method for multi-frame video editing is compatible with the implementation in mobile devices or tablets as long as the editing tasks are simple enough, adapted to the home-user and for instance relate to text insertion, object segmentation and per-region filtering, color modification, object removal. These functionalities can then be integrated into mobile applications, such as Technicolor Play, for modifying and sharing personal videos to a social network. Collaborative video editing and compositing between multiple users can profit from these tools as well.
In a second aspect, the invention is directed to a computer-readable storage medium storing program instructions computer-executable to perform the disclosed method.
In a third aspect, the invention is directed to a device comprising at least one processor; a display coupled to the at least one processor; and a memory coupled to the at least one processor, wherein the memory stores program instructions, wherein the program instructions are executable by the at least one processor to perform the disclosed method on the display.
Any characteristic or variant described for the method is compatible with a device intended to process the disclosed methods and with a computer-readable storage medium storing program instructions.
Preferred features of the present invention will now be described, by way of non-limiting example, with reference to the accompanying drawings, in which:
The technology behind such method for multi-frame video editing comprising the propagation process is dense motion estimation. That is, the available motion field that, for each pixel of the reference image, assigns a motion vector that links the position of such pixel in the reference within another image of the video sequence. Such a method for generating a motion field is described in the international application PCT/EP13/050870 filed on Jan. 17, 2013 by the same applicant. The international application describes how to generate an improved dense displacement map, also called motion field, between two frames of the video sequence using a multi-step flow method. Such motion fields are compatible with the motion field used in the present invention. Besides, the international application introduces the concept of from-the-reference and to-the-reference motion fields.
In a variant, such motion fields are pre-computed upstream of the video-editing task and increases the amount of stored data for a video sequence. In another variant, requiring more computing power, such motion fields are computed on-line along with the propagation tasks.
In a first step 10 of displaying mother frame, a frame among the frames of the video sequence is chosen by a user as the mother frame through an input interface. This mother frame is displayed on a display device attached to the processing device implementing the method. The mother frame, also called reference frame, corresponds to the frame to which the editing task will be applied. In the following, the term frame or image will be used indifferently and the term mother frame or reference frame will be used indifferently.
In a second step 20 of capturing an information representative of an editing task, an editing task as those previously detailed is manually applied by the user on the displayed mother frame and captured through an input interface. Variants of the editing tasks (also called editing tools) are hereinafter detailed along with their respective propagation mode for multi-frame image edition. In a variant particularly interesting in the scope the disclosed method, the frame editing task modifies a piece of information related to at least a pixel of the displayed mother frame. The modification of the video sequence according to the modification of the mother frame thus requires a pixel-wise propagation of the modified mother frame.
In a third step 20 of displaying child frames with temporally propagated information, at least a frame among the frames of the video sequence is chosen by a user as the child frames. Advantageously, the child frames are temporally distributed in the video sequence. That is, a child frame comprises any frame of the video sequence, advantageously temporally distant from the mother frame. The child frames are also displayed on the display device attached to the processing device implementing the method. The pixels in a child frame corresponding to the pixels modified in the mother frame are modified accordingly. To that end, the pixels in the child frame corresponding to the modified pixels in the mother frame are determined through dense motion fields between the reference frame and the child frames. In addition to the motion field attached to a given frame and linking it to another one, an occlusion mask indicates the pixels in the current field that are occluded in the other one. Advantageously, the occlusion mask deactivates the temporal propagation when a pixel of the mother frame is occluded in a child frame or when a pixel of the child frame is occluded in the mother frame according to the variant of propagation model of the editing task.
The steps of the method are advantageously performed in parallel, that is mother frame and child frames are displayed simultaneously and once a user enters a modification on the mother frame, the propagation of the modification in the displayed child frames is applied instantaneously. In a variant, the steps of the method are performed sequentially, that is mother frame and child frames are displayed together and once a user has entered a modification on the mother frame, the propagation of the modification in the displayed child frames is applied only after a user enters a command for propagating the modification.
In a refinement, the propagation is also controlled by a user not only for the displayed child frame but also for all the frames of the video sequence. This embodiment is particularly advantageous when the processing of the propagation is time consuming. Thus it can be preferable to apply the editing task first on the fly to the displayed child frames and then, after validation by the user, propagated to all the other frames. Advantageously, the method comprises a further step of rendering the video sequence as a whole wherein the video sequence includes the propagated information relative to the editing task.
According to a variant, any frame of the video sequence is selected by a user as the reference frame or as a child frame during a same editing task. In other words, the modification is applied to any of the images and the change is automatically propagated to the other frames of the video sequence. Thus a user editing a first mother frame may change of mother frame by making focus on one of the displayed child frames through the input interface and commit the focused child frame as the new mother frame. When the user makes focus on such image, it becomes momentarily the reference. This feature raises technical issue of the back and forward propagation of modifications between frames of the video sequence. The conflicts that may occur when applying different changes to different images can be resolved in different ways:
Other conflicts may occur when editing tasks are applied to multiple frames. When propagating such editing tasks to all the frames of the video sequence, one can wonder which frames among the multiple edited frames serve as a reference for the remaining frames of the video. This issue is particularly significant in the variant wherein the propagation to the rest of the video is performed off-line after a user validation. The skilled in the art will appreciate that various embodiments are compatible with the invention: the latest edited frame serves as a reference frame for all the frames of the sequence, the closest edited frame serves as reference frame for a given frame of the sequence. Advantageously, the various embodiments regarding the choice of the reference frame for a propagation based on dense motion field are controlled by a user.
According to other advantageous characteristics, variant embodiments of the editing tasks (also called editing tools) are described. Each editing task is associated to a multi-frame image editing propagation modes.
Images of the sequence are linked to the reference image by means of dense motion fields, computed for example using the multi-step flow method. Advantageously, a long-distance motion and correspondence field estimator is used. This kind of estimator is well adapted to cope with temporal occlusions, illumination variations and with minimal displacement drift error. In a variant, those motion fields are assumed to be pre-computed by another system or algorithm.
The way changes are automatically propagated from the reference frame to the rest can be done in several ways:
The skilled person will also appreciate that as the method can be implemented quite easily without the need for special equipment by devices such as PCs, laptops, tablets, PDA, mobile phone including or not graphic processing unit. According to different variants, features described for the method are being implemented in software module or in hardware module.
Each feature disclosed in the description and (where appropriate) the claims and drawings may be provided independently or in any appropriate combination. Features described as being implemented in software may also be implemented in hardware, and vice versa. Reference numerals appearing in the claims are by way of illustration only and shall have no limiting effect on the scope of the claims.
In another aspect of the invention, the program instructions may be provided to the device 400 via any suitable computer-readable storage medium. A computer readable storage medium can take the form of a computer readable program product embodied in one or more computer readable medium(s) and having computer readable program code embodied thereon that is executable by a computer. A computer readable storage medium as used herein is considered a non-transitory storage medium given the inherent capability to store the information therein as well as the inherent capability to provide retrieval of the information therefrom. A computer readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. It is to be appreciated that the following, while providing more specific examples of computer readable storage mediums to which the present principles can be applied, is merely an illustrative and not exhaustive listing as is readily appreciated by one of ordinary skill in the art: a portable computer diskette; a hard disk; a random access memory (RAM); a read-only memory (ROM); an erasable programmable read-only memory (EPROM or Flash memory); a portable compact disc read-only memory (CD-ROM); an optical storage device; a magnetic storage device; or any suitable combination of the foregoing.”
Naturally, the invention is not limited to the embodiments previously described. In particular, if various embodiments of editing tools along with their multi-frame propagation model are described, the invention is not limited to the described tools. In particular, the skilled in the art would easily generalize from described embodiments, a propagation model for other editing tools known in photo edition.
Number | Date | Country | Kind |
---|---|---|---|
13305699.4 | May 2013 | EP | regional |
13305951.9 | Jul 2013 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2014/060738 | 5/23/2014 | WO | 00 |