This application claims priority to the EP Application number 16306425.6, filed Oct. 28, 2016, which is herein incorporated by reference in its entirety and for all purposes.
The present disclosure relates to the field of video processing. More specifically, the present disclosure relates to deblurring of video. More particularly, the methods and devices proposed in the present disclosure are adapted for deblurring User Generated Content (UGC) videos such as hand-held cameras videos.
The present section is intended to introduce the reader to various aspects of art, which may be related to various aspects of the present disclosure that are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present disclosure. Accordingly, it should be understood that these statements are to be read in this light, and not as admissions of prior art.
Videos captured by cameras often contain significant camera shake, causing some frames to be blurry. This is more specifically the case of hand-held camera video and user generated content videos. Thus, the blur is a main issue of low-quality UGC when displayed on a good screen. UGC is usually captured by smartphones or sportcams. A good screen is for example a 4K tv screen.
Several techniques have already been investigated for deblurring such videos contents: using correspondence image, using sharp regions of close frames.
For example, deblurring by example using correspondence proposes to deblur an image thanks to a look-a-like reference sharp image. The main idea of this prior art technique is to estimate a blur kernel, and apply blind-deconvolution. The main problem of this method is that it leads to typical artifacts of deconvolution. Additionally, the blur kernel estimation is roughly-local (the image is for example divided in 3×4 tiles), and the blur variation among the image is forced to be smooth.
Another technique consists in selecting sharp regions in the video, and using these regions to restore blurry regions of the same content in nearby frames. This method is for example described in article “Video Deblurring for Hand-held Cameras Using Patch-based Synthesis”. However this technique mainly works for hand-shaking blur, since it implements only an estimate of a parametric, homography-based motion for each frame as an approximation to the real motion. Even though this technique does not apply a deconvolution, it is needed to locally estimate the blur in order to look for similar patches between the blurry patch, and the sharp region, convolved with the estimated blur. Another problem is that a patch-based texture synthesis approach is used to copy the estimated deblurred pixels into the result frame.
The proposed technique allows reducing prior art drawbacks. More specifically, the proposed technique does not need extensive calculation.
One embodiment of the described general aspects is a method for deblurring a frame (FC) of a video, the video comprising a plurality of frames (F0 . . . FX). The method comprises obtaining (10), from the plurality of frames (F0 . . . FX), a set of neighboring frames of the current frame wherein a global score of sharpness is greater than a predetermined sharpness threshold, called set of selected frames (FS0 . . . FSX). The method further comprises, for at least one of the frames of the set of selected frames (FS0, . . . FSX) and for the current frame (FC), generating (20) of a local blur map, delivering a local blur map of the at least one frame (LBM FS0 . . . LBM FSX) and a local blur map of the current frame (LBMFC) and further comprises performing (30) a local warping of the at least one frame of the set of selected frames (FS0, . . . FSX) and of the local blur map (LBM FS0 . . . LBM FSX) associated with the at least one frame as a function of a local motion estimation between the current frame (FC) and the at least one frame of the set of selected frames (FS0, . . . FSX), delivering at least one locally warped frame (LWFS0, . . . LWFSX) and an associated locally warped blur map (LWBM FS0 . . . LWBM FSX). The method further comprises performing (40) a weighted aggregation of a part of the at least one locally warped frame (LWFS0, . . . LWFSX) and a corresponding part the current frame (FC), based on the at least one locally warped blur map and the local blur map of the current frame (LBMFC).
Another embodiment of the described general aspects is an apparatus for deblurring a frame (FC) of a video, the video comprising a plurality of frames (F0 . . . FX), said apparatus comprising at least one processor and memory, wherein the at least one processor is configured to:
obtain (10), from the plurality of frames (F0 . . . FX), a set of neighboring frames of the current frame wherein a global score of sharpness is greater than a predetermined sharpness threshold, called set of selected frames (FS0 . . . FSX); for at least one of the frames of the set of selected frames (FS0, . . . FSX) and for the current frame (FC), generate (20) of a local blur map, delivering a local blur map of the at least one frame (LBM FS0 . . . LBM FSX) and a local blur map of the current frame (LBMFC); perform (30) a local warping of the at least one frame of the set of selected frames (FS0, . . . FSX) and of the local blur map (LBM FS0 . . . LBM FSX) associated with the at least one frame as a function of a local motion estimation between the current frame (FC) and the at least one frame of the set of selected frames (FS0, . . . FSX), providing at least one locally warped frame (LWFS0, . . . LWFSX) and an associated locally warped blur map (LWBM FS0 . . . LWBM FSX); perform (40) a weighted aggregation of a part of the at least one locally warped frame (LWFS0, . . . LWFSX) and a corresponding part the current frame (FC), based on the at least one locally warped blur map and the local blur map of the current frame (LBMFC).
A non-transitory processor readable medium having stored thereon such a deblurred video is also disclosed.
According to one implementation, the different steps of the method for deblurring a video as described here above are implemented by one or more software programs or software module programs comprising software instructions intended for execution by a data processor of an apparatus for deblurring a video, these software instructions being designed to command the execution of the different steps of the methods according to the present principles.
A computer program is also disclosed that is capable of being executed by a computer or by a data processor, this program comprising instructions to command the execution of the steps of a method for deblurring a video as mentioned here above.
This program can use any programming language whatsoever and be in the form of source code, object code or intermediate code between source code and object code, such as in a partially compiled form or any other desirable form whatsoever.
The information carrier can be any entity or apparatus whatsoever capable of storing the program. For example, the carrier can comprise a storage means such as a ROM, for example a CD ROM or a microelectronic circuit ROM or a magnetic recording means, for example a floppy disk or a hard disk drive.
Again, the information carrier can be a transmissible carrier such as an electrical or optical signal which can be conveyed via an electrical or optical cable, by radio or by other means. The program according to the present principles can be especially uploaded to an Internet type network.
As an alternative, the information carrier can be an integrated circuit into which the program is incorporated, the circuit being adapted to executing or to being used in the execution of the methods in question.
According to one embodiment, the methods/apparatus may be implemented by means of software and/or hardware components. In this respect, the term “module” or “unit” can correspond in this document equally well to a software component and to a hardware component or to a set of hardware and software components.
A software component corresponds to one or more computer programs, one or more sub-programs of a program or more generally to any element of a program or a piece of software capable of implementing a function or a set of functions as described here below for the module concerned. Such a software component is executed by a data processor of a physical entity (terminal, server, etc.) and is capable of accessing hardware resources of this physical entity (memories, recording media, communications buses, input/output electronic boards, user interfaces, etc.).
In the same way, a hardware component corresponds to any element of a hardware unit capable of implementing a function or a set of functions as described here below for the module concerned. It can be a programmable hardware component or a component with an integrated processor for the execution of software, for example an integrated circuit, a smartcard, a memory card, an electronic board for the execution of firmware, etc.
According to the disclosure, a method for deblurring a frame of a video is proposed. The method may be used for every frame of a video, as long as the frame comprises blurred parts. This method uses several other frames of the video, which are closed to the frame to deblur, by extracting useful information from these frames and inserting a part of the useful information in current frame for realizing the deblurring. The frames from which the information is extracted are selected in view of a global index of sharpness, which avoid trying to obtain useful information from frames which are not sufficiently sharp. Once these globally sharp frames are extracted from the video, a specific calculation is done on these globally sharp frames. A key part of the disclosure is to realize local motion estimation between the globally sharp frames and the current frame, in order to realize a king of local resetting of the frames on the basis of the current frame.
Indeed, it is proposed a discrete evaluation of the motion between the current frame and a previous and or a next frame to ease and enhanced the obtaining of a patch, said patch being used to “deblur” a portion of the current frame. Thanks to the proposed technique, the frame is deblurred, in several locations, by using several patches which are constructed using the globally sharp frames. Thus, in the proposed method there is no model assumption about the frame or about the blur itself, and the deblurring is really local.
Thus, the way the deblurring is done allows not only obtaining a sharp frame but also taking advantage of the best parts of the neighboring frames of the video to deblur specific portion of the current frames. This allows obtaining a deblurred frame with the information which is the most accurate in view of the blur of each part of the frame (to deblur) and in view of local motion of this part if view of the other frames. Since no global model is applied on the frame itself, the method allows tuning the deblurring in a more accurate way than prior art solutions, which are mainly based on global blur model. Several embodiments of the proposed method may be implemented.
For example, the way the set of selected frames is obtained may vary in view of the embodiments. In a first embodiment, some neighbor frames are preselected and a global index of sharpness is calculated on these preselected frames. This is explained in detail bellow. In another embodiment, some neighbor frames can be preselected in view of additional information attached to the frames. For example, additional information may be linked to the way the video has been shot. Indeed, user handheld devices such as a smartphone are often equipped with an accelerometer. The accelerometer records can be saved along with the video while the videoing is being shot. From the accelerometer records, one can estimate the movement of the device and therefore determine if a video frame is blurred or may be blur. This information may be used twice: the first use is to easily track the frames which need to be deblurred and the second use is not to use the blurred frames in the set of neighbor frames on which the global score of sharpness is calculated. The calculating resources of the handheld device are therefore saved.
In the following, a specific embodiment is disclosed.
A video is composed by a plurality of frames comprising picture elements. A video made by a handheld device has normally a frame rate from 25 to 30 frames per seconds. In other words, in a user generated video fragment, there are normally 25 to 30 frames. While taking the video, the handheld device is often in an unstable state due to the movement of the user who holds the device. Some frames may be blurred if the device was not motionless held, while some other frames may be less blur if the motion of the device is less intense. In the less blur frames, the picture elements may be identical or similar to the picture elements in the blur frames, in particular when the frames are neighbor frames in a short period of time (less than one second for example).
In this section, a specific embodiment of the invention for deblurring a frame of a video is disclosed in detail in relation with
In this step, only the frames which are globally sharper than the current one can be kept. It is preferred to keep at least two sharpest frame. In another embodiment, a predetermined threshold value of global score of sharpness may be defined. This threshold value should be greater than the global score of sharpness of the current frame. The frames having global scores of sharpness higher than the threshold value can be kept. The quality and quantity of selected frames can therefore be adjusted by adjusting the threshold value. For example, when a deblurring device is of poor calculation capacity, the threshold value can be set as a relatively high value so as to reduce the quantity of selected frames.
In this embodiment, the use of a global blur measure method allows obtaining a general information which helps keeping or rejecting a neighbor frame of the current frame. A detail explanation of how this measure is obtained is detailed bellow. A key point is to use a method which does not need an important amount of resources, in order to keep resources for more resource consuming tasks. Once the set of selected frames is obtained, the local blur maps are calculated and the local warping is done. In this embodiment, the following steps are implemented:
Once the previous steps have been processed, one gets some kind of locally “resettled” frames. Since the motion estimation is done locally, no global motion model has been applied, so a precise sub motion (for a sub region or a portion of the frame) is use instead. Then the weighted aggregation of the current frame and the locally warped selected frames obtained can be processed. This is done by using the current frame, the warped selected frames, the local blur maps of the current frame and the warped local blur maps of the selected frames. This weighted aggregation delivers a deblurred current frame.
The weighted aggregation is carried out on the basis of the pixels of the frames. However, as explained herein above, a patch processing is done. That means that the calculations take into account the portion of the frame around the pixel, for obtaining better results. The size of the patch (i.e. the size of the portion) is parametrically defined. The deblurred current frame can be partially deblurred or totally deblurred. For a partially deblurred image, parts of its pixels result from the weighted aggregation. For a totally deblurred frame, all of its pixels result from the weighted aggregation.
The process for aggregating comprises:
More specifically, in this embodiment, a deblurred pixel ũr(i,j) in a deblurred frame can be computed by a weighted aggregation operation according to the below equation (equation 1):
where:
In the present embodiment:
where:
The previous equation (equation 1) is applied on the blurred pixels of the current frame. Before that, as explained before, distances and blur measures of the patch have to be calculated. In this embodiment, the blur measures br and bn can be computed by the following equations (equation 2, equation 3):
where:
The Euclidian distance dn can be computed by the following equation (equation 4):
where:
The goal of this procedure is to decide whether a neighbor has to be kept in the set of selected frame or not. It is then possible to select only frames which have a good index of sharpness (i.e. frames which are not too blurry). For computing this, a specific procedure is processed. In order to improve the speed of the deblur process, the integral image u is processed. The procedure is done in horizontal and/or vertical directions, to get two measures: Bh and/or Bv. The final measure is simply (when the two measures are calculated):
=max(h,v) equation 5
Let's denote u the original input image. First of all, the image is blurred in the chosen direction, to get a blurry image ũ:
Then the gradient is computed in both u and ũ in the chosen direction
∀(i,j),Du(i,j)=|u(i,j+1)−u(i,j−1)|, equation 7
∀(i,j),Dũ(i,j)=|ũ(i,j+1)−ũ(i,j−1)|. equation 8
In a less precise variant, it may be possible to calculate Du(i,j)=|u(i,j)−u(i,j−1)| instead. However, this calculation is less accurate for curves. Equations 6 and/or 7 allow better narrowing of the curves.
Then one sums the gradient of the image Su and the variation of the gradients Sv. This variation is evaluated only on the absolute differences which have decreased. Let us denote:
Finally, the result is normalized between [0, 1]
Checking whether Su=0 or not allows preventing computational problem for example for flat images. In an embodiment, the local blur metric is computed on the luminance channel, which is basically the average of the three channels and allows speeding the calculations.
In this process, the goal is to evaluate, the more precisely as possible, the local blur in a given frame. That means that one try to evaluate the blurry portions of the frame. This is done by calculating a Multi-resolution Singular Value (MSV) local blur metric. The Multi-resolution Singular Value (MSV) local blur metric is principally based on the Singular Value Decomposition (SVD) of the image u:
where λi(1≤i≤n) are the eigen values in decreasing order and the ei(1≤i≤n) are rank-1 matrices called the eigen-images.
The idea is that the first most significant eigen-images encode low-frequency shape structures while less significant eigen-images encode the image details. Then, to reconstruct a very blurred image, one need only very few eigen-images. On the contrary, one need almost all eigen images to reconstruct a sharp image.
Furthermore for a blurred block, the high frequency details are lost much more significantly in comparison with its low frequency shape structures. Then only the high frequency of the image are studied, through a Haar wavelet transformation. On this high frequency sub-bands, the metric is the average singular value.
As the metric is local, pixel-wise, the description of the code stands for a patch of size κ×κ around the current pixel. Let's us denote by P the current patch.
Singular Values Decomposition
A SVD decomposition is applied on each sub-bands Ps to get the K singular values {λsi}i. Then the local metric associated to the patch P is
Metric Final on Each Pixel
As a metric is obtained for a whole patch, it has to be decided to which pixel this measure will be associated. As the Haar decomposition need a power of two side block, the patch can't be centered on one pixel. Then two solutions are possible:
Warping an image from an example is a difficult task, generally based on motion estimation. Simple known methods may be applied, but they are usually used for a global motion. However, as soon as a precise warping is wanted two main issues rise:
In particular, when the wanting result are used for deblurring locally, one may need a more precise estimation. Therefore, it is proposed a fast local warping, which doesn't lead to motion estimation, but provides really good results in term of warping. The aim of this algorithm is to provide a warped image (or patch) which can be used after in other applications, such as deblurring.
The main idea of is to extract key points of interest in both images (current frame and selected frame), associate them and warp locally around those key points with a really simple motion estimation. Let's denote by u1 the input reference image (current frame) and u2 the second input image (selected frame). The final result will consist of a locally warped image w2 and a mask of valid pixels m2. The whole algorithm may be summarized as follows:
At the end, a locally wrapped image is obtained and it is the result of local transformations which are applied to second image to fit with the reference image.
The disclosure also proposes a device for deblurring a video. The device can be specifically designed for deblurring video or any electronic device comprising non-transitory computer readable medium and at least one processor configured by computer readable instructions stored in the non-transitory computer readable medium to implement any method in the disclosure.
According to an embodiment shown in
The CPU controls the entirety of the device by executing a program loaded in the RAM. The CPU also performs various functions by executing a program(s) (or an application(s)) loaded in the RAM.
The RAM stores various sorts of data and/or a program(s).
The ROM also stores various sorts of data and/or a program(s) (Pg).
The storage device, such as a hard disk drive, a SD card, a USB memory and so forth, also stores various sorts of data and/or a program(s).
The device performs the method for deblurring a video as a result of the CPU executing instructions written in a program(s) loaded in the RAM, the program(s) being read out from the ROM or the storage device and loaded in the RAM.
More specifically, the device can be a server, a computer, a pad, a smartphone or a camera.
The disclosure also relates to a computer program product comprising computer executable program code recorded on a computer readable non-transitory storage medium, the computer executable program code when executed, performing the method for deblurring a video. The computer program product can be recorded on a CD, a hard disk, a flash memory or any other suitable computer readable medium. It can also be downloaded from the Internet and installed in a device so as to deblur a video.
One embodiment of the described general aspects is a method 700 for deblurring a frame (FC) of a video, the video comprising a plurality of frames (F0 . . . FX). The method comprises obtaining (10, 710), from the plurality of frames (F0 . . . FX), a set of neighboring frames of the current frame wherein a global score of sharpness is greater than a predetermined sharpness threshold, called set of selected frames (FS0 . . . FSX). The method further comprises, for at least one of the frames of the set of selected frames (FS0, . . . FSX) and for the current frame (FC), generating (20, 720) of a local blur map, delivering a local blur map of the at least one frame (LBM FS0 . . . LBM FSX) and a local blur map of the current frame (LBMFC) and further comprises performing (30, 730) a local warping of the at least one frame of the set of selected frames (FS0, . . . FSX) and of the local blur map (LBM FS0 . . . LBM FSX) associated with the at least one frame as a function of a local motion estimation between the current frame (FC) and the at least one frame of the set of selected frames (FS0, . . . FSX), delivering at least one locally warped frame (LWFS0, . . . LWFSX) and an associated locally warped blur map (LWBM FS0 . . . LWBM FSX). The method further comprises performing (40, 740) a weighted aggregation of a part of the at least one locally warped frame (LWFS0, . . . LWFSX) and a corresponding part the current frame (FC), based on the at least one locally warped blur map and the local blur map of the current frame (LBMFC).
Another embodiment of the described general aspects is an apparatus 800 for deblurring a frame (FC) of a video, the video comprising a plurality of frames (F0 . . . FX), said apparatus comprising at least one processor (810) and memory (820), wherein the at least one processor is configured to:
obtain (10), from the plurality of frames (F0 . . . FX), a set of neighboring frames of the current frame wherein a global score of sharpness is greater than a predetermined sharpness threshold, called set of selected frames (FS0 . . . FSX); for at least one of the frames of the set of selected frames (FS0, . . . FSX) and for the current frame (FC), generate (20) of a local blur map, delivering a local blur map of the at least one frame (LBM FS0 . . . LBM FSX) and a local blur map of the current frame (LBMFC); perform (30) a local warping of the at least one frame of the set of selected frames (FS0, . . . FSX) and of the local blur map (LBM FS0 . . . LBM FSX) associated with the at least one frame as a function of a local motion estimation between the current frame (FC) and the at least one frame of the set of selected frames (FS0, . . . FSX), providing at least one locally warped frame (LWFS0, . . . LWFSX) and an associated locally warped blur map (LWBM FS0 . . . LWBM FSX); perform (40) a weighted aggregation of a part of the at least one locally warped frame (LWFS0, . . . LWFSX) and a corresponding part the current frame (FC), based on the at least one locally warped blur map and the local blur map of the current frame (LBMFC).
Number | Date | Country | Kind |
---|---|---|---|
16306425.6 | Oct 2016 | EP | regional |