METHOD AND DEVICE FOR RESTORING VIDEO

BACKGROUND
1. Field

One or more embodiments relate to a method and device for restoring a video.

2. Description of Related Art

Video super-resolution (VSR) aims to restore a high-resolution (HR) video from a given low-resolution (LR) video. In various situations, given videos may be blurred or qualitatively deteriorated due to camera shakes or object motions. This is referred to as motion blur.

In the field of video prediction, a dynamic filter may be used to restore an HR video from an LR video with motion blur.

The above description has been possessed or acquired by the inventor(s) in the course of conceiving the present disclosure and is not necessarily an art publicly known before the present application is filed.

SUMMARY

Aspects provide technology for warping a plurality of images included in a video based on optical flow information.

Aspects provide technology for restoring a target image to be restored by filtering the warped images obtained by warping the plurality of the blurred images in the video.

Aspects provide technology for restoring a target image to be restored based on optical flow information.

However, technical aspects are not limited to the foregoing aspects, and there may be other technical aspects.

According to an aspect, there is provided a method of restoring a video including obtaining a plurality of blurred images due to object motion or camera motion captured in the video. The method may include generating a first kernel including degradation information on the plurality of blurred images, optical flow information on the object motion direction or camera motion direction, and first motion information of the plurality of blurred images included in the video. The method may include generating dynamic filtering information to filter the plurality of blurred images, based on the plurality of blurred images, the first motion information, the optical flow information, and the first kernel.

The generating of the dynamic filtering information may include generating second motion information to adjust the optical flow information, based on the plurality of blurred images and the first motion information. The generating of the dynamic filtering information may include adjusting the optical flow information, based on the second motion information, the optical flow information, and the first kernel. The generating of the dynamic filtering information may include generating distorted information on how distorted each of the remaining images excluding the target image is with respect to the target image, based on the second motion information, the optical flow information, and the first kernel. The dynamic filtering information may include adjusted optical flow information and the distorted information.

The restoring of the target image may include restoring the target image by filtering the plurality of blurred images, based on the adjusted optical flow information and the distorted information.

The restoring of the target image may further include warping each of the plurality of blurred images, based on the adjusted optical flow information. The restoring of the target image may further include generating a second kernel to filter warped images obtained by warping the each of the plurality of blurred images, based on the distorted information. The restoring of the target image may further include restoring the target image by filtering the warped images using the second kernel.

The restoring of the target image by filtering the warped images using the second kernel may include restoring low-frequency components of the target image.

The method may further include restoring high-frequency components of the target image, based on the distorted information.

According to another aspect, there is provided an electronic device including a memory including instructions and a processor electrically connected to the memory and configured to execute the instructions. The instructions, when executed by the processor, may cause the electronic device to obtain a plurality of blurred images due to object motion or camera motion captured in a video. The instructions, when executed by the processor, may cause the electronic device to generate a first kernel including degradation information on the plurality of blurred images, optical flow information on object motion directions, and first motion information of the each of the plurality of blurred images, based on the plurality of blurred images included in the video. The instructions, when executed by the processor, may cause the electronic device to generate dynamic filtering information to filter the plurality of blurred images, based on the plurality of blurred images, the first motion information, the optical flow information, and the first kernel. The instructions, when executed by the processor, may cause the electronic device to restore a target image to be restored among the plurality of blurred images, based on the dynamic filtering information and the plurality of blurred images.

The instructions, when executed by the processor, may cause the electronic device to generate second motion information to adjust the optical flow information, based on the plurality of blurred images and the first motion information. The instructions, when executed by the processor, may cause the electronic device to adjust the optical flow information, based on the second motion information, the optical flow information, and the first kernel. The instructions, when executed by the processor, may cause the electronic device to generate distorted information on how distorted each of the remaining images excluding the target image is with respect to the target image, based on the second motion information, the optical flow information, and the first kernel.

The instructions, when executed by the processor, may cause the electronic device to restore the target image by filtering the plurality of blurred images, based on adjusted optical flow information and the distorted information.

The instructions, when executed by the processor, may cause the electronic device to warp the each of the plurality of blurred images, based on the adjusted optical flow information. The instructions, when executed by the processor, may cause the electronic device to generate a second kernel to filter warped images obtained by warping the each of the plurality of blurred images, based on the distorted information. The instructions, when executed by the processor, may cause the electronic device to restore the target image by filtering the warped images using the second kernel.

The instructions, when executed by the processor, may cause the electronic device to restore low-frequency components of the target image.

The instructions, when executed by the processor, may cause the electronic device to restore high-frequency components of the target image, based on the distorted information.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram illustrating a device for restoring a video, according to an embodiment.

FIG. 2 is a diagram illustrating the internal structure of a first neural network for generating a degraded image due to an optical flow, according to an embodiment.

FIG. 3 is a diagram illustrating the internal structure of an optical flow estimator included in a first neural network, according to an embodiment.

FIG. 4 is a diagram illustrating the internal structure of a second neural network for restoring a video, according to an embodiment.

FIG. 5 is a diagram illustrating the internal structure of an optical flow estimator included in a second neural network, according to an embodiment.

FIG. 6 is a diagram illustrating the structure of a flow-guided dynamic filter (FGDF), according to an embodiment.

FIG. 7 is a flowchart illustrating a method of restoring a video, according to an embodiment.

FIG. 8 is a block diagram illustrating a video restoration unit according to an embodiment.

DETAILED DESCRIPTION

The following detailed structural or functional description is provided as an example only and various alterations and modifications may be made to the examples. Here, the embodiments are not construed as limited to the disclosure and should be understood to include all changes, equivalents, and replacements within the idea and the technical scope of the disclosure.

Terms, such as first, second, and the like, may be used herein to describe various components. Each of these terminologies is not used to define an essence, order or sequence of a corresponding component but used merely to distinguish the corresponding component from other component(s). For example, a first component may be referred to as a second component, and similarly the second component may also be referred to as the first component.

It should be noted that if it is described that one component is “connected”, “coupled”, or “joined” to another component, a third component may be “connected”, “coupled”, and “joined” between the first and second components, although the first component may be directly connected, coupled, or joined to the second component.

As used herein, the singular forms “a”, “an”, and “the” include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises/including” and/or “includes/including” when used herein, specify the presence of stated features, integers, operations, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, operations, operations, elements, components and/or groups thereof.

Unless otherwise defined, all terms, including technical and scientific terms, used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure pertains. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

Hereinafter, embodiments are described in detail with reference to the accompanying drawings. When describing the embodiments with reference to the accompanying drawings, like reference numerals refer to like elements and a repeated description related thereto will be omitted.

FIG. 1 is a block diagram illustrating a device for restoring a video, according to an embodiment.

Referring to FIG. 1, according to an embodiment, a video restoration unit 100 may include a first neural network 140 and/or a second neural network 170.

The video restoration unit 100 may be a unit for restoring a video. The video restoration unit 100 may receive a video 110 to restore a high-quality (or high-resolution) video from a low-quality (or low-resolution) video. The video 110 may include a plurality of blurred images. The plurality of blurred images may include a series of low-resolution images. For example, the plurality of blurred images may be images including motion blur. The video restoration unit 100 may perform super-resolution on the plurality of blurred images included in the video 110.

The super-resolution refers to video processing that transforms an input image with low spatial resolution into an input image with high spatial resolution. The video restoration unit 100 may perform super-resolution on the video 110 or the plurality of blurred images included in the video 110 by using a neural network (e.g., the first neural network 140 or the second neural network 170).

The video 110 may include a plurality of temporal portions. One temporal portion corresponds to a group of frames during one certain temporal duration, that is, one video clip which is usually a part (the certain temporal duration) of a longer recording. The video may include scenes of multiple temporal durations over time. The first neural network 140 may receive the video 110. The video 110 input to the first neural network 140 may include the plurality of blurred images due to the motions of the captured object and/or a plurality of sharp images corresponding to the plurality of blurred images. The plurality of blurred images may be degraded images due to the motions of the captured object, and the plurality of sharp images may be sharp images without object motions. The plurality of blurred images and the plurality of sharp images may each correspond to the same point in time.

The first neural network 140 may receive the video 110 and may generate a degraded image due to an optical flow. The first neural network 140 may receive the video 110 and may be trained to generate a degraded image due to an optical flow. The internal structure of the first neural network 140 and the training to generate a degraded image are described in detail with reference to FIG. 2.

The video restoration unit 100 may be pre-trained such that the first neural network 140 generates a degraded image. The first neural network 140 pre-trained to generate a degraded image may generate a first kernel (e.g., a first kernel K^Dof FIG. 2) including degradation information on blurred images X, optical flow information (e.g., optical flow information f^D,Mof FIG. 2), and/or first motion information (e.g., first motion information F^D,Mof FIG. 2) of the plurality of the blurred images in the video 110. The video restoration unit 100 may transmit the first motion information F^D,M, the optical flow information f^D,M, and/or the first kernel K^Dgenerated from the first neural network 140 to the second neural network 170. The video restoration unit 100 may use the first motion information F^D,M, the optical flow information f^D,M, and/or the first kernel K^Dgenerated from the first neural network 140 to train the second neural network 170.

The video 110 input to the second neural network 170 may be the same video input to the first neural network 140. The second neural network 170 may receive the first motion information F^D,M, the optical flow information f^D,M, and/or the first kernel K^Dgenerated from the first neural network 140 and the video 110 and may restore the video 110. The internal structure of the second neural network 170 and the training to restore a video are described in detail with reference to FIG. 4.

A neural network (or an artificial neural network) may include a statistical learning algorithm that mimics biological nerves in cognitive science and machine learning. Neural networks may refer to general models having problem-solving capabilities, in which artificial neurons (nodes) forming a network through synapse coupling change the intensity of the connection between synapses through training.

The neural network (e.g., the first neural network 140 and/or the second neural network 170) may include a deep neural network. The neural network may include a convolutional neural network (CNN), a recurrent neural network (RNN), a perceptron, a multilayer perceptron, a feed forward (FF), a radial basis network (RBF), a deep feed forward (DFF), a long short-term memory (LSTM), a gated recurrent unit (GRU), an auto encoder (AE), a variational auto-encoder (VAE), a denoising auto-encoder (DAE), a sparse auto-encoder (SAE), a Markov chain (MC), a Hopfield network (HN), a Boltzmann machine (BM), a restricted Boltzmann machine (RBM), a deep belief network (DBN), a deep convolutional network (DCN), a deconvolutional network (DN), a deep convolutional inverse graphics network (DCIGN), a generative adversarial network (GAN), a liquid state machine (LSM), an extreme learning machine (ELM), an echo state network (ESN), a deep residual network (DRN), a differentiable neural computer (DNC), a neural turning machine (NTM), a capsule network (CN), a Kohonen network (KN), a visual geometry group (VGG) network, and an attention network (AN).

FIG. 2 is a diagram illustrating the internal structure of a first neural network for generating a degraded image due to an optical flow, according to an embodiment.

Referring to FIG. 2, according to an embodiment, the first neural network 140 may be pre-trained to train a second neural network (e.g., the second neural network 170 of FIG. 1). The first neural network 140 may simultaneously perform video super-resolution and deblurring (VSRDB). The first neural network 140 may be a part of a degradation learning network of FMA-net, which is the architecture of a VSRDB framework. The first neural network 140 may predict motion-aware spatiotemporally variant degradation.

The first neural network 140 may receive a plurality of blurred images X included in the video 110. The plurality of blurred images X may be expressed by X={X_c−N:c+N}∈R^T×H×W×3T is 2N+1 and may be the number of the plurality of blurred images X input to the first neural network 140. A target image X_c111 among the plurality of blurred images X may be an N+1th blurred image of a series of the input 2N+1 blurred images. The target image X_cmay be a center blurred image among a series of input image layers. The target image X_cmay be a blurred image to be restored. A blurred image X_C−Nmay include blurred images temporally prior to the target image X_c. A blurred image X_C+Nmay include blurred images temporally later than the target image X_c. The blurred images excluding the target image X_cmay be the blurred images X_C−Nand X_C+N(or the remaining images X_C−Nand X_C+N).

The first neural network 140 may include a feature extractor 210, an optical flow estimator 250, and/or a degraded image generator 270.

The feature extractor 210 may obtain initial motion information F^D,0of the video 110, based on the video 110 input to the first neural network 140. For example, the feature extractor 210 may extract the initial motion information F^D,0of the blurred images X. The feature extractor 210 may include a three-dimensional residual-in-residual dense block (3D RRDB). Immediately obtaining a degraded image {circumflex over (X)}c 271 based on the initial motion information F^D,0may be unstable for obtaining a degraded image and may incur a great deal of computational costs. The feature extractor 210 may transmit (or relay) the extracted initial motion information F^D,0to the optical flow estimator 250.

The optical flow estimator 250 may generate optical flow information f^D,Mon an object motion direction or camera motion direction included in the blurred images X of the video 110, based on the received (or relayed) initial motion information F^D,0from the feature extractor 210.

The optical flow estimator 250 may include a plurality of feature refinement with multi-attention (FRMA) blocks FRMA^D,1to FRMA^D,M. The optical flow estimator 250 may input the initial motion information F^D,0to a first FRMA block FRMA^D,1among the plurality of FRMA blocks FRMA^D,1to FRMA^D,M. For example, the optical flow estimator 250 may include M FRMA blocks. The FRMA blocks FRMA^D,1to FRMA^D,Mmay iteratively refine initial optical flow information f,0 and/or the initial motion information F^D,0in a residual learning manner. The FRMA blocks FRMA^D,1to FRMA^D,Mmay learn the initial optical flow information f^D,0and an occlusion mask corresponding to the initial optical flow information f^D,0.

The video restoration unit 100 may acquire flow diversity by learning the FRMA blocks FRMA^D,1to FRMA^D,Mbased on the initial optical flow information f^D,0and the occlusion mask corresponding to the initial optical flow information f^D,0. The video restoration unit 100 may learn the one-to-many relationship between the pixel of the target image X_Cand the pixels of the remaining images X_C−Nand X_C+N, based on the initial optical flow information f^D,0and the occlusion mask corresponding to the initial optical flow information f^D,0. The video restoration unit 100 may learn the relationship between the blurred images X where their pixel information has been spread due to light accumulation, based on the initial optical flow information f^D,0and the occlusion mask corresponding to the initial optical flow information f^D,0. The internal structure of the FRMA blocks FRMA^D,1to FRMA^D,Mand the process of the initial optical flow information f^D,0and/or the initial motion information F^D,0being iteratively refined by the FRMA blocks FRMA^D,1to FRMA^D,Mare described with reference to FIG. 3.

The optical flow estimator 250 may acquire the optical flow information f^D,M, the first motion information F^D,M, and/or distorted information F_W^D,Mbased on initial distorted information F_W^D,0on how distorted the initial optical flow information f^D,0, the initial motion information F^D,0, and the target image X_Care compared to each of the remaining images X_C−Nand X_C+N. The optical flow estimator 250 may transmit (or relay) the acquired optical flow information f^D,M, the acquired first motion information F^D,M, and/or the acquired distorted information F_W^D,Mto the degraded image generator 270.

The degraded image generator 270 may generate the first kernel K^Dbased on the received (or relayed) distorted information F_W^D,M. The first kernel K^Dmay be guided by the optical flow information f^D,Mand may be dynamically generated to be aware of a motion pixel-wise. The first kernel K^Dthat is guided by the optical flow information f^D,Mmay effectively process the motion of the object included in the video even with a small size. The degraded image generator 270 may acquire filtering results by using the first kernel K^Dthrough Equation 1.

$\begin{matrix} y (p) = \sum_{t = - N}^{+ N} \sum_{k = 1}^{n^{2}} F_{c + t}^{p} (p_{k}) \cdot x_{c + t}^{'} (p + p_{k}) & [Equation 1] \end{matrix}$

Here, y (p) denotes an output value at a position p, a range from t=−N to t=+N is 2N+1, that is, the number of input images, a range from k=1 to k=n²is the size of an n×n kernel, F_c+t^p(p_k) denotes a kernel value in a c+tth input image included in the input images, x′_c+tis W(x_c+t, f_c+t), and p+p_kdenotes a position moving from the position p by p_k, that is, the position of the first kernel K^D. W(x_c+t, f_c+t) denotes a warping operation using the c+tth input image, X_c+tdenotes the c+tth input image, f_c+tdenotes an occlusion mask and optical flow information of the c+tth input image for a center image among the input images.

Based on the optical flow information f^D,M, the degraded image generator 270 may obtain an image flow mask f^Yfor sharp images Y. The image flow mask f^Ymay include information on the object motion direction or camera motion direction in the sharp images Y. The image flow mask f may include information on a part hidden (or occluded) by the motion of the object. The information on the hidden (or occluded) part may be information on an occlusion mask. The size of a patch of the image flow mask f^Ymay be f^Y∈ custom-character ^{T×H×W×(2+1)}. Here, T denotes the number of input images, H denotes the height of the patch, W denotes the width of the patch, and 2+1 denotes a combination of 2 denoting the optical flow information of the sharp images Y and 1 denoting the information of a current sharp image among the sharp images Y.

Based on the image flow mask f^Yobtained based on the optical flow information f^D,M, the degraded image generator 270 may warp the sharp images Y. The degraded image generator 270 may generate warped images Y_wwhere the object included in each of the sharp images Y is warped.

The degraded image generator 270 may filter the warped images Y_wthrough the first kernel K^D. The degraded image generator 270 may obtain a degraded image Åc by filtering the warped images Y_wthrough the first kernel K^D. The size of the patch of the first kernel K^Dmay be K^D∈ custom-character ^T×H×W×k^d². Here, k_dmay be the size of the first kernel K^D. The first kernel K^Dmay have been normalized by using a SoftMax function. The first kernel K^Dmay mimic a blur generation process where all kernels have positive values.

The first kernel K^Dmay include degradation information on the blurred images X. The degradation information may be information on degradation (blur and/or low-resolution) included in the blurred images X. Based on the degradation information included in the first kernel K^D, the degraded image generator 270 may obtain (or generate) a degraded image {circumflex over (X)}c from the warped images Y_w.

The degraded image generator 270 may obtain the degraded image {circumflex over (X)}c through Equation 2.

$\begin{matrix} {\hat{X}}_{c} = (w (Y, s \cdot (f^{Y} ↑_{^{s}})) K^{D}) ↓_{s} & [Equation 2] \end{matrix}$

Here, {circle around (*)}↓_sdenotes dynamic filterling performed by moving the first kernel K^Dat every interval s at the position of each pixel according to Equation 1, and ↑_sdenotes s times bilinear upsampling.

Based on the first motion information F^D,M, the degraded image generator 270 may generate a sharp image {circumflex over (X)}_sharp^Dby removing degradation included in the blurred images X. For example, the degraded image generator 270 may generate the sharp image {circumflex over (X)}_sharp^Dby mapping the first motion information F^D,Mto an image domain via 3D convolution. The sharp image {circumflex over (X)}_sharp^Dmay be used to train the first neural network 140. The sharp image {circumflex over (X)}_sharp^Dmay be an intermediate output.

The first neural network 140 may be trained based on a loss L_D. The loss L_Dmay be obtained through Equation 3.

$\begin{matrix} L_{D} = l_{1} ({\hat{X}}_{c}, X_{c}) + λ_{1} \sum_{- N}^{+ N} l_{1} (W (Y_{t + c}, s \cdot (f_{t + c}^{Y} ↑_{s})), Y_{c}) + λ_{2} l_{1} (f^{Y}, f_{RAFT}^{Y}) + λ_{3} l_{1} ({\hat{X}}_{Sharp}^{_{D}}, X_{Sharp}) & [Equation 3] \end{matrix}$

Here, f^Ydenotes an optical flow included in the image flow mask f, f_RAFT^Ydenotes a pseudo-ground truth (GT) optical flow generated by a pre-trained RAFT model, and, X_Sharpdenotes a sharp low-resolution image obtained by applying bicubic downsampling to the sharp images Y.

The first term of the right side of Equation 3 may be a reconstruction loss. The second term may be a warping loss for optical flow learning from the center image Y_cto the remaining images Y_t+camong the sharp images Y. The third term may be a loss using RAFT pseudo-GT for minutely adjusting an optical flow. The fourth term may temporally anchor each of features included in motion information F^Dto a corresponding image and may be a temporal anchor loss that sharpens the motion information F^D. The temporal anchor loss may distinguish distorted information from non-distorted information of the motion information F^Dby limiting a solution space.

FIG. 3 is a diagram illustrating the internal structure of an optical flow estimator included in a first neural network, according to an embodiment.

Referring to FIG. 3, according to an embodiment, the feature extractor 210 may transmit (or relay) extracted initial motion information (e.g., the initial motion information F^D,0of FIG. 2) to the optical flow estimator 250. The optical flow estimator 250 may input the initial motion information (e.g., the initial motion information F^D,0of FIG. 2), initial optical flow information (e.g., the initial optical flow information f^D,0of FIG. 2), and/or initial distorted information (e.g., the initial distorted information F_W^D,0of FIG. 2) to a first FRMA block (e.g., the first FRMA block FRMA^D,1of FIG. 2) among a plurality of FRMA blocks (e.g., the plurality of FRMA blocks FRMA^D,1to FRMA^D,Mof FIG. 2). The optical flow estimator 250 may refine the input initial motion information F^D,0, the input initial optical flow information f^D,0, and/or the input initial distorted information F_W^D,0.

The optical flow estimator 250 may include an FRMA block 340. The FRMA block 340 may be an i+1th FRMA block among the plurality of FRMA blocks FRMA^D,1to FRMA^D,MOptical flow information fⁱ, motion information Fⁱ, and distorted information F_Wⁱmay be information of optical flow information (e.g., the optical flow information f^D,Mof FIG. 2), first motion information (e.g., the first motion information F^D,Mof FIG. 2), and distorted information (e.g., the distorted information F_W^D,Mof FIG. 2), respectively. For example, if the FRMA block 340 is the first FRMA block FRMA^D,1, the optical flow information f may be the initial optical flow information f^D,0, the motion information Fⁱmay be the initial motion information F^D,0, and the distorted information F_Wⁱmay be the initial distorted information F_W^D,0.

The motion information Fⁱis expressed by F∈ custom-character ^T×H×W×C, the distorted information F_Wⁱis expressed by F_W∈^T×H×W×C, and the optical flow information f is expressed by f≣{f_c→(c+t)^j,o_c→(c+t)^j}_j=1:n^t=−N:N∈^{T×H×W×(2+1)n}. Here, n denotes the pieces of optical flow information fⁱfrom a target image (e.g., the target image X_Cof FIG. 2) to each of the remaining images (e.g., the remaining images X_C−Nand X_C+Nof FIG. 2). The optical flow information fⁱmay include a trainable occlusion mask o_c→(c+t)^j. The occlusion mask o_c→(c+t)^jmay use a sigmoid activation for stability.

The FRMA block 340 may include a 3D residual dense block (3D RDB). The FRMA block 340 may acquire motion information Fⁱ⁺¹by inputting the motion information Fⁱto the 3D RDB. For example, the FRMA block 340 may acquire the motion information Fⁱ⁺¹through Equation 4.

$\begin{matrix} F^{i + 1} = RDB (F^{i}) & [Equation 4] \end{matrix}$

The FRMA block 340 may acquire optical flow information fⁱ⁺¹through Equation 5.

$\begin{matrix} f^{i + 1} = {f^{S}}^{} + {Conv}_{3 d} (concat (f^{S}, W (F^{i + 1}, f^{S}), F_{c}^{0})) & [Equation 5] \end{matrix}$

Here, W denotes occlusion-aware backward warping, concat denotes concatenation along a channel dimension, and F_c⁰∈ custom-character ^H×W×Cmay be motion information on a center image of the initial motion information F⁰.

The initial motion information F⁰is expressed by F⁰∈ custom-character ^T×H×W×C. For example, the initial motion information F⁰may include the initial motion information F^D,0.

The FRMA block 340 may acquire updated distorted information {tilde over (F)}_w^Sbased on the distorted information F_Wⁱ, the optical flow information fⁱ+1, and the motion information Fⁱ⁺¹. For example, the FRMA block 340 may update the distorted information F_Wⁱbased on the motion information Fⁱ⁺¹that an i+1th blurred image among input blurred images is warped as the target image X_Cby using the optical flow information fⁱ⁺¹. For another example, the FRMA block 340 may acquire the updated distorted information {tilde over (F)}_w^Sthrough Equation 6.

$\begin{matrix} {\tilde{F}}_{w}^{_{S}} = {Conv}_{2 d} (concat (F_{w}^{S}, r_{4 \to 3} (W (F^{i + 1}, f^{i + 1}))) & [Equation 6] \end{matrix}$

Here, r_4→3denotes a reshape operation from custom-character ^T×H×W×Cto ^H×W×TCfor feature aggregation.

The FRMA block 340 may include a multi-attention block 350. The FRMA block 340 may acquire distorted information F_Wⁱ⁺¹based on the updated distorted information {tilde over (F)}_w^S. For example, the FRMA block 340 may acquire the distorted information F_Wⁱ⁺¹by inputting the updated distorted information {tilde over (F)}_w^Sto the multi-attention block 350.

The multi-attention block 350 may include a center-oriented (CO) attention 351. The multi-attention block 350 may further include a fully connected neural network (FNN) 353.

The CO attention 351 included in the multi-attention block 350 may acquire CO attention information based on the updated distorted information {tilde over (F)}_w^Sand the initial motion information F_c⁰. The CO attention information may include a query Q, a key K, and/or a value V. For example, the query Q may be Q=W_qF_c⁰, the key K may be K=W_k{tilde over (F)}_w^S, and the value V may be V=W_v{tilde over (F)}_w^S. The CO attention 351 may calculate an attention map based on the query Q and the key K of the CO attention information. The CO attention 351 may adjust the value V based on the calculated attention map.

An initial process for adjusting the value V may be similar to self-attention. The process of adjusting the value V may achieve better performance when the updated distorted information {tilde over (F)}_w^Slearns its relationship with the initial motion information F_c⁰rather than with the updated distorted information {tilde over (F)}_w^Sitself.

The CO attention 351 may be expressed by Equation 7.

$\begin{matrix} CO Attention (Q, K, V) = SoftMax ({QK}^{T} / \sqrt{d}) V & [Equation 7] \end{matrix}$

Here, √{square root over (d)} may be a scale factor.

The FNN 353 included in the multi-attention block 350 may acquire the distorted information F_Wⁱ⁺¹based on an attention map. For example, the FNN 353 may acquire the distorted information F_Wⁱ+1 by receiving the calculated attention map from the CO attention 351.

The multi-attention block 350 may more accurately warp the updated distorted information {tilde over (F)}_w^Sin the center image through the CO attention 351. The multi-attention block 350 may cause the updated distorted information {tilde over (F)}_w^Sto effectively learn spatiotemporally changing degradation through the CO attention 351.

FIG. 4 is a diagram illustrating the internal structure of a second neural network for restoring a video, according to an embodiment.

Referring to FIG. 4, according to an embodiment, the second neural network 170 may include the feature extractor 210, an optical flow estimator 450, and/or an image restoration unit 470.

The second neural network 170 may receive a plurality of blurred images X included in the video 110. The plurality of blurred images X input to the second neural network 170 may be the same as a video input to the first neural network 140.

The second neural network 170 may receive optical flow information f^D,M, first motion information F^D,M, and/or a first kernel K^Dfrom the first neural network 140.

The feature extractor 210 included in the second neural network 170 may generate second motion information F^R,0to adjust the optical flow information f^D,M, based on the plurality of input blurred images X and the first motion information F^D,M. For example, the feature extractor 210 may generate the second motion information F^R,0, based on concatenation information between the plurality of input blurred images X and the first motion information F^D,M.

The feature extractor 210 may include a 3D RRDB. The feature extractor 210 may generate the second motion information F^R,0by inputting the concatenation information between the plurality of input blurred images X and the first motion information F^D,Mto the 3D RRDB. The feature extractor 210 may transmit (or relay) the generated second motion information F^R,0to the optical flow estimator 450.

The optical flow estimator 450 may generate dynamic filtering information to filter the plurality of blurred images X, based on the received (or relayed) second motion information F^R,0, the optical flow information f^D,M, and initial distorted information F_W^R,0. The dynamic filtering information may include adjusted optical flow information f^R,Mand/or distorted information F_W^R,M. The distorted information F_W^R,Mmay be the initial distortion information F_W^R,0having been refined by a plurality of FRMA blocks. The optical flow estimator 450 may transmit (or relay) the generated adjusted optical flow information f^R,Mand/or the distorted information F_W^R,Mto the image restoration unit 470.

The optical flow estimator 450 may include a plurality of FRMA blocks FRMA^R,1to FRMA^R,MFor example, the optical flow estimator 450 may include M FRMA blocks. The optical flow estimator 450 may input the second motion information F^R,0to the first FRMA block FRMA^R,1among the plurality of FRMA blocks FRMA^R,1to FRMA^R,M. The internal structure of the FRMA blocks FRMA^R,1to FRMA^R,Mand the process of each of the optical flow information f^D,Mand the distorted information F_W^R,Mbeing iteratively refined by the FRMA blocks FRMA^R,1to FRMA^R,Mare described with reference to FIG. 5.

The optical flow estimator 450 may further include a plurality of 3D convolution blocks 450-1 to 450-m. For example, the optical flow estimator 450 may include M 3D convolution blocks. The number of 3D convolution blocks may be the same as the number of FRMA blocks.

The optical flow estimator 450 may generate a plurality of adjusted kernels k^D,1to k^D,Mbased on the first kernel K^D. For example, the optical flow estimator 450 may generate the plurality of adjusted kernels k^D,1to k^D,Mby inputting the first kernel K^Dto the M 3D convolution blocks 450-1 to 450-m. The optical flow estimator 450 may transmit (or relay) the plurality of generated adjusted kernels k^D,1to k^D,Mrespectively to the plurality of 3D convolution blocks 450-1 to 450-m. For example, the optical flow estimator 450 may transmit (or relay) the first adjusted kernel k^D,1to the first FRMA block FRMA^R,1. How the plurality of adjusted kernels k^D,1to k^D,Mis used in the plurality of 3D convolution blocks 450-1 to 450-m is described with reference to FIG. 5.

The image restoration unit 470 may include an FGDF 471. The FGDF 471 may restore the target image X_Cby filtering the blurred images X, based on the adjusted optical flow information f^R,Mand the distorted information F_W^R,Mof the received (relayed) dynamic filtering information.

The FGDF 471 may generate a second kernel K^Rbased on the distorted information F_W^R,M. The second kernel K^Rmay be guided by the adjusted optical flow information f^R,Mand may be dynamically generated to be aware of a motion pixel-wise. The second kernel K^Rthat is guided by the adjusted optical flow information f^R,Mmay effectively process the motion of an object included in the video 110 even with a small size.

The FGDF 471 may obtain an image flow mask f^Xfor the blurred images X, based on the adjusted optical flow information f^R,M. The image flow mask f^Xmay include information on an object motion direction or camera motion direction of the object in the blurred images X. The image flow mask f^Xmay include information on a part hidden (or occluded) by the motion of the object. The information on the hidden (or occluded) part may be information on an occlusion mask. The size of a patch of the image flow mask f^Xmay be the same as the size of a patch of an image flow mask (e.g., the image flow mask f^Yof FIG. 2).

The FGDF 471 may warp the each of the blurred images X, based on the adjusted optical flow information f^R,M. The FGDF 471 may warp the each of the blurred images X, based on the image flow mask f^Xobtained based on the adjusted optical flow information f^R,M. The FGDF 471 may generate warped images X_Wby warping the each of the blurred images X.

The FGDF 471 may filter the warped images X_Wusing the second kernel K^R. The FGDF 471 may obtain a restored image Ŷc restored from the target image X_Cby filtering the warped images X_Wthrough the second kernel K^R. The size of a patch of the second kernel K^Rmay be K^R∈ custom-character ^T×H×W×s²^k^r². k_rmay be the size of the second kernel K^R. The second kernel K^Rmay have been normalized by using a SoftMax function. The second kernel K^Rmay mimic a deblurring process where kernels have negative values in addition to positive values. The structure of the FGDF 471 is described with reference to FIG. 6.

The image restoration unit 470 may restore high-frequency components Ŷ_rof the target image X_C, based on the distorted information F_W^R,MFor example, the image restoration unit 470 may generate the high-frequency components Ŷ_rof the target image X_Cby using stacked convolution and pixel shuffle. The image restoration unit 470 may restore low-frequency components of the target image X_Cby filtering the warped images X_Wusing the second kernel K^R.

The image restoration unit 470 may obtain the restored image Ŷc restored from the target image X_Cthrough Equation 8.

$\begin{matrix} \hat{Y} c = {\hat{Y}}_{r} + (W (X, f^{X}) K^{R}) ↑_{s} & [Equation 8] \end{matrix}$

Here, {circle around (*)}↓_smay be performing dynamic filtering by moving the second kernel K^Rat every interval s from the position of each pixel according to Equation 1.

Based on the motion information F^R,M, the image restoration unit 470 may generate a sharp image {circumflex over (X)}_Sharp^Rby removing degradation included in the blurred images X. For example, the image restoration unit 470 may generate the sharp image {circumflex over (X)}_Sharp^Rby mapping the motion information F^R,Mto an image domain via 3D convolution. The motion information F^R,Mmay be the second motion information F^R,0refined by the optical flow estimator 450. The sharp image {circumflex over (X)}_Sharp^Rmay be used to train the second neural network 170. The sharp image {circumflex over (X)}_Sharp^Rmay be an intermediate output. The sharp image {circumflex over (X)}_Sharp^Rmay be expressed by {circumflex over (X)}_Sharp^R∈ custom-character ^T×H×W×3.

The second neural network 170 may be trained based on a total loss L_total. The first neural network 140 may be jointly trained with the second neural network 170, based on the total loss L_total. The total loss L_totalmay be obtained through Equation 9.

$\begin{matrix} L_{total} = l_{1} ({\hat{Y}}_{c}, Y_{c}) + λ_{4} \sum_{- N}^{+ N} l_{1} (W (X_{t + c}, f_{t + c}^{X}), X_{c}) + λ_{5} l_{1} ({\hat{X}}_{Sharp}^{_{R}}, X_{Sharp}) + λ_{6} l_{D} & [Equation 9] \end{matrix}$

The first term of the right side of Equation 9 may be a reconstruction loss. The second term and the third term may be respectively the same as the second term and the third term of the right side of Equation 3 except for applied domains.

FIG. 5 is a diagram illustrating the internal structure of an optical flow estimator included in a second neural network, according to an embodiment.

Referring to FIG. 5, according to an embodiment, the feature extractor 210 may transmit (or relay) generated second motion information (e.g., the second motion information F^R,0of FIG. 4) to the optical flow estimator 450. The optical flow estimator 450 may input optical flow information (e.g., the optical flow information f^D,Mof FIG. 4), second motion information (e.g., the second motion information F^R,0of FIG. 4), and/or initial distorted information (e.g., the initial distorted information F_W^R,0of FIG. 4) to a plurality of FRMA blocks (e.g., the FRMA blocks FRMA^R,1to FRMA^R,Mof FIG. 4). The optical flow estimator 450 may refine the input optical flow information f^D,M, the input second motion information F^R,0, and/or the input initial distorted information F_W^R,0.

The optical flow estimator 450 may include an FRMA block 540. The FRMA block 540 may be an i+1th FRMA block among the plurality of FRMA blocks FRMA^R,1to FRMA^R,M. Optical flow information fⁱ, motion information Fⁱ, and distorted information F_Wⁱmay be information of optical flow information (e.g., the optical flow information f^R,Mof FIG. 4), motion information (e.g., the motion information F^R,Mof FIG. 4), and distorted information (e.g., the distorted information F_W^R,Mof FIG. 4), respectively. For example, if the FRMA block 540 is the first FRMA block FRMA^R,1, the optical flow information f may be the optical flow information f^D,M, the motion information Fⁱmay be the second motion information F^R,0, and the distorted information F_Wⁱmay be the initial distorted information F_W^R,0, in which the initial distorted information F_W^R,0may be 0. The internal structure of the FRMA block 540 may be the same as the internal structure of an FRMA block (e.g., the FRMA block 340 of FIG. 3). For example, the optical flow information fⁱ, the motion information Fⁱ, and the distorted information F_Wⁱthat are input to the FRMA block 540 may be respectively the same as the optical flow information fⁱ, the motion information Fⁱ, and the distorted information F_Wⁱthat are input to the FRMA block 340. For another example, distorted information {tilde over (F)}_w^Supdated based on motion information Fⁱ⁺¹, optical flow information fⁱ⁺¹, and the distorted information F_Wⁱinput to the FRMA block 340 may be the same as distorted information {tilde over (F)}_w^Supdated based on the motion information Fⁱ⁺¹, the optical flow information fⁱ⁺¹, and the distorted information F_Wⁱinput to the FRMA block 540.

The FRMA block 540 may include a multi-attention block 550. The FRMA block 540 may acquire distorted information Fⁱ⁺¹based on the updated distorted information {tilde over (F)}_W^S. For example, the FRMA block 540 may acquire the distorted information F_Wⁱ⁺¹by inputting the updated distorted information {tilde over (F)}_W^Sto the multi-attention block 550.

The multi-attention block 550 may include the CO attention 351 and/or a degradation-aware (DA) attention 555. The multi-attention block 550 may further include one or more FNNs 353 and 553.

The FNN 353 included in the multi-attention block 550 may acquire distorted information (not shown) to which nonlinear transformation is applied, based on an attention map. For example, the distorted information to which nonlinear transformation is applied may include distorted information transformed based on a sigmoid activation function.

The DA attention 555 included in the multi-attention block 550 may acquire DA attention information based on the distorted information to which nonlinear transformation is applied and an adjusted kernel k^D,iobtained from a first kernel (e.g., the first kernel K^Dof FIG. 2). The DA attention information may include a query Q, a key K, and/or a value V. For example, the query Q may be Q=W_qk^D,i, the key K may be K=W_k{tilde over (F)}_w^S, and the value V may be V=W_v{tilde over (F)}_W^S. The adjusted kernel k^D,imay be expressed by k^D,i∈ custom-character ^H×W×C. The DA attention 555 may calculate an attention map based on the query Q and the key K of the DA attention information. The DA attention 555 may adjust the value V based on the calculated attention map.

An initial process for adjusting the value V may be similar to self-attention. The process of adjusting the value V may achieve better performance when the updated distorted information {tilde over (F)}_W^Slearns its relationship with the initial motion information F_c⁰rather than with the updated distorted information {tilde over (F)}_W^Sitself.

The DA attention 555 the same as the CO attention 351 may be expressed by Equation 6.

The FNN 553 included in the multi-attention block 550 may acquire the distorted information F_Wⁱ⁺¹based on the attention map. The multi-attention block 550 may cause the updated distorted information {tilde over (F)}_W^Sto be globally adaptive to degradation included in blurred images (e.g., the blurred images X of FIG. 4) through the DA attention 555.

FIG. 6 is a diagram illustrating the structure of an FGDF, according to an embodiment.

Referring to FIG. 6, according to an embodiment, an FGDF 600 may be the same as an FGDF (e.g., the FGDF 471 of FIG. 4). The FGDF 600 may receive blurred images X. In addition, the FGDF 600 may receive adjusted optical flow information f^R,Mfrom an optical flow estimator (e.g., the optical flow estimator 450 of FIG. 4).

The FGDF 600 may warp each of the blurred images X, based on the adjusted optical flow information f^R,M. The FGDF 600 may generate warped images X_Wby warping the each of the blurred images X.

The FGDF 600 may include a second kernel K^R. The FGDF 600 may filter the warped images X_Wthrough the second kernel K^R. The FGDF 600 may generate a restored image Ŷc restored from a target image (e.g., the target image X_Cof FIG. 4) by filtering the warped images X_Wusing the second kernel K^R.

FIG. 7 is a flowchart illustrating a method of restoring a video, according to an embodiment. Operations 710 to 770 may be sequentially performed but not necessarily. For example, operations 710 and 730 may be parallelly performed, or operation 730 may be performed prior to operation 710. Operations 710 to 770 may be substantially the same as the operations of the above-described video restoration unit (e.g., the video restoration unit 100 of FIG. 1), and thus the repeated description thereof is omitted.

In operation 710, the video restoration unit 100 may obtain a plurality of blurred images X_c−N, X_c, and X_c+Ndue to motions of an object captured in the video (e.g., the video 110 of FIG. 1) and a plurality of sharp images (e.g., the plurality of sharp images Y of FIG. 2) corresponding to the plurality of blurred images X_c−N, X_c, and X_c+N.

In operation 730, based on the blurred images X_c−N, X_c, and X_c+Nincluded in the video 110, the video restoration unit 100 may generate a first kernel (e.g., the first kernel K^Dof FIG. 2) to filter the sharp images Y, optical flow information (e.g., the optical flow information f^D,Mof FIG. 4) on an object motion direction or camera motion direction, and first motion information (e.g., the first motion information F^D,Mof FIG. 4) of the object included in the blurred images X_c−N, X_c, and X_c+N.

In operation 750, the video restoration unit 100 may generate dynamic filtering information to filter the blurred images X_c−N, X_c, and X_c+N, based on the first kernel K^D, the optical flow information f^D,M, the first motion information F^D,M, the blurred images X_c−N, X_c, and X_c+N.

In operation 770, the video restoration unit 100 may restore a target image (e.g., the target image X_cof FIG. 4) to be restored among the plurality of the blurred images X_c−N, X_c, and X_c+Nand the dynamic filtering information.

FIG. 8 is a block diagram illustrating a video restoration unit according to an embodiment.

Referring to FIG. 8, according to an embodiment, an electronic device 810 (e.g., the video restoration unit 100 of FIG. 1) may include a processor 830 and a memory 870.

The memory 870 may store instructions (or programs) executable by the processor 830. For example, the instructions may include instructions for executing an operation of the processor 830 and/or an operation of each component of the processor 830.

The memory 870 may include one or more computer-readable storage media. The memory 870 may include non-volatile storage elements (e.g., a magnetic hard disk, an optical disc, a floppy disc, a flash memory, an electrically programmable memory (EPROM), and an electrically erasable and programmable memory (EEPROM)).

The memory 870 may be non-transitory media. The term “non-transitory” may indicate that a storage medium is not embodied in a carrier wave or a propagated signal. However, the term “non-transitory” should not be interpreted to mean that the memory 870 is non-movable.

The processor 830 may process data stored in the memory 870. The processor 830 may execute computer-readable code (e.g., software) stored in the memory 870 and instructions triggered by the processor 830.

The processor 830 may be a hardware-implemented data processing device including a circuit that is physically structured to execute desired operations. For example, the desired operations may include code or instructions included in a program.

For example, the hardware-implemented data processing device may include a microprocessor, a central processing unit (CPU), a processor core, a multi-core processor, a multiprocessor, an application-specific integrated circuit (ASIC), and a field-programmable gate array (FPGA).

The host processor 830 may generally control the electronic device 810 by executing programs and/or instructions stored in the memory 870. Operations performed by the electronic device 810 may be substantially the same as the operations performed by the video restoration unit 100 described with reference to FIGS. 1 to 7. Accordingly, the repeated description thereof is omitted.

The examples described herein may be implemented using a hardware component, a software component and/or a combination thereof. A processing device may be implemented using one or more general-purpose or special-purpose computers, such as, for example, a processor, a controller and an arithmetic logic unit (ALU), a digital signal processor (DSP), a microcomputer, a FPGA, a programmable logic unit (PLU), a microprocessor, or any other device capable of responding to and executing instructions in a defined manner. The processing device may run an operating system (OS) and one or more software applications that run on the OS. The processing unit also may access, store, manipulate, process, and generate data in response to execution of the software. For purpose of simplicity, the description of a processing unit is used as singular; however, one skilled in the art will appreciate that a processing unit may include multiple processing elements and multiple types of processing elements. For example, the processing unit may include a plurality of processors, or a single processor and a single controller. In addition, different processing configurations are possible, such as parallel processors.

The software may include a computer program, a piece of code, an instruction, or some combination thereof, to independently or uniformly instruct or configure the processing device to operate as desired. Software and data may be embodied permanently or temporarily in any type of machine, component, physical or virtual equipment, computer storage medium or device, or in a propagated signal wave capable of providing instructions or data to or being interpreted by the processing device. The software also may be distributed over network-coupled computer systems so that the software is stored and executed in a distributed fashion. The software and data may be stored by one or more non-transitory computer-readable recording mediums.

The methods according to the above-described examples may be recorded in non-transitory computer-readable media including program instructions to implement various operations of the above-described embodiments. The media may also include, alone or in combination with the program instructions, data files, data structures, and the like. The program instructions recorded on the media may be those specially designed and constructed for the purposes of examples, or they may be of the kind well-known and available to those having skill in the computer software arts. Examples of non-transitory computer-readable media include magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROM discs, DVDs, and/or Blue-ray discs; magneto-optical media such as optical discs; and hardware devices that are specially configured to store and perform program instructions, such as read-only memory (ROM), random-access memory (RAM), flash memory (e.g., USB flash drives, memory cards, memory sticks, etc.), and the like. Examples of program instructions include both machine code, such as produced by a compiler, and files containing higher-level code that may be executed by the computer using an interpreter.

The above-described devices may act as one or more software modules in order to perform the operations of the above-described examples, or vice versa.

As described above, although the examples have been described with reference to the limited drawings, a person skilled in the art may apply various technical modifications and variations based thereon. Suitable results may be achieved if the described techniques are performed in a different order, and/or if components in a described system, architecture, device, or circuit are combined in a different manner, and/or replaced or supplemented by other components or their equivalents.

Accordingly, other implementations are within the scope of the following claims.

Number	Date	Country	Kind
10-2023-0181432	Dec 2023	KR	national
10-2024-0179843	Dec 2024	KR	national

METHOD AND DEVICE FOR RESTORING VIDEO

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

Priority Claims (2)

CROSS-REFERENCE TO RELATED APPLICATION(S)

Provisional Applications (1)