IMAGE PROCESSING METHOD AND DEVICE, AND STORAGE MEDIUM

Description

TECHNICAL FIELD

The present disclosure relates to the technical field of computer technology, in particular to an image processing method and device, an electronic apparatus, and a storage medium.

BACKGROUND

Video super-resolution aims to reconstruct a high-resolution video corresponding to a given low-resolution video. The relevant technology predicts a high-resolution video frame by using multiple low-resolution video frames, and the reconstructed video frame has resolution higher than that of the pro-reconstruction video frame. The resulting video thus has higher definition.

SUMMARY

The present disclosure provides a technical solution for reconstructing a high-resolution video frame.

According to one aspect of the present disclosure, there is provided an image processing method comprising:

acquiring at least one of a backward propagation feature of an (x+1)th video frame in a video segment and a forward propagation feature of an (x−1)th video frame in the video segment, wherein the video segment includes N video frames, N being an integer greater than 2, and x being an integer;

deriving a reconstruction feature of the xth video frame from at least one of the xth video frame, the backward propagation feature of the (x+1)th video frame, and the forward propagation feature of the (x−1)th video frame; and

deriving a target video frame corresponding to the xth video frame by reconstructing the xth video frame based on the reconstruction feature of the xth video frame, wherein the target video frame has resolution higher than that of the xth video frame.

In a possible implementation, in the case of 1<x<N, deriving a reconstruction feature of the xth video frame from at least one of the xth video frame, the backward propagation feature of the (x+1)th video frame, and the forward propagation feature of the (x−1)th video frame comprises:

determining a backward propagation feature of the xth video frame based on the xth video frame, the (x+1)th video frame, and the backward propagation feature of the (x+1)th video frame;

determining a forward propagation feature of the xth video frame based on the xth video frame, the (x−1)th video frame, the forward propagation feature of the (x−1)th video frame, and the backward propagation feature of the xth video frame; and

using the forward propagation feature of the xth video frame as a reconstruction feature of the xth video frame.

In a possible implementation, determining a backward propagation feature of the xth video frame based on the xth video frame, the (x+1)th video frame, and the backward propagation feature of the (x+1)th video frame comprises:

deriving a first optical flow diagram from the xth video frame and the (x+1)th video frame;

deriving a distorted backward propagation feature by distorting the backward propagation feature of the (x+1)th video frame based on the first optical flow diagram; and

deriving a backward propagation feature of the xth video frame from the distorted backward propagation feature and the xth video frame.

deriving a second optical flow diagram from the xth video frame and the (x−1)th video frame;

deriving a distorted forward propagation feature by distorting the forward propagation feature of the (x−1)th video frame based on the second optical flow diagram; and

deriving a forward propagation feature of the xth video frame from the backward propagation feature of the xth video frame, the distorted forward propagation feature, and the xth video frame.

determining a forward propagation feature of the xth video frame based on the xth video frame, the (x−1)th video frame, and the forward propagation feature of the (x−1)th video frame;

determining a backward propagation feature of the xth video frame based on the xth video frame, the (x+1)th video frame, the backward propagation feature of the (x+1)th video frame, and the forward propagation feature of the xth video frame; and

using the backward propagation feature of the xth video frame as a reconstruction feature of the xth video frame.

In a possible implementation, determining a forward propagation feature of the xth video frame based on the xth video frame, the (x−1)th video frame, and the forward propagation feature of the (x−1)th video frame comprises:

deriving a second optical flow diagram from the xth video frame and the (x−1)th video frame;

deriving a distorted forward propagation feature by distorting the forward propagation feature of the (x−1)th video frame based on the second optical flow diagram; and

deriving a forward propagation feature of the xth video frame from the distorted forward propagation feature and the xth video frame.

In a possible implementation, determining a backward propagation feature of the xth video frame based on the xth video frame, the (x+1)th video frame, the backward propagation feature of the (x+1)th video frame, and the forward propagation feature of the xth video frame comprises:

deriving a first optical flow diagram from the xth video frame and the (x+1)th video frame;

deriving a distorted backward propagation feature by distorting the backward propagation feature of the (x+1)th video frame based on the first optical flow diagram; and

deriving a backward propagation feature of the xth video frame from the distorted backward propagation feature and the xth video frame.

In a possible implementation, in the case of x=1, deriving a reconstruction feature of the xth video frame from at least one of the xth video frame, the backward propagation feature of the (x+1)th video frame, and the forward propagation feature of the (x−1)th video frame comprises:

deriving a forward propagation feature of the xth video frame by performing feature extraction on the xth video frame; and

using the forward propagation feature of the xth video frame as a reconstruction feature of the xth video frame.

In a possible implementation, in the case of x=N, deriving a reconstruction feature of the xth video frame from at least one of the xth video frame, the backward propagation feature of the (x+1)th video frame, and the forward propagation feature of the (x−1)th video frame comprises:

deriving a backward propagation feature of the xth video frame by performing feature extraction on the xth video frame; and

using the backward propagation feature of the xth video frame as a reconstruction feature of the xth video frame.

acquiring a backward propagation feature of the (x+1)th video frame for the xth video frame;

deriving a forward propagation feature of the xth video frame from the xth video frame and the backward propagation feature of the (x+1)th video frame; and

using the forward propagation feature of the xth video frame as a reconstruction feature of the xth video frame.

acquiring a forward propagation feature of the (x−1)th video frame for the xth video frame;

deriving a backward propagation feature of the xth video frame from the xth video frame and the forward propagation feature of the (x−1)th video frame; and

using the backward propagation feature of the xth video frame as a reconstruction feature of the xth video frame.

In a possible implementation, the method further comprises:

determining at least two key frames in video data; and

dividing the video data into at least one video segment based on the key frames.

According to another aspect of the present disclosure, there is provided an image processing device comprising:

an acquiring module configured to acquire at least one of a backward propagation feature of an (x+1)th video frame in a video segment and a forward propagation feature of an (x−1)th video frame in the video segment, wherein the video segment includes N video frames, N being an integer greater than 2, and x being an integer;

a first processing module configured to derive a reconstruction feature of the xth video frame from at least one of the xth video frame, the backward propagation feature of the (x+1)th video frame, and the forward propagation feature of the (x−1)th video frame; and

a second processing module configured to derive a target video frame corresponding to the xth video frame by reconstructing the xth video frame based on the reconstruction feature of the xth video frame, wherein the target video frame has resolution higher than that of the xth video frame.

In a possible implementation, in the case of 1<x<N, the first processing module is further configured to determine a backward propagation feature of the xth video frame based on the xth video frame, the (x+1)th video frame, and the backward propagation feature of the (x+1)th video frame;

determine a forward propagation feature of the xth video frame based on the xth video frame, the (x−1)th video frame, the forward propagation feature of the (x−1)th video frame, and the backward propagation feature of the xth video frame; and

use the forward propagation feature of the xth video frame as a reconstruction feature of the xth video frame.

In a possible implementation, the first processing module is further configured to derive a first optical flow diagram from the xth video frame and the (x+1)th video frame;

derive a distorted backward propagation feature by distorting the backward propagation feature of the (x+1)th video frame based on the first optical flow diagram; and

derive a backward propagation feature of the xth video frame from the distorted backward propagation feature and the xth video frame.

In a possible implementation, the first processing module is further configured to derive a second optical flow diagram from the xth video frame and the (x−1)th video frame;

derive a distorted forward propagation feature by distorting the forward propagation feature of the (x−1)th video frame based on the second optical flow diagram; and

derive a forward propagation feature of the xth video frame from the backward propagation feature of the xth video frame, the distorted forward propagation feature, and the xth video frame.

In a possible implementation, in the case of 1<x<N, the first processing module is further configured to determine a forward propagation feature of the xth video frame based on the xth video frame, the (x−1)th video frame, and the forward propagation feature of the (x−1)th video frame;

determine a backward propagation feature of the xth video frame based on the xth video frame, the (x+1)th video frame, the backward propagation feature of the (x+1)th video frame, and the forward propagation feature of the xth video frame; and

use the backward propagation feature of the xth video frame as a reconstruction feature of the xth video frame.

In a possible implementation, the first processing module is further configured to derive a second optical flow diagram from the xth video frame and the (x−1)th video frame;

derive a distorted forward propagation feature by distorting the forward propagation feature of the (x−1)th video frame based on the second optical flow diagram; and

derive a forward propagation feature of the xth video frame from the distorted forward propagation feature and the xth video frame.

In a possible implementation, the first processing module is further configured to derive a first optical flow diagram from the xth video frame and the (x+1)th video frame;

derive a distorted backward propagation feature by distorting the backward propagation feature of the (x+1)th video frame based on the first optical flow diagram; and

derive a backward propagation feature of the xth video frame from the forward propagation feature of the xth video frame, the distorted backward propagation feature, and the xth video frame.

In a possible implementation, in the case of x=1, the first processing module is further configured to:

derive a forward propagation feature of the xth video frame by performing feature extraction on the xth video frame; and

use the forward propagation feature of the xth video frame as a reconstruction feature of the xth video frame.

In a possible implementation, in the case of x=N, the first processing module is further configured to:

derive a backward propagation feature of the xth video frame by performing feature extraction on the xth video frame; and

use the backward propagation feature of the xth video frame as a reconstruction feature of the xth video frame.

In a possible implementation, in the case of x=1, the first processing module is further configured to:

acquire a backward propagation feature of the (x+1)th video frame for the xth video frame;

derive a forward propagation feature of the xth video frame from the xth video frame and the backward propagation feature of the (x+1)th video frame; and

use the forward propagation feature of the xth video frame as a reconstruction feature of the xth video frame.

In a possible implementation, in the case of x=N, the first processing module is further configured to:

acquire a forward propagation feature of the (x−1)th video frame for the xth video frame;

derive a backward propagation feature of the xth video frame from the xth video frame and the forward propagation feature of the (x−1)th video frame; and

use the backward propagation feature of the xth video frame as a reconstruction feature of the xth video frame.

In a possible implementation, the device further comprises:

a determining module configured to determine at least two key frames in video data; and

a dividing module configured to divide the video data into at least one video segment based on the key frames.

According to still another aspect of the present disclosure, there is provided an electronic apparatus comprising a processor; and a memory for storing instructions executable by the processor, wherein the processor is configured to call instructions stored by the memory so as to perform the methods described above.

According to still another aspect of the present disclosure, there is provided a computer-readable storage medium storing computer program instructions which, when executed by a processor, implement the methods described above.

According to still another aspect of the present disclosure, there is provided a computer program comprising computer-readable codes, wherein when the codes run on an electronic apparatus, a processor in the electronic apparatus performs the methods described above.

In embodiments of the present disclosure, it is possible to acquire at least one of a backward propagation feature of the (x+1)th video frame and a forward propagation feature of the (x−1)th video frame, and thus possible to derive a reconstruction feature of the xth video frame from at least one of the xth video frame, the backward propagation feature of the (x+1)th video frame, and the forward propagation feature of the (x−1)th video frame. Further, it is possible to derive a target video frame corresponding to the xth video frame by reconstructing the xth video frame based on the reconstruction feature of the xth video frame, wherein the target video frame has resolution higher than that of the xth video frame. The image processing method and device, the electronic apparatus, and the storage medium provided by embodiments of the present disclosure reduces computation costs by deriving high-resolution images in higher reconstruction efficiency, and save plenty of feature extraction and aggregation time and improve reconstruction accuracy by making use of temporal continuity in natural videos by determining a reconstruction feature of a video frame from features transferred by a preceding video and a following video frame—that is, using features of proximate video frames without extracting features from scratch.

It should be appreciated that the foregoing general description and the following detailed description are exemplary and explanatory and not meant to limit the present disclosure. Other aspects of the present disclosure will become clear due to the following detailed explanations of exemplary embodiments with reference to the drawing.

BRIEF DESCRIPTION OF THE DRAWINGS

The drawings, which are incorporated in and constitute part of the specification, illustrate embodiments in accordance with the present disclosure and serve to explain, together with the specification, technical solutions of the present disclosure.

FIG. 1 shows a flowchart of an image processing method according to an embodiment of the present disclosure;

FIG. 2 shows a schematic structural diagram of a neural network according to an embodiment of the present disclosure;

FIG. 3 shows a schematic diagram of an image processing method according to an embodiment of the present disclosure;

FIG. 4 shows a schematic diagram of an image processing method according to an embodiment of the present disclosure;

FIG. 5 shows a schematic diagram of an image processing method according to an embodiment of the present disclosure;

FIG. 6 shows a schematic diagram of an image processing method according to an embodiment of the present disclosure;

FIG. 7 shows a schematic diagram of an image processing method according to an embodiment of the present disclosure;

FIG. 8 shows a schematic diagram of an image processing method according to an embodiment of the present disclosure;

FIG. 9 shows a block diagram of an image processing device according to an embodiment of the present disclosure;

FIG. 10 shows a block diagram of an electronic apparatus 800 according to an embodiment of the present disclosure; and

FIG. 11 shows a block diagram of an electronic apparatus 1900 according to an embodiment of the present disclosure.

DETAILED DESCRIPTION

Various exemplary embodiments, features, and aspects of the present disclosure will be described in detail with reference to the drawings. The same reference numerals in the drawings represent parts having the same or similar functions. Although various aspects of the embodiments are shown in the drawings, it is unnecessary to proportionally draw the drawings unless otherwise specified.

Herein the term “exemplary” means “used as an instance or example, or explanatory.” An “exemplary” embodiment given herein is not necessarily construed as being superior to or better than other embodiments.

The term “and/or” herein just means association between associated objects, which means that three relationships exist between the associated objects. For example, “A and/or B” means the three cases that A exists alone, A and B exist at the same time, and B exists alone. Besides, the term “at least one” herein means any one of a plurality things, or any combination of at least two of a plurality of things. For example, including at least one of A, B, and C means any one or some elements selected from the set consisting of A, B and C.

Numerous details are given in the following embodiments for the purpose of better explaining the present disclosure. It should be appreciated by a person skilled in the art that the present disclosure can still be implemented even without some of those details. In some of the embodiments, methods, means, units and circuits that are well known to a person skilled in the art are not described in detail so that the principle of the present disclosure become apparent.

FIG. 1 shows a flowchart of an image processing method according to an embodiment of the present disclosure. The method may be performed by an electronic apparatus such as a terminal apparatus or a server. The terminal apparatus may be user equipment (UE), a mobile apparatus, a terminal, a cellular phone, a cordless phone, a personal digital assistant (PDA), a handheld apparatus, a computing apparatus, an in-vehicle apparatus, a wearable apparatus, etc. The method may also be implemented by calling, by a processor, computer-readable instructions stored in a memory, or be performed by a server.

As shown in FIG. 1, the image processing method comprises the following:

Step S11 of acquiring at least one of a backward propagation feature of an (x+1)th video frame in a video segment and a forward propagation feature of an (x−1)th video frame in the video segment, wherein the video segment includes N video frames, N being an integer greater than 2, and x being an integer.

Video super-resolution is meant to reconstruct a high-resolution video corresponding to a given a low-resolution video. The image processing method provided by embodiments of the present disclosure can derive a corresponding high-resolution video by reconstructing a low-resolution video.

For example, video data to be processed can be regarded as one video segment, or be divided into multiple video segments which are independent of each other.

In a possible implementation, the method may further comprise:

determining at least two key frames in the video data; and

dividing the video data into at least one video segment based on the key frames.

For example, the first and last frames in the video data can be regarded as key frames, and the video data is regarded as one video segment. Alternatively, at least two key frames in the video data may be determined according to a preset number of interval frames. For example, the first frame in the video data is taken as a key frame, two adjacent key frames in the video data are spaced by a preset number of interval frames, and the video data is divided into multiple video segments based on every two adjacent key frames. Alternatively, the first frame in the video data is taken as a key frame, and for an Nth key frame, the optical flow between any frame that follows the Nth key frame and the Nth key frame is determined. If an average value of the optical flow is greater than a threshold, this frame is regarded as the (N+1)th key frame. Dividing the video data into multiple video segments based on every two adjacent key frames can ensure that video frames in the same video segment are correlated to each other to some extent.

In reconstructing a high-resolution image of the xth video frame in the video segment, a backward propagation feature of the (x+1)th video frame in the video segment, and/or a forward propagation feature of the (x−1)th video frame in the video segment can be acquired. A backward propagation feature of all video frames other than the first video frame (the second video frame, the third video frame, . . . , the (N−1)th video frame) in the video segment can be determined based on a backward propagation feature of a video frame that immediately follows a current video frame, and the determined backward propagation feature can be transferred to a video frame that immediately precedes the current video frame so that a backward propagation feature of the video frame that immediately precedes the current video frame can be determined based on a backward propagation feature of the current video frame. A forward propagation feature of all video frames other than the Nth video frame can be determined based on a forward propagation feature of a video frame that immediately precedes a current video frame, and the determined forward propagation feature can be transferred to a video frame that immediately follows the current video frame so that a forward propagation feature of the video frame that immediately follows the current video frame can be determined based on a forward propagation feature of the current video frame.

Step S12 of deriving a reconstruction feature of the xth video frame from at least one of the xth video frame, the backward propagation feature of the (x+1)th video frame, and the forward propagation feature of the (x−1)th video frame.

For example, after the backward propagation feature of the (x+1)th video frame, and/or the forward propagation feature of the (x−1)th video frame is/are acquired, a reconstruction feature of the xth video frame can be derived by performing feature extraction based on at least one of the xth video frame, the backward propagation feature of the (x+1)th video frame, and the forward propagation feature of the (x−1)th video frame. For instance, in the case of 1<x<N, a reconstruction feature of the xth video frame can be derived from the xth video frame, the backward propagation feature of the (x+1)th video frame, and the forward propagation feature of the (x−1)th video frame. In the case of x=1, a reconstruction feature of the xth video frame can be derived from the xth video frame or the backward propagation feature of the (x+1)th video frame. In the case of x=N, a reconstruction feature of the xth video frame can be derived from the xth video frame or the forward propagation feature of the (x−1)th video frame. For instance, a reconstruction feature of the xth video frame can be derived by performing, by means of a neural network configured to extract a reconstruction feature, convolution on at least one of the xth video frame, the backward propagation feature of the (x+1)th video frame, and the forward propagation feature of the (x−1)th video frame.

Step S13 of deriving a target video frame corresponding to the xth video frame by reconstructing the xth video frame based on the reconstruction feature of the xth video frame, wherein the target video frame has resolution higher than that of the xth video frame.

For example, a high-resolution reconstruction feature can be derived by amplifying the reconstruction feature of the xth video frame by means of convolution and multi-channel recombination. Then, a target video frame corresponding to the xth video frame is derived by performing up-sampling on the x-th video frame to derive an up-sampling result, and adding the high-resolution reconstruction feature and the up-sampling result. The target video frame has resolution higher than that of the xth video frame, that is, the target video frame is a high-resolution image frame of the xth video frame.

Illustratively, FIG. 2 shows a schematic structural diagram of a neural network configured to reconstruct a high-resolution image. A convolution module 202 performs convolution on a reconstruction feature 201 of the xth video frame (p_x) to give a convolution result. A pixel recombination module 203 processes the convolution result to give a first processing result, which is then subjected to processing by a convolution module 204 and a pixel recombination module 205 to give a second processing result. A convolution module 206 and a convolution module 207 perform convolution twice on the second processing result to give an amplified reconstruction feature. The xth video frame (p_x) is subjected to up-sampling to give an up-sampling result. Adding the up-sampling result and the amplified reconstruction feature together results in a target video frame 208 corresponding to the xth video frame.

In this way, it is possible to acquire at least one of a backward propagation feature of the (x+1)th video frame and a forward propagation feature of the (x−1)th video frame, and thus possible to derive a reconstruction feature of the xth video frame from at least one of the xth video frame, the backward propagation feature of the (x+1)th video frame, and the forward propagation feature of the (x−1)th video frame. Further, it is possible to derive a target video frame corresponding to the xth video frame by reconstructing the xth video frame based on the reconstruction feature of the xth video frame, wherein the target video frame has resolution higher than that of the xth video frame. The image processing method provided by an embodiment of the present disclosure improves the reconstruction efficiency of high-resolution images, reduces computation costs, saves plenty of feature extraction and aggregation time, and improves reconstruction accuracy by making use of temporal continuity in natural videos by determining a reconstruction feature of any video frame from features transferred by a preceding video frame and a following video frame—that is, using features of proximate video frames without extracting features from scratch.

In a possible implementation, deriving a reconstruction feature of the xth video frame from at least one of the xth video frame, a backward propagation feature of the (x+1)th video frame, and a forward propagation feature of the (x−1)th video frame may comprise:

determining a backward propagation feature of the xth video frame based on the xth video frame, the (x+1)th video frame, and the backward propagation feature of the (x+1)th video frame;

using the forward propagation feature of the xth video frame as a reconstruction feature of the xth video frame.

For example, a backward propagation feature of the xth video frame can be derived by distorting the backward propagation feature of the (x+1)th video frame by means of the xth video frame and the (x+1)th video frame to achieve feature alignment.

deriving a first optical flow diagram from the xth video frame and the (x+1)th video frame;

deriving a distorted backward propagation feature by distorting the backward propagation feature of the (x+1)th video frame based on the first optical flow diagram; and

deriving a backward propagation feature of the xth video frame from the distorted backward propagation feature and the xth video frame.

For example, as shown in FIG. 3, it is possible to predict a first optical flow diagram (denoted by s_x⁺ in FIG. 3) between the xth video frame (denoted by p_xin FIG. 3) and the (x+1)th video frame (denoted by p_x+1in FIG. 3) from the xth video frame and the (x+1)th video frame, derive a distorted backward propagation feature by feature aligning the backward propagation feature (denoted by b_x+1in FIG. 3) of the (x+1)th video frame with the xth video frame based on the first optical flow diagram s_x⁺, and derive a backward propagation feature (denoted by b_xin FIG. 3) of the xth video frame from the distorted backward propagation feature and the xth video frame.

Illustratively, a backward propagation feature of the xth video frame (denoted by p_x) can be determined using a neural network configured to determine a backward propagation feature shown in FIG. 4, in which 401 denotes a convolution module, and 402 denotes a residual module. To be specific, the first step is to derive a distorted backward propagation feature by distorting the backward propagation feature b_x+1of the (x+1)th video frame by means of a first optical flow diagram between the xth video frame and the (x+1)th video frame to construct correspondence between the xth video frame and the backward propagation feature b_x+1of the (x+1)th video frame. The next step is to derive a backward propagation feature b_xof the xth video frame by performing convolution a number of times on the distorted backward propagation feature and the xth video frame and using the convolution result as input of the residual module.

Subsequent to deriving the backward propagation feature b_xof the xth video frame, a forward propagation feature of the xth video frame can be determined based on the backward propagation feature b_xof the xth video frame.

In a possible implementation, determining a forward propagation feature of the xth video frame based on the xth video frame, the (x−1)th video frame, the forward propagation feature of the (x−1)th video frame, and the backward propagation feature of the xth video frame may comprise:

deriving a second optical flow diagram from the xth video frame and the (x−1)th video frame;

deriving a distorted forward propagation feature by distorting the forward propagation feature of the (x−1)th video frame based on the second optical flow diagram; and

deriving a forward propagation feature of the xth video frame from the backward propagation feature of the xth video frame, the distorted forward propagation feature, and the xth video frame.

For example, as shown in FIG. 5, it is possible to predict a second optical flow diagram (denoted by s_x′ in FIG. 5) between the xth video frame (denoted by p_xin FIG. 5) and the (x−1)th video frame (denoted by p_x−1in FIG. 5) from the xth video frame and the (x−1)th video frame, derive a distorted forward propagation feature by feature aligning the forward propagation feature (denoted by f_x−1in FIG. 5) of the (x−1)th video frame with the xth video frame based on the second optical flow diagram s_x⁺, and derive a forward propagation feature (denoted by f_xin FIG. 5) of the xth video frame from the distorted forward propagation feature, the backward propagation feature of the xth video frame, and the xth video frame.

Illustratively, a forward propagation feature of the xth video frame can be determined using a neural network configured to determine a forward propagation feature shown in FIG. 6, in which 601 denotes a convolution module, and 602 denotes a residual module. To be specific, the first step is to derive a distorted forward propagation feature by distorting the forward propagation feature f_x−1of the (x−1)th video frame by means of a second optical flow diagram between the xth video frame and the (x−1)th video frame to construct correspondence between the xth video frame and the forward propagation feature f_x−1of the (x−1)th video frame. The next step is to derive a forward propagation feature f_xof the xth video frame by performing convolution a number of times on the distorted forward propagation feature, the backward propagation feature of the xth video frame, and the xth video frame and using the convolution result as input of the residual module.

deriving a forward propagation feature of the xth video frame by performing feature extraction on the xth video frame; and

using the forward propagation feature of the xth video frame as a reconstruction feature of the xth video frame.

For instance, feature extraction can be performed on a first video frame and optional neighboring frames (a preset number of video frames that are sequentially associated with the first video frame), and the extracted feature is transferred as a forward propagation feature of the first video frame to a second video frame. Doing so allows for predicting a forward propagation feature of the second video frame from the forward propagation feature of the first video frame, and transferring it to a third video frame . . . , until a forward propagation feature of a (N−2)th video frame is predicted based on the forward propagation feature of a (N−1)th video frame. Embodiments of the present disclosure are not meant to limit how to extract a feature from a video frame, and any method capable of image feature extraction is acceptable.

After the forward propagation feature of the first video frame is extracted, it can be used as a reconstruction feature of the first video frame, and then a high-resolution image reconstruction on the first video frame can be performed based on the reconstruction feature of the first video frame, thereby deriving a target video frame corresponding to the first video frame. The target video frame is the high-resolution image of the first video frame. Embodiments of the present disclosure are not meant to limit how to perform image reconstruction on the first video frame, and consulting relevant techniques will work.

deriving a backward propagation feature of the xth video frame by performing feature extraction on the xth video frame; and

using the backward propagation feature of the xth video frame as a reconstruction feature of the xth video frame.

For instance, a feature can be extracted from an Nth video frame and optional neighboring frames (a preset number of video frames that are sequentially associated with the Nth video frame), and the extracted feature is transferred as a backward propagation feature of the Nth video frame to the (N−1)th video frame. Doing so allows for predicting a backward propagation feature of the (N−1)th video frame from the forward propagation feature of the Nth video frame, and transferring it to the (N−2)th video frame . . . , until a backward propagation feature of the second video frame is predicted from the backward propagation feature of the third video frame. Embodiments of the present disclosure are not meant to limit how to extract a feature from a video frame, and any method capable of image feature extraction is acceptable.

After the forward propagation feature of the Nth video frame is extracted, the backward propagation feature of the Nth video frame can be used as a reconstruction feature of the Nth video frame, and then a high-resolution image reconstruction on the Nth video frame can be performed based on the reconstruction feature of the Nth video frame, thereby deriving a target video frame corresponding to the Nth video frame. The target video frame is the high-resolution image of the Nth video frame. Embodiments of the present disclosure are not meant to limit how to perform image reconstruction on the Nth video frame, and consulting relevant techniques will work.

In this way, embodiments of the present disclosure allows for high-resolution reconstruction of all video frames in a video segment by performing feature extraction only on the first video frame and the Nth video frame in the video segment, thereby making reconstruction of high-resolution images more efficient and reducing the computation costs.

acquiring a backward propagation feature of the (x+1)th video frame for the xth video frame;

deriving a forward propagation feature of the xth video frame from the xth video frame and the backward propagation feature of the (x+1)th video frame; and

using the forward propagation feature of the xth video frame as a reconstruction feature of the xth video frame.

For example, a backward propagation feature of the second video frame can be determined using a neural network configured to determine a backward propagation feature shown in FIG. 4. The first step is to acquire a backward propagation feature of the second video frame and derive a distorted backward propagation feature by distorting the backward propagation feature of the second video frame by means of an optical flow diagram between the first video frame and the second video frame to construct correspondence between the first video frame and the backward propagation feature of the second video frame. The next step is to derive a forward propagation feature of the first video frame by performing convolution a number of times on the distorted backward propagation feature and the first video frame and using the convolution result as input of the residual module. The further step is to transfer the forward propagation feature as a reconstruction feature of the first video frame to the second video frame, predict a forward propagation feature of the second video frame from the forward propagation feature of the first video frame, and transfer the forward propagation feature of the second video frame to the third video frame . . . until a backward propagation feature of the Nth video frame is predicted from the forward propagation feature of the (N−1)th video frame.

After the reconstruction feature of the first video frame is determined, the target video frame of the first video frame can be reconstructed using a neural network configured to reconstruct a high-resolution image as shown in FIG. 2.

acquiring a forward propagation feature of the (x−1)th video frame for the xth video frame;

deriving a backward propagation feature of the xth video frame from the xth video frame and the forward propagation feature of the (x−1)th video frame; and

using the backward propagation feature of the xth video frame as a reconstruction feature of the xth video frame.

For example, the first step is to acquire a forward propagation feature of the (N−1)th video frame and derive a distorted forward propagation feature by distorting the forward propagation feature of the (N−1)th video frame by means of an optical flow diagram between the Nth video frame and the (N−1)th video frame to construct correspondence between the Nth video frame and the forward propagation feature of the (N−1)th video frame. The next step is to derive a backward propagation feature of the Nth video frame by performing convolution a number of times on the distorted forward propagation feature and the Nth video frame and using the convolution result as input of the residual module. The further step is to transfer the backward propagation feature as a reconstruction feature of the Nth video frame to the (N−1)th video frame, predict a backward propagation feature of the (N−1)th video frame from the backward propagation feature of the Nth video frame, and transfer the backward propagation feature of the (N−1)th video frame to the (N−2)th video frame . . . until a forward propagation feature of the first video frame is predicted from the backward propagation feature of the second video frame.

In this way, embodiments of the present disclosure allow for high-resolution reconstruction of all video frames in a video segment without extracting a feature from any one of the video frames in the video segment, thereby making reconstruction of high-resolution images more efficient and reducing the computation costs.

In order for a person skilled in the art to better understand embodiments of the present disclosure, the following is an explanation of embodiments of the present disclosure by means of examples.

As shown in FIG. 7, for the video segment S (p₁to p_N), the processing is to derive a backward propagation feature of the Nth video frame by performing feature extraction on the Nth video frame, reconstruct a high-resolution image of the Nth video frame based on the backward propagation feature, transfer the backward propagation feature to the (N−1)th video frame to predict a backward propagation feature of the (N−1)th video frame from the backward propagation feature of the Nth video frame, and transfer the backward propagation feature of the (N−1)th video frame to the (N−2)th video frame . . . until a backward propagation feature of the second video frame is predicted from the backward propagation feature of the second video frame. That is, the backward propagation feature of every video frame in the video segment (p₂to p_N1) can be predicted from the backward propagation feature of a following video frame in the video segment (p₂to p_N−1).

The processing is to derive a forward propagation feature of the first video frame by performing feature extraction on the first video frame, derive the target video corresponding to the first video frame by reconstructing a high-resolution image of the first video frame based on the forward propagation feature, transfer the forward propagation feature of the first video frame to the second video frame to predict a forward propagation feature of the second video frame from the backward propagation feature of the second video frame and the forward propagation feature of the first video frame, derive the target video frame corresponding to the second video frame by reconstructing the second video frame by means of using the forward propagation feature of the second video as the reconstruction feature, transfer the forward propagation feature of the second video frame to the third video frame . . . until a forward propagation feature of the (N−1)th video frame is predicted from the forward propagation feature of the (N−2)th video frame, and derive the target video frame corresponding to the (N−1)th video frame by reconstructing the (N−1)th video frame by means of using the forward propagation feature of the (N−1)th video as the reconstruction feature. That is, the forward propagation feature of every video frame in the video segment (p₂to p_N−1) can be predicted from the forward propagation feature of a preceding video frame in the video segment (p₂to p_N−1), and a corresponding target video frame is derived through reconstruction based on the forward propagation feature.

In a possible implementation, deriving a reconstruction feature of the xth video frame from at least one of the xth video frame, the backward propagation feature of the (x+1)th video frame, and the forward propagation feature of the (x−1)th video frame may comprise:

determining a forward propagation feature of the xth video frame based on the xth video frame, the (x−1)th video frame, and the forward propagation feature of the (x−1)th video frame;

using the backward propagation feature of the xth video frame as a reconstruction feature of the xth video frame.

For example, a backward propagation feature of the xth video frame can be derived by distorting the forward propagation feature of the (x−1)th video frame by means of the xth video frame and the (x−1)th video frame to achieve feature alignment.

deriving a second optical flow diagram from the xth video frame and the (x−1)th video frame;

deriving a distorted forward propagation feature by distorting the forward propagation feature of the (x−1)th video frame based on the second optical flow diagram; and

deriving a forward propagation feature of the xth video frame from the distorted forward propagation feature and the xth video frame.

For example, the processing is to predict a second optical flow diagram between the xth video frame and the (x−1)th video frame from the xth video frame and the (x−1)th video frame, derive a distorted forward propagation feature by feature aligning the forward propagation feature of the (x−1)th video frame with the xth video frame based on the second optical flow diagram to construct correspondence between the xth video frame and the forward propagation feature of the (x−1)th video frame, and further derive a forward propagation feature of the xth video frame from the distorted forward propagation feature and the xth video frame. As an example, a forward propagation feature of the xth video frame can be derived by performing convolution a number of times on the distorted forward propagation feature and the xth video frame and using the convolution result as input of the residual module.

After the forward propagation feature of the xth video frame is acquired, a backward propagation feature of the xth video frame can be determined from its forward propagation feature.

deriving a first optical flow diagram from the xth video frame and the (x+1)th video frame;

deriving a distorted backward propagation feature by distorting the backward propagation feature of the (x+1)th video frame based on the first optical flow diagram; and

deriving a backward propagation feature of the xth video frame from the forward propagation feature of the xth video frame, the distorted backward propagation feature, and the xth video frame.

For example, the processing may be to predict a first optical flow diagram between the xth video frame and the (x+1)th video frame from the xth video frame and the (x+1)th video frame, derive a distorted forward propagation feature by feature aligning the backward propagation feature of the (x+1)th video frame with the xth video frame based on the second optical flow diagram to construct correspondence between the xth video frame and the backward propagation feature of the (x+1)th video frame, and further derive a backward propagation feature of the xth video frame from the distorted backward propagation feature, the forward propagation feature of the xth video frame, and the xth video frame. As an example, a backward propagation feature of the xth video frame can be derived by performing convolution a number of times on the distorted backward propagation feature, the forward propagation feature of the xth video frame, and the xth video frame and using the convolution result as input of the residual module.

In order for a person skilled in the art to better understand embodiments of the present disclosure, the following is an explanation of embodiments of the present disclosure by means of examples.

As shown in FIG. 8, for the video segment S (p₁to p_N), the processing is to derive a forward propagation feature of the first video frame by performing feature extraction on the first video frame, reconstruct a high-resolution image of the first video frame based on the forward propagation feature, transfer the forward propagation feature to the second video frame to predict a forward propagation feature of the second video frame from the forward propagation feature of the first video frame, and transfer the forward propagation feature of the second video frame to the third video frame . . . until a forward propagation feature of the (N−1)th video frame is predicted from the forward propagation feature of the (N−2)th video frame. That is, it is possible to predict the forward propagation feature of every video frame in the video segment (p₂to p_N−1) from the forward propagation feature of a preceding video frame in the video segment (p₂to p_N−1).

The processing is to derive a backward propagation feature of the Nth video frame by performing feature extraction on the Nth video frame, derive the target video corresponding to the Nth video frame by reconstructing a high-resolution image of the Nth video frame based on the backward propagation feature, transfer the backward propagation feature of the Nth video frame to the (N−1)th video frame to predict a backward propagation feature of the (N−1)th video frame from the forward propagation feature of the (N−1)th video frame and the backward propagation feature of the Nth video frame, derive the target video frame corresponding to the (N−1)th video frame by reconstructing the (N−1)th video frame by means of using the backward propagation feature of the (N−1)th video as the reconstruction feature, transfer the backward propagation feature of the (N−1)th video frame to the (N−2)th video frame . . . until a backward propagation feature of the second video frame is predicted from the backward propagation feature of the third video frame, and derive the target video frame corresponding to the second video frame by reconstructing the second video frame by means of using the backward propagation feature of the second video as the reconstruction feature. That is, it is possible to predict the backward propagation feature of every video frame in the video segment (p₂to p_N−1) from the backward propagation feature of a following video frame the video segment (p₂to p_N−1) and derive a corresponding target video frame through reconstruction based on the backward propagation feature.

It can be appreciated that the various method embodiments mentioned above in the present disclosure may all be combined, without departing from principles and logics, with each other to form a combined embodiment. In this regard, no more detail is given herein. It can be appreciated by a person skilled in the art that in the methods described in the above embodiments, the order of carrying out the steps should be determined by their functions and logic.

The present disclosure further provides an image processing device, an electronic apparatus, a computer-readable storage medium, and a program that are capable of implementing any of the image processing methods provided by the present disclosure. For details of the image processing device, electronic apparatus, computer-readable storage medium, and program, see the foregoing description of the methods.

FIG. 9 shows a block diagram of an image processing device according to an embodiment of the present disclosure. As shown in FIG. 9, the image processing device comprises:

an acquiring module 901 configured to acquire at least one of a backward propagation feature of an (x+1)th video frame in a video segment and a forward propagation feature of an (x−1)th video frame in the video segment, wherein the video segment includes N video frames, N being an integer greater than 2, and x being an integer;

a first processing module 902 configured to derive a reconstruction feature of the xth video frame from at least one of the xth video frame, the backward propagation feature of the (x+1)th video frame, and the forward propagation feature of the (x−1)th video frame; and

a second processing module 903 configured to derive a target video frame corresponding to the xth video frame by reconstructing the xth video frame based on the reconstruction feature of the xth video frame, wherein the target video frame has resolution higher than that of the xth video frame.

In some embodiments of the present disclosure, the functions possessed or modules contained by the device provided by embodiments of the present disclosure can be used to perform the methods described in the above method embodiments. For implementation and technical effects of the device, see the foregoing description of the method embodiments, and no more detail is given herein for conciseness sake.

In a possible implementation, in the case of 1<x<N, the first processing module may further be configured to determine a backward propagation feature of the xth video frame based on the xth video frame, the (x+1)th video frame, and the backward propagation feature of the (x+1)th video frame;

use the forward propagation feature of the xth video frame as a reconstruction feature of the xth video frame.

In a possible implementation, the first processing module may further be configured to derive a first optical flow diagram from the xth video frame and the (x+1)th video frame;

derive a distorted backward propagation feature by distorting the backward propagation feature of the (x+1)th video frame based on the first optical flow diagram; and

derive a backward propagation feature of the xth video frame from the distorted backward propagation feature and the xth video frame.

In a possible implementation, the first processing module may further be configured to derive a second optical flow diagram from the xth video frame and the (x−1)th video frame;

derive a distorted forward propagation feature by distorting the forward propagation feature of the (x−1)th video frame based on the second optical flow diagram; and

derive a forward propagation feature of the xth video frame from the backward propagation feature of the xth video frame, the distorted forward propagation feature, and the xth video frame.

In a possible implementation, in the case of 1<x<N, the first processing module may further be configured to determine a forward propagation feature of the xth video frame based on the xth video frame, the (x−1)th video frame, and the forward propagation feature of the (x−1)th video frame;

use the backward propagation feature of the xth video frame as a reconstruction feature of the xth video frame.

In a possible implementation, the first processing module may further be configured to derive a second optical flow diagram from the xth video frame and the (x−1)th video frame;

derive a distorted forward propagation feature by distorting the forward propagation feature of the (x−1)th video frame based on the second optical flow diagram; and

derive a forward propagation feature of the xth video frame from the distorted forward propagation feature and the xth video frame.

In a possible implementation, the first processing module may further be configured to derive a first optical flow diagram from the xth video frame and the (x+1)th video frame;

derive a distorted backward propagation feature by distorting the backward propagation feature of the (x+1)th video frame based on the first optical flow diagram; and

derive a backward propagation feature of the xth video frame from the forward propagation feature of the xth video frame, the distorted backward propagation feature, and the xth video frame.

In a possible implementation, in the case of x=1, the first processing module may further be configured to:

derive a forward propagation feature of the xth video frame by performing feature extraction on the xth video frame; and

use the forward propagation feature of the xth video frame as a reconstruction feature of the xth video frame.

In a possible implementation, in the case of x=N, the first processing module may further be configured to:

acquire a backward propagation feature of the xth video frame by performing feature extraction on the xth video frame; and