This patent application claims the benefit and priority of Chinese Patent Application No. 202110196819.X, entitled “SUPER-RESOLUTION IMAGE RECONSTRUCTION METHOD BASED ON DEEP CONVOLUTIONAL SPARSE CODING”, filed with the Chinese State Intellectual Property Office on Feb. 22, 2021, which is incorporated by reference in its entirety herein.
The present disclosure belongs to the technical field of super-resolution (SR) image reconstruction, and particularly relates to an SR image reconstruction method based on deep convolutional sparse coding (DCSC).
Currently, as a classical problem in digital imaging and computer low-level vision, SR image reconstruction aims to construct high-resolution (HR) images with single-input low-resolution (LR) images, and has been widely applied to various fields from security and surveillance imaging to medical imaging and satellite imaging requiring more image details. Since visual effects of the images are affected by imperfect imaging systems, transmission media and recording devices, there is a need to perform the SR reconstruction on the images to obtain high-quality digital images.
In recent years, the SR image reconstruction method has been widely researched in the computer vision, and the known SR image reconstruction methods are mainly classified into two types of methods, namely the interpolation-based methods and the modeling-based methods. The interpolation-based methods such as Bicubic interpolation and Lanzcos resampling methods will cause the over-smoothing phenomenon of the images in spite of the high implementation efficiency. On the contrary, iterative back projection (IBP) methods may generate images with over sharpened edges. Hence, many image interpolation methods are applied to a post-processing (edge sharpening) stage of the IBP methods. The modeling-based methods are intended to use mappings from LR images to HR images for modeling. For example, sparse coding methods are to reconstruct HR image blocks with sparse representation coefficients of LR image blocks, and such sparse prior-based methods are typical SR reconstruction methods; self-similarity methods are to add structural self-similarity information of LR image blocks to the reconstruction process of the HR images; and neighbor embedding methods are to embed neighbors of LR image blocks into nearest atoms in dictionaries and pre-calculate corresponding embedded matrices to reconstruct HR image blocks. During solving of these methods, each step is endowed with specific mathematical and physical significances, which ensures that these methods can be interpreted and correctly improved under the theoretical guidance. and yield the desirable effect; and particularly, sparse models gain significant development in the field of SR reconstruction. Nevertheless, there are usually two main defects for most of these methods, specifically, the methods are complicated in term of calculation during optimization, making the reconstruction time-consuming; and these methods involve manual selection of many parameters, such that the reconstruction performance is to be improved to some extent.
In order to break through limitations of the above classical methods, the deep learning-based model as a pioneer, namely the SR convolutional neural network (SRCNN), emerges and brings a new direction. The method predicts the mapping from nonlinear LR images to HR images through a fully convolutional network (FCN), indicating that all SR information is obtained through data learning, namely parameters in the network are adaptively optimized through backpropagation (BP). This method makes up the shortages of the classical learning methods and yields better performance. However, the above method has its limitations, specifically, the uninterpretable network structure can only be designed through repeated testing and is hardly improved; and the method depends on the context of small image regions and is insufficient to restore the image details. Therefore, a novel SR image reconstruction method is to be provided urgently.
Through the above analysis, there are the following problems and defects in the prior art:
(1) The existing SRCNN structure is uninterpretable and can only be designed through repeated testing and is hardly improved; and
(2) the existing SRCNN depends on the context of the small image regions and is insufficient to restore the image details.
The difficulties for solving the above problems and defects lie in that: the existing SRCNN structure is uninterpretable and can only be designed through repeated testing and is hardly improved; and the structure depends on the context of the small image regions and is insufficient to restore the image details.
Solving the above problems and defects are helpful in: breaking through the limitations of the classical methods; the interpretability of the network being able to instruct us to design a better network architecture to improve the performance, rather than stack network layers simply; and expanding the context of the image regions to better restore the image details.
In view of the problems of the conventional art, the present disclosure provides an SR image reconstruction method based on DCSC.
The present disclosure is implemented as follows: An SR image reconstruction method based on DCSC includes the following steps:
step 1: embedding a multi-layer learned iterative soft thresholding algorithm (ML-LISTA) into a deep convolutional neural network (DCNN), adaptively updating all parameters of the ML-LISTA with a learning ability of the DCNN, and constructing an SR multi-layer convolutional sparse coding (SRMCSC) network which is an interpretable end-to-end supervised neural network for SR image reconstruction, where an interpretability of the network may be helpful to better design a network architecture to improve performance, rather than simply stack network layers; and
step 2: introducing residual learning, extracting a residual feature with the ML-LISTA, and reconstructing an HR image in combination with the residual feature and an input image, thereby accelerating a training speed and a convergence speed of the SRMCSC network.
In some embodiments, in constructing a multi-layer convolutional sparse coding (ML-CSC) model in step 1:
sparse coding (SC) is implemented to find a sparsest representation γ∈RM of a signal y∈RN in a given overcomplete dictionary A∈RN×M(M>N), which is expressed as y=Aγ; and a γ problem which is also called a Lasso or 1 regularization BP problem is solved:
where, a constant α is used to weigh a reconstruction item and a regularization item; and an update ecmation of an iterative soft thresholding algorithm (ISTA) may be written as:
where, γi represents an ith iteration update, L is a Lipschitz constant, and Sρ(·)is a soft thresholding operator with a threshold ρ; and the soft thresholding operator is defined as follows:
In some embodiments, constructing an ML-CSC model in step 1 may further include: proposing a convolutional sparse coding (CSC) model to perform SC on a whole image, where the image may be obtained by performing convolution on m local filters di,∈Rn(n<<N) and corresponding feature maps γi∈RN thereof and linearly combining resultant convolution result, which is expressed as
and corresponding to equation (1), an optimization problem of the CSC model may be written as:
and
converting the filters into a banded circulant matrix to construct a special global convolutional dictionary D∈RN×mN, thereby x=Dγ, where in the convolutional dictionary D, all small blocks each serve as a local dictionary, and have a same size of nxm elements, with filters {di}i=1m as respective columns; the CSC model (3) may be considered as a special form of an SC model (1), matrix multiplication in equation (2) of the ISTA is replaced by a convolution operation, and the CSC problem (3) may also be solved by the LISTA.
A thresholding operator may be a basis of a convolutional neural network (CNN) and the CSC model; by comparing a rectified linear unit (ReLU) in the CNN with a soft thresholding function, the ReLU and the soft thresholding function may keep consistent in a non-negative part; and for a non-negative CSC model, a corresponding optimization problem (1) may be added with a constraint to allow a result to be positive:
naturally, a resulting problem may be whether the constraint affects an expressive ability of an original sparse model; as a matter of fact, there may be no doubt because a negative coefficient of the original sparse model may be transferred to a dictionary; and for a given signal y=Dγ, the signal may be written as:
y=Dγ
where, γ may be divided into γ+ and γ−, γ+ includes a positive element, γ− includes a negative element, and both the γ+ and the −γ− are non-negative; apparently, a non-negative sparse representation [γ+ −γ−]T may be allowable for the signal y in a dictionary [D −D]; and therefore, each SC may be converted into non-negative SC (NNSC), and the NNSC problem (4) may also be solved by the soft thresholding algorithm; a non-negative soft thresholding operator Sρ+ is defined as:
meanwhile, assuming that γ0=0, an iteration update of γ in the problem (4) may be written as:
the non-negative soft thresholding operator is equivalent to an ReLU function:
S
ρ
+(z)=max(z−ρ,0)=ReLU(z−ρ) (7)
therefore, equation (6) is equivalently written as:
where, a bias vector b corresponds to a threshold
and in other words, α is a hyper-parameter in the SC, but a learning parameter in the CNN; furthermore, dictionary learning may be completed through D=WT; and therefore, the non-negative soft thresholding operator for the CSC model is closely associated with the CNN.
In some embodiments, constructing an ML-CSC model in step 1 may further include:
assuming that a convolutional dictionary D may be decomposed into multiplication of multiple matrices, namely x=D1D2 . . . DLγL; and describing the ML-CSC model as:
where, γi is a sparse representation of an ith layer and also a signal of an (i+1)th layer, and Di, is a convolutional dictionary of the ith layer and a transpose of a convolutional matrix; an effective dictionaryl {Di}iL=1 serves as an analysis operator for causing a sparse representation of a shallow layer to be less sparse; consequently, different representation layers are used in an analysis-based prior and a synthesis-based prior, such that prior information may not only constrain a sparsity of a sparse representation of a deepest layer, but also allows the sparse representation of the shallow layer to be less sparse; the ML-CSC is also a special form of an SC(1) model; and therefore, for a given signal γo=γ, an optimization object of the ith layer in the ML-CSC model may be written as:
where, αi, is a regularization parameter of the ith layer; similar to equation (2), the ISTA is used to obtain an update of γ in the problem (9); the ISTA is repeated to obtain an ML-ISTA of {γi}iL=, and the ML-ISTA converges at a rate of
to a globally optimal solution of the ML-CSC; and proposing the ML-LISTA which is configured to be approximate to the SC of the ML-ISTA through learning parameters from data,
where, (I−WiTWi) {circumflex over (γ)}i+BiTγi−1k+1 replaces an iterative operator
a dictionary Di, in the ML-LISTA is decomposed into two dictionaries Wi, and Bi with a same size, and the dictionaries Wi, and Bi each are also constrained as a convolutional dictionary to control a number of parameters; and if a deepest sparse representation with an initial condition of γL1=0 is found through only one iteration, the representation may be rewritten as:
γL=PρL((BLTPρL−1( . . . Pρ1(B1Ty)))) (10)
In some embodiments, if a non-negative assumption similar to equation (4) is made to a sparse representation coefficient, a thresholding operator P may be a non-negative projection; a process of obtaining a deepest sparse representation may be equivalent to that of obtaining a stable solution of a neural network, namely forwarding propagation of the CNN may be understood as a tracing algorithm for obtaining a sparse representation with a given input signal; a dictionary Di in the ML-CSC model may be embedded into a learnable convolution kernel of each of the Wi and the Bi, namely a dictionary atom in BiT (or WiT) may represent a convolutional filter in the CNN, and the Wi and the Bi each may be modeled with an independent convolutional kernel; and a threshold ρi may be parallel to a bias vector b1, and a non-negative soft thresholding operator may be equivalent to an activation function ReLU of the CNN.
In some embodiments, establishment of the SRMCSC network may include two steps: an ML-LISTA feature extraction step and an HR image reconstruction step; the network may be an end-to-end system, with an LR image y as an input, and a directly generated and real HR image x as an output; and a depth of the network may be only related to a number of iterations.
Further, in step 1, each layer and each skip connection in the SRMCSC network may strictly correspond to each step of a processing flow of a three-layer LISTA, an unfolded algorithm framework of the three-layer LISTA may serve as a first constituent part of the SRMCSC network, and first three layers of the network may correspond to a first iteration of the algorithm; a middle hidden layer having an iterative update in the network may include update blocks; and thus the proposed network may be interpreted as an approximate algorithm for solving a multi-layer BP problem.
Further, in step 2, the residual learning may be implemented by performing K iterations to obtain a sparse feature mapping γSK, estimating a residual image according to a definition of the ML-CSC model and in combination with the sparse feature mapping and a dictionary, an estimated residual image U mainly including highly frequent detail information, and obtaining a final HR image x through equation (11) to serve as a second constituent part of the network:
x=U+y (11).
Performance of the network may only depend on an initial value of a parameter, a number of iterations K and a number of filters; and in other words, thereof the network may only increase the number of iterations without introducing an additional parameter, and parameters of the filters to be trained by the model may only include three dictionaries with a same size.
Further, a loss function that is a mean squared error (MSE) may be used in the SRMCSC network:
N training pairs {yi, xi}i=1N, namely LR-HR patch pairs, may be given to minimize a following objective function:
where, ƒ(·) is the SRMCSC network, Θ represents all trainable parameters, and an Adam optimization program is used to optimize the parameters of the network.
Another object of the present disclosure is to provide a computer program product stored on a non-transitory computer readable storage medium, including a computer readable program, configured to provide, when executed on an electronic device, a user input interface to implement the SR image reconstruction method based on DCSC.
Another object of the present disclosure is to provide a non-transitory computer readable storage medium, storing instructions, and configured to enable, when run on a computer, the computer to execute the SR image reconstruction method based on DCSC.
With the above technical solutions, the present disclosure has the following advantages and beneficial effects: The SR image reconstruction method based on DCSC provided by the present disclosure proposes the interpretable end-to-end supervised neural network for the SR image reconstruction, namely the SRMCSC network, in combination with the ML-CSC model and the DCNN. The network has the compact structure, easy implementation and desirable interpretability. Specifically, the network is implemented by embedding the ML-LISTA into the DCNN, and adaptively updating all parameters in the ML-LISTA with the strong learning ability of the DCNN. Without introducing additional parameters, the present disclosure can get a deeper network by increasing the number of iterations, thereby expanding context information of a receiving domain in the network. However, while the network gets deeper gradually, the convergence speed becomes a key problem for training. Therefore, the present disclosure introduces the residual learning, extracts the residual feature with the ML-LISTA, and reconstructs the HR image in combination with the residual feature and the input image, thereby accelerating the training speed and the convergence speed. In addition, compared with multiple state-of-the-art relevant methods, the present disclosure yields the best reconstruction effect qualitatively and quantitatively.
To describe the technical solutions in embodiments of the present disclosure more clearly, the following briefly describes the accompanying drawings that need to be used in the embodiments. Apparently, the accompanying drawings in the following description show merely some embodiments of the present disclosure, and a person of ordinary skills in the art may derive other drawings from these accompanying drawings without creative efforts.
To make the objects, technical solutions and advantages of the present disclosure clearer and more comprehensible, the present disclosure will be further described below in detail in conjunction with embodiments. It should be understood that the specific embodiments described herein are merely intended to explain but not to limit the present disclosure.
In view of the problems of the prior art, the present disclosure provides an SR image reconstruction method based on DCSC. The present disclosure is described below in detail in combination with the accompanying drawings.
As shown in
In step S101, the ML-LISTA of ML-CSC model is embedded into DCNN, to adaptively update all parameters in the ML-LISTA with a learning ability of the DCNN, and thus an interpretable end-to-end supervised neural network for SR image reconstruction, namely an SRMCSC network is construed.
In step S102, residual learning is introduced, to extract a residual feature with the ML-LISTA, and reconstruct an HR image in combination with the residual feature and an input image, thereby accelerating a training speed and a convergence speed of the SRMCSC network.
The SR image reconstruction method based on DCSC according to the present disclosure may also be implemented by the person of ordinary skills in the art with other steps.
Technical solutions of the present disclosure are further described below in conjunction with the embodiments.
The present disclosure proposes the interpretable end-to-end supervised neural network for the SR image reconstruction, namely the SRMCSC network, in combination with the ML-CSC model and the DCNN. The network has the compact structure, easy implementation and desirable interpretability. Specifically, the network is implemented by embedding the ML-LISTA into the DCNN, and adaptively updating all parameters in the ML-LISTA with the strong learning ability of the DCNN. Without introducing additional parameters, the present disclosure can obtain a deeper network by increasing the number of iterations, thereby expanding context information of a receptive field in the network. However, while the network gets deeper gradually, the convergence speed becomes a key problem for training. To solve this problem, the present disclosure introduces the residual learning, to extract the residual feature with the ML-LISTA, and reconstruct the HR image in combination with the residual feature and the input image, thereby accelerating the training speed and the convergence speed of the network. In addition, compared with multiple state-of-the-art relevant methods, the present disclosure yields the best reconstruction effect qualitatively and quantitatively.
The present disclosure provides a novel method for solving the SR reconstruction problem. An SR convolutional neural network, named as the SRMCSC network and as shown in
In
The network structure mainly includes the iterative algorithm for solving regularized optimization of multi-layer sparsity, namely ML-LISTA, and the residual learning. The present disclosure mainly use the residual learning, since the LR image and the HR image are similar to a great extent, with the difference as shown by Residual in
Therefore, the proposed SRMCSC is the interpretable end-to-end supervised neural network inspired from the ML-CSC model; and the network is a recursive network architecture having skip connections, is useful for the SR image reconstruction, and contains network layers strictly corresponding to each step in the processing flow of the unfolding three-layer ML-LISTA model. More specifically, the soft thresholding function in the algorithm is replaced by the ReLU activation function, and all parameters and filter weights in the network are updated by minimizing a loss function with BP. Different from the SRCNN, on one hand, the present disclosure can initialize the parameters in the SRMCSC with a more principled method upon a correct understanding of the physical significance of each layer, which is helpful to improve the optimization speed and quality. On the other hand, the network is data-driven, and is a novel interpretable network designed in combination with neighborhood knowledge and deep learning. The SRMCSC method proposed by the present disclosure and four typical SR methods are all subjected to benchmark testing on the test sets Set5, Set14 and BSD100. Compared with the typical SR methods, including Bicubic interpolation, sparse coding presented by Zeyde et al., local linear neighborhood embedding (NE+LLE), and anchored neighborhood regression (ANR),, the method of the present disclosure exhibits an obvious average PSNR gain of about 1-2 dB under all scale factors. Compared with the deep learning method which is the SRCNN, the method of the present disclosure exhibits an obvious average PSNR gain of about 0.4-1 dB under all scale factors; and particularly, when the scale factor is 2, the average PSNR value of the method on the test set Set5 is 1 dB higher than that of the SRCNN. Therefore, the method of the present disclosure is more accurate and effective than other methods.
To sum up, the work of the present disclosure is summarized as follows:
(1) The present disclosure provides the interpretable end-to-end CNN for the SR reconstruction, namely the SRMCSC network, with the architecture inspired from the processing flow of the unfolding three-layer ML-LISTA model. The network gets deeper by increasing the number of iterations without introducing additional parameters.
(2) With the residual learning, the method of the present disclosure accelerates the convergence speed in the deep network training to improve the learning efficiency.
(3) Compared with multiple state-of-the-art relevant methods, the present disclosure yields the best reconstruction effect qualitatively and quantitatively and is less time-consuming.
The present disclosure describes the ML-CSC model from the SC. The SC has been widely applied in image processing. Particularly, steady progresses have been made by the sparse model for a long time in the SR reconstruction field. The SC aims to find a sparsest representation γ∈RM of a signal y ∈RN in a given overcomplete dictionary A∈RN×M (M>N), namely y=Aγ; and a γ problem which is also called a Lasso or 1-regularization BP problem is solved:
where, a constant α is used to weigh a reconstruction item and a regularization item. The problem can be solved by various classical methods such as orthogonal matching pursuit (OMP) and basis pursuit (BP), and particularly the ISTA is a prevalent and effective method to solve the problem (1). An update equation of the ISTA may be written as:
where, γi represents an ith iteration update, L is a Lipschitz constant, and Sp(·)is a soft thresholding operator with a threshold ρ. The soft thresholding operator is defined as follows:
In order to improve the timeliness of the ISTA, the “learning version” of the ISTA, namely the learned iterative soft thresholding algorithm (LISTA), is proposed. The LISTA is configured to be approximate to the SC of the ISTA through learning parameters from data. However, most SC-based methods are implemented by segmenting the whole image into overlapping blocks to relieve the modeling and calculation burdens. These methods ignore the consistency between the overlapping blocks to cause the difference between the global image and the local image. In view of this, a convolutional sparse coding (CSC) model is proposed to perform the SC on a whole image, where the image may be obtained by performing convolution on m local filters di∈Rn(n<<N) and corresponding feature maps γi∈RN thereof and linearly combining the convolution results, namely
and corresponding to equation (1), an optimization problem of the CSC model may be written as:
Although solutions for equation (3) have been proposed, the convolution operation may be executed as matrix multiplication, and is implemented by converting the filters into a banded circulant matrix to construct a special convolutional dictionary D∈RN×mN, namely x=Dγ. As shown in
In some work, it is proposed that the calculation efficiency of the CSC is effectively improved in combination with the calculation ability of the CNN, to allow the model to be more adaptive. The thresholding operator is a basis for a CNN and a CSC model; by comparing an ReLU in the CNN with a soft thresholding function, the ReLU and the soft thresholding function keep consistent in a non-negative part, as shown in
Naturally, a resulting problem is whether the constraint affects an expressive ability of an original sparse model. As a matter of fact, there is no doubt that because a negative coefficient of the original sparse model may be transferred to a dictionary, for a given a signal y=Dγ, the signal may be written as:
y=Dγ
++(−D) (5)
where, γ may be divided into γ+ and γ−, γ+ includes a positive element, γ− includes a negative element, and both the γ+ and the −γ− are non-negative. Apparently, a non-negative sparse representation [γ+−γ−]T is allowable for the signal y in a dictionary [D-D]. Therefore, each SC may be converted into non-negative SC (NNSC), and the NNSC problem (4) may also be solved by the soft thresholding algorithm. In the present disclosure, a non-negative soft thresholding operator Sρ+may be defined as:
Meanwhile, it is assumed that γ0=0, thus an iterative update of γ in the problem (4) may be written as:
In combination with the activation function ReLU in the typical CNN, the non-negative soft thresholding operator is apparently equivalent to an ReLU function:
S
ρ
+(z)=max(z−ρ, 0)=ReLU(z−ρ) (7)
Therefore, equation (6) is equivalently written as:
where, a bias vector b corresponds to a threshold
and in other words, α is a hyper-parameter in the SC, but a learning parameter in the CNN. Furthermore, dictionary learning may be completed through D=WT. Therefore, the non-negative soft thresholding operator for the CSC model is closely associated with the CNN.
In recent years, with the inspiration that the double sparse performance accelerates the training process, the ML-CSC model has been proposed. It is assumed that the convolutional dictionary D may be decomposed into multiplication of multiple matrices, namely x=D1D2 . . . DLγL. The ML-CSC model may be described as:
where, γi is a sparse representation of an ith layer and also a signal of an (i+l)th layer, and Di, is a convolutional dictionary of the ith layer and a transpose of a convolutional matrix. An effective dictionary {Di}i=1L serves as an analysis operator, to making a sparse representation of a shallow layer less sparse. Consequently, different representation layers are used in an analysis-based prior and a synthesis-based prior, such that prior information may not only constrain a sparsity of a sparse representation of a deepest layer, but also make the sparse representation of the shallow layer less sparse. The ML-CSC is also a special form of an SC(1) model. Therefore, for a given signal (such as an image), it is assumed that γo=y′, an optimization object of the ith layer in the ML-CSC model may be written as:
where, αi is a regularization parameter of the ith layer. Similar to equation (2), an ISTA may be used to obtain an update of γl in the problem (9). The algorithm is repeated to obtain an ML-ISTA of {γi}i=1L, and it is proved that the ML-ISTA converges at a rate of
to a globally optimal solution of the ML-CSC. With the inspiration from the LISTA, the ML-LISTA, as described by the algorithm 1, is proposed.
indicates data missing or illegible when filed
Where, (I−WiTWi){circumflex over (γ)}i+BiTγi−1k+1 replaces an iterative operator
a dictionary Di in the ML-LISTA is decomposed into two dictionaries Wi and Bi with a same size, and each of the dictionaries Wi and Bi is also constrained as a convolutional dictionary to control a number of parameters. An interesting point is that if a deepest sparse representation with an initial condition of γL1=0 is found through only one iteration, the representation can be rewritten as:
γL=PρL((BLTPρL−1( . . . Pρ1(B1Ty)))) (10)
Further, if a non-negative assumption similar to equation (4) is made to a sparse representation coefficient, a thresholding operator P is a non-negative projection. A process of obtaining a deepest sparse representation is equivalent to that of obtaining a stable solution of a neural network, namely forwarding propagation of the CNN may be understood as a tracing algorithm for obtaining a sparse representation with a given input signal (such as an image). In other words, a dictionary Di in the ML-CSC model is embedded into a learnable convolution kernel of each of the Wi and the Bi, that is a dictionary atom (a column in the dictionary) in BiT(or WiT) represents a convolutional filter in the CNN. In order to make a full use of the advantages of the deep learning, each of the Wi and the Bi is modeled with an independent convolutional kernel. A threshold ρi is parallel to a bias vector bi, and a non-negative soft thresholding operator is equivalent to an activation function ReLU of the CNN. However, as the number of iterations increases, the situation becomes more complicated, and the unfolding ML-LISTA algorithm will result in a recursive neural network having skip connections. Therefore, how to develop the network of the present disclosure on the basis of the ML-CSC model and convert the network into a network for the SR reconstruction will be described in the next section.
3. SRMCSC Network
The present disclosure illustrates the framework of the proposed SRMCSC network in
3.1 Network Structure
The network architecture proposed by the present disclosure for the SR reconstruction is inspired from the unfolding ML-LISTA. It is empirically noted by the present disclosure that a three-layer model is sufficient to solve the problem of the present disclosure. Each layer and each skip connection in the SRMCSC network strictly correspond to each step of a processing flow of a three-layer LISTA, an algorithm framework is unfolded to serve as a first constituent part of the SRMCSC network, as shown in
x=U+y (11)
Performance of the network only depends on an initial value of a parameter, a number K of iterations and a number of filters. In other words, the network only needs to increase the number of iterations but not introduce an additional parameter, and parameters of the filters to be trained by the model only include three dictionaries with a same size. In addition, it is to be noted that, different from other empirical networks, each of the skillful skip connections in the network can be theoretically explained.
3.2 Loss Function
MSE is the most common loss function in image applications. The MSE is still used in the present disclosure. N training pairs {yi, xi}i=1N, namely LR-HR patch pairs, are given to minimize a following objective function:
where, ƒ(·) is the SRMCSC network of the present disclosure, Θ represents all trainable parameters, and an Adam optimization program is used to optimize the parameters of the network
4. Experiments and Results
4.1 Datasets
The present disclosure takes 91 common images in SR reconstruction literatures as a training set. All models of the present disclosure are learned from the training set. In view of limitations of a memory of the graphics processing unit (GPU), sub-images for training have a size of 33. Therefore, the dataset including the 91 images can be decomposed into 24,800 sub-images, and these sub-images are extracted from the original image at a step size of 14. The benchmark testing is performed on datasets Set5, S et14 and BSD100.
4.2 Parameter Settings
During work of the present disclosure, the present disclosure uses an Adam solver having a minimum batch size of 16; and for other hyper-parameters of the Adam, the present disclosure uses default settings. The learning rate of the Adam is fixed at 10−4, the epoch is set as 100 and is far less than that of the SRCNN, and training one SRMCSC network takes about an hour and a half. All tests of the model in the present disclosure are conducted in the pytorch environment python3.7.6, which is run on the personal computer (PC) that is provided with the Intel Xeon E5-2678 V3 central processing unit (CPU) and the Nvidia RTX 2080Ti GPU. Each of the convolutional kernels has a size of 3×3, the number of filters on each layer is the same. Now, how to set the number of filters and the number of iterations is described below.
4.2.1 Settings the Number of Filters and the Number of Iterations
The present disclosure is to investigate influences of different model configurations on performance of the network. As the network structure of the present disclosure is inspired from the unfolding three-layer LISTA, the present disclosure can improve the performance by adjusting the number R of filters and the number K of iterations on each layer. It is to be noted that the number of filters on each layer is the same in the present disclosure. In addition, it is to be noted that, the network can get deeper by increasing the number of iterations without introducing additional parameters. The present disclosure tests different combinations of the number of filters and the number of iterations on the dataset Set5 under the scale factor ×2, and makes comparisons in the SR reconstruction performance. Specifically, the testing is performed under a condition where the number of filters is R∈{32, 64, 128, 256}, and the number of iterations is K∈11, 2, 31. With results as shown in Table 1, when the number of iterations is the same, and the number of filters is increased from 32 to 128, the PSNR is increased more obviously. In order to equilibrate the effectiveness and the efficiency, the present disclosure selects R=64 and K=3 as default settings.
4.3 Comparisons with State-of-the-Art Methods
In the present disclosure, in order to evaluate the SR image reconstruction performance of the SRMCSC network, the method of the present disclosure is qualitatively and quantitatively compared with four state-of-the-art SR methods, including Bicubic interpolation, SC presented by Zeyde et al., NE+LLE, ANR and SRCNN. Average results of all comparative methods on three test sets are as shown in Table 2, and the best result is boldfaced. The results indicate that the SRMCSC network is superior to other SR methods in term of PSNR value on all test sets and under all scale factors. Specifically, compared with the classical SR methods, including Bicubic interpolation, SC presented by Zeyde et al., NE+LLE, and ANR, the method of the present disclosure exhibits an obvious average PSNR gain of about 1-2 dB under all scale factors. Compared with the deep learning method which is the SRCNN, the method of the present disclosure exhibits an average PSNR gain of about 0.4-1 dB under all scale factors. Particularly, when the scale factor is 2, the average PSNR value of the method on the Set5 is 1 dB higher than that of the SRCNN.
The table shows the comparisons of the method of the present disclosure with other methods.
The present disclosure proposes a novel SR deep learning method, namely, the interpretable end-to-end supervised convolutional network (SRMCSC network) is established in combination with the MI-LISTA and the DCNN, for the SR reconstruction. Meanwhile, with the interpretability, the present disclosure can better design the network architecture to improve the performance, rather than simply stack network layers. In addition, the present disclosure introduces the residual learning to the network, thereby accelerating the training speed and the convergence speed of the network. The network can get deeper by directly changing the number of iterations, without introducing additional parameters. Experimental results indicate that the SRMCSC network can generate visually attractive results to offer a practical solution for the SR reconstruction.
The above embodiments may be implemented completely or partially by using software, hardware, firmware, or any combination thereof When the above embodiments are implemented in the form of a computer program product in whole or part, the computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the procedures or functions according to the embodiments of the present disclosure are all or partially generated. The computer may be a general-purpose computer, a dedicated computer, a computer network, or another programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or may be transmitted from a computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center to another website, computer, server, or data center in a wired (for example, a coaxial cable, an optical fiber, or a digital subscriber line (DSL)) or wireless (for example, infrared, radio, and microwave) manner. The computer-readable storage medium may be any usable medium accessible by a computer, or a data storage device, such as a server or a data center, integrating one or more usable media. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, or a magnetic tape), an optical medium (for example, a digital video disc (DVD), a semiconductor medium (for example, a solid state disk (SSD)), or the like.
The foregoing are merely descriptions of the specific embodiments of the present disclosure, and the protection scope of the present disclosure is not limited thereto. Any modification, equivalent replacement, improvement and the like made within the technical scope of the present disclosure by a person skilled in the art according to the spirit and principle of the present disclosure shall fall within the protection scope of the present disclosure.
Number | Date | Country | Kind |
---|---|---|---|
202110196819 .X | Feb 2021 | CN | national |