IMAGE RECOVERY PROCESSOR UTILIZING FRAMEWORK FOR GENERATING SPARSITY REGULARIZERS FOR IMAGE RESTORATION

TECHNICAL FIELD

The present invention generally relates to techniques for image recovery. More specifically, the present invention relates to an image restoration processor applying a framework to generation of sparsity regularizers for image recovery.

BACKGROUND

Principal component analysis (PCA) refers to seeking an underlying low-dimensional linear subspace from high-dimensional data. It is widely used for dimensionality reduction, and abounds in many applications such as computer vision, bioinformatics, as well as signal and image processing. Although PCA works well in the presence of zero-mean Gaussian noise, it is vulnerable to outliers. To overcome this drawback, robust PCA (RPCA) is proposed to decompose the outlier-contaminated data into a sum of low-rank and sparse matrices. RPCA has been applied in hyperspectral image restoration, shadow removal and video separation, to mention a few. The low-rank plus sparse decomposition model can be considered as a weighted linear combination of rank and l₀-norm minimization problem, which is NP-hard. To address the issue, a common method is to replace the rank function and the l₀-norm with the nuclear norm and the l₁-norm, respectively, leading to a convex optimization formulation, which is known as principal component pursuit (PCP). Besides, it is proved that minimizing the weighted linear combination of the nuclear norm and l₁-norm minimization attains exact low-rank and sparse component recovery with high probability, if the incoherence conditions are satisfied. To solve PCP, many efficient algorithms based on singular value thresholding (SVT), accelerated proximal gradient and augmented Lagrange multipliers, are developed.

Convex relaxations are relatively easy in optimization, but their estimates are known to be biased. For example, when the nuclear norm is employed to find the low-rank component, it underestimates all nonzero singular values and shrinks them with the same constant. It has been shown that shrinking less the larger singular values will attain better restoration performance in image denoising and inpainting, as well as background subtraction. There are two main strategies to alleviate the bias induced by the nuclear norm. The first is to weigh the singular values differently via updating the weights per iteration, which is similar to the reweighted l₁minimization. One of related works extends this technique to low-rank matrix approximation, and adopts a weighted nuclear norm minimization (WNNM) as a surrogate to rank minimization and l₁-norm to resist outliers. Besides, the weighted Schatten-p-norm minimization as a generalization of WNNM is suggested. Moreover, to facilitate weight adaption, a weighted minimax-concave penalty (WMCP) is developed, which uses a clever trick to unfold the minimax-concave penalty (MCP) and utilizes the alternating direction method of multipliers (ADMM) to find the solution.

The second strategy employs nonconvex sparsity-inducing regularizers to reduce the estimation bias. Many studies have shown the superiority of nonconvex regularizers over the convex relaxation approaches. One of the studies develops a fast proximal algorithm with nonconvex regularizers such as the smoothly clipped absolute deviation (SCAD) and MCP, for low-rank matrix learning. One of the studies suggests robust PCA via adopting the custom-character _p-norm (0<p≤1) as the regularizers to impose constraints on the low-rank and sparse terms, but the l_p-norm does not have a closed-form expression for its proximal operator, except for three special cases of

$p = {\frac{1}{2}, \frac{2}{3}, 1} .$

A modified l_p-norm is then devised to impose low-rank and sparse constraints to achieve the low-rank and sparse decomposition, resulting in good performance in background separation. In addition, other nonconvex regularizers, including the exponential-type penalty (ETP) and the Laplace function, are applied for low-rank matrix recovery via iteratively reweighted nuclear norm. Nevertheless, neither of them has the explicit expressions for their corresponding proximal operators.

Although numerous nonconvex regularizers have been suggested, they still have the aforementioned limitations. The field of low-rank matrix recovery faces challenges in handling large-scale datasets, complex noise models, and incomplete or corrupted data. Traditional methods may not be sufficient in addressing these challenges, necessitating the development of novel approaches that can effectively handle these issues and provide accurate recovery results.

SUMMARY OF INVENTION

It is an objective of the present invention to provide an apparatus and a method to address the aforementioned issues in the prior arts.

In accordance with one aspect of the present invention, an image restoration processor is provided. The image restoration processor includes an image receiver, a matrix converter, a framework memory, a modifier, a matrix decomposer, and an image reconstructor. The image receiver is configured to receive one or more degraded images. The matrix converter is configured to reshape image data of the one or more degraded images into one or more target matrices. The framework memory is configured to store a framework for generating one or more sparsity-inducing regularizers, which enables generation of the one or more sparsity-inducing regularizers and derivation of their theoretical properties and closed-form proximity operators. The modifier is configured to read the framework memory and apply the framework to modify an M-estimator, accordingly output a hybrid M-estimator and the corresponding sparse regularizer. The matrix decomposer is configured to receive the one or more target matrices and the sparse regularizer and to apply the sparse regularizer to a robust principal component analysis (RPCA) approach via decomposing one of the one or more target matrices into a sum of low-rank and sparse matrices. The image reconstructor integrates outcomes of decomposition from the matrix decomposer to form one or more recovered images from the one or more degraded images, utilizing the low-rank and sparse matrices.

In accordance with one aspect of the present invention, an image restoration processor is provided. The image restoration processor includes an image receiver, a matrix converter, a framework memory, a modifier, a matrix decomposer, an image reconstructor, and an image selector. The image receiver is configured to receive one degraded image. The matrix converter is configured to reshape image data of the degraded image into a target matrix. The framework memory is configured to store a framework for generating one or more sparsity-inducing regularizers, which enables generation of the one or more sparsity-inducing regularizers and derivation of their theoretical properties and closed-form proximity operators. The modifier is configured to read the framework memory and apply the framework to modify more than one M-estimator and accordingly output more than one hybrid M-estimator and more than one sparse regularizer corresponding to the more than one hybrid M-estimator. The matrix decomposer is configured to receive the target matrix and the different sparse regularizers and to apply the different sparse regularizers to a robust principal component analysis (RPCA) approach via decomposing the matrix into sums of low-rank and sparse matrices. The image reconstructor integrates outcomes of decomposition from the matrix decomposer to form more than one recovered image from the degraded image using different sparse regularizers, utilizing the low-rank and sparse matrices. The image selector is configured to receive the recovered images from the reconstructor and determine which one of the recovered images is to be outputted according to image evaluation metrics for all of the recovered images.

In one embodiment, a framework is devised to generate different nonconvex sparsity-inducing regularizers under some relatively mild conditions. Although the resultant regularizers may be nonconvex, it can be proven that their Moreau envelopes are convex, and their analytical solutions are given. In addition, the framework is applied to three popular M-estimators, namely, Welsch, Cauchy and German-McClure, to generate the corresponding sparse regularizers, because they have achieved considerable success in image restoration, compressed sensing and subspace clustering. Although these M-estimators provide implicit regularizers (IRs) via half-quadratic optimization, sparsity cannot be achieved (See Appendix A). While the sparsity promoting regularizers associated with the three M-estimators are generated in this work. Moreover, an effective strategy is derived for the hyperparameter selection and their physical meanings are explained.

The sparse regularizers generated in the framework of the present invention can be considered as nonconvex surrogates to enforce low-rank-plus-sparse decomposition. Different from PCP and WNNM based RPCA (WNNM-RPCA), which both employ the custom-character ₁-norm to combat sparse outliers, the devised nonconvex regularizers are adopted in the present invention. The resultant optimization problem is tackled via ADMM with convergence guarantees. Briefly, there are contributions provided via the above configuration:

- (1) A framework to generate sparsity-inducing regularizers is introduced, and their theoretical properties as well as proximity operators in closed-form are derived.
- (2) The framework is applied to Welsch, Cauchy and German-McClure functions to generate three different sparsity-promoting regularizers, and their parameter selection is analyzed.
- (3) The sparse regularizers are then employed to attain low-rank plus sparse decomposition, resulting in ADMM based algorithms with theoretical guarantees.
- (4) Extensive experiments on both synthetic and real-life data are conducted to verify effectiveness.

BRIEF DESCRIPTION OF DRAWINGS

Embodiments of the invention are described in more details hereinafter with reference to the drawings, in which:

FIG. 1 illustrates the curves of Huber and truncated-quadratic functions;

FIG. 2 illustrates a convex function ƒ(x) and its conjugate function ƒ*(y) at y which is the maximum gap between yx and ƒ(x) where y=∇ƒ(x), including sections (a) and (b);

FIG. 3 plots the numerically-computed curves of φ_σ,λ(y) φ_σ,λ(y), and φ_τ,λ(γ);

FIG. 4 illustrates the P_φ(x) for different regularizers when λ=0.5 according to various embodiments of the present invention;

FIG. 5 depicts Algorithm 1 which describes the preceding optimization steps according to various embodiments of the present invention;

FIG. 6 depicts the algorithm phase transition diagrams with different rank and outlier ratio settings of {p_r,p_s} according to various embodiments of the present invention, including sections (a), (b), (c), (d), (e), (f), (g), and (h);

FIG. 7 illustrates the convergence curves of different techniques, including sections (a), (b), (c), (d), (e), (f), and (g);

FIG. 8 depicts the recovered images of the Washington DC Mall with bands 126-73-12 as R-G-B, including sections (a), (b), (c), (d), (e), (f), (g), (h), (i), and (j);

FIG. 9 depicts the recovered images of HYDICE urban data with bands 6-188-36 as R-G-B, including sections (a), (b), (c), (d), (e), (f), (g), (h), (i), and (j);

FIG. 10 depicts the images from two datasets, in which images in the first and second rows are from the CAVE and Harvard datasets, respectively;

FIG. 11 illustrates average PSNR curves as functions of spectral bands by different algorithms, including sections (a) and (b);

FIG. 12 depicts the recovered images with bands 23-13-4 as R-G-B with a demarcated area zoomed in 6 times, including sections (a), (b), (c), (d), (e), (f), (g), (h), (i), and (j).

FIG. 13 depicts the recovered images with bands 25-18-8 as R-G-B with a demarcated area zoomed in 8 times, including sections (a), (b), (c), (d), (e), (f), (g), (h), (i), and (j).

FIG. 14 depicts a schematic diagram of an image restoration processor for utilizing framework for generating sparsity regularizers for image restoration in accordance with one embodiment of the present invention;

FIG. 15 depicts another schematic diagram of an image restoration processor for utilizing framework for generating sparsity regularizers for image restoration in accordance with another embodiment of the present invention;

FIG. 16 illustrates the curves of P′(x) for different regularizers; and

FIG. 17 illustrates the Log-scale RRE versus 2, including sections (a), (b), (c), (d), (e), (f), (g), (h), and (i).

DETAILED DESCRIPTION OF THE INVENTION

In the following description, image restoration processors using framework for generating sparsity-inducing regularizers and the likes are set forth as preferred examples. It will be apparent to those skilled in the art that modifications, including additions and/or substitutions may be made without departing from the scope and spirit of the invention. Specific details may be omitted so as not to obscure the invention; however, the disclosure is written to enable one skilled in the art to practice the teachings herein without undue experimentation.

For clearer illustrations of the embodiments of the present invention, notations and basic operations are stated before the descriptions of the embodiments.

A) Notations:

- Scalars and matrices are represented by italic and bold upper-case letters, respectively. The (i,j) entry of a matrix A is denoted by A_ij, and (·)^Tis the transpose operator. In addition, ∥A∥_F=√{square root over (Σ_i=1^mΣ_j=1ⁿA_ij²)} is the Frobenius norm of A∈, and unless stated otherwise, the matrix norm refers to the Frobenius norm, i.e., ∥A∥=∥A∥_F. The first and second derivatives of a differentiable function g(x) are represented by g′(x) and g″(x), respectively. Moreover, |α| stands for the absolute value of α, and ∂ refers to the set of subgradients, which reduces to the derivative for differentiable functions.

B) Related Works:

- (1) Robust PCA: to make PCA combat outliers, RPCA is developed by decomposing the degraded matrix into a sum of low-rank and sparse matrices. Many studies have shown that such decomposition can be exactly attained with high probability via solving the following convex relaxation formulation:

$\begin{matrix} \min_{M, E} { M }_{*} + λ { E }_{1}, s . t . X = M + E & (1) \end{matrix}$

where X, M and E are the corrupted, low-rank and sparse matrices, respectively, and the nuclear norm ∥M∥_*is the sum of singular values of M. The custom-character ₁-norm of E, i.e., ∥E∥₁is the regularizer to handle sparse outliers, and A is the regularization parameter, which controls the relative weight between the nuclear norm and regularizer. However, the nuclear norm underestimates the nonzero singular values, especially large singular values that contain major information of a matrix. To alleviate this effect, the weighted nuclear norm is suggested, which is extended from the weighted custom-character ₁-norm for vectors, resulting in:

$\begin{matrix} \min_{M, E} { M }_{ω, *} + λ { E }_{1}, s . t . X = M + E & (2) \end{matrix}$

- where the weighted nuclear-norm ∥M∥_ω,+=Σ_i=1^rω_iσ_i=∥σ∥_ω,1is the weighted ₁-norm of the vector of singular values. Nevertheless, compared with nonconvex penalties, the ₁-norm in (2), namely, ∥E∥₁, is still vulnerable to large-magnitude outliers. In addition, it is known that nonconvex sparse regularization for low-rank matrices is able to reduce the bias for singular value estimates, thus the following double nonconvex penalty for low-rank and sparse components model is developed as:

$\begin{matrix} \min_{M, E} φ_{1} (M) + {λφ}_{2} (E), s . t . X = M + E & (3) \end{matrix}$

where φ₁(·) and φ₂(·) are nonconvex penalty functions.

- (2) Proximity operator: For a proper and lower semi-continuous function φ(·), the Moreau envelope of the regularizer φ(·) multiplied by a scalar λ>0 is defined as:

$\begin{matrix} \min_{x} \frac{1}{2} {(x - y)}^{2} + λφ (x) & (4) \end{matrix}$

whose solution is solved by the operator:

$\begin{matrix} P_{φ (y)} := \arg \min_{x} \frac{1}{2} {(x - y)}^{2} + λφ (x) & (5) \end{matrix}$

which is called the proximity operation of φ(·). In fact, the proximity operator has been extensively used in signal denoising, image restoration and sparse representation. When φ(·) is convex, (4) is strictly convex and has the optimal solution. While when φ(·) is nonconvex, (4) is generally a nonconvex problem. Recent studies show that the solution to (4) can be attained even when φ(·) is nonconvex, but it requires iterations, that is, the explicit proximity operator cannot be found. In addition, one related work generates a series of nonconvex IRs to achieve robustness using a half-quadratic optimization, but their resultant IRs are not sparsity-inducing regularizers, which limits their application. The common sparsity-promoting regularizer is the custom-character ₁-norm, and its proximity operator can be found in (10) which will be discussed below. However, as afore-mentioned, it suffers a bias problem.

It is one of the purposes of the present invention to address the problem of restoring a matrix with low-rank constraint in the presence of sparse outliers. Nonconvex regularizers may not have the closed-form proximity operators, thus iterations are needed to find their expressions, which increase a computational load. To address this issue, a framework in accordance with various embodiments is devised to generate sparsity-inducing regularizers with a closed-form proximity operator. Although the resultant regularizers may be nonconvex, their Moreau envelopes are convex.

The framework is then applied to three popular M-estimators, namely, Welsch, Cauchy and German-McClure functions, and their associated sparsity-inducing regularizers are first generated, which enrich the variety of regularizers. Next, a parameter selection strategy for the resultant regularizers is also proposed. Moreover, the regularizers are exploited for low-rank and sparse decomposition applications, leading to three algorithms based on the alternating direction method of multipliers with convergence guarantees. The property that any limit point generated is a critical point is proven as well. Finally, extensive numerical experiments based on synthetic and real-world data are conducted to demonstrate the validity of the developed approaches.

Firstly, the framework for sparsity inducing regularizer generation is discussed. FIG. 1 illustrates the curves of Huber and truncated-quadratic functions. The definition of Huber and truncated-quadratic functions are recalled.

- Definition 1: The Huber function l_h:→ is defined as:

$\begin{matrix} l_{h} (x) = {\begin{matrix} x^{2} / 2, & ❘ x ❘ \leq λ \\ λ ❘ x ❘ - \frac{λ^{2}}{2}, & ❘ x ❘ > λ \end{matrix} & (6) \end{matrix}$

- Definition 2: The truncated-quadratic function l_t:→ is defined as:

$\begin{matrix} l_{t} (x) = {\begin{matrix} x^{2} / 2, & ❘ x ❘ \leq λ \\ \frac{λ^{2}}{2}, & ❘ x ❘ > λ \end{matrix} & (7) \end{matrix}$

The curves of l_hand l_tare shown in FIG. 1. Besides, both can be written via a half-quadratic optimization or Legendre-Ferchel transform as:

$\begin{matrix} l (x) = \min_{y} \frac{1}{2} {(x - y)}^{2} + λφ (y) & (8) \end{matrix}$

where φ(y)=φ_h(y)=|y| for the Huber function, while for the truncated-quadratic function:

$\begin{matrix} φ (y) = φ_{t} (y) = {\begin{matrix} - \frac{{(❘ y ❘ - λ)}^{2}}{2 λ} + \frac{λ}{2}, & ❘ y ❘ < λ \\ \frac{λ}{2}, & ❘ y ❘ \geq λ \end{matrix} & (9) \end{matrix}$

The solutions to (8) for φ_h(y) and φ_t(y), known as proximity operator, are:

$\begin{matrix} P_{φ h, λ} (x) = \max {0, ❘ x ❘ - λ} \cdot sign (x) & (10) \end{matrix}$

$and$

$\begin{matrix} P_{φ t, λ} (x) = {\begin{matrix} 0, & ❘ x ❘ < λ \\ {0, x}, & ❘ x ❘ = λ \\ x, & ❘ x ❘ > λ \end{matrix} & (11) \end{matrix}$

respectively, implying that φ_h(y) and φ_t(y) are sparsity-inducing regularizers. It is worth noting that the proximity operator for the custom-character ₀-norm is the same as (11).

In the present invention, a framework is provided to generate different sparsity-inducing regularizers via generalizing (6) and (7) for |x|>λ as illustrated in FIG. 1. In various embodiments, the provided loss function has the form of:

$\begin{matrix} l_{g, λ} (x) = {\begin{matrix} x^{2} / 2, & ❘ x ❘ \leq λ \\ a \cdot g (❘ x ❘) + b, & ❘ x ❘ > λ \end{matrix} & (12) \end{matrix}$

where g(x) is a continuous function and g′(x)≥0 for x>0, while a and b are constants to make l(x) continuously differentiable at x=λ. Thus, a=λ/g′(λ)>0(g′(λ)≠0), and b=λ²/2−ag(λ). It is easy to see that (6) and (7) are a special case of (12).

- Theorem 1. Define

$h (x) = \frac{x^{2}}{2} - a \cdot g (❘ x ❘) - b .$

If h′(x)>0 and h″(x)>0 for x>λ, namely, h(x) is strictly convex when x>λ, then (12) can be used to generate sparsity-inducing regularizers.

- Proof: It is proven that this function can generate sparse regularizers via the Legendre-Fenchel transform. Given a function ƒ(x), its conjugate ƒ*(y) is:

$\begin{matrix} f^{*} (y) = \sup x \cdot y - f (x) & (13) \end{matrix}$

and if ƒ(x) is convex, the conjugate of ƒ*(x) is ƒ(x), namely,

$\begin{matrix} f (x) = {(f^{*} (x))}^{*} = \max_{y} x \cdot y - f^{*} (y) & (14) \end{matrix}$

- Note that when ƒ(x) is convex, the sup is the max. Since h(x) is strictly convex, a convex function ƒ(x) can be defined as:

$\begin{matrix} f (x) = \frac{x^{2}}{2} - l_{g, λ} (x) = {\begin{matrix} 0, & ❘ x ❘ \leq λ \\ \frac{x^{2}}{2} - a \cdot g (❘ x ❘) - b, & ❘ x ❘ > λ \end{matrix} & (15) \end{matrix}$

Thus,

$\begin{matrix} \begin{matrix} f^{*} (y) = \max_{x} x \cdot y - \frac{x^{2}}{2} + l_{g, λ} (x) \\ = \max_{x} - \frac{{(y - x)}^{2}}{2} + l_{g, λ} (x) + \frac{y^{2}}{2} \\ = {λφ}_{g, λ} (y) + \frac{y^{2}}{2} \end{matrix} & (16) \end{matrix}$

$where$

$\begin{matrix} φ_{g, λ} (y) = \max_{x} - \frac{{(y - x)}^{2}}{2 λ} + \frac{l_{g, λ} (x)}{λ} & (17) \end{matrix}$

- Since ƒ(x) is convex, it is obtained that:

$\begin{matrix} \begin{matrix} f (x) = {(f^{*} (x))}^{*} = \max_{y} y \cdot x - f^{*} (y) \\ = \max_{y} y \cdot x - {λφ}_{g, λ} (y) - \frac{y^{2}}{2} \\ = \max_{y} - \frac{{(y - x)}^{2}}{2} - {λφ}_{g, λ} (y) + \frac{x^{2}}{2} \end{matrix} & (18) \end{matrix}$

Combining (15) and (18), it is easy to obtain:

$\begin{matrix} l_{g, λ} (x) = \min_{y} \frac{{(x - y)}^{2}}{2} + {λφ}_{g, λ} (y) & (19) \end{matrix}$

Similar to the regularizers in the related works, the expression of φ(y) is generally unknown, which is referred to as implicit regularizer (IR). Before solving (19), the following lemma is introduced.

- Lemma 1. (Inversion rule for subgradient relations) For any proper, lower semicontinuous and convex function ƒ(x), there are:

$\begin{matrix} \begin{matrix} \arg \max_{y} y \cdot x - f^{*} (y) = \partial f (x) \\ \arg \max_{x} x \cdot y - f (x) = \partial f^{*} (y) \end{matrix} & (20) \end{matrix}$

Since the solution to (19) is the same as that to (18), according to Lemma 1, the solution to y ·in (19) is:

$\begin{matrix} P φ_{g, λ} (x) := y = \nabla f (x) = \max {0, ❘ x ❘ - a \cdot g^{'} (❘ x ❘)} \cdot sign (x) & (21) \end{matrix}$

which means that the IR generated by l_g,λ(x) can make the solution sparse, and Pφ_g,λ(x) is nondecreasing since h(x) is strictly convex for |x|>λ.

FIG. 2 illustrates the convex function ƒ(x) and its conjugate function ƒ*(y) at y which is the maximum gap between yx and ƒ(x) where y=∇ƒ(x), in which the section (a) is an arbitrary convex function while; and the section (b) is the convex function generated by (12).

Next, the reason of sparsity-promoting regularizer generation is illustrated via (12), compared with other convex functions in FIG. 2. It is observed that y=∇ƒ(x) is not zero, except for x=0 in FIG. 2 (a), while for our convex function in FIG. 2(b), y=∇ƒ(x)=0 with |x|≤1, and when |x|>λ, y=∇ƒ(x) is not equal to zero and is unique because ƒ(x) is convex. In addition, the following proposition describes the properties of the IR, i.e., φ_g,λ(y), in (19), whose proof is found in Appendix B.

- Proposition 1. Although φ_g,λ(y) may be nonconvex, the problem (19) is convex, and φ_g,λ(y) has the following properties:
- 1) φ_g,λ(y) is symmetric, i.e., φ_g,λ(y)=φ_g,λ(−y).
- 2) φ_g,λ(y) is continuous, but unsmooth at y=0.
- 3) φ_g,λ(y) is nonnegative and increasing when y>0.
- 4) If g(x) is concave for x>λ, φ(y) is concave and satisfies the triangle inequality, i.e., φ_g,λ(y₁+y₂)≤φ_g,λ(y₁)+φ_g,λ(y₂) for any y₁, y₂∈.

Second, generalization via g(x) is discussed. It is worth mentioning that g(x) can represent many convex or nonconvex functions. In various embodiments, a nonconvex g(x), and exploit g(x) for three M-estimators, namely, Welsch, Cauchy, and German-McClure are considered, resulting in three different sparsity-inducing regularizers. Compared with the case g(x) as convex or concave functions for x>0, it is difficult to analyze those M-estimators because they are nonconvex and nonconcave.

Regarding modification to the Welsch Function, the Welsch function is a M-estimator, whose expression is:

$\begin{matrix} ρ_{welsch} (x) = \frac{σ^{2}}{2} (1 - e^{- \frac{x^{2}}{σ^{2}}}) & (22) \end{matrix}$

It is because minimizing (12) has been shown to be equivalent to maximizing the correntropy criterion when the Gaussian kernel is adopted as a correntropy function with σ being the kernel parameter. Thus, the provided theory of the present invention is applied to the Welsch function. When g(x)=ρ_welsch(x), according to (12), the hybrid Welsch function is:

$\begin{matrix} l_{σ, λ} (x) = {\begin{matrix} \frac{x^{2}}{2} & ❘ x ❘ \leq λ \\ \frac{σ^{2}}{2} (1 - e^{- \frac{λ^{2} - x^{2}}{σ^{2}}}) + \frac{λ^{2}}{2} & ❘ x ❘ > λ \end{matrix} & (23) \end{matrix}$

which can be rewritten via (19) as:

$\begin{matrix} l_{σ, λ} (x) = \min_{y} \frac{{(x - y)}^{2}}{2} + {λφ}_{σ, λ} (y) & (24) \end{matrix}$

The solution is:

$\begin{matrix} P φ_{σ, λ} (x) := \max {0, ❘ x ❘ - ❘ x ❘ \cdot e^{\frac{(λ^{2} - x^{2})}{σ^{2}}}} \cdot sign (x) & (25) \end{matrix}$

Regarding modification to the Cauchy Function, unlike the Welsch M-estimator, which is bounded from above, the Cauchy function is unbounded from above, whose expression is:

$\begin{matrix} ρ_{cauchy} (x) = \frac{γ^{2}}{2} \ln (1 + {(\frac{x}{γ})}^{2}) & (26) \end{matrix}$

where γ is the scale parameter of the Cauchy distribution. Thus, when g(x)=ρ_cauchy(x), the hybrid Cauchy function can be obtained as:

$\begin{matrix} l_{γ, λ} (x) = {\begin{matrix} \frac{x^{2}}{2} & ❘ x ❘ \leq λ \\ \frac{γ^{2} + λ^{2}}{2} \ln (1 + {(\frac{x}{γ})}^{2}) + δ & ❘ x ❘ > λ \end{matrix} & (27) \end{matrix}$

where

$δ = \frac{λ^{2}}{2} - \frac{γ^{2} + λ^{2}}{2} \ln (1 + {(\frac{λ}{γ})}^{2}) .$

In addition, there is:

$\begin{matrix} l_{γ, λ} (x) = \min_{y} \frac{{(x - y)}^{2}}{2} + {λφ}_{γ, λ} (y) & (28) \end{matrix}$

and based on (21), its solution is:

$\begin{matrix} P φ_{γ, λ} (x) := \max {0, | x | - \frac{(y^{2} + λ^{2}) | x |}{y^{2} + x^{2}}} \cdot sign (x) & (29) \end{matrix}$

Regarding modification to the German-McClure Function, the expression of German-McClure (GMC) M-estimator is:

$\begin{matrix} ρ_{gmc} (x) = \frac{2 {(x / τ)}^{2}}{{(x / τ)}^{2} + 4} & (30) \end{matrix}$

where τ>0 is a scale parameter. Thus, when g(x)=ρ_gmc(x), there is:

${\begin{matrix} l_{τ, λ} (x) = {\begin{matrix} x^{2} / 2 & | x | \leq λ \\ \frac{{(λ^{2} + 4 τ^{2})}^{2} x^{2}}{8 τ^{2} (x^{2} + 4 τ^{2})} - \frac{λ^{4}}{8 τ^{2}} & | x | > λ \end{matrix} & (31) \end{matrix}}_{}$

which amounts to:

$\begin{matrix} l_{τ, λ} (x) = \min_{y} \frac{{(x - y)}^{2}}{2} + {λφ}_{τ, λ} (y) & (32) \end{matrix}$

where φ_τ,λ(y) is the sparse regularizer related to l_τ,λ(x). Employing (21), the solution to (32) is:

$\begin{matrix} P φ_{τ, λ} (x) := \max {0, | x | - \frac{(λ^{2} + 4 τ^{2}) | x |}{{(x^{2} + 4 τ^{2})}^{2}}} \cdot sign (x) & (33) \end{matrix}$

FIG. 3 illustrates the numerically-computed curves of φ_σ,λ(y), φ_σ,λ(y) and φ_τ,λ(y), in which their proximity operators are summarized in Table I. With respect to the illustration of FIG. 3, IRs of three hybrid loss functions are with λ=0.5, σ=√{square root over (2)}λ, γ=λ and τ=√{square root over (3)}λ/2 and when y>0, three IRs are concave.

TABLE I

Different loss functions and proximity operators for their sparsity-inducing regularizers.

Huber
Hybrid Welsch
Hybrid Cauchy
Hybrid GMC

l(x)

{\begin{matrix} x^{2} / 2, & | x | \leq λ \\ λ | x | - \frac{λ^{2}}{2}, & | x | > λ \end{matrix}

{\begin{matrix} x^{2} / 2, & | x | \leq λ \\ \frac{σ^{2}}{3} (1 - e^{\frac{λ^{3} - ?}{σ^{2}}}) + \frac{λ^{2}}{3} . & | x | > λ \end{matrix}

{\begin{matrix} x^{3} / 2, & | x | \leq λ \\ \frac{?^{2} + λ^{2}}{2} \ln (1 + \frac{x^{3}}{?^{2}}) + δ, & | x | > λ \end{matrix}

{\begin{matrix} x^{3} / 2, & | x | \leq λ \\ \frac{?}{?} - \frac{λ^{6}}{?}, & | x | > λ \end{matrix}

P(x)
max{0, |x| − λ} sign (x)

\max {?, | x | - | x | e^{\frac{?}{?}}} sign (x)

\max {0, | x | - \frac{(γ^{3} + λ^{2}) | x |}{?}} sign (x)

\max {0, | x | - \frac{{(λ^{2} + 4 τ^{2})}^{3} | x |}{{(x^{2} + 4 τ^{2})}^{3}}} sign (x)

? indicates text missing or illegible when filed

In various embodiments, a parameter selection strategy can be applied to the above modified functions. FIG. 4 illustrates P_φ(x) for different regularizers when λ=0.5 according to various embodiments of the present invention.

There are four parameters for l_σ,λ(x), φ_σ,λ(y), l_γ,λ(x), φ_γ,λ(y), l_σ,λ(x) and φ_σ,λ(y), i.e., λ, σ, γ and τ. Apparently, knowing the physical meaning of a parameter can help choose parameters for their value. First, the Huber function l_h(x) can be rewritten via (8) as:

$\begin{matrix} l_{h} (x) = \min_{y} \frac{1}{2} {(x - y)}^{2} + λ | y | & (34) \end{matrix}$

where λ is a positive weighting parameter that controls sparsity, namely, y=0 for |x|≤λ. Similarly, λ for φ_σ,λ(γ), φ_γ,λ(y) and φ_τ,λ(y) in (24), (28) and (32), respectively, also controls sparsity. That is, y=0 for |x|≤1. However, different from the Huber function that has the explicit λ|y|, the expressions of φ_σ,λ(y), φ_γ,λ(y) and φ_σ,λ(y) are generally unknown. Nevertheless, they all have a smaller bias than λ|y|, which is illustrated in FIG. 4, whose theoretical analysis can be found in Proposition 2.

Besides, the proximal operator for λ|y| in (10) treats all values of x equally and shrinks x with the same bias λ as indicated in (10). In fact, for many real-world applications, the bias for different x should not be the same. For example, when addressing low-rank matrix recovery, larger singular values of an observed matrix correspond to the dominant information, thus it is better to shrink them less. Mathematically, the bias can be described via the difference between y=x and the proximal operator for different regularizers when x>1:

$\begin{matrix} Δ d = x - (x - a \cdot g^{'} (x)) = a \cdot g^{'} (x) & (35) \end{matrix}$

and Δd is required to be non-increasing as x increases, implying that g″(x)≤0, namely, g(x) is concave when x>λ. For the hybrid Welsch function, when x>λ,

$g^{″} (x) = (1 - \frac{2 x^{2}}{σ^{2}}) e^{- \frac{x^{2}}{σ^{2}}} \leq 0,$

thus the obtained is σ≤√{square root over (2)}x. Since x>λ, there is:

$\begin{matrix} σ \leq \sqrt{2} λ & (36) \end{matrix}$

Similarly, for l_γ,λ(x), when x>λ.

$g^{″} (x) = \frac{2 (γ^{2} - x^{2})}{{(γ^{2} + x^{2})}^{2}} \leq 0$

and it is obtained that γ≤x. Thus for l_γ,λ(x), there is:

$\begin{matrix} γ \leq λ & (37) \end{matrix}$

Finally, for the hybrid GMC function,

$g^{″} (x) = \frac{16 τ^{2} (4 τ^{2} - 3 x^{2})}{{(x + 4 τ^{2})}^{3}} \leq 0$

is obtained to yield:

$\begin{matrix} τ \leq \sqrt{3} λ / 2 & (38) \end{matrix}$

The above is summarized in the following proposition.

- Proposition 2. If with σ≤√{square root over (2)}λ, γ≤λ and τ≤√{square root over (3)}λ/2, φ_σ,λ(γ), φ_γλ(y) and φ_σ,λ(y) have a smaller bias than φ(y)=λ|y| when applied to the following problem:

$\begin{matrix} \min_{y} \frac{{(x - y)}^{2}}{2} + φ (y) & (39) \end{matrix}$

Next, applications to low-rank matrix recovery are discussed, including mathematical preliminaries, algorithms for low-rank-plus-sparse matrix decomposition, and complexity.

Regarding mathematical preliminaries, the theory derived in the above is applied to the scalar case, and when it is extended to vectors, matrices, and the singular values of matrices, it is necessary to give the corresponding definitions. Besides, the proofs of the following propositions can be found in Appendices D, E and F, respectively.

- Definition 3. Let x∈, φ_·,λ(x):→ is defined as:

$\begin{matrix} φ_{\cdot, λ} (x) = \sum_{i = 1}^{n} φ_{\cdot, λ} (x_{i}) & (40) \end{matrix}$

where φ_·,λ(·) can be φ_σ,λ(·), φ_γ,λ(·) and φ_τ,λ(·), and σ, γ and τ are positive constants.

- Proposition 3. Given the vector operator defined in (40), its proximity operator P_φ·,λ(·):→ is defined as:

$\begin{matrix} P_{φ \cdot, λ} (x) := \arg \min_{y} \frac{1}{2} { x - y }_{2}^{2} + λ_{φ \cdot, λ} (y) & (41) \end{matrix}$

- whose solution is:

$\begin{matrix} P_{φ \cdot, λ} (x) = \max {0, ❘ x ❘ - a \cdot g^{'} (❘ x ❘)} \cdot sign (x) & (42) \end{matrix}$

- where P_φ·,λ(·) is an element-wise operator. That is, P_φ·,λ(x)=[P_φ·,λ(x₁) P_φ·,λ(x₂) . . . P_φ·,λ(x_n)]. In this invention, P_φ·,λ(·) can be P_φσ,λ(·), P_φγ,λ(·) and P_φτ,λ(·). In addition, when σ≤√{square root over (2)}λ, γ≤λ and τ≤√{square root over (3)}λ/2, the obtained is:

$\begin{matrix} { x - P_{φ \cdot, λ} (x) }_{2} \leq \sqrt{n} λ & (43) \end{matrix}$

- Definition 4. Let X∈, φ_·,λ(X):→ is defined as:

$\begin{matrix} φ_{\cdot, λ} (X) = \sum_{i = 1}^{m} \sum_{j = 1}^{n} φ_{\cdot, λ} (X_{i, j}) & (44) \end{matrix}$

where φ_·,λ(·) can be φ_σ,λ(·), φ_γ,λ(·) and φ_τ,λ(·), while σ, γ and τ are positive constants.

- Proposition 4. Given the matrix operator defined in (44), its proximity operator P_φ·,λ(·):→ is defined as:

$\begin{matrix} P_{φ \cdot, λ} (X) := \arg \min_{Y} \frac{1}{2} { X - Y }_{2}^{2} + λ_{φ \cdot, λ} (Y) & (45) \end{matrix}$

whose solution is:

$\begin{matrix} P_{φ \cdot, λ} (X) = \max {0, ❘ X ❘ - a \cdot g^{'} (❘ X ❘)} \cdot sign (X) & (46) \end{matrix}$

where P_φ·,λ(·) is an element-wise operator, and P_φ·,λ(·) can be P_φσ,λ(·), P_φγ,λ(·) as well as P_φτ,λ(·). Besides, when σ≤√{square root over (2)}λ, γ≤λ and τ≤√{square root over (3)}λ/2, the obtained is:

$\begin{matrix} { X - P_{φ \cdot, λ} (X) }_{F} \leq \sqrt{mn} λ & (47) \end{matrix}$

- Definition 5. Let X=U Diag(s)V^Tbe the SVD of a rank-r matrix X∈, where s=[s₁, s₂, . . . , s_γ]^Tis the vector of singular values. The nuclear norm ∥X∥, is defined as:

$\begin{matrix} { X }_{*} = { s }_{1} = \sum_{i = 1}^{r} s_{i} & (48) \end{matrix}$

which is the custom-character ₁-norm of s.

However, employing the nuclear norm to achieve low-rank recovery will lead to a biased solution because the nuclear norm is based on the custom-character ₁-norm of singular values. To address this issue, the non-convex regularizers φ(·) of the present invention are used to replace the ₁-norm. Compared with other nonconvex regularizers, ours have closed-form proximity operators.

- Definition 6. Let X=U Diag(s) V^Tbe the SVD of a rank-r matrix X∈ where s=[s₁, s₂, . . . , s_γ]^Tis the vector of singular values. The matrix _φ·,λ-norm of X, denoted as ∥X∥_φ·,λ, is defined as:

$\begin{matrix} { X }_{φ \cdot, λ} = φ \cdot, λ (s) = \sum_{i = 1}^{r} φ \cdot, λ (s_{i}) & (49) \end{matrix}$

- Lemma 2. Let X=U Diag(s)V^Tbe the SVD of a rank-r matrix X∈ where s=[s₁, s₂, . . . , s_γ]^Tis the vector of singular values, and define:

$\begin{matrix} P_{{ \cdot }_{φ \cdot, λ}} (X) = \arg \min_{M} λ { M }_{φ \cdot, λ} + \frac{1}{2} { X - M }_{F}^{2} & (50) \end{matrix}$

If the proximity operator P_φ·,λ is monotone, then the solution for (50) is:

$M = U Diag (s^{*}) V^{T}$

where s* satisfies s₁*≥ . . . ≥s_i*≥ . . . ≥s_r* with i=1, 2, . . . , r, which is determined as:

$s_{i}^{*} \in P_{φ \cdot, λ} (s_{i}) = \arg \min_{s > 0} λ_{φ \cdot, λ} (s) + \frac{1}{2} {(s - s_{i})}^{2}$

- Proposition 5. For the solution to (50), when σ≤√{square root over (2)}λ, γ≤λ with λ=0.5 and τ≤√{square root over (3)}λ/2 there is:

$\begin{matrix} { X - P_{{ \cdot }_{φ \cdot, λ}} (X) }_{F} \leq \sqrt{r} λ & (51) \end{matrix}$

Regarding algorithms for low-rank-plus-sparse matrix decomposition, the regularizers of the present invention are applied to RPCA via decomposing the target matrix into a sum of low-rank and sparse matrices, which is formulated as:

$\begin{matrix} \min_{M, E} { M }_{φ \cdot, 1 / ρ} + λ_{φ \cdot, λ / ρ} (E) s . t . X = M + E & (52) \end{matrix}$

where φ·, 1/ρ(·) and φ·, λ/ρ(·) are the regularizers included in TABLE II.

TABLE II

Choice of different sparsity-inducing regularizers.

Hybrid Welsch
Hybrid Cauchy
Hybrid GMC

Algorithm
SIR-HW
SIR-HC
SIR-HG

φ_{•, 1/ρ}(•)
φ_{σ, 1/ρ}(•)
φ_{γ, 1/ρ}(•)
φ_{τ, 1/ρ}(•)

φ_{•, λ/ρ}(•)
φ_{σ, λ/ρ}(•)
φ_{γ, λ/ρ}(•)
φ_{τ, λ/ρ}(•)

The problem (52) can be solved by ADMM, and its augmented Lagrangian is:

$\begin{matrix} ℒ_{ρ}^{'} (M, E, Λ) = { M }_{φ \cdot, 1 / ρ} + λ_{φ \cdot, λ / ρ} (E) + 〈 Λ, X - M - E 〉 + \frac{ρ}{2} { X - M - E }_{F}^{2} & (53) \end{matrix}$

which amounts to:

$\begin{matrix} ℒ_{ρ} (M, E, Λ) = 1 / ρ { M }_{φ \cdot, 1 / ρ} + λ / ρ \times_{φ \cdot, λ / ρ} (E) + 〈 Λ, X - M - E 〉 / ρ + \frac{1}{2} { X - M - E }_{F}^{2} & (54) \end{matrix}$

where Λ contains the Lagrange multipliers, the last term is the augmented term and ρ>0 is the augmented Lagrangian parameter. The ADMM updates the primal and dual variables at (k+1)-th iteration via:

$\begin{matrix} E^{k + 1} = \arg \min_{E} ℒ (M^{k}, E, Λ^{k}) & (55) \end{matrix}$

$M^{k + 1} = \arg \min_{M} ℒ (M, E^{k + 1}, Λ^{k})$

$Λ^{k + 1} = Λ^{k} + ρ^{k} (X - M^{k + 1} - E^{k + 1})$

The exact expressions at the (k+1)th iteration, i.e., {M^k+1, E^k+1, Λ^k+1}, are derived as follows.

Update of E: E is updated via:

$\begin{matrix} \arg \min_{E} λ / ρ^{k} φ \cdot, λ / ρ^{k} (E) + \frac{1}{2} { X - M^{k} + \frac{Λ^{k}}{ρ^{k}} - E }_{F}^{2} & (56) \end{matrix}$

where the constant term is ignored since it does not affect the solution to E^k+1. Invoking (21), there is:

$\begin{matrix} E^{k + 1} = P_{φ \cdot,_{λ / ρ^{k}}} (X - M^{k} + \frac{Λ^{k}}{ρ^{k}}) & (57) \end{matrix}$

Update of M: Given E^k+1and Λ^k, the low-rank matrix M is updated via:

$\begin{matrix} \arg \min_{M} 1 / ρ^{k} { M }_{φ \cdot,_{1 / ρ^{k}}} + \frac{1}{2} { X - E^{k + 1} + \frac{Λ^{k}}{ρ^{k}} - M }_{F}^{2} & (58) \end{matrix}$

According to Lemma 2, the solution is:

$\begin{matrix} M^{k + 1} = P_{{ \cdot }_{φ \cdot,_{1 / ρ^{k}}}} (X - E^{k + 1} + \frac{Λ^{k}}{ρ^{k}}) & (59) \end{matrix}$

FIG. 5 illustrates Algorithm 1 which describes the preceding optimization steps according to various embodiments of the present invention. The preceding optimization steps are described in Algorithm 1, whose convergence is analyzed in Theorem 2. In addition, it is worth mentioning that Algorithm 1 gives a framework to generate the low-rank and sparse components attained by different sparsity-inducing regularizers found in Table II. Although there would be nine algorithms obtained by various combinations of φ·,_1/ρ(·) and φ·,_λ/ρ(·) as shown in Table II, there are cases where φ·,_1/ρ(·) and φ·,_λ/ρ(·) are generated from the same loss function, resulting in three different algorithms. For example, when φ·,_1/ρ(·) and φ·,_λ/ρ(·) are generated from the hybrid Welsch function, i.e., φ·,_1/ρ(·)=φ_σ,1/ρ(·) and φ·,_λ/ρ(·)=φ_σ,λ/ρ(·), the resulting algorithm is referred to as SIR-HW. The abbreviations of the remaining algorithms can be found in Table II. In addition, the sparse regularizers generated by the Huber and truncated-quadratic functions are employed to obtain low-rank and sparse decomposition, leading to two ADMM based algorithms, referred to as SIR-HU and SIR-HT, respectively.

Theorem 2. The {M^k, E^k} generated by the proposed algorithm satisfy the following properties:

$\begin{matrix} \lim_{k \to \infty} { M^{k + 1} - M^{k} }_{F}^{2} = 0. & 1) \end{matrix}$

$\begin{matrix} \lim_{k \to \infty} { E^{k + 1} - E^{k} }_{F}^{2} = 0. & 2) \end{matrix}$

$\begin{matrix} \lim_{k \to \infty} { X - M^{k + 1} - E^{k + 1} }_{F}^{2} = 0. & 3) \end{matrix}$

Let {M^k^j, E^k^j, Λ^k^j} be a subsequence of {M^k, E^k, Λ^k} generated by the proposed algorithm such that lim_k_j_>∞{M^k^j, E^k^j, Λ^k^j}={M*, E*, A*}. Then, {M*, E*, A*} is a critical point.

Regarding complexity, similar to PCP and WNNM-RPCA, when finding the low-rank matrix, the proposed algorithms involve the computation of an SVD per iteration, whose complexity is custom-character (min(m,n)mn), where m and n are the row and column lengths of the degraded matrix, respectively. In addition, the complexity of calculating the sparse matrix is (mn). Therefore, solving the low-rank component dominates the computational time at each iteration.

In order to verify an image restoration processor of the present invention, extensive experiments on both synthetic and real-life data are conducted. In experimental results, SIR-HW, SIR-HC and SIR-HG, together with SIR-HU and SIR-HT, are assessed. Parameter settings for σ=√{square root over (2)}λ, γ=1 and τ=√{square root over (3)}λ/2 are suggested. In addition, three benchmark techniques, namely, PCP, WNNM-RPCA, and DPRPCA are employed. The experimental results include two parts, (1) synthetic data and (2) real-world image restoration.

Regarding synthetic data, the data matrix X∈ custom-character is generated by a sum of low-rank matrix M=UV^T, where U∈ and V∈ with r being the rank, and sparse matrix E. n=m is set for convenience. The entries of U and V satisfy the standard Gaussian distribution, and the locations of the sparse outliers E, which has S non-zero entries uniformly distributed in [−500,500], are drawn independently from a Bernoulli distribution. Similar to some related works, r=p_r×m and S=p_s×m². In the experiments, m=400 is set, and p_rand p_svary from 0.01 and 0.05 with a step size of 0.02. Besides, two evaluation metrics are employed, namely, relative reconstruction error of the low-rank matrix REE=∥M−{circumflex over (M)}∥_F²/∥M∥_F², where {circumflex over (M)} is the estimated low-rank matrix, and the estimated rank {circumflex over (r)}. All the competing methods, including PCP and WNNM-RPCA, are set with the same parameters and stopping conditions. That is, λ=1/√{square root over (max(m,n))} and μ=1.05, while ∥X−M^k−E^k∥_F/∥X∥_F<10⁻⁷and the maximum iteration number of 1000 as termination conditions for all algorithms. Moreover, the performance of all approaches is evaluated using the average results of 100 independent runs.

FIG. 6 depicts the algorithm phase transition diagrams with different ranks and outlier ratio settings of {p_r, p_s} according to various embodiments of the present invention, in which sections (a), (b), (c), (d), (e), (f), (g), and (h) correspond to PCP. WNNM-RPCA, DPRPCA (p=0.6), SIR-HC, SIR-HU, SIR-HW, SIR-HG, and SIR-HT. The log-scale REE with various settings of {p_r, p_s} is shown in FIG. 6. It can be seen that PCP and SIR-HU have the same phase transition diagram, because the regularizer generated by the Huber function is custom-character ₁-norm and PCP also employs the ₁-norm to achieve low-rank and sparse decomposition. Besides, compared with PCP. WNNM-RPCA and DPRPCA (p=0.6), SIR-HW, SIR-HC and SIR-HG can recover more cases, and among all the techniques, SIR-HG has the biggest success area (If REE<10⁻⁴, it is denoted as a success recovery). TABLE III tabulates the average estimated rank by different algorithms. It is observed that SIR-HW and SIR-HG give more accurate rank estimates than the remaining methods for different rank parameters.

TABLE III

Average estimated rank by different methods with p_s= 0.15.

Method
r = 4
r = 20
r = 36
r = 52
r = 68
r = 84
r = 100
r = 116
r = 132
r = 148
r = 164
r = 180

PCP
4
20
36
55
89
198
200
203
206
208
210
213

WNNM-RPCA
4
20
36
52
68
84
100
116
131
124
125
120

DPRPCA( text missing or illegible when filed

= 0.

)
4
20
36
52
68
84
100
116
132
148
184
172

SIR-HC
4
20
36
52
68
84
100
116
132
148
164
192

SIR-HU
4
20
36
55
89
198
200
203
206
208
210
213

SIR-HW
4
20
36
52
68
84
100
116
132
148
164
180

SIR-HG
4
20
36
52
68
84
100
116
132
148
164
180

SIR-HT
4
20
36
52
68
84
63
27
19
16
13
11

text missing or illegible when filed

indicates data missing or illegible when filed

In addition, FIG. 7 illustrates the convergence curves of different techniques. The sections (a)-(d) of FIG. 7 plot log-scale RRE versus iterations for different values of p_rand p_s. It is easy to see that when p_ror p_sincreases, the performance of PCP, WNNM-RPCA, DPRPCA (p=0.6), SIR-HU and SIR-HT worsens, while SIR-HW, SIR-HC and SIR-HG still operate well even when p_r=0.3 and p_s=0.3. Due to the discontinuities of the proximity operator in (11), the curve of SIR-HT oscillates. Moreover, to verify the theoretical results in Theorem 2, the relative errors for M^k, E^kand X−M^k−E^kat the k-th iteration are first defined as:

$\begin{matrix} {RE}_{M^{k}} = \min {1, { M^{k} - M^{k - 1} }_{F} / { M }_{F}} & (60) \end{matrix}$

${RE}_{E^{k}} = \min {1, { E^{k} - E^{k - 1} }_{F} / { E }_{F}}$

${RE}_{X^{k}} = \min {1, { X - M^{k} - E^{k} }_{F} / { X }_{F}}$

The sections (e)-(g) of FIG. 7 show the convergence curves of RE_M_k, RE_E_k, RE_X_kversus the iteration number at p_r=0.1 and p_s=0.1. It is observed that the relative errors approach zero with a sufficient number of iterations, which validates Theorem 2. Furthermore, the impact of λ on the proposed algorithms is investigated in Appendix H.

Regarding hyperspectral image restoration, FIG. 8 depicts recovered images of the Washington DC Mall with bands 126-73-12 as R-G-B, in which: the section (a) is the degraded image corrupted by 10 dB impulsive noise; the sections (b), (c), (d), (e) correspond to PCP, WNNM-RPCA, DPRPCA (p=0.9), DPRPCA (p=0.6); the section (f) is the original noise-free; and the sections (g), (h), (i), (j) correspond to SIR-HC, SIR-HW, SIR-HG, SIR-HT. These images are the restoration results by different algorithms, with a demarcated area zoomed in 4 times. FIG. 9 depicts the recovered images of HYDICE urban data with bands 6-188-36 as R-G-B, in which: the section (a) is the degraded image and the remaining images as the sections (b)-(j) are the restoration results by different algorithms, with a demarcated area zoomed in 4 times.

Hyperspectral imaging has numerous applications such as environmental monitoring, mineral exploration and urban planning. However, they may be contaminated by noise during the acquisition process. To restore hyperspectral images (HSIs), low-rank modeling has been found to be very useful because of the strong correlation along the spectral direction. The features as above are applied to HSI restoration. Three quantitative metrics, i.e., peak signal-to-noise ratio (PSNR), structure similarity index measure (SSIM) and root mean square error (RMSE), are employed. The PSNR, SSIM and RMSE of HSI data are calculated by:

$PSNR (M, \hat{M}) = \frac{1}{n} \sum_{j = 1}^{1} PSNR (M^{j}, {\hat{M}}^{j})$

$SSIM (M, \hat{M}) = \frac{1}{n} \sum_{j = 1}^{1} SSIM (M^{j}, {\hat{M}}^{j})$

$RMSE (M, \hat{M}) = \frac{1}{n} \sum_{j = 1}^{1} \frac{{ M^{j}, {\hat{M}}^{j} }_{F}}{\sqrt{m}}$

where M∈ custom-character and {circumflex over (M)}∈ denote the ground truth and estimated matrix of the HSI, and M^jand {circumflex over (M)}^jare the j-th band images of M and {circumflex over (M)}, respectively. Besides, the higher the PSNR and SSIM, and the smaller the RMSE, the better the restoration quality.

The sub-image of the Washington DC Mall contains 191 spectral bands with 256*256 pixels per band. The gray values of each band are normalized onto [0, 1], and then each band is vectorized as a vector, which is stacked to construct the pure matrix M∈ custom-character . Impulsive noise is generated by the built-in command of ‘imnoise(I, ‘salt & pepper’, ρ)’ in MATLAB, where I is the original matrix, and ρ is the normalized noise intensity. The relationship between p and the signal-to-noise ratio (SNR) is ρ=1/SNR. In addition, to fairly compare different algorithms, the best weighting parameter λ for each method is selected based on the lowest RRE. Impulsive noise with different ρ is added to M, and the average denoising results by 10 independent runs are tabulated in Table IV. It is observed that SIR-HC, SIR-HW and SIR-HG, achieve good restoration results, and among them, SIR-HG is in total the best method since it attains better recovery performance for all cases. Although DPRPCA (p=0.6) has smaller SSIM for one case, its choice of ρ is crucial because it employs custom-character _p-norm to achieve sparseness. In order to provide a visual comparison, three bands of HSIs are chosen to form a pseudo-color image shown in FIG. 8. It can be seen that SIR-HG attains the best denoising in terms of PSNR value over the competing methods.

Furthermore, all methods are assessed using the real HYDICE urban dataset. Similar to related works, the bands 104-108, 139-151 and 207-210 are removed since they are seriously contaminated by noise, thus the sub-data with dimensions 256×256×188 are used. FIG. 9 shows the recovered images by setting bands 6-188-36 as R-G-B. It is seen that SIR-HW, SIR-HG and SIR-HT outperform the remaining schemes.

TABLE IV

Average restoration results under different noise levels, in which the best three results are bold.

Noise
Metric
PCP
WNNM-RPCA
DPRPCA (p = 0.9)
DPRPCA (p = 0.6)
SIR-HC
SIR-HW
SIR-HG
SIR-HT

text missing or illegible when filed

= 0.1
PSNR

text missing or illegible when filed

52.577

56.438

SSIM
0.9976

text missing or illegible when filed

0.9987

RMSE

0.002

0.001

= 0.2
PSNR
47.909
51.23 text missing or illegible when filed

53.796
53.773

SSIM

text missing or illegible when filed

0.9960

0.9988

RMSE

0.00313

0.00240

0.00237

= 0.3
PSNR

text missing or illegible when filed

49.713

51.712

SSIM
0.9947

text missing or illegible when filed

0.0084

RMSE

0.00373

0.00292

= 0.4
PSNR

text missing or illegible when filed

47.494

48.822

SSIM
0.9895

text missing or illegible when filed

0.0063
0.9953
0.9959

text missing or illegible when filed

RMSE

0.00393
0.00418

text missing or illegible when filed

indicates data missing or illegible when filed

Regarding multispectral image restoration, FIG. 10 depicts the images from two datasets, in which images in the first and second rows are from the CAVE and Harvard datasets, respectively, and FIG. 11 illustrates the average PSNR curves as functions of spectral bands by different algorithms, including two sections, (a) CAVE dataset and (b) Harvard dataset. Further, FIG. 12 depicts the recovered images with bands 23-13-4 as R-G-B, in which the section (a) is the degraded image, the section (f) is the original noise-free image, and the remaining images of the sections are the restoration results by different algorithms, with a demarcated area zoomed in 6 times; and FIG. 13 depicts the recovered images with bands 25-18-8 as R-G-B, in which the section (a) is the degraded image, the section (f) is the original noise-free image, and the remaining images of the sections are the restoration results by different algorithms, with a demarcated area zoomed in 8 times.

Multispectral images (MSIs) are different from HSIs because of their higher spatial resolution. In this section, two datasets, i.e., CAVE and Harvard, are used. The CAVE dataset consists of 32 different objects, and each object has 31 spectral bands with dimensions 512×512. The Harvard database includes 50 MSIs of real-world indoor and outdoor scenes. For each set of MSIs, it has 31 spectral bands with spatial resolution 1392×1040. FIG. 10 shows the plots of sixteen MSIs from the CAVE and Harvard datasets, which are adopted to evaluate all approaches. Besides, for the MSIs in the Harvard database, each band is resized as 696×520. Moreover, 10 dB impulsive noise is added to the MSIs, and the average results of all MSIs shown in FIG. 10 are tabulated in Table V. It is observed that SIR-HW, SIR-HC and SIR-HG attain better restoration, compared with the remaining algorithms in terms of PSNR, although PCP needs less computational time. It can be concluded again that the SIR-HG generally outperforms the remaining methods because it achieves comparable restoration results in terms of PSNR, SSIM and RMSE, for the two datasets. To further compare the performance of all techniques in each spectral band, FIG. 11 shows the plots the average PSNR versus spectral band number.

It can be seen that SIR-HC, SIR-HW and SIR-HG have higher PSNR values for most of the bands in both datasets. Note that all methods perform bad in the first few bands, because there exists a blur in those bands. To provide visual comparison, FIG. 12 and FIG. 13 show the recovered images of feathers and Img-7, respectively. It is very clear that there still exists apparent noise in the images recovered by PCP, WNNM-RPCA, DPRPCA (p=0.9) and DPRPCA (p=0.6) in FIG. 12 and FIG. 13.

TABLE V

Average recovery performance comparison in terms of PSNR, SSIM, RMSE and runtime (in seconds).

Dateset
Metric
PCP
WNNM-RPCA
DPRPCA (p = 0.9)
DPRPCA (p = 0.6)
SIR-HC
SIR-HW
SIR-AG
SIR-HT

CAVE
PSNR
39.172
40.883
42.100
46.146
46.797
46.344
46.627
44.718

SSIM
0.6850
0.7680
0.8025
0.8227
0.8510
0.8510
0.8511
0.8275

RMSE

text missing or illegible when filed

0.0158
0.0155
0.0144
0.0143
0.0149
0.0148
0.0168

Runtime
35.280
84.001
124.47
122.88
73.266
58.677
70.717
55.583

Harvard
PSNR
47.296
47.052
50.057
52.637
53.249
52.966
53.173
51.498

SSIM
0.9519
0.9112
0.9501
0. text missing or illegible when filed

620
0.9594
0.9629
0.9630
0.9675

RMSE

text missing or illegible when filed

0.00490
0.00371
0.00286
0.00280
0.00287
0.00281
0.00316

Runtime
53.873
120.35
177.07
175.03
103.05
78.789
96.603
73.717

text missing or illegible when filed

indicates data missing or illegible when filed

With the above derivation and verification, in various embodiments, an image restoration processor is provided to perform modification to function and recovery to degraded images.

FIG. 14 depicts a schematic diagram of an image restoration processor 100 for utilizing framework for generating sparsity regularizers for image restoration in accordance with various embodiments of the present invention. The image restoration processor 100 includes an image receiver 102, a matrix converter 104, a modifier 106, a matrix decomposer 108, a framework memory 110, a parameter selector 120, a computation module 122, a first cache 130, a second cache 132, and an image reconstructor 140. These elements can be arranged in electrical communication with others so they can send at least one information signal to others or receive at least one information signal from others.

The image receiver 102 is configured to receive at least one degraded image to be recovered. The matrix converter 104 is electrically coupled with the image receiver 102 and is configured to reshape image data of the degraded image into at least one target matrix. The degraded image or the target matrix can serve as input data to be processed by the image restoration processor 100. The matrix converter 104 is electrically coupled with the matrix decomposer 108 and is further configured to send the target matrix to the matrix decomposer 108.

The modifier 106 is electrically coupled with the framework memory 110. The framework memory 110 can be configured to store a framework as discussed above (e.g., the Proposition 1 as discussed above) for generating one or more sparsity-inducing regularizers, which enables generation of the one or more sparsity-inducing regularizers and derivation of their theoretical properties and closed-form proximity operators. The modifier 106 can receive a M-estimator from an external source. For example, users can optionally input a M-estimator to the modifier 106 for further modification. In one embodiment, the image restoration processor 100 may further include a user interface configured to receive an M-estimator from an external source and transmit an information signal to inform the modifier 106 of the received M-estimator. The modifier 106 can be configured to read the framework memory 110 and apply the framework to modify the input M-estimator. The input M-estimator can be modified by the modifier 106, as discussed above, and then the modifier 106 can accordingly output a hybrid M-estimator and generate the corresponding sparse regularizer. In an embodiment, the sparse regularizer is generated by the hybrid M-estimator in the modifier 106 and then output by the modifier 106. In some embodiment, the input M-estimator is selected from Welsch Function, Cauchy Function, and German-McClure Function. In various embodiments, the generated sparse regularizer is nonconvex and has closed-form proximity operators.

The parameter selector 120 is electrically coupled with the modifier 106 and is configured to perform the parameter selection strategy as discussed above. Specifically, the parameter selector 120 can determine a relationship between a scale parameter of the sparse regularizer and a positive weighting parameter controlling sparsity.

The matrix decomposer 108 is configured to receive the target matrix and the sparse regularizer. In various embodiments, the matrix decomposer 108 can receive the target matrix and the sparse regularizer via accessing the first cache 130 and the second cache 132. For example, the first cache 130 is configured to receive the target matrix from the matrix converter 104 and store the target matrix, the second cache 132 is configured to receive the sparse regularizer from the modifier 106 and store the sparse regularizer, when the matrix decomposer 108 is triggered to perform decomposition, the matrix decomposer 108 can obtain the target matrix and the sparse regularizer via accessing the first cache 130 and the second cache 132, respectively. After the obtaining, the matrix decomposer 108 can be configured to apply the sparse regularizer to a RPCA approach, as discussed above, via decomposing the target matrix into a sum of low-rank and sparse matrices. The computation module 122 can work together to decompose the target matrix and solve the problem of the equation (52) as afore-mentioned with the matrix decomposer 108. The computation module 122 can be configured to perform an ADMM approach following with the RPCA approach performed by the matrix decomposer 108.

The image reconstructor 140 is electrically coupled with the matrix decomposer 108 and is configured to integrate outcomes of decomposition from the matrix decomposer 108 to form at least one recovered image from the original degraded image, utilizing the low-rank and sparse matrices. In an embodiment, the image restoration processor 100 can further include a display in electrical communication with the image reconstructor 140 and configured to display the recovered image and may further display a difference of the image before and after restoration.

The image restoration processor 100 can be applied as a visual rehabilitation pipeline or a digital image remastering framework, which combines specific steps or processes to restore images to their original visual state. In this regard, the image restoration processor 100 can be used for a scenario that dramatically reduces image capture resources by deliberately undersampling an image to be captured. This permits transfer of the undersampled image in less time and consuming fewer hardware resources; for example, an image transferred wirelessly will require less spectral bandwidth; an image transferred electronically will use less RAM/cache as the image is transferred over a network. As such, the image restoration processor 100 of the present invention improves network operation, particularly for networks that transmit considerable image, video, and/or radar data.

For example, hyperspectral imaging is a popular style of image acquisition which captures a multitude of two-dimensional images at different frequencies to yield more information from a scene. If the images are spectrally undersampled, image recovery according to the present invention may be performed to reconstruct the complete images. Similarly, other types of images, such as radar and MIMO images, which are related to antenna technology used in wireless communications, may also suffer from corruption during transmission. In such cases, the image restoration processor 100 can be effectively applied to restore these images, mitigating any potential misunderstandings or misinterpretations caused by the corruption. This can help ensure accurate and reliable image processing for radar and MIMO applications, enhancing the overall performance and reliability of these systems. Accordingly, in various embodiments, the image data is from one or more degraded hyperspectral images or from one or more degraded multispectral images.

FIG. 15 depicts a schematic diagram of an image restoration processor 200 for utilizing framework for generating sparsity regularizers for image restoration in accordance with various embodiments of the present invention. The architecture of the image restoration processor 200 illustrated in FIG. 15 is similar to that of the image restoration processor 100 illustrated in FIG. 14, expect the image restoration processor 200 further includes an image selector 260 electrically coupled between the image reconstructor 240 and the display 250.

In the image restoration processor 200 illustrated in FIG. 15, more than one M-estimator is input to the modifier 206 and accordingly the modifier 206 can output more than one sparse regularizer. For example, if two different M-estimators are input to the modifier 206, there are two different sparse regularizers generated from the modifier 206, which are modified by the same framework. As such, different sparse regularizers can be applied to a RPCA approach via decomposing the matrix into different groups of sums of low-rank and sparse matrices by the matrix decomposer 208. Thereafter, the image reconstructor 240 can integrate the outcomes of decomposition and form more than one recovered image. For example, if two different M-estimators are input to the modifier 206, the results are two recovered images formed by using different sparse regularizers from one degraded image.

The image selector 260 is configured to receive the recovered images from the image reconstructor 240 and determine which one of the recovered images is to be outputted according to the image evaluation metrics for all of the recovered images. In one embodiment, the image evaluation metrics include SNR (Signal-to-Noise Ratio), SSIM (Structural Similarity Index), and RMSE (Root Mean Square Error). The recovered image with the best image evaluation metrics can be transmit to the display 250 from the image selector 260 for displaying. In various embodiments, the display 250 electrically communicating with the image selector 260 is configured to show which function is introduced as the sparse regularizer for the output recovered image, so as to inform the user of which sparse regularizer is applied to output eventually. In various embodiments, the image restoration processor 200 can further include an image comparator 262 configured to compare the recovered images and quantify recovery performance thereof, as the tables above (e.g., TABLE I to TABLE V), thereby informing users of the quantified comparation of the recovered images.

As described above, in accordance with the various embodiments of the present invention, a framework is provided to generate sparsity promoting regularizers along with a theoretical analysis. The framework of the present invention can be applied to three M-estimators, namely, Welsch, Cauchy and GMC, and produce the IRs of those functions. It is the first time that sparsity regularizers associated with these three functions are generated. Moreover, the obtained regularizers are applied to low-rank recovery, and we propose three corresponding algorithms with convergence guarantees. Numerical examples based on both simulations and real-world data demonstrate that the image recover method of the present invention can consistently achieve outstanding restoration performance. As such, the image restoration processor of the present invention can be provided to perform the image recovery.

The Appendices as afore mentioned are provided as follows.

APPENDIX A

FIG. 16 depicts curves of P_φ(x) for different regularizers. A M-estimator f(x) may be reformulated via a half-quadratic optimization as:

$\begin{matrix} f (x) := \min_{y} \frac{1}{2} {(x - y)}^{2} + φ_{λ} (y) & (61) \end{matrix}$

where φ_λ(y) is the associated implicit regularizer (IR). The IRs of Welsch, Cauchy and German-McClure M-estimators are found in Table VI, but they cannot achieve sparseness. When φ_λ(y)=λ|y|, the solution to (61), denoted as P_φ(x), is zero for |x|≤λ. That is to say, the custom-character ₁-norm can attain sparseness and set y as zero when the magnitude of x is less than a threshold. However, when the penalty φ_λ(y) is considered as the IRs of these M-estimators, sparseness cannot be attained since the solution is zero if and only if x=0. Besides, to clearly illustrate the conclusion that for regularizers tabulated in Table VI, only the custom-character ₁-norm can attain sparseness, the curves of P_φ(x) are shown in FIG. 16.

TABLE VI

Different regularizers and their corresponding proximity operators

Regularizer φ_λ(y)

custom-character

₁

IR (Welsch)
IR (Cauchy)
IR (German-McClure)

P_φ(x)

{\begin{matrix} x + λ, & x < - λ \\ 0, & | x | \leq λ \\ x - λ, & x > λ \end{matrix}

x - x \cdot e^{- x^{2} / σ^{2}}

x - \frac{γ^{3} x}{γ^{?} + x^{2}}

x - \frac{16 τ^{?} x}{{(x^{2} + 4 τ^{2})}^{2}}

? indicates text missing or illegible when filed

APPENDIX B

Proof of Proposition 1: Since ƒ*(y) is convex in (16), it can be known that

$yx - f^{⋆} (y) = yx - {λφ}_{g, λ} (y) - \frac{y^{2}}{2} and - m (y) = - \frac{{(y - x)}^{2}}{2} - {λφ}_{g, λ} (y)$

in (18) with respect to (w.r.t.) y are concave, thus m(y) is convex w.r.t. y. That is, (19) is a convex problem.

- 1) According to (17), there is:

$\begin{matrix} φ_{g, λ} (- y) = \max_{x ϵ} - \frac{{(- y - x)}^{2}}{2 λ} + \frac{l_{g, λ} (x)}{λ} & (62) \end{matrix}$

$t = = - x \max_{t \in} - \frac{{(- y + t)}^{2}}{2 λ} + \frac{l_{g, λ} (- t)}{λ}$

$= \max_{t \in} - \frac{{(y - t)}^{2}}{2 λ} + \frac{l_{g, λ} (t)}{λ}$

$= φ_{g, λ} (y)$

where the penultimate equation is obtained because l_g,λ(x) is an even function. Thus, φ_g,λ(y) is symmetric.

- 2) According to the inversion rule of subgradient relations and (16), it is obtained:

$\begin{matrix} \arg \max_{x} x \cdot y - f (x) = \partial f^{⋆} (y) = y + λ \partial φ_{g, λ} (y) & (63) \end{matrix}$

Besides, it is obtained via (21):

$\begin{matrix} y = \nabla f (x^{⋆}) = {\begin{matrix} 0, & | x^{⋆} | \leq λ \\ x^{⋆} - a \cdot g^{'} (| x^{⋆} |) sign (x^{⋆}), & | x^{⋆} | > λ \end{matrix} & (64) \end{matrix}$

Then, the solution x* in (63) is discussed in terms of two cases, namely, y>0 and γ=0. Moreover, q (x)=x·y−ƒ(x) is defined, which is a concave function since ƒ(x) is convex, thus the solution x* for (63) satisfies ∇q(x*), and γ=ƒ′(x*) is obtained.

- 3) According to (63), when y>0, the solution to

$\arg \max_{x} y \cdot x - f (x)$

is unique, and it satisfies:

$\begin{matrix} y = x^{⋆} - a \cdot g^{'} (| x^{⋆} |) sign (x^{⋆}) & (65) \end{matrix}$

implying that y<x* due to ·g′(|x|)sign(x)>0 since g(x) is a monotonically increasing function, and by (63)(x*=y+λφ′_g,λ(y), note that ∂φ_g,λ(y) is replaced with φ′_g,λ(y) because φ_g,λ(y) is differentiable for |y|>0), the obtained is φ′_g,λ(y)>0. Therefore, φ_g,λ(y) increases with y for y>0 and φ_g,λ(y) is nonnegative because φ_g,λ(0)=0.

- 4) To verify φ_g,λ(y₁+y₂)≤φ_g,λ(y₁)+φ_g,λ(y₂) for any y₁, y₂∈, the discussion is provided for the following three cases. First, if y₁·y₂=0 and suppose that y₁≠0 and y₂≠0, the obtained is φ_g,λ(y₁+y₂)=φ_g,λ(y₁)≤φ_g,λ(y₁)+φ_g,λ(0)=φ_g,λ(y₁)+φ_g,λ(y₂) because φ_g,λ(0)=0.

If y₁·y₂>0 and it is first assumed that y₁>0 and y₂>0, then for any y>0, there is (65) and by (16), it is obtained that y increases with x* increasing. Combining x*=y+λφ′_g,λ(y) and (65), it is known φ′_g,λ(γ)=α/λ·g′(|x*|) sign (x*). Besides, it is assumed that g(x) is a concave function for x>λ and then g′(x) decreases with x increasing. Therefore, φ′_g,λ(y) decreases as y increases. That is, φ_g,λ(y) is a concave function. Thus,

$\int_{0}^{y_{2}} {φ^{'}}_{g, λ} (y_{1} + y) - {φ^{'}}_{g, λ} (y) dy = φ_{g, λ} (y_{1} + y_{2}) - φ_{g, λ} (y_{2}) - φ_{g, λ} (y_{1}) + φ_{g, λ} (0) \leq 0$

Then, there is φ_g,λ(y₁+y₂)≤φ_g,λ(y₁)+φ_g,λ(y₂). On the other hand, if y₁·y₂>0, y₁<0 and y₂<0, it is easy to obtain via (62):

$φ_{g, λ} (y_{1} + y_{2}) = φ_{g, λ} (- y_{1} - y_{2})$

$\leq φ_{g, λ} (- y_{1}) + φ_{g, λ} (- y_{2})$

$= φ_{g, λ} (y_{1}) + φ_{g, λ} (y_{2})$

Finally, if y₁·y₂<0 and it is supposed that y₁>0 and y₂<0, the conclusion is drawn immediately from φ_g,λ(y₁+y₂)≤φ_g,λ(y₁+|y₂|)≤φ_g,λ(y₁)+φ_g,λ(|y₂|)=φ_g,λ(y₁)+φ_g,λ(y₂).

APPENDIX C

Proof of Proposition 2: When, σ≤√{square root over (2)}λ, γ≤λ and τ≤√{square root over (3)}λ/2, it is known that Δd for φ_σ,λ(y), φ_γ,λ(y) and φ_τ,λ(y) is non-increasing. When |x|=λ, Δd obtains the maximum value, i.e., λ, thus Δd≤λ for the above three regularizers. While when φ(y)=λ|y|, Δd=λ, thus the proof is complete.

APPENDIX D

Proof of Proposition 3: According to Definition 3, φ_·,λ(y) is separable, thus there is:

$\arg \min_{{y_{i}}} Σ_{i = 1}^{n} \frac{1}{2} { x_{i} - y_{i} }_{2}^{2} + φ_{\cdot, λ} (y_{i})$

with solution being:

$y_{i} = P φ_{\cdot, λ} (x_{i}) = \max {0, | x_{i} | - a \cdot g^{'} (| x_{i} |)} \cdot sign (x_{i})$

thus,

$y = P φ_{\cdot, λ} (x) = \max {0, | x_{i} | - a \cdot g^{'} (| x |)} \cdot sign (x)$

where Pφ_·,λ(·) is a point-wise operator.

Besides, when σ≤√{square root over (2)}λ, γ≤1 and τ≤√{square root over (3)}λ/2, it can be known via Proposition 2:

${ x - P φ_{\cdot, λ} (x) }_{2}^{2} = \sum_{i = 1}^{n} {(x_{i} - P φ_{\cdot, λ} (x_{i}))}^{2} \leq \sum_{i = 1}^{n} {(x_{i} - P | \cdot |_{, λ} (x_{i}))}^{2} \leq n λ^{2}$

thus, there is ∥x−Pφ_·,λ(x)∥₂≤√{square root over (n)}λ

APPENDIX E

Proof of Proposition 4: The proof follows exactly as in the case of Proposition 3 because Definition 4 shows that φ_·,λ(Y) is separable.

In addition, when σ≤√{square root over (2)}λ, γ≤λ and τ≤√{square root over (3)}λ/2, via Proposition 2, there is:

${ X - P φ_{\cdot, λ} (X) }_{F}^{2} = \sum_{i = 1}^{m} \sum_{j = 1}^{n} {(X_{i, j} - P φ_{\cdot, λ} (X_{i, j}))}^{2} \leq \sum_{i = 1}^{m} \sum_{j = 1}^{n} {(X_{i, j} - P | \cdot |_{, λ} (X_{i, j}))}^{2} \leq mn λ^{2}$

thus, there is ∥X−Pφ_·,λ(X)∥_F≤√{square root over (mn)}λ

APPENDIX F

Proof of Proposition 5: Let X=U Diag(s) V^Tbe the SVD of a rank-r matrix X∈ custom-character , where s=[s₁, s₂, . . . , s_γ]^Tis the vector of singular values, and

$ P_{{ \cdot }_{φ_{\cdot, λ}}} (X) =  U Diag (s^{⋆}) V^{T},$

there is:

${ X - P_{{ \cdot }_{φ_{\cdot, λ}}} (X) }_{F}^{2} =  U Diag (s) V^{T} - U Diag (s^{⋆}) V^{T} _{F}^{2} = { Diag (s) - Diag (s^{⋆}) }_{F}^{2} =  s - s^{⋆} _{F}^{2} =  s - P_{φ_{\cdot, λ}} (s) _{F}^{2} \leq {γλ}^{2}$

where the last inequality is due to Proposition 3.

Therefore,

${ X - P_{{ \cdot }_{φ_{\cdot, λ}}} (X) }_{F} \leq \sqrt{γ} λ$

is obtained.

APPENDIX G

Before the proof, the definition of a critical point and the following proposition are first stated as follows.

- Definition 7. Suppose that ƒ is a function of M, E and Λ. If M, E and Λ satisfy:

$\begin{matrix} 0 \in \frac{\partial f (M, E, Λ)}{\partial M} \\ 0 \in \frac{\partial f (M, E, Λ)}{\partial E} \\ 0 \in \frac{\partial f (M, E, Λ)}{\partial Λ} \end{matrix}$

then, {M, E, Λ} is a critical point of ƒ.

- Proposition 6.: Defining h₁(σ,λ)=λφ_σ,λ(γ), h₂(γ,λ)=λφ_γ,λ(y) and h₃(τ,λ)=λφ_τ,λ(y), then when y>0, h₁(σ, λ), h₂(γ,λ) and h₃(τ, λ) increases with A and their corresponding parameters σ, γ and τ, respectively.

Proof:

According to (17), there is

$h_{1} (σ, λ) = - \frac{{(y - x^{⋆})}^{2}}{2} + l_{σ, λ} (x^{⋆}) .$

By (21) or the section (b) of FIG. 2, it can be known x>λ for y>0, thus there is only need to verify that h(σ,λ) increases with λ and σ when x*>λ. Then,

$\frac{\partial h_{1}}{\partial λ} = λ (1 - e^{\frac{λ^{2} - {(x^{⋆})}^{2}}{σ^{2}}}) > 0 and \frac{\partial h_{1}}{\partial σ} = (σ e^{\frac{{(x^{⋆})}^{2} - λ^{2}}{σ^{2}}} - σ + \frac{λ^{2} - {(x^{⋆})}^{2}}{σ^{2}}) e^{\frac{λ^{2} - {(x^{⋆})}^{2}}{σ^{2}}} > (σ (\frac{{(x^{⋆})}^{2} - λ^{2}}{σ^{2}} + 1) - σ + \frac{λ^{2} - {(x^{⋆})}^{2}}{σ}) e^{\frac{λ^{2} - {(x^{⋆})}^{2}}{σ^{2}}} = 0$

is checked. Thus, h₁(σ, λ) increases with λ and σ. Similarly, it can be verified that h₂(γ,λ) and h₃(τ,λ) increases with λ and their corresponding parameters γ and τ, respectively.

Proof of Theorem 2:

Let U^kDiag(s^k)(V^k)^Tbe the SVD of the matrix

${\tilde{M}}^{k} = X - E^{k + 1} + \frac{Λ^{k}}{ρ^{k}}, {\tilde{M}}^{k} \in,$

where s^k=[s₁^k, s₂^k, . . . , s_r^k]^Tis the vector of singular values, r is the rank of {tilde over (M)}^kand r«min(m,n)). Thus, {Λ^k} is bounded because:

$\begin{matrix} {{{ Λ^{k + 1} }_{F}^{2} = { Λ^{k} + ρ^{k} (X - M^{k + 1} - E^{k + 1}) }_{F}^{2} = {(ρ^{k})}^{2}  X - E^{k + 1} + \frac{Λ^{k}}{ρ^{k}} - M^{k + 1}) }_{F}^{2} = {(ρ^{k})}^{2}  U^{k} Diag (s^{k}) {(V^{k})}^{T} - U^{k} Diag ({(s^{*})}^{k}) {(V^{k})}^{T}) }_{F}^{2} = {(ρ^{k})}^{2} { Diag (s^{k}) - Diag ({(s^{*})}^{k}) }_{F}^{2} = {(ρ^{k})}^{2} { Diag (s^{k}) - Diag (P φ_{\cdot, 1 / ρ^{k}} (s^{k})) }_{F}^{2} \leq {(ρ^{k})}^{2} { Diag (s^{k}) - Diag (P_{❘ \cdot ❘, 1 / ρ^{k}} (s^{k})) }_{F}^{2} \begin{matrix} a \\ \leq \end{matrix} {(ρ^{k})}^{2} \sum_{i = 1}^{r} 1 / {(ρ^{k})}^{2} \leq r \leq \min (m, n) & (66) \end{matrix}$

where a is owing to Proposition 3.

According to the update equations of E^k+1and Λ^k, it is obtained:

$E^{k + 1} = P φ_{\cdot, λ / ρ^{k}} (X - M^{k} + \frac{Λ^{k}}{ρ^{k}})$

$and$

$E^{k} = X - M^{k} - \frac{Λ^{k} - Λ^{k - 1}}{ρ^{k - 1}}$

Thus, there is:

$\lim_{k \to \infty} { E^{k + 1} - E^{k} }_{F}^{2} = { P φ_{\cdot, λ / ρ^{k}} (X - M^{k} + \frac{Λ^{k}}{ρ^{k}}) - (X - M^{k} - \frac{Λ^{k} - Λ^{k - 1}}{ρ^{k - 1}}) }_{F}^{2} = \lim_{k \to \infty} { P φ_{\cdot, λ / ρ^{k}} (X - M^{k} + \frac{Λ^{k}}{ρ^{k}}) - (X - M^{k} + \frac{Λ^{k}}{ρ^{k}}) + \frac{Λ^{k}}{ρ^{k}} + \frac{Λ^{k} - Λ^{k - 1}}{ρ^{k - 1}} }_{F}^{2} \leq \lim_{k \to \infty} { P φ_{\cdot, λ / ρ^{k}} (X - M^{k} + \frac{Λ^{k}}{ρ^{k}}) - (X - M^{k} + \frac{Λ^{k}}{ρ^{k}}) }_{F}^{2} + \lim_{k \to \infty} { \frac{Λ^{k}}{ρ^{k}} + \frac{Λ^{k} - Λ^{k - 1}}{ρ^{k - 1}} }_{F}^{2} \leq \lim_{k \to \infty} \frac{λ^{k} mn}{{(ρ^{k})}^{2}} + \lim_{k \to \infty} { \frac{Λ^{k}}{ρ^{k}} + \frac{Λ^{k} - Λ^{k - 1}}{ρ^{k - 1}} }_{F}^{2} = 0$

where the last inequality is owing to Proposition 4.

Similarly, according to the updates of M^k+1and Λ^k, the obtained is:

$M^{k + 1} = P_{ \cdot  φ_{\cdot, 1 / ρ^{k}}} (X - E^{k + 1} + \frac{Λ^{k}}{ρ^{k}})$

$and$

$M^{k} = X - E^{k} - \frac{Λ^{k} - Λ^{k - 1}}{ρ^{k - 1}}$

Thus, there is:

$\lim_{k \to \infty} { M^{k + 1} - M^{k} }_{F}^{2} = { P_{{ \cdot }_{φ_{\cdot, 1 / ρ^{k}}}} (X - E^{k + 1} + \frac{Λ^{k}}{ρ^{k}}) - (X - E^{k} - \frac{Λ^{k} - Λ^{k - 1}}{ρ^{k - 1}}) }_{F}^{2} = \lim_{k \to \infty} { P_{{ \cdot }_{φ_{\cdot, 1 / ρ^{k}}}} (X - E^{k + 1} + \frac{Λ^{k}}{ρ^{k}}) - (X - E^{k + 1} + \frac{Λ^{k}}{ρ^{k}}) + E^{k} - E^{k + 1} + \frac{Λ^{k}}{ρ^{k}} + \frac{Λ^{k} - Λ^{k - 1}}{ρ^{k - 1}} }_{F}^{2} \leq \lim_{k \to \infty} { P_{{ \cdot }_{φ_{\cdot, 1 / ρ^{k}}}} (X - E^{k + 1} + \frac{Λ^{k}}{ρ^{k}}) - X + E^{k + 1} - \frac{Λ^{k}}{ρ^{k}} }_{F}^{2} + \lim_{k \to \infty} { E^{k} - E^{k + 1} }_{F}^{2} + \lim_{k \to \infty} { \frac{Λ^{k}}{ρ^{k}} + \frac{Λ^{k} - Λ^{k - 1}}{ρ^{k - 1}} }_{F}^{2} = 0$

It is easy to obtain:

$\lim_{k \to \infty} { X - M^{k + 1} - E^{k + 1} }_{F}^{2} = \lim_{k \to \infty} \frac{1}{ρ^{k}} { Λ^{k + 1} - Λ^{k} }_{F}^{2} = 0$

However, the preceding Frobenius norm conditions cannot guarantee the boundedness of M^kand E^k. Next, their boundedness will be established via the boundedness of custom-character _ρ_k(M^k, E^k, Λ^k) per iteration. Since the updates of E^kand M^kare the minimizers of their corresponding optimization problems, there is:

$\begin{matrix} ℒ_{ρ^{k}} (M^{k}, E^{k + 1}, Λ^{k}) \leq ℒ_{ρ^{k}} (M^{k}, E^{k}, Λ^{k}) & (67) \end{matrix}$

$ℒ_{ρ^{k}} (M^{k + 1}, E^{k + 1}, Λ^{k}) \leq ℒ_{ρ^{k}} (M^{k}, E^{k + 1}, Λ^{k})$

After updating Λ^k+1and ρ^k+1, there is:

$\begin{matrix} ℒ_{ρ^{k + 1}} (M^{k + 1}, E^{k + 1}, Λ^{k + 1}) \begin{matrix} b \\ \leq \end{matrix} ℒ_{ρ^{k}} (M^{k + 1}, E^{k + 1}, Λ^{k}) + 〈 Λ^{k + 1} / ρ^{k + 1} - Λ^{k} / ρ^{k}, X - M^{k + 1} - E^{k + 1} 〉 = ℒ_{ρ^{k}} (M^{k + 1}, E^{k + 1}, Λ^{k}) + 〈 Λ^{k + 1} / ρ^{k + 1} - Λ^{k} / ρ^{k}, (Λ^{k + 1} - Λ^{k}) / ρ^{k} 〉 = ℒ_{ρ^{k}} (M^{k + 1}, E^{k + 1}, Λ^{k}) + 1 / {(ρ^{k})}^{2} 〈 Λ^{k + 1} / μ - Λ^{k}, Λ^{k + 1} - Λ^{k} 〉 = ℒ_{ρ^{k}} (M^{k + 1}, E^{k + 1}, Λ^{k}) + 1 / {(ρ^{k})}^{2} ({ Λ^{k + 1} }_{F}^{2} / μ + { Λ^{k} }_{F}^{2} - 1 / μ 〈 Λ^{k + 1}, Λ^{k} 〉 - 〈 Λ^{k}, Λ^{k + 1} 〉) \leq ℒ_{ρ^{k}} (M^{k + 1}, E^{k + 1}, Λ^{k}) + 1 / {(ρ^{k})}^{2} ((3 / (2 μ)) + 1 / 2) { Λ^{k + 1} }_{F}^{2} + (3 / 2 + 1 / (2 μ) { Λ^{k} }_{F}^{2}) \begin{matrix} c \\ \leq \end{matrix} ℒ_{ρ^{k}} (M^{k + 1}, E^{k + 1}, Λ^{k}) + Qr / {(ρ^{k})}^{2} & (68) \end{matrix}$

where b and c are due to Proposition 6 and (66), respectively, and Q is a constant w.r.t. u. Combining (67) and (68), there is:

$ℒ_{ρ^{k + 1}} (M^{k + 1}, E^{k + 1}, Λ^{k + 1}) \leq ℒ_{ρ^{k}} (M^{k}, E^{k}, Λ^{k}) + Qr / {(ρ^{k})}^{2}$

Thus, there is:

$ℒ_{ρ^{k}} (M^{k}, E^{k}, Λ^{k}) \leq ℒ_{ρ^{0}} (M^{0}, E^{0}, Λ^{0}) + \sum_{i = 0}^{k - 1} Qr / {(ρ^{k})}^{2}$

Given M⁰, E⁰and Λ⁰are bounded, since

$\lim_{k \to \infty} \sum_{i = 0}^{k - 1} \frac{1}{{(ρ^{i})}^{2}} = \frac{μ^{2}}{ρ^{0} (μ^{2} - 1)} < \infty,$

it is known that custom-character _ρ_k(M^k, E^k, Λ^k) is bounded from above. Therefore, _ρ_k(M^k,E^k+1,Λ^k) and _ρ_k(M^k+1, E^k+1, Λ^k) are bounded from above via (67), implying that the updates of E^k+1or M^k+1per iteration are bounded. This is because if ∥E^k+1∥_F²→∞ or ∥M^k+1∥_F²→∞, then _ρ_k(M^k, E^k+1, Λ^k)→∞ or custom-character _ρ_k(M^k+1, E^k+1, Λ^k)→∞.

By Bolzano-Weierstrass theorem, the boundness of {M^k, E^k, Λ^k} suggests that there exists at least one accumulation point {M*, E*, A*} for {M^k, E^k, Λ^k}. That is, there exists a subsequence {M^k^j, E^k^j, Λ^k^j} such that:

$\begin{matrix} \lim_{k_{j} \to \infty} E^{k_{j}} = E^{*} & (69) \end{matrix}$

$\lim_{k_{j} \to \infty} M^{k_{j}} = M^{*}$

$\lim_{k_{j} \to \infty} Λ^{k_{j}} = Λ^{*}$

Since the proximal operator is the closed-form solution to the Moreau envelope of the sparsity-inducing regularizers, that is, E^k+1and M^k+1are the minimizers of (56) and (58), respectively, although the regularizers are nonconvex, there is:

$\begin{matrix} 0 \in \frac{\partial ℒ (M^{k}, E^{k + 1}, Λ^{k})}{\partial E} & (70) \end{matrix}$

$0 \in \frac{\partial ℒ (M^{k + 1}, E^{k + 1}, Λ^{k})}{\partial M}$

In addition,

$\frac{\partial ℒ (M^{k}, E^{k + 1}, Λ^{k})}{\partial E} = \frac{\partial ℒ (M^{k + 1}, E^{k + 1}, Λ^{k + 1})}{\partial E} - ρ^{k} (M^{k + 1} - M^{k} + \frac{Λ^{k} - Λ^{k + 1}}{ρ^{k}})$

$\frac{\partial ℒ (M^{k + 1}, E^{k + 1}, Λ^{k})}{\partial M} = \frac{\partial ℒ (M^{k + 1}, E^{k + 1}, Λ^{k + 1})}{\partial M} + M^{k + 1}$

$\frac{\partial ℒ (M^{k + 1}, E^{k + 1}, Λ^{k + 1})}{\partial Λ} = X - M^{k + 1} - E^{k + 1} = (Λ^{k + 1} - Λ^{k}) / ρ^{k}$

Thus, the following can be obtained via combining (69) and (70):

$\lim_{k_{j} \to \infty} \frac{\partial ℒ (M^{k_{j} + 1}, E^{k_{j} + 1}, Λ^{k_{j} + 1})}{\partial E} = \lim_{k_{j} \to \infty} \frac{\partial ℒ (M^{k_{j}}, E^{k_{j} + 1}, Λ^{k_{j}})}{\partial E} + \lim_{k_{j} \to \infty} ρ^{k_{j}} (M^{k_{j} + 1} - M^{k_{j}} + \frac{Λ^{k_{j}} - Λ^{k_{j} + 1}}{ρ^{k_{j}}}) ∋ 0$

$\lim_{k_{j} \to \infty} \frac{\partial ℒ (M^{k_{j} + 1}, E^{k_{j} + 1}, Λ^{k_{j} + 1})}{\partial M} = \lim_{k_{j} \to \infty} \frac{\partial ℒ (M^{k_{j} + 1}, E^{k_{j} + 1}, Λ^{k_{j}})}{\partial M} - \lim_{k_{j} \to \infty} (Λ^{k_{j}} - Λ^{k_{j} + 1}) ∋ 0$

$\lim_{k_{j} \to \infty} \frac{\partial ℒ (M^{k_{j} + 1}, E^{k_{j} + 1}, Λ^{k_{j} + 1})}{\partial Λ} = (Λ^{k_{j} + 1} - Λ^{k_{j}}) / ρ^{k_{j}} ∋ 0$

Therefore, any accumulation point {M*, E″, Λ*} is a critical point.

APPENDIX H

FIG. 17 plots corresponding results in terms of log-scale RRE versus c, in which sections (a)-(c), (d)-(f) and (g)-(i) show the influence of matrix rank, ratio of outliers and matrix dimensions, respectively. Specifically, FIG. 17 depicts Log-scale RRE versus c, in which the sections (a)-(b) plot the curves for different methods with only varying rank, and p_s=0.1 as well as m=n=400 keep fixed; the sections (e)-(g) show the curves for different outlier ratios p_s, and p_r=0.05 as well as m=n=400; the sections (g)-(i) plot the curves by varying dimensions, and p_r=0.05 as well as p_s=0.1.

Motivated by PCP, the parameter λ is chosen as c/√{square root over (max(m,n))}, i.e., λ=c/√{square root over (max(m,n))}, where c is a constant. The value of c is investigated in three cases, that is, different matrix rank, ratio of outliers and matrix dimensions.

It can be seen that the range of proper values of c for the proposed algorithms is larger than that of PCP, and compared with PCP, the matrix rank and dimensions have little impact on the choice of c for the proposed methods. Besides, the range of c for all techniques decreases with the increase of ratio of outliers, thus it is set λ=1/√{square root over (max(m,n))} in the synthetic data for convenience, because all the methods including PCP have a comparable recovery performance although λ is not the optimal value for the current settings.

The functional units and modules of the image restoration processor and methods in accordance with the embodiments disclosed herein may be embodied in hardware or software. That is, the claimed image restoration processor may be implemented entirely as machine instructions or as a combination of machine instructions and hardware elements. Hardware elements include, but are not limited to, computing devices, computer processors, or electronic circuitries including but not limited to application specific integrated circuits (ASIC), field programmable gate arrays (FPGA), microcontrollers, and other programmable logic devices configured or programmed according to the teachings of the present disclosure.

Computer instructions or software codes running in the computing devices, computer processors, or programmable logic devices can readily be prepared by practitioners skilled in the software or electronic art based on the teachings of the present disclosure.

The image restoration processor may include computer storage media, transient and non-transient memory devices having computer instructions or software codes stored therein, which can be used to program or configure the computing devices, computer processors, or electronic circuitries to perform any of the processes of the present invention. The storage media, transient and non-transient memory devices can include, but are not limited to, floppy disks, optical discs, Blu-ray Disc, DVD, CD-ROMs, and magneto-optical disks, ROMs, RAMS, flash memory devices, or any type of media or devices suitable for storing instructions, codes, and/or data.

The image restoration processor may also be configured as distributed computing environments and/or Cloud computing environments, wherein the whole or portions of machine instructions are executed in distributed fashion by one or more processing devices interconnected by a communication network, such as an intranet, Wide Area Network (WAN), Local Area Network (LAN), the Internet, and other forms of data transmission medium.

The foregoing description of the present invention has been provided for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations will be apparent to the practitioner skilled in the art.

The embodiments were chosen and described in order to best explain the principles of the invention and its practical application, thereby enabling others skilled in the art to understand the invention for various embodiments and with various modifications that are suited to the particular use contemplated.

IMAGE RECOVERY PROCESSOR UTILIZING FRAMEWORK FOR GENERATING SPARSITY REGULARIZERS FOR IMAGE RESTORATION

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims