Deep Regularized Compound Gaussian Network

FIELD OF THE DISCLOSURE

This disclosure relates to image processing, including object recognition.

BACKGROUND

Reconstructing a digital image and/or digital signal from measurement data is generally a linear inverse problem. A variety of techniques have been used for image estimation, including iterative approaches and deep learning approaches. Conventional techniques using either of these approaches suffer from several disadvantages. In conventional iterative imaging systems, disadvantages include the limitation to a narrow set of signal priors (statistics), slow convergence, and inability for real-time imaging.

For example, the Iterative Shrinkage and Thresholding Algorithm (ISTA) restricts prior distribution of the desired signal to the Laplace distribution, which does not capture desired statistical properties including self-similarity, heavy-tailed marginal distributions, and self-reinforcement among local coefficients and is slow to converge.

The Compressive Sampling Matching Pursuit (CoSaMP) algorithm restricts prior distribution of the desired signal to the Laplace distribution, which does not capture desired statistical properties including self-similarity, heavytailed marginal distributions, and self-reinforcement among local coefficients. Success is highly dependent on signal sparsity, which is not accurate for many signals and images.

The l₁least squares (l₁-LS) restricts prior distribution of the desired signal to the Laplace distribution, which does not capture desired statistical properties including self-similarity, heavy-tailed marginal distributions, and self-reinforcement among local coefficients. It can be slow to converge.

The Bayesian Compressive Sensing (BCS) algorithm restricts prior distribution of the desired signal to the Student's t distribution, which does not capture desired statistical properties including self-similarity, heavy-tailed marginal distributions, and self-reinforcement among local coefficients. Enforces higher levels of signal sparsity, which is not accurate for many signals and images. It is slow to converge.

The Hierarchical Bayesian Maximum a Posteriori algorithm uses compound Gaussian prior distribution, with scale variable fixed to a log normal distribution, to iteratively solve a linear inverse problem with particular application in tomographic imaging. This method outperforms comparative iterative estimation approaches, with Compound Gaussian Least Squares (below) performing comparably, but is computationally intensive and slow to converge.

The Compound Gaussian Least Squares (CG-LS) algorithm uses a compound Gaussian prior distribution, with scale variable fixed to a log normal distribution, to iteratively and alternatively solve a linear inverse problem with particular application in tomographic imaging. This method outperforms comparative iterative estimation approaches, with HB-MAP performing comparably, but is slow to converge.

In conventional deep learning imaging systems disadvantages include the reliance on significant amounts of imaging training data and the black-box nature of these systems. For example, Reconstruction Network (ReconNet) uses a standard convolution neural network to solve a linear inverse problem with particular application in compressive sensing. In using standard convolutional layers, this method is a blackbox process where there is little understanding as to what each layer in this neural network is representing. The method does not incorporate any prior information about the signals of interest. These factors consequently require ReconNet to be trained on a significant amount of training samples to be accurate.

Iterative Shrinkage and Thresholding Algorithm Network (ISTA-Net) structures a deep neural network around the iterative shrinkage and thresholding algorithm for solving linear inverse problems with particular application in compressive sensing. The proximal operator evaluation in ISTA is replaced by a sequence of convolutional layers in ISTA-Net. Restricts prior distribution of the desired signal to the Laplace distribution, which does not capture desired statistical properties including self-similarity, heavy-tailed marginal distributions, and self-reinforcement among local coefficients. Method performs well when trained on a significant amount of training data, but stagnates when few training samples are available.

Memory Augmented Deep Unfolded Network (MADUN) structures a deep neural network around the iterative shrinkage and thresholding algorithm for solving linear inverse problems with particular application in compressive sensing. The proximal operator evaluation in ISTA is replaced by a convolutional long short-term network in MAD UN. This method does not incorporate any prior information about the signals of interest and instead learns this from data provided to the network. Consequently, MADUN requires a significant amount of training samples required to train the network for sufficient accuracy. A significant component, the long short-term networks, of this method is a blackbox process where the precise relationship between the signal and network layer outputs is unclear.

BRIEF DESCRIPTION OF THE DRAWINGS/FIGURES

The accompanying drawings, which are incorporated in and constitute part of the specification, illustrate embodiments of the disclosure and, together with the general description given above and the detailed descriptions of embodiments given below, serve to explain the principles of the present disclosure. In the drawings:

FIG. 1 is an example deep neural network with: Input layer L₀∈ custom-character ³, first hidden layer L₁∈⁵, second hidden layer L₂∈⁴, third hidden layer L₃∈², and output layer in L₄∈⁵in accordance with an embodiment of the present disclosure;

FIG. 2 is an example block structure for a CG-Net image reconstruction network, which is the unrolled deep neural network of the compound Gaussian least squares algorithm in accordance with an embodiment of the present disclosure;

FIGS. 3A-3C show another example of a CG-Net image reconstruction network in accordance with an embodiment of the present disclosure;

FIGS. 4A-4E show an example end-to-end network structure of a DR-CG-Net image reconstruction network in accordance with an embodiment of the present disclosure;

FIG. 6 is a flowchart setting forth the steps of an example method for training a CG-Net image reconstruction network on training data in accordance with an embodiment of the present disclosure;

FIG. 9 is a block diagram of an example CG-Net image reconstruction system in accordance with an embodiment of the present disclosure;

FIG. 10 is a block diagram of example components that can implement the system of FIG. 9 in accordance with an embodiment of the present disclosure;

FIG. 11 is a diagram of an exemplary image estimation system in accordance with an embodiment of the present disclosure;

FIG. 12 is a flowchart of an exemplary method for image estimation in accordance with an embodiment of the present disclosure;

FIG. 14 is a diagram illustrating complete scale variable mapping for each z_kin accordance with an embodiment of the present disclosure;

FIG. 15 is a diagram illustrating CG-Net intermediate scale variable mapping in accordance with an embodiment of the present disclosure;

FIG. 16 is a diagram illustrating PGD intermediate scale variable mapping in accordance with an embodiment of the present disclosure;

FIG. 17 shows exemplary image reconstructions in accordance with an embodiment of the present disclosure; and

Features and advantages of the present disclosure will become more apparent from the detailed description set forth below when taken in conjunction with the drawings, in which like reference characters identify corresponding elements throughout. In the drawings, like reference numbers generally indicate identical, functionally similar, and/or structurally similar elements. The drawing in which an element first appears is indicated by the leftmost digit(s) in the corresponding reference number.

DETAILED DESCRIPTION

In the following description, numerous specific details are set forth to provide a thorough understanding of the disclosure. However, it will be apparent to those skilled in the art that the disclosure, including structures, systems, and methods, may be practiced without these specific details. The description and representation herein are the common means used by those experienced or skilled in the art to most effectively convey the substance of their work to others skilled in the art. In other instances, well-known methods, procedures, components, and circuitry have not been described in detail to avoid unnecessarily obscuring aspects of the disclosure.

References in the specification to “one embodiment,” “an embodiment,” “an exemplary embodiment,” etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to understand that such description(s) can affect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.

Overview

Embodiments of the present disclosure use a deep neural network for image estimation using a (learned) compound Gaussian prior. For example, embodiments of the present disclosure use an unrolled deep network that solves linear inverse problems with particular application in tomographic imaging and image compressive sensing. Systems and methods in accordance with embodiments of the present disclosure result in image reconstructions with a higher similarity index than those produced by conventional methods. Image reconstructions enabled by embodiments of the present disclosure are useful in a variety of applications, including radar, sonar, medical, and tomographic imaging systems.

Embodiments of the present disclosure provide a deep neural network, Deep Regularized Compound Gaussian Network (DR-CG-Net), that is formed from applying algorithm unrolling to a generalized Compound Gaussian Least Squares (CG-LS) iterative algorithm. The generalized CG-LS algorithm incorporates a compound Gaussian prior, which decomposes a signal of interest into the product of a scale variable component and Gaussian component. The compound Gaussian prior is a generalization on an assumed Laplace prior and t distribution and has been shown empirically to better capture statistical properties for signals of interest. Additionally, the generalized CG-LS algorithm does not fix the scale variable distribution and instead retains an implicit distribution that can be specified on an application specific basis.

In an embodiment, DR-CG-Net learns the scale variable distribution from the training data and is the first deep neural network to incorporate learning a prior distribution while still constraining to a specific class of priors. In an embodiment, DR-CG-Net constrains to the powerful class of compound Gaussian priors. In doing so, DR-CG-Net maintains the useful impact of the compound Gaussian prior, but has greater flexibility and learning capacities within this class of priors to represent signals.

Systems and methods according to embodiments of the present disclosure significantly outperform comparative methods and perform comparably in tomographic imaging and compressive sensing, when only a small amount of training data is available. Additionally, when larger amounts of training data are available, systems and methods according to embodiments of the present disclosure outperforms conventional tomographic imaging techniques.

It is an aspect of the present disclosure to provide a method for reconstructing an image. The method includes accessing measurement data with a computer system, where the measurement data have been acquired from a subject using an imaging system. A neural network is also accessed with the computer system, where the neural network has been trained on training data to reconstruct an image from data consistent with the measurement data. The neural network includes an unrolled iterative reconstruction algorithm that implements a compound Gaussian prior. A reconstructed image is generated from the measurement data by inputting the measurement data to the neural network using the computer system, thereby generating the reconstructed image from an output of the neural network. The reconstructed image may then be output with the computer system.

It is another aspect of the present disclosure to provide a method for training a neural network to reconstruct an image from measurement data. The method includes accessing training data with a computer system, where the training data include matched pairs of measurement data and sparse domain coefficients. A neural network is trained on the training data, using the computer system, to learn parameters of the neural network. The neural network includes an unrolled iterative reconstruction algorithm implementing a compound Gaussian prior. The neural network and the learned parameters are stored with the computer system as a trained neural network.

Described here are systems and methods for reconstructing images and/or signals using a deep learning model that incorporates a compound Gaussian prior. As a non-limiting example, the deep learning model can be implemented as a deep neural network that incorporates a broad compound Gaussian prior distribution for the signal-of-interest (e.g., the image or signal to be reconstructed) into a deep learning framework for solving general linear inverse problems. In some implementations, the disclosed systems and methods may be referred to as a “CG-Net.” Unlike previous unrolling approaches, the systems and methods described in the present disclosure do not need to replace any part of the original optimization of the iterative reconstruction framework with a neural network structure that must be learned from scratch, which provides for a better interpretability of the CG-Net.

It is an aspect of the systems and methods described in the present disclosure to provide a deep learning-based reconstruction algorithm that incorporates a compound Gaussian prior. The compound Gaussian prior may be a generalization on the Laplace prior and/or Student's t-distribution. Advantageously, the compound Gaussian prior can better capture statistical properties of the signals of interest. Accordingly, by incorporating a compound Gaussian prior into the reconstruction techniques described in the present disclosure, an improvement in reconstructed signal quality can be achieved relative to previous deep learning-based reconstruction techniques, even when only a small number of training data samples is available. Moreover, the incorporation of the compound Gaussian prior in the disclosed systems and methods provides a more computationally efficient implementation that is faster and easier to implement in a deep neural network structure. Accordingly, the functioning of the computer system implementing the reconstruction techniques described in the present disclosure is also improved by way of the reduced computational burden on the limited resources of the computer system (e.g., reduced memory burden, reduced computational complexity).

Image Reconstruction

Image reconstruction is often formulated as an underdetermined linear inverse problem. In other words, the forward measurement model may be formulated as:

$\begin{matrix} y = Ψ x + v, & (1) \end{matrix}$

where x ∈ custom-character ⁿis a vectored, n=N×N, image observed through a measurement matrix Ψ∈^m×n, with additive white noise, ν ∈^m, and where y ∈^mare the underdetermined linear measurements. Typically, m<<n implying that obtaining the signal or image, x, given the measurements, y, does not produce a unique solution. Although, exploiting the sparsity of images under a change of basis makes it possible to uniquely reconstruct an image with measurements sampled below the Nyquist frequency. Letting Φ∈ custom-character ^n×nbe a sparsity inducing transformation, such as the discrete Fourier transform or a wavelet dictionary, and letting c∈ⁿbe the sparse coefficients of x, and by defining A=ΨΦ, Eqn. (1) can be rewritten as,

$\begin{matrix} y = Ac + v . & (2) \end{matrix}$

Now, the inverse estimation problem to Eqn. (2) aims to recover the sparse coefficients, c, from the measurements, y, corresponding to an observation method, Ψ, and sparsity transform, Φ, chosen a priori. Often, image estimation is solved by an iterative algorithm, which involves the minimization of an objective function and may include subspace projections between minimization steps. Many iterative algorithms for image estimation have been developed, including Iterative Shrinkage and Thresholding, Basis Pursuit, Bayesian Compressive Sensing (BCS), and Compressive Sampling Matching Pursuit (CoSaMP).

One useful way to formulate inverse problems in imaging is by Bayesian estimation. In particular, considering the maximum a posteriori (MAP) estimation of c from Eqn. (2):

$\begin{matrix} c^{*} = \underset{c}{\arg \min} {- \log (p (y ❘ c)) - \log (p (c))}, & (3) \end{matrix}$

which can, equivalently, be viewed as a regularized least squares (RLS) optimization:

$\begin{matrix} c^{*} = \underset{c}{\arg \min} { y - Ac }_{2}^{2} + R (c) . & (4) \end{matrix}$

As the regularization satisfies R(c) ∝-log(p(c)), the choice of prior density, p(c), of the sparse coefficients, c, is a useful component to incorporate domain level knowledge into the image reconstruction problem.

Compound Gaussian Prior

With that in mind, the systems and methods described in the present disclosure make use of a compound Gaussian prior. Sparse coefficients of natural images exhibit self-similarity, heavy-tailed marginal distributions, and self-reinforcement among local coefficients. Such properties are not encompassed by the generalized Gaussian prior typically assumed for the image sparsity coefficients. Instead, a class of densities known as compound Gaussian (“CG”) densities, or Gaussian scale mixtures, better captures the statistical properties of sparse coefficients of natural images and images from other modalities, such as radar. A useful formulation of the CG prior lies in modeling the sparse coefficients of images as the Hadamard product:

$\begin{matrix} c = z ⊙ u = {[\begin{matrix} z_{1} u_{1} & z_{2} u_{2} & \dots & z_{n} u_{n} \end{matrix}]}^{T} & (5) \end{matrix}$

such that z=h(x), where h: custom-character → is a componentwise, positive, nonlinear function, x follows a multi-scale Gaussian tree process, u˜(0, Σ_u), and u and z are independent random variables. The CG prior subsumes many well-known distributions including the generalized Gaussian allowing an interpretation of the CG prior as a generalization of compressive sensing work.

Algorithm Unrolling and Deep Neural Networks

Deep neural networks (DNN) generally include three main layer types: input, hidden, and output. An example DNN is shown in FIG. 1. FIG. 1 is an example deep neural network with: Input layer L₀∈ custom-character ³, first hidden layer L₁∈⁵, second hidden layer L₂∈⁴, third hidden layer L₃∈², and output layer in L₄∈⁵in accordance with an embodiment of the present disclosure. The input layer, L₀, is assigned the input data to the network such as the measurements, y, from Eqn. (2). The output layer, L₄, returns the desired output from the network, when provided an input, such as the wavelet coefficients, c, from Eqn. (2). Finally, hidden layers, L₁, L₂, and L₃, are transformations of the input data to produce the desired output. In general, L_i∈ custom-character ^dⁱwhere d_icorresponds to the number of hidden units in the layer.

Layers of a DNN are connected by functions, ƒ_i, parameterized by θ_i, such that

$\begin{matrix} L_{i + 1} = f_{i} (L_{i}; θ_{i}) & (6) \end{matrix}$

$for$

$i = 0, 1, \dots, k$

where L_iis network layer i and the network contains k hidden layers. For a standard, fully connected, neural network (NN)

$\begin{matrix} f_{i} (x; [W_{i}, b_{i}]) = σ (W_{i} x + b_{i}) & (7) \end{matrix}$

where x ∈ custom-character ⁿis an input vector, W_i∈^m×nis a weight matrix, b_i=^mis a bias vector, and σ:→ is a componentwise activation function. Standard activation functions include sigmoid, Rectified Linear Unit (ReLU), and hyperbolic tangent. In networks described in the present disclosure, the input may include noisy Radon transform measurements, or other medical imaging or other imaging data, y_i∈ custom-character ^m, from (2), and the output layer returns the estimated wavelet coefficients denoted by c(y_i;Θ∈ⁿwhere Θ=[θ₀, . . . , θ_k] are the network parameters.

Using a training dataset, custom-character ={(y_i, c_i):i=1, 2, . . . , M}, the network parameters, Θ=[θ₀, . . . , θ_k], may be learned by optimizing a loss or error function, (Θ), between the network outputs c(y_i; Θ), and actual wavelet coefficients c_i. Common loss functions for image reconstruction neural networks include mean-squared error (MSE), normalized mean-squared error (NMSE), peak signal to noise ratio (PSNR), SSIM, or linear combinations of these.

In some implementations, algorithm unrolling may be used when constructing the deep learning-based machine learning models described in the present disclosure. Algorithm unrolling structures a DNN based upon an iterative image estimation algorithm (IIEA). Algorithm unrolling places the operations from each step j of the IIEA as the function ƒ_jdefining the layers in a DNN. Then the parameters, θ_j, on each step j of the IIEA parameterize ƒ_jin the DNN. By training the unrolled DNN, each θ_jis learned, which optimizes the IIEA, to produce improved image estimates. Algorithm unrolling can provide significant performance improvements in image reconstruction while offering simple interpretability of the network layers.

CG-LS Algorithm

It is an aspect of the systems and methods described in the present disclosure to provide an iterative image estimation algorithm, referred to as compound Gaussian least squares (CG-LS), for solving Eqn. (2) with general sensing matrix Ψ, under the assumption of a global compound Gaussian prior. Informed by the statistical representation of image coefficients through the compound Gaussian prior, the CG-LS iterative image reconstruction algorithm, which enforces the compound Gaussian prior, is used as the reconstruction framework for developing the CG-Net deep learning reconstruction model. CG-LS is based on an RLS optimization where a regularization term is chosen to enforce the CG prior. Furthermore, the CG-LS algorithm is unrolled into a DNN that may be referred to as CG-Net. For training CG-Net, image estimation is estimated from Radon transform measurements, or other medical imaging or other imaging data types, as these data types underlie many imaging applications, including microwave, acoustic, and medical imaging modalities.

An example of the CG-LS iterative reconstruction algorithm is now described in more detail. Consider

$c^{*} = \underset{c}{\arg \min} { y - Ac }_{2}^{2} + R (c),$

the regularized least squares estimate of c from Eqn. (2). The regularization function, R: custom-character ⁿ→, is determined by the prior enforced on c. For the compound Gaussian prior, we decompose c=z⊙u where z=h(x)=√{square root over (exp(x/α))}, for Gaussian random vectors u and x, and we aim to estimate z and u. Let h be the component-wise, invertible, nonlinear function in the CG prior and ƒ=h⁻¹the inverse nonlinearity. Hence, the following regularization may be selected, as an example:

$\begin{matrix} R (z, u) = λ { u }_{2}^{2} + μ { f (z) }_{2}^{2} & (8) \end{matrix}$

to enforce normality of u and x=ƒ(z), and the regularized least squares estimate [(z*)^T(u*)^T]^Tmay be considered to be given as the solution to:

$\begin{matrix} \underset{[z, u]}{\arg \min} { y - A (z ⊙ u) }_{2}^{2} + λ { u }_{2}^{2} + μ { f (z) }_{2}^{2} . & (9) \end{matrix}$

Based on the component-wise nonlinearity h(x)=√{square root over (exp(x/α))}, the inverse is also a component-wise function and may be given by ƒ(z)=2α ln(z). The regularization, R(z,u), may approximate the log prior from a MAP estimation while still capturing the desired statistical properties for the prior. Empirically, this approximation is effective for the underlying problem of estimating image wavelet coefficients, or other sparse domain coefficients.

For a vector w∈ custom-character ⁿ, D{w}=diag(w) can be defined and A_w=AD{w}. Due to the explicit joint estimation in Eqn. (9), optimization may be performed by block coordinate descent, which on iteration k may be given by the following:

$\begin{matrix} z_{k} = \underset{z}{\arg \min} { y - A_{u_{k - 1}} z }_{2}^{2} + μ { f (z) }_{2}^{2} & (10) \end{matrix}$

$\begin{matrix} u_{k} = \underset{u}{\arg \min} { y - A_{z_{k}} u }_{2}^{2} + λ { u }_{2}^{2} . & (11) \end{matrix}$

Algorithm 1 below details pseudocode for the CG-LS estimate of u and z.

Algorithm 1 CG-LS

Input: Measurement y, measurement operator A, number of

iterations K, and number of Newton descent steps J

1:
z₀= T_∈(A^Ty) and u₀= (A_z₀^TA_z₀+ λI)⁻¹A_z₀^Ty

2:
for k ∈ {1, 2, . . . , K} do

3:
z ESTIMATION:

4:
z_k⁰= z_k−1

5:
for j ∈ {0, 1, . . . , J − 1} do

6:
Compute step size η_k^j(Armijo line search)

7:
Compute d_k^j= (H (z_k^j; u_k−1))⁻¹v(z_k^j; u_k−1, y)

8:
z_k^j+1= T_∈ (z_k^j− η_k^jd_k^j)

9:
end for

10:
z_k= z_k^J

11:
u ESTIMATION:

12:
u_k= (A_z_k^TA_z_k+ λI)⁻¹A_z_k^Ty

13:
end for

Output: c* = z_K⊙ u_K

Note that Eqn. (11) is a Tikhonov regularization problem with a solution given by:

$\begin{matrix} u_{k} = {(A_{z_{k}}^{T} A_{z_{k}} + λ I)}^{- 1} A_{z_{k}}^{T} y . & (12) \end{matrix}$

As a non-limiting example, Eqn. (10) can be solved using damped Newton descent, which uses the gradient and Hessian of Eqn. (10), given respectively by:

$\begin{matrix} v (z; u, y) = 2 A_{u}^{T} (A_{u} z - y) + 2 μ D {{[h^{- 1}]}^{'} (z)} h^{- 1} (z) & (13) \end{matrix}$

$\begin{matrix} H (z; u) = 2 A_{u}^{T} A_{u} + 2 μ D {w (z)}, & (14) \end{matrix}$

$where$

$\begin{matrix} w (z) = D {{[h^{- 1}]}^{″} (z)} h^{- 1} (z) + D {{{[h^{- 1}]}^{'} (z)} [h^{- 1}]}^{'} (z) . & (15) \end{matrix}$

For clarity, “step” may be used to refer to an update of z by damped Newton descent (or other techniques) and “iteration” to refer to an update of both u and z in CG-LS. It can be shown that Eqn. (10), with ƒ(z)=h⁻¹(z)=2α ln(z), is a convex cost function so long as z∈(0, e)ⁿ. Hence, in some examples, a nonlinear thresholding operator may be applied to z after each Newton descent step, such as the following component-wise operator:

$\begin{matrix} T_{ϵ} (x) = {\begin{matrix} ϵ & x \leq 0 \\ x & 0 < x < e \\ e - ϵ & x \geq e \end{matrix} & (16) \end{matrix}$

where ϵ is a small positive real number. Thus, z on Newton descent step j+1 of iteration k, denoted by z_k^j+1, is

$\begin{matrix} z_{k}^{j + 1} = T_{ϵ} (z_{k}^{j} - {η_{k}^{j} (H (z_{k}^{j}; u_{k - 1}))}^{- 1} v (z_{k}^{j}; u_{k - 1}, y)) & (17) \end{matrix}$

where η_k^jis a step size often chosen via a backtracking, Armijo line search, or the like.

As a non-limiting example, a convergence criterion for CG-LS can be implemented as follows: Let δ be a small positive real number and define the Newton decrement λ(z;u, y)∈ custom-character as

$\begin{matrix} λ (z; u, y) = {v (z; u, y)}^{T} {(H (z; u))}^{- 1} v (z; u, y) . & (18) \end{matrix}$

The Newton decrement serves as a convergence metric for Newton descent. Thus, before each Newton descent step, that is between lines 5 and 6 in Algorithm 1, a check can be made if λ(z_k^j; u_k−1, y)≤δ. When this holds the Newton descent steps are exited and the algorithm proceed with the next u estimate, u_k, given in line 12 of Algorithm 1. Once λ(z_k⁰; u_k−1, y)≤δ, that is, Newton descent is unnecessary because the error tolerance has been reached, it can be said that CG-LS has converged and the algorithm can be terminated. If no k∈{1, 2, . . . , K} exists such that λ(z_k⁰; u_k−1, y)≤δ then CG-LS did not converge.

In addition to Algorithm 1, the CG-LS algorithm may also be implemented with Algorithm 2 illustrated below.

Algorithm 2 Compound Gaussian Least Squares (CG-LS)

1:
z₀= custom-character

a,b(A^Ty) and u₀= (A_z₀^TA_z₀+ λI)⁻¹A_z₀^Ty

2:
for k ∈ {1, 2, . . . , K} do

3:
z ESTIMATION:

4:
z_k⁰= z_k−1

5:
for j ∈ {1, 2, . . . , J} do

6:
if ||∇_zF(u_k−1, z_k^j−1)||_*²_(k,j)< δ then

7:
return z_k= z_k^j−1

8:
end if

9:
Compute step size η_k^(j)(backtracking line search)

10:
Compute descent vector d_k^j= d_k^j(z_k^j−1; u_k−1, y)

11:
z_k^j= z_k^j−1+ η_k^(j)d_k^j(z_k^j−1; u_k−1, y)

12:
end for

13:
z_k= z_k^J

14:
u ESTIMATION:

15:
u_k= (A_z_k^TA_z_k+ λI)⁻¹A_z_k^Ty

16:
end for

Output: c* = z_K⊙ u_K

In this example, steps are performed using a steepest descent instead of damped Newton descent. As described above, CG-LS is an iterative reconstruction algorithm that approximately solves the optimization in Eqn. (9).

The cost function is a regularized least squares cost function where, as given by the CG prior, the sparse coefficients are decomposed as c=z⊙u and the regularization may be taken to be R(c)=R(u,z)=λ∥u∥₂²+μ∥ƒ(z)∥₂²to enforce normality of u and x=ƒ(z), a Gaussian tree process, as desired from the CG prior.

As the optimization in Eqn. (10) cannot be solved analytically, in some examples a steepest descent approach can be used to iteratively and approximately solve Eqn. (10). Given a norm ∥·∥ on custom-character ⁿand differentiable function, g(x):ⁿ→, the steepest descent direction, d:ⁿ→ⁿ, is defined as

$\begin{matrix} d (x) = { \nabla g (x) }_{*} (\underset{ v  = 1}{\arg \min} \nabla {g (x)}^{T} v) & (19) \end{matrix}$

where ∥·∥_*is the dual norm given by ∥w∥_*=max_∥V∥=1w^Tv. For instance the Euclidean norm produces d (x)=−∇g(x). Additionally, each steepest descent step may be scaled by a step size, η=η(x), determined by a backtracking line search. That is, given two user-chosen parameters τ∈(0,½] and β∈(0,1) the step size is chosen to be η=β^rwhere r=r(x) is the minimum, non-negative integer such that

$\begin{matrix} g (x + β^{r} d (x)) \leq g (x) + {τβ}^{r} \nabla {g (x)}^{T} d (x) . & (20) \end{matrix}$

In some examples, r can be determined by incrementing from zero until Eqn. (20) is first satisfied. With an initial guess x⁰the sequence {x^j}_j=1^∞ generated by steepest descent is given by x^j=x^j−1+η^(j)d(x^j−1) where η^(j)depends on the previous step x^j−1as η^(j)=β^r(x^j−1⁾.

Now, when applying steepest descent to the cost function in Eqn. (10), z_k^jcan be referred to as the estimate of z on steepest descent step j of iteration k. For generality, a different norm may define each steepest descent step as is the case in Newton descent for a convex cost function with non-constant Hessian. The descent direction corresponding to norm ∥·∥_(k,j), with dual norm ∥·∥_*(k,j), for steepest descent step j of iteration k can be denoted as d_k^j=d_k^j(z). Furthermore, the backtracking line search step size for steepest descent step j of iteration k can be denoted as η_k^(j)=β^r(z^k^j−1⁾. Thus, z_k^jmay be given by:

$\begin{matrix} z_{k}^{j} = z_{k}^{j - 1} + η_{k}^{(j)} d_{k}^{j} (z_{k}^{j - 1}) . & (21) \end{matrix}$

Note, d_k^j=d_k^j(z)=d_k^j(z;u_k−1,y) as d_k^jis parameterized by u_k−1and y. Let J be the maximum number of steepest descent steps; then, for notation, z_k^j=z_k=z_k+1⁰, as described above.

As described above, Eqn. (11) has the Tikhonov solution in Eqn. (12). Next, the initial estimate of z can be defined as z₀= custom-character _a,b(A^Ty) where the modified ReLU function is defined by (where ReLU(x)=max{0,x})

$𝒫_{a, b} (x) = a + Re LU (x - a) - Re LU (x - b)$

and is applied elementwise to A^Ty. custom-character _a,bis a projection operator onto the interval [min{a, b}, max{a, b}]. This eliminates negative values, as z should have positive components, and limits the maximum values in the initial z estimate. The initial u estimate, denoted as u₀, is given by Eqn. (12).

Finally, the gradient, ∇_zF(u,z), and a user-chosen parameter δ>0 determine convergence of CG-LS. On each steepest descent step j of iteration k, a check is made if ∥∇_zF(u_k−1,z_k^j−1)∥_*(k,j)<δ. When this holds, the steepest descent steps are exited, taking z_k=z_k^j−1. Once ∥∇_zF(u_k−1,z_k⁰)∥_*(k,1)<δ CG-LS has converged and the estimates u_k−1and z_k−1are returned. Otherwise, CG-LS may terminate after a user-chosen maximum number of iterations K, or some other user-defined stopping condition.

Generalized CG-LS Algorithm

In some implementations, the CG-LS algorithms described above may be generalized. The generalized CG-LS algorithm implements a regularized least squares optimization with regularization including a Gaussian term and an implicitly defined term, which together encorce a compound Gaussian prior. For example, image coefficients can be decomposed according to the CG prior, c=z⊙u, and a MAP estimate of the scale variable, z, and Gaussian vector, u, can be considered from Eqn. (2). That is, the following cost function can be defined:

$\begin{matrix} F (u, z) = \frac{1}{2} { y - A (z ⊙ u) }_{2}^{2} + \frac{1}{2} u^{T} P_{u}^{- 1} u + ℛ (z) & (22) \end{matrix}$

and the following estimate, which is equivalent to the MAP estimate, may be considered:

$\begin{matrix} [\begin{matrix} u^{*} & z^{*} \end{matrix}] = F (u, z) & (23) \end{matrix}$

where custom-character ⊆[0, ∞)ⁿis the convex domain of . Note P_u∝Σ_u, for Σ_uthe covariance of u, and (z)∝−log(p_z(z)) where p_zis the prior density for the scale variable. The G-CG-LS algorithm, given in Algorithm 3 below, is an iterative algorithm that approximately solves Eqn. (23) through block coordinate descent.

Algorithm 3 G-CG-LS

1:
z₀= custom-character

_[0,b](A^Ty) and u₀= custom-character

(z₀)

2:
for k ∈ {1, 2, . . . , K} do

3:
z_k⁽⁰⁾= z_k−1

4:
for j ∈ {1, 2, . . . , J} do

5:
z_k^(j)= g(z_k^(j−1), u_k−1)

6:
end for

7:
z_k= z_k^(J)

8:
u_k= custom-character

(z_k)

9:
end for

Output: c* = z_K⊙ u_K

For vector v, A_v=Adiag(v) can be defined where diag(v) is the diagonal matrix formed by placing the entries of v on the diagonal. Now, the following can be defined:

$\begin{matrix} 𝒯 (z) = 𝒯 (z; P_{u}) = {(A_{z}^{T} A_{z} + P_{u}^{- 1})}^{- 1} A_{z}^{T} y & (24) \end{matrix}$

and letting g(z,u) be a scale variable update method such that:

$\begin{matrix} 𝒵 (z, u) = \underset{Jtimes}{\underset{︸}{g \circ \dots g \circ g (z, u)}} \approx F (u, z) . & (25) \end{matrix}$

At least two possibilities for g are described below. Note that custom-character (z)=argmin_uF(u,z) is a Tikhonov solution. Then, on iteration k of G-CG-LS the following estimates can be determined:

$\begin{matrix} z_{k} = 𝒵 (z_{k - 1}, u_{k - 1}) and u_{k} = 𝒯 (z_{k}; P_{u}) & (26) \end{matrix}$

Note that, in practice, the inverse in Eqn. (24) may not be calculated and may instead be solved as a system of linear equations. Furthermore, the computational time to calculate Eqn. (24) can be reduced by using the following Woodburry matrix identity:

$\begin{matrix} 𝒯 (z; P_{u}) = P_{u} {A_{z}^{T} (I + A_{z} P_{u} A_{z}^{T})}^{- 1} y . & (27) \end{matrix}$

Additionally, for a convex set custom-character let

$𝒫_{𝒞} (x) = \underset{v \in 𝒞}{\arg \min} \frac{1}{2} { x - v }_{2}^{2}$

be the unique projection onto custom-character . Initial estimates can be defined as z₀=_[a,b]_n(A^Ty) and u₀=(z₀). Let ReLU(x)=max{x,0} be the rectified linear unit activation function. It is noted that:

$\begin{matrix} 𝒫_{{[a, b]}^{n}} (x) \equiv {[𝒫_{[a, b]} (x_{i})]}_{i = 1}^{n} = {[a + ReLU (x_{i} - a) - ReLU (x_{i} - b)]}_{i = 1}^{n} . & (28) \end{matrix}$

Applying custom-character _[a,b]_nn eliminates negative values, as z₀should have positive components, and limits the maximum values in the initial estimate for numerical stability in calculating u₀.

Scale Variable Update Methods

As mentioned above, at least two example implementations for updating the scale variable in the G-CG-LS algorithm are described. Each of these example methods are described as using a step size η>0.

In a first example, a projected gradient descent (“PGD”) method can be used. In the PGD approach, the following updates are performed:

$\begin{matrix} g (z, u) & = & 𝒫_{3} (\nabla_{z} F (u, z)) & (29) \\ = & 𝒫_{3} (z - η (A_{u}^{T} (A_{u} z - y) + \nabla ℛ (z))) . & (30) \end{matrix}$

In a second example, an iterative shrinkage and thresholding (“ISTA”) method can be used. In the ISTA approach, the following updates are performed:

$\begin{matrix} g (z, u) & = & {prox}_{ηℛ} (z - η A_{u}^{T} (A_{u} z - y)) & (31) \\ = & \underset{ζ \in 3}{\arg \min} \frac{1}{2} { ζ - (z - η A_{u}^{T} (A_{u} z - y)) }_{2}^{2} + ηℛ (ζ) & (32) \end{matrix}$

where prox_ƒis the proximal operator of a function ƒ and is well-defined for convex ƒ. For a non-smooth, convex function, ƒ, the proximal operator is an optimization tool as fixed points of prox_ƒ minimize ƒ. The general ISTA method, which is equivalent to a proximal gradient descent method, is an optimization method for the sum of a convex, differentiable function and a convex, non-smooth function. From the use of the proximal operator on the non-smooth piece, fixed points of ISTA are optimality points of the original sum of functions.

CG-Net Structure

As described above, a deep neural network, CG-Net, may be constructed by applying algorithm unrolling to the CG-LS algorithm, such as the algorithm summarized in Algorithm 1. As a non-limiting example, the CG-Net model may have a structure as shown in FIG. 2. FIG. 2 is an example block structure for a CG-Net image reconstruction network, which is the unrolled deep neural network of the compound Gaussian least squares algorithm in accordance with an embodiment of the present disclosure. CG-Net includes an input layer, L₀, initialization layer, Z₀, output layer, O, and K CG-Net blocks. Each CG-Net block, k, contains J Newton descent z update layers, Z_k¹, . . . , Z_k^J, and a single u update layer, U_k. Every layer takes the measurements y as an input and is omitted for clarity. In the illustrated example, there are two main operations between layers: ƒ_u: custom-character ⁿ×^m→ⁿand ƒ_z:ⁿ×ⁿ×^m→ⁿ, which may be defined by:

$\begin{matrix} f_{u} (z, y) = {(A_{z}^{T} A_{z} + λ I)}^{- 1} A_{z}^{T} y & (33) \end{matrix}$

$\begin{matrix} f_{z} (z, u, y) = T_{ϵ} (z - {η (H (z; u))}^{- 1} v (z; u, y)) . & (34) \end{matrix}$

Thus, ƒ_z(z_k^j, u_k−1, y) corresponds to updating z on descent step j+1 of iteration k in CG-LS and ƒ_u(z_k−1, y) corresponds to updating u on iteration k of CG-LS. In some other examples, when developing the CG-Net model from Algorithm 2, the operation for updating z using a steepest descent may include g_k^j(z,u,y)≡g_k^j(z, u, y; a_k^j, b_k^j, η_k^(j), d_k^j), which is parameterized by a descent vector, d_k^j, step size, η_k^(j)and modified ReLU parameters a_k^jand b_k^j, corresponding to updating z as:

$\begin{matrix} g_{k}^{j} (z, u, y) = 𝒫_{a_{k}^{j}, b_{k}^{j}} (z + η_{k}^{(j)} d_{k}^{j} (z; u, y)) . & (35) \end{matrix}$

In CG-LS, the step size, η_k^(j), may be found by a backtracking line search. In CG-Net, the application of the modified ReLU activation function, custom-character _a,b, at each steepest descent step serves to guarantee the next step is not too large and stays within the domain of interest for z.

The network blocks in FIG. 2 include the following. The input block, L₀, includes the input layer, which receives the measurement y. The initialization block, Z₀, includes a single layer that gives the initial estimate of z from line 1 of Algorithm 1 or Algorithm 2. That is, Z₀=T_∈(A^Ty) or = custom-character _a₀_,b₀(A^Ty). The kth U block, U_k, contains a single layer giving the update of u on iteration k from line 12 in Algorithm 1 or line 15 in Algorithm 2, for example. Block U₀corresponds to the initial estimate of u from line 1 in Algorithm 1 or Algorithm 2. Thus, U₀=ƒ_u(Z₀, y) and for all k=1, 2, . . . , K, U_k=ƒ_u(Z_k−1^J, y) for example. The kth Z block, Z_k, includes J layers, denoted Z_k^j, and corresponds to lines 4-10 of Algorithm 1 or lines 4-13 in Algorithm 2, for example. That is, Z_k^j+1=ƒ_z(Z_k^j, U_k−1, y) gives z on descent step j of iteration k of Algorithm 1 (or Z_k^j=g_k^j(Z_k^j−1, U_k−1, y) for Algorithm 2), for j=0, 1, . . . , J−1 and k=1, 2, . . . , K. Note, to simplify notation, Z_k⁰=Z_k−1^Jand Z₀=Z₁⁰. The output block, O, includes a single output layer that gives the estimated wavelet coefficients (or other sparse domain coefficients) c*=Z_K^J⊙U_K.

CG-Net contains K+1 estimation layers for u, KJ descent step layers for z, one input, one output, and one initialization layer. Thus, CG-Net contains K(J+1)+4 layers in total.

FIGS. 3A-3C show another example of a CG-Net image reconstruction network in accordance with an embodiment of the present disclosure. FIG. 3A shows an example end-to-end network structure for CG-Net. FIG. 3B shows an example mathematical description of each layer in the CG-Net. CG-Net includes an input layer, L₀, initialization layer, Z₀, output layer, O, and K CG-Net blocks. Each CG-Net block, k, contains J steepest descent layers, Z_k¹, . . . , Z_k^J, and a single Tikhonov layer, U_k, as illustrated in FIG. 3C. Every layer takes the measurements, L₀≡y, as an input so these connections are omitted for clarity.

FIG. 3A illustrates an example end-to-end network structure for CG-Net, which as described above is the unrolled deep neural network of Algorithm 2. A mathematical description of each layer is given in FIG. 3B. As described above, CG-Net includes of an input layer, L₀, initialization layer, Z₀, output layer, O, and K CG-Net blocks. As shown in FIG. 3C, each CG-Net block, k, contains J descent layers, Z_k¹, . . . , Z_k^J, and a single Tikhonov layer, U_k. Every layer takes the measurements, L₀≡y, as an input so these connections are omitted for clarity.

Network Parameters and Loss Function

In CG-Net the function ƒ_u=ƒ_u(z, y; λ_k), connecting layers Z_k^Jand U_kfor every k=1, 2, . . . . K, is parameterized by a positive regularization scalar, λ_k. In CG-LS a fixed constant λ_k=λ is taken for all k, where λ is given in Eqn. (11), but for training the CG-Net different constants, learned by the DNN, may be allowed in each update of u (e.g., at each layer updating u). In some examples, the initialization layer, Z₀, may also be parameterized by two positive real numbers a₀>z_minand b₀>z_min, which may be applied through the modified ReLU function, custom-character _a₀_,b₀.

Next, parameters of the function ƒ_zmay depend on the choice of nonlinearity h for the CG prior. In the example described above, h(x)=√{square root over (exp(x/α))} was chosen, such that h⁻¹(z)=2α ln(z) and therefore ƒ_z(z,u,y) in Eqn. (34) is given by:

$T_{ϵ} (z - {η (A_{u}^{T} A_{u} + κ D {w_{1} (z)})}^{- 1} (A_{u}^{T} (A_{u} z - y + κ w_{2} (z)))$

where K=4μα²and w₁:(0, ∞)→ custom-character and w₂:(0, ∞)→ are component-wise functions defined by:

$w_{1} (x) = \frac{1 - \ln (x)}{x^{2}} and w_{2} (x) = \frac{\ln (x)}{x},$

respectively. Hence, ƒ_z=ƒ_z(z, u, y; η_k^j, κ_k^j), which connects two layers U_kand Z_k^jto layer Z_k^j+1for every k=1, 2, . . . , K and every j=0, 1, . . . , J−1, is parameterized by η_k^j, a descent step size, and κ_k^j, a scaled product of a regularization scalar, μ, and CG prior constant, α. Again to simplify notation the following are denoted Z_k⁰=Z_k−1^Jand Z₀=Z₁⁰.

In CG-LS a fixed constant κ_k^j=4μα is taken for all k and j, but for training the CG-Net different constants, learned by the DNN, are allowed for each update step of z. To increase trainability of the network, the descent step size η_k^j, which may be chosen by a backtracking line search in CG-LS, is allowed to be a matrix in custom-character ^n×ninstead of a single constant. Choosing η_k^jto be a matrix provides CG-Net with flexibility to update a coordinate of z as a function of all coordinates of the search direction (H(z;u))⁻¹v(z;u, y) instead of a function of a single coordinate. Such flexibility may increase the rate that the descent steps are able to traverse through the optimization landscape of problem Eqn. (10). Nevertheless, if a single constant step size is optimal then CG-Net can simply learn η_k^jas a scaled identity matrix. Alternatively, some η_k^jmay be considered to be constrained to a specific structure, such as a diagonal matrix. Therefore, the CG-Net parameters Θ may be:

$Θ = {[λ_{k}, κ_{k}^{j}, η_{k}^{j}]}_{k \in {1, 2, \dots, K}}^{j \in {0, 1, \dots, J - 1}} .$

At a maximum, when every η_k^jis a full matrix, CG-Net contains K(J(n²+1)+1) parameters although this will be less when certain η_k^jhave specific structure constraints.

To learn the CG-Net parameters, Θ, a loss function involving the structural similarity index measure (“SSIM”) function may be used, such as:

$\begin{matrix} ℒ_{ℬ} (Θ) = \frac{1}{❘ ℬ ❘} \sum_{(y_{i}, c_{i}) \in ℬ} (1 - SSIM (Φ c (y_{i}; Θ), Φ c_{i})) & (36) \end{matrix}$

for custom-character ⊂ a batch of training data points. This cost function may be optimized through adaptive moment estimation (“ADAM”), which is a stochastic gradient-based optimizer. Other common optimization methods include: stochastic gradient descent, RMSprop, and Adadelta. The gradient used in ADAM is calculated via backpropagation through the network. Backpropagation calculations are easily and quickly implemented with automatic differentiation, which is readily available in most machine learning libraries, such as Tensorflow.

In some examples, for each Z_k^jlayer, defined by (35), the steepest descent vector d_k^j(z;u)=−B_k^j∇_zF(z;u) may be implemented for a positive definite matrix B_k^jthat will be learned in CG-Net. In this example, the descent vector is the steepest descent based upon the quadratic norm ∥·∥_(B_k_j₎₋₁². Thus, CG-Net can be understood as learning the quadratic norm defining every steepest descent step that optimally traverses the landscape of the cost function in (10). For matrix L, let Q and Λ be the eigendecomposition of (L+L^T)/2. That is, (L+L^T)/2=QΛQ^T. For small ∈>0, the diagonal matrix Λ_∈ may be defined as [Λ_∈]_ii=max{Λ_ii,∈} and let,

$\begin{matrix} P_{ϵ} (L) = Q Λ_{ϵ} Q^{T} . & (17) (37) \end{matrix}$

That is, P₅₃₁(L) can be viewed as the closest symmetric, real-valued matrix with minimum eigenvalue of ∈ to (L+L^T)/2 as measured by the Frobenius norm. B_k^jcan be enforced to be positive definite by learning a lower triangular matrix L_k^jand setting B_k^j=P_∈(L_k^j). Therefore, CG-Net layer Z_k^jmay be parameterized by a lower triangular matrix L_k^jdefining the steepest descent vector

$\begin{matrix} d_{k}^{j} (z; u) = - P_{ϵ} (L_{k}^{j}) \nabla_{z} F (u, z) . & (38) \end{matrix}$

Additionally, as ∇_zF(u,z), given by:

$\begin{matrix} \nabla_{z} F (u, z) = - 2 A_{u}^{T} (y - A_{u} z) + 2 μ f^{'} (z) ⊙ f (z), & (39) \end{matrix}$

depends on the regularization scalar μ, layer Z_k^jis parameterized by regularization scalar μ_k^j. Furthermore, layer Z_k^jmay be parameterized by the step size η_k^(j), which can be taken to be a diagonal matrix. That is, instead of learning a single constant to scale the steepest descent vector d_k^ja different constant can be learned to scale each component of d_k^jseparately. Finally, layer Z_k^jlearns positive real numbers a_k^j>z_minand b_k^j>z_min, which may be applied through the modified ReLU activation function,

$𝒫_{a_{k}^{j}, b_{k}^{j}},$

in (35).

Fixing a small real-valued ∈>0, to ensure λ_k>0 and a₀, b₀, a_k^j, b_k^j>z_minin implementation, max{λ_k,∈} may be used in place of λ_k, max{a₀,z_min+∈} may be used in place of a₀, and similarly for b₀, a_k^j, and b_k^j.

In total, in this example implementation, CG-Net has

$K (\frac{J}{2} (n (n + 3) + 6)) + 3 parameters :$

$\begin{matrix} Θ = {λ_{0}, a_{0}, b_{0}, λ_{k}, μ_{k}^{j}, L_{k}^{j}, η_{k}^{(j)}, a_{k}^{j}, b_{k}^{j}}_{k = 1, 2, \dots, K}^{j = 1, 2, \dots, J} & (40) \end{matrix}$

where n is the image size. The CG-Net parameters are trained by minimizing a loss function involving the SSIM image quality metric, such as using Eqn. (36) above for custom-character ⊂ a batch of data points. This cost function may be optimized as described above. The gradient ∇_Θ for ADAM may be calculated via backpropagation through the network.

Deep Regularized CG-Net

FIGS. 4A-4E show an example end-to-end network structure of a DR-CG-Net image reconstruction network in accordance with an embodiment of the present disclosure. FIG. 4A shows an unrolled deep neural network of Algorithm 3 described in the present disclosure. The DR-CG-Net includes an input block, L₀, initialization block, Z₀, K+1 Tikhonov blocks, U_k, output block, O, and K(j) complete scale variable mappings, Z_k, with structure shown in FIGS. 4B and 4C. Each Z_kincludes J updates g further detailed in FIG. 4D, for PGD, and in FIG. 4E for ISTA. Each g^(k)includes a data fidelity gradient descent step, r^(j)added to a convolutional neural network and corresponds to an intermediate update of the z variable.

In some embodiments, a deep regularized implementation of the CG-Net (“DR-CG-Net”) described above may be implemented. In these instances, a deep neural network with end-to-end structure, such as the one shown in FIG. 4A, may be implemented. The deep neural network may be constructed by applying algorithm unrolling to the generalized CG-LS algorithm described above, or alternatively to the CG-LS algorithm described above. Instead of requiring a specified custom-character , DR-CG-Net learns through a subnetwork representing either ∇ or .

Let custom-character _k^(j):ⁿ→ⁿ, for k=1, 2, . . . , K and j=1, 2, . . . , J, be a subnetwork. That is, each _k^(j)is a collection of layers mapping from ⁿto ⁿ. For example, convolutional layers may be used, making each _k^(j)subnetwork a CNN. Now, the intermediate scale variable mapping, g_k^(j), can be defined as:

$\begin{matrix} g_{k}^{(j)} (z, u) = {\begin{matrix} 𝒫_{[0, \infty)} (r_{k}^{(j)} (u, z) + 𝒱_{k}^{(j)} (z)) & PGD \\ 𝒫_{[0, \infty)} (𝒱_{k}^{(j)} (r_{k}^{(j)} (u, z))) & ISTA . \end{matrix} & (41) \end{matrix}$

where r_k^(j)is the mapping of the data fidelity gradient descent step of F(u,z) in z,

$\begin{matrix} r_{k}^{(j)} (u, z) = z - η_{k}^{(j)} (A_{u}^{T} (A_{u} z - y)) & (42) \end{matrix}$

for a step size η_k^(j). Note that g_k^(j)corresponds to the update methods described above, where custom-character _k^(j)) replaces ∇ and in PGD or ISTA, respectively. In some implementations, _[0,∞)≡ReLU may be applied to ensure all intermediate z estimates in DR-CG-Net maintain positive entries. Finally, recall the Tikhonov update defined in Eqn. (24), and let (z)=(z; P_k) for a covariance matrix, P_k.

Each layer k of DR-CG-Net, shown in FIG. 4B, is broken down into a complete scale variable mapping, custom-character _kshown in FIG. 4C, and a Tikhonov update, U_k, so that layer k corresponds to iteration k of Algorithm 3. As in Algorithm 3, _kincludes a composition of the J scale variable updates g_k⁽¹⁾, . . . , g_k^(J)shown in FIG. 4D and FIG. 4E for PGD or ISTA, respectively.

Mathematically, the DR-CG-Net blocks may be detailed as:

- L₀=y is the input measurements to the network;
- ₀=_[0,b](A^Ty) is the initial estimate of z from line 1 of Algorithm 3;
- U_k=(g_k^(J)) is the u Tikhonov estimate corresponding to line 1 and 8 of Algorithm 3;

The kth complete scale variable mapping custom-character _kcontains:

g_k^(j)=g_k^(j)(g_k^{(j−1), U}_k−1) is the intermediate scale variable mapping analogous to z_k^jon line 5 in Algorithm 3;

r_k^(j)=r_k^(j)(U_k−1, g_k^{(j−1)) is the data fidelity gradient step; and}

O=U_K⊙ custom-character _Kis the estimated signal coefficients produced by DR-CG-Net.

Note, to simplify notation, let g_k⁽⁰⁾=g_k−1^(J). Assume each subnetwork, custom-character _k^(j), uses D layers, then DR-CG-Net contains K+1 estimation layers for u, KJ(D+1) layers for updating z, one input, one output, and one initialization layer. Thus, DR-CG-Net is a DNN with K(J(D+1)+1)+4 layers.

Network Parameters and Subnetwork

For every k=0, 1, . . . , K, the layer U_kis parameterized by a covariance matrix, P_k. To reduce the number of parameters learned by the network and for consistency in P_krepresenting the covariance matrix of u, DR-CG-Net learns a single covariance matrix P and constrains P₁= . . . =P_K=P. Furthermore, the possibility of a structured covariance matrix can be considered, where P is either a scaled identity, diagonal, tridiagonal, or full matrix. Imposing a covariance structure may be desirable or advantageous, for example to ensure only local reinforcement.

To ensure P is a covariance matrix, i.e. symmetric and positive definite, for ∈>0 a small fixed real number, one of the following structures can be imposed:

$\begin{matrix} P = {\begin{matrix} \max {λ, ϵ} I & Scaled Identity \\ diag ({[\max {λ_{i}, ϵ}]}_{i = 1}^{n}) & Diagonal \\ L_{tri} L_{tri}^{T} + ϵ I & Tridiagonal \\ {LL}^{T} + ϵ I & Full . \end{matrix} & (43) \end{matrix}$

In the scaled identity case, only a constant λ is learned. In the diagonal case, a vector λ=[λ_i]_i=1ⁿis learned. In the tridiagonal case, two vectors λ₁∈ custom-character ⁿand λ₂∈ⁿ⁻¹are learned and L_triis the lower triangular matrix formed by placing λ₁on the diagonal and λ₂on the first subdiagonal. Finally, in the full case, a lower triangular matrix, L, is learned.

Next, each custom-character _kis parameterized by {η_k^{(1), . . . , η}_k^(J)}, a collection of step sizes, as η_k^(j)parameterizes the data fidelity gradient step, r_k^(j), and the parameters of the the subnetworks _k⁽¹⁾, . . . , _k^(J). For fixed constant γ_max>0 and fixed z and u the following is taken:

$\begin{matrix} η_{k}^{(j)} = δ_{k}^{(j)} {\begin{matrix} 1 & { A_{u}^{T} (A_{u} z - y) }_{2} \leq γ_{\max} \\ \frac{γ_{\max}}{{ A_{u}^{T} (A_{u} z - y) }_{2}} & else \end{matrix} & (44) \end{matrix}$

And let the network learn δ_k^(j), a real value parameter. This corresponds to performing a normalized step in the data fidelity gradient step. Normalizing the step size provides numerical stability in preventing the data fidelity gradient step from being too large and counterproductive.

In some implementations, custom-character _k^(j)can be a CNN of depth D using ReLU activation functions. That is, layer d includines ƒ_dconvolutions, i.e. filter channels, using kernel size k_d×k_dwith unit stride and a bias matrix added to each filter channel. Note, zero padding may be applied to each filter channel of the input such that that the output from each filter channel is the same size as the input. Furthermore, ƒ_D=1 can be taken, such that for X∈ custom-character ^n×n, _k^(j(X)∈^n×n. Next, for x∈ⁿ², mat(x) may be defined to be x reshaped into a n×n matrix. Similarly, for X∈^n×n, vec(X) may be defined to be X reshaped into a vector of size n²such that vec(mat(x))=x. Then

$\begin{matrix} 𝒱_{k}^{(j)} (x) = {\begin{matrix} vec (𝒲_{k}^{(j)} (mat (x))) & PGD \\ x + vec (𝒲_{k}^{(j)} (mat (x))) & ISTA . \end{matrix} & (45) \end{matrix}$

For W_k,d^(j)and B_k,d^(j)being the convolutional kernels and biases of custom-character _k^(j), respectively, the DR-CG-Net parameters are

$\begin{matrix} Θ = {P} ⋃ {δ_{k}^{(j)}, W_{k, d}^{(j)}, B_{k, d}^{(j)}} \begin{matrix} j = 1, 2, \dots, J \\ k = 1, 2, \dots, K \\ d = 1, 2, \dots, D \end{matrix} . & (46) \end{matrix}$

Loss Function

The DR-CG-Net parameters can be trained by minimizing the mean absolute error loss function. Namely, for custom-character ⊂_s, a batch of data points the mean absolute error loss function is

$\begin{matrix} ℒ_{ℬ} (Θ) = \frac{1}{❘ ℬ ❘} \frac{1}{n} \sum ({\overline{y}}_{i}, {\overline{c}}_{i}) \in ℬ { \hat{c} ({\overline{y}}_{i}; Θ) - {\overline{c}}_{i} }_{1} . & (47) \end{matrix}$

This cost function can be optimized through adaptive moment estimation (Adam), which is a stochastic, gradient-based optimizer. Other common optimization methods include stochastic gradient descent, RMSprop, and Adadelta. The gradient, ∇_Θ custom-character , for Adam is calculated via backpropagation through the network, which can be implemented with automatic differentiation using Tensorflow.

Thus, using the powerful CG class of densities to represent image coefficients, a deep neural network named DR-CG-Net for solving linear inverse problems is provided. This deep neural network is produced by applying algorithm unrolling to the generalized CG-LS algorithm described above, which enforces the CG prior. The generalized CG-LS algorithm allows for problem-specific choices of the scale variable distribution, which subsequently is learned through the unrolled deep neural network. Hence, DR-CG-Net has the flexibility to learn the prior while still being constrained to the class of compound Gaussian distributions.

Neural Network Implementation

FIG. 5 is a flowchart setting forth the steps of an example method for reconstructing an image from measurement data using a CG-Net image reconstruction network in accordance with an embodiment of the present disclosure. Referring now to FIG. 5, a flowchart is illustrated as setting forth the steps of an example method for reconstructing an image from measurement data using a suitably trained neural network or other machine learning algorithm, such as the CG-Net neural networks described in the present disclosure. As described above, the CG-Net takes measurement data as input data and generates a reconstructed image as output data.

Accessing Data

The method includes accessing measurement data with a computer system, as indicated at step 502. Accessing the measurement data may include retrieving such data from a memory or other suitable data storage device or medium. Additionally or alternatively, accessing the measurement data may include acquiring such data with an imaging system and transferring or otherwise communicating the data to the computer system, which may be a part of the imaging system.

In one non-limiting example, the measurement data may include medical imaging data acquired with a medical imaging system. For instance, the medical imaging system may include an x-ray imaging system (e.g., a computed tomography (“CT”) system), a magnetic resonance imaging (“MRI”) system, a positron emission tomography (“PET”) system, an ultrasound imaging system, or the like. In these instances, the measurement data may include the respective measurement data acquired by the medical imaging system. As a non-limiting example, the measurement data may include x-ray attenuation and/or photon count measurements acquired using a CT system. As another example, the measurement data may include k-space data acquired with an MRI system.

In other examples, the measurement data may include other measurement data acquired using other imaging or sensing modalities, include microwave, acoustic, optical, or otherwise.

Accessing Trained Neural Network

A trained neural network (or other suitable machine learning algorithm) is then accessed with the computer system, as indicated at step 504. Accessing the trained neural network may include accessing network parameters (e.g., weights, biases, or both) that have been optimized or otherwise estimated by training the neural network on training data. In some instances, retrieving the neural network can also include retrieving, constructing, or otherwise accessing the particular neural network architecture to be implemented. For instance, data pertaining to the layers in the neural network architecture (e.g., number of layers, type of layers, ordering of layers, connections between layers, hyperparameters for layers) may be retrieved, selected, constructed, or otherwise accessed.

In general, the neural network is trained, or has been trained, on training data in order to reconstruct a signal and/or image from measurement data.

Applying Data to the Neural Network

The measurement data are then input to the one or more trained neural networks, generating one or more reconstructed images as an output, as indicated at step 506. In some implementations, as described above, sparse coefficients, c, are output from the output layer of the CG-Net reconstruction network. In these instances, the reconstructed image may be recovered from the sparse coefficients, for example, by using the appropriate sparse domain transformation. For example, when the sparse domain is the wavelet domain, the CG-Net reconstruction network may output wavelet coefficients. By applying a wavelet transformation to these wavelet coefficients, the reconstructed image can be recovered.

Display and/or Store Output

The image(s) generated by inputting the measurement data to the trained neural network(s) can then be displayed to a user, stored for later use or further processing, or both, as indicated at step 508.

Neural Network Training

FIG. 6 is a flowchart setting forth the steps of an example method for training a CG-Net image reconstruction network on training data in accordance with an embodiment of the present disclosure. Referring now to FIG. 6, a flowchart is illustrated as setting forth the steps of an example method for training one or more neural networks (or other suitable machine learning algorithms) on training data, such that the one or more neural networks are trained to receive measurement data as input data in order to generate reconstructed images as output data. In general, the neural network(s) implement the CG-Net architectures described above.

Accessing Training Data

The method includes accessing training data with a computer system, as indicated at step 602. Accessing the training data may include retrieving such data from a memory or other suitable data storage device or medium. Alternatively, accessing the training data may include acquiring such data with an imaging system and transferring or otherwise communicating the data to the computer system.

In general, the training data can include measurement data acquired with an imaging system and example images reconstructed from the measurement data. The method can include assembling training data from measurement data and/or images reconstructed from measurement data using a computer system. This step may include assembling the training fata into an appropriate data structure on which the neural network or other machine learning algorithm can be trained.

Training the Neural Network(s)

One or more neural networks (or other suitable machine learning algorithms) are trained on the training data, as indicated at step 604. In general, the neural network can be trained by optimizing network parameters (e.g., weights, biases, or both) based on minimizing a loss function. As one non-limiting example, the loss function may be the SSIM-based loss function described above.

Training a neural network may include initializing the neural network, such as by computing, estimating, or otherwise selecting initial network parameters (e.g., weights, biases, or both). During training, an artificial neural network receives the inputs for a training example and generates an output using the bias for each node, and the connections between each node and the corresponding weights. For instance, training data can be input to the initialized neural network, generating output as an intermediate reconstructed image. The artificial neural network then compares the generated output with the actual output of the training example in order to evaluate the quality of the intermediate reconstructed image. For instance, the intermediate reconstructed image can be passed to a loss function to compute an error. The current neural network can then be updated based on the calculated error (e.g., using backpropagation methods based on the calculated error). For instance, the current neural network can be updated by updating the network parameters (e.g., weights, biases, or both) in order to minimize the loss according to the loss function. The training continues until a training condition is met. The training condition may correspond to, for example, a predetermined number of training examples being used, a minimum accuracy threshold being reached during training and validation, a predetermined number of validation iterations being completed, and the like. When the training condition has been met (e.g., by determining whether an error threshold or other stopping criterion has been satisfied), the current neural network and its associated network parameters represent the trained neural network. Different types of training processes can be used to adjust the bias values and the weights of the node connections based on the training examples. The training processes may include, for example, gradient descent, Newton's method, conjugate gradient, quasi-Newton, Levenberg-Marquardt, among others.

The artificial neural network can be constructed or otherwise trained using the techniques described above in more detail. In addition to learning the neural network parameters described above, in some implementations the CG-Net may be trained on training data to additionally or alternatively learn the inverse of the nonlinear function in the CG prior (i.e., ƒ(z), by approximating it with a sub-network embedded inside of CG-Net. Such an extension of CG-Net may expand its applicability by both no longer requiring a user-specified function ƒ, as well as providing CG-Net with a greater learning capacity.

Storing Neural Network(s) for Later Use

The one or more trained neural networks are then stored for later use, as indicated at step 606. Storing the neural network(s) may include storing network parameters (e.g., weights, biases, or both), which have been computed or otherwise estimated by training the neural network(s) on the training data. Storing the trained neural network(s) may also include storing the particular neural network architecture to be implemented. For instance, data pertaining to the layers in the neural network architecture (e.g., number of layers, type of layers, ordering of layers, connections between layers, hyperparameters for layers) may be stored.

Example Implementation

In an example study, the CG-LS algorithm in Algorithm 2 was tested using gradient descent, which may be denoted by gCG-LS, and Newton descent, which may be denoted by nCG-LS, as the steepest descent method for updating z. As an example, 32×32 images from the CIFAR10 image dataset and 64×64 images from the CalTech101 image dataset may be used. Each image was converted to a single-channel grayscale image, scaled down by the maximum pixel value, and vectorized. A Radon transform, at a specified number of angles, was performed on each image to which white noise is added producing noisy measurements, y, at a specified signal-to-noise ratio (“SNR”). Finally, a biorthogonal wavelet transformation was applied to each image to produce the sparsity coefficients, c. That is, Ψ is a matrix representation of a Radon transform and Φ is a biorthogonal wavelet transformation.

For all simulations in this example, the following were used: ƒ(z)=ln(z), μ=2, K=1000, J=1, and δ=10⁻⁶. For measurements at an SNR of 60 dB and 40 dB, λ=0.3 and λ=2 were taken, respectively. Using nCG-LS required the Hessian of the cost function in Eqn. (9) with respect to z to be positive definite, which for ƒ(z)=ln(z), is guaranteed when z∈(0, e)ⁿ. A local minimizer can be contained in [1, e)ⁿunder sufficient scaling of the input data. Therefore, in each nCG-LS test using 32×32 or 64×64 image measurements, the input measurement was scaled by a factor, chosen empirically, of e⁻⁴or e⁻⁶, respectively. Additionally, an eigendecomposition on the Hessian of the cost function in Eqn. (9) with respect to z was chosen to find the closest positive semi-definite matrix that was then used in the Newton descent step. Alternatively, the modified ReLU function custom-character _1,emay be applied at each z update to ensure the Hessian of the cost function in Eqn. (9) with respect to z is positive semi-definite. Finally, _a,b=_1,ewas chosen for nCG-LS whereas for gCG-LS _a,b=¬_1,e₂was used.

In a non-limiting example, given a sensing matrix, Ψ, sparsity transformation, Φ, and noise level in SNR, a set of training and testing measurement-coefficient pairs, (y,c), can be created as described above, that were then used to train and evaluate a CG-Net created by applying algorithm unrolling to Algorithm 2. For network size, CG-Net running on 32×32 or 64×64 image measurements use (K,J)=(20,1) or (K,J)=(5,1), respectively. The network sizes were chosen empirically such that the time to complete one image reconstruction was reasonably quick while still producing excellent reconstructions on a validation set of test images. Letting ƒ(z)=ln(z) and initializing a₀=1, b₀=exp(2), every

$η_{k}^{(j)} = \frac{1}{2} I,$

all μ_k^j=2, each

$a_{k}^{j} = \frac{8}{10},$

all b_k^j=exp(3), and every L_k^j=I. Note, for CG-Net running on 32×32 images the diagonal and sub-diagonal in L_k^jwere learned, constraining B_k^jto be a tridiagonal matrix. Finally, λ_k=0.3 was initialized for 60 dB SNR noise level and λ_k=2 was initialized all other noise levels. Each CG-Net was trained for 20 epochs, using a learning rate of 10⁻³, with early stopping implemented when the model overfit a validation dataset.

FIGS. 7A-7F show average SSIM values over 200 test image reconstructions using CG-Net, MADUN, ISTA-Net⁺, from Radon transforms, when varying the amount of data for training in accordance with an embodiment of the present disclosure. Shown in FIGS. 7A-7F is the average SSIM quality, over 200 test image reconstructions, for CG-Net, memory augmented deep unfolding network (MADUN), and ISTA-Net⁺ when each is trained on a varying amount of training data. MADUN and ISTA-Net⁺ are two DNNs formed by algorithm unrolling. Also compared was ReconNet, a standard CNN for image reconstructions, which was the lowest performing significantly and thus was omitted from FIGS. 7A-7F. Small sets of training data were considered, specifically of size 1000, 500, 100, and 20 pairs of measurements and coefficients. In FIGS. 7A, 7B, and 7C it can be seen that when reconstructing images from Radon transforms with a noise level of 60 dB SNR that CG-Net outperforms all comparative methods and does so significantly in low training. Similarly, in the low training and sparse sensing regimes of FIGS. 7D, 7E, and 7F, where the SNR has decreased, it can still be observed that CG-Net outperforms all comparative methods. It is contemplated that this is due to the initialization of CG-Net corresponding precisely to CG-LS and thereby incorporating the powerful statistical CG prior. Also, unlike other algorithm unrolling methods, the CG-Net does not replace any part of the optimization with a CNN, or another neural network, that is learned completely from scratch, which can often overfit quickly in low training.

FIGS. 8A-8L show example image reconstructions (and SSIM values) using CG-Net, MADUN, ISTA-Net⁺, and ReconNet for 32×32 and 64×64 Barbara images in accordance with an embodiment of the present disclosure. Here, the sensing matrix, Y′, is a Radon transform at 15 uniformly spaced angles and the noise level is set to 60 dB SNR. For a visual comparison of the four methods, FIG. 8 shows reconstructions, plus SSIM values, of a Barbara snippet from a Radon transform at 15 uniformly spaced angles with a noise level of 60 dB SNR. All reconstructions are performed after training on a dataset of 20 samples. In all figures, it can be seen that CG-Net produced higher quality images visually and by SSIM than the three comparison methods.

To further highlight the applicability of CG-Net, alternative sensing matrices, Ψ, and sparsity transformations, Φ may be considered other than those mentioned above. For example, in general, Ψ∈ custom-character ^m×ncan be considered as a Gaussian matrix where m/n is the sampling ratio. A discrete cosine transformation (“DCT”) dictionary may also be considered as an alternative sparsity transformation.

Computer Systems

FIG. 9 is a block diagram of an example CG-Net image reconstruction system in accordance with an embodiment of the present disclosure. FIG. 9 shows an example system 900 for reconstructing images using a CG-Net reconstruction network in accordance with some examples described in the present disclosure. As shown in FIG. 9, a computing device 950 can receive one or more types of data (e.g., measurement data) from data source 902. In some embodiments, computing device 950 can execute at least a portion of a CG-Net image reconstruction system 904 to reconstruct images from data received from the data source 902.

Additionally or alternatively, in some embodiments, the computing device 950 can communicate information about data received from the data source 902 to a server 952 over a communication network 954, which can execute at least a portion of the CG-Net image reconstruction system 904. In such embodiments, the server 952 can return information to the computing device 950 (and/or any other suitable computing device) indicative of an output of the CG-Net image reconstruction system 904.

In some embodiments, computing device 950 and/or server 952 can be any suitable computing device or combination of devices, such as a desktop computer, a laptop computer, a smartphone, a tablet computer, a wearable computer, a server computer, a virtual machine being executed by a physical computing device, and so on. The computing device 950 and/or server 952 can also reconstruct images from the data.

In some embodiments, data source 902 can be any suitable source of data (e.g., measurement data, images reconstructed from measurement data, processed image data), such as an imaging system, another computing device (e.g., a server storing measurement data, images reconstructed from measurement data, processed image data), and so on. In some embodiments, data source 902 can be local to computing device 950. For example, data source 902 can be incorporated with computing device 950 (e.g., computing device 950 can be configured as part of a device for measuring, recording, estimating, acquiring, or otherwise collecting or storing data). As another example, data source 902 can be connected to computing device 950 by a cable, a direct wireless link, and so on. Additionally or alternatively, in some embodiments, data source 902 can be located locally and/or remotely from computing device 950, and can communicate data to computing device 950 (and/or server 952) via a communication network (e.g., communication network 954).

In some embodiments, communication network 954 can be any suitable communication network or combination of communication networks. For example, communication network 954 can include a Wi-Fi network (which can include one or more wireless routers, one or more switches, etc.), a peer-to-peer network (e.g., a Bluetooth network), a cellular network (e.g., a 3G network, a 4G network, etc., complying with any suitable standard, such as CDMA, GSM, LTE, LTE Advanced, WiMAX, etc.), other types of wireless network, a wired network, and so on. In some embodiments, communication network 954 can be a local area network, a wide area network, a public network (e.g., the Internet), a private or semi-private network (e.g., a corporate or university intranet), any other suitable type of network, or any suitable combination of networks. Communications links shown in FIG. 9 can each be any suitable communications link or combination of communications links, such as wired links, fiber optic links, Wi-Fi links, Bluetooth links, cellular links, and so on.

FIG. 10 is a block diagram of example components that can implement the system of FIG. 9 in accordance with an embodiment of the present disclosure. Referring now to FIG. 10, an example of hardware 1000 that can be used to implement data source 902, computing device 950, and server 952 in accordance with some embodiments of the systems and methods described in the present disclosure is shown.

As shown in FIG. 10, in some embodiments, computing device 950 can include a processor 1002, a display 1004, one or more inputs 1006, one or more communication systems 1008, and/or memory 1010. In some embodiments, processor 1002 can be any suitable hardware processor or combination of processors, such as a central processing unit (“CPU”), a graphics processing unit (“GPU”), and so on. In some embodiments, display 1004 can include any suitable display devices, such as a liquid crystal display (“LCD”) screen, a light-emitting diode (“LED”) display, an organic LED (“OLED”) display, an electrophoretic display (e.g., an “e-ink” display), a computer monitor, a touchscreen, a television, and so on. In some embodiments, inputs 1006 can include any suitable input devices and/or sensors that can be used to receive user input, such as a keyboard, a mouse, a touchscreen, a microphone, and so on.

In some embodiments, communications systems 1008 can include any suitable hardware, firmware, and/or software for communicating information over communication network 954 and/or any other suitable communication networks. For example, communications systems 1008 can include one or more transceivers, one or more communication chips and/or chip sets, and so on. In a more particular example, communications systems 1008 can include hardware, firmware, and/or software that can be used to establish a Wi-Fi connection, a Bluetooth connection, a cellular connection, an Ethernet connection, and so on.

In some embodiments, memory 1010 can include any suitable storage device or devices that can be used to store instructions, values, data, or the like, that can be used, for example, by processor 1002 to present content using display 1004, to communicate with server 952 via communications system(s) 1008, and so on. Memory 1010 can include any suitable volatile memory, non-volatile memory, storage, or any suitable combination thereof. For example, memory 1010 can include random-access memory (“RAM”), read-only memory (“ROM”), electrically programmable ROM (“EPROM”), electrically erasable ROM (“EEPROM”), other forms of volatile memory, other forms of non-volatile memory, one or more forms of semi-volatile memory, one or more flash drives, one or more hard disks, one or more solid state drives, one or more optical drives, and so on. In some embodiments, memory 1010 can have encoded thereon, or otherwise stored therein, a computer program for controlling operation of computing device 950. In such embodiments, processor 1002 can execute at least a portion of the computer program to present content (e.g., images, user interfaces, graphics, tables), receive content from server 952, transmit information to server 952, and so on. For example, the processor 1002 and the memory 1010 can be configured to perform the methods described herein (e.g., Algorithm 1 described above, Algorithm 2 described above, the CG-Net reconstruction network shown in FIGS. 2 and/or 3A-3C, the method of FIG. 5, the method of FIG. 6).

In some embodiments, server 952 can include a processor 1012, a display 1014, one or more inputs 1016, one or more communications systems 1018, and/or memory 1020. In some embodiments, processor 1012 can be any suitable hardware processor or combination of processors, such as a CPU, a GPU, and so on. In some embodiments, display 1014 can include any suitable display devices, such as an LCD screen, LED display, OLED display, electrophoretic display, a computer monitor, a touchscreen, a television, and so on. In some embodiments, inputs 1016 can include any suitable input devices and/or sensors that can be used to receive user input, such as a keyboard, a mouse, a touchscreen, a microphone, and so on.

In some embodiments, communications systems 1018 can include any suitable hardware, firmware, and/or software for communicating information over communication network 954 and/or any other suitable communication networks. For example, communications systems 1018 can include one or more transceivers, one or more communication chips and/or chip sets, and so on. In a more particular example, communications systems 1018 can include hardware, firmware, and/or software that can be used to establish a Wi-Fi connection, a Bluetooth connection, a cellular connection, an Ethernet connection, and so on.

In some embodiments, memory 1020 can include any suitable storage device or devices that can be used to store instructions, values, data, or the like, that can be used, for example, by processor 1012 to present content using display 1014, to communicate with one or more computing devices 950, and so on. Memory 1020 can include any suitable volatile memory, non-volatile memory, storage, or any suitable combination thereof. For example, memory 1020 can include RAM, ROM, EPROM, EEPROM, other types of volatile memory, other types of non-volatile memory, one or more types of semi-volatile memory, one or more flash drives, one or more hard disks, one or more solid state drives, one or more optical drives, and so on. In some embodiments, memory 1020 can have encoded thereon a server program for controlling operation of server 952. In such embodiments, processor 1012 can execute at least a portion of the server program to transmit information and/or content (e.g., data, images, a user interface) to one or more computing devices 950, receive information and/or content from one or more computing devices 950, receive instructions from one or more devices (e.g., a personal computer, a laptop computer, a tablet computer, a smartphone), and so on.

In some embodiments, the server 952 is configured to perform the methods described in the present disclosure. For example, the processor 1012 and memory 1020 can be configured to perform the methods described herein (e.g., Algorithm 1 described above, Algorithm 2 described above, the CG-Net reconstruction network shown in FIGS. 2 and/or 3A-3C, the method of FIG. 5, the method of FIG. 6).

In some embodiments, data source 902 can include a processor 1022, one or more data acquisition systems 1024, one or more communications systems 1026, and/or memory 1028. In some embodiments, processor 1022 can be any suitable hardware processor or combination of processors, such as a CPU, a GPU, and so on. In some embodiments, the one or more data acquisition systems 1024 are generally configured to acquire data, images, or both, and can include an imaging system. Additionally or alternatively, in some embodiments, the one or more data acquisition systems 1024 can include any suitable hardware, firmware, and/or software for coupling to and/or controlling operations of an imaging system. In some embodiments, one or more portions of the data acquisition system(s) 1024 can be removable and/or replaceable.

Note that, although not shown, data source 902 can include any suitable inputs and/or outputs. For example, data source 902 can include input devices and/or sensors that can be used to receive user input, such as a keyboard, a mouse, a touchscreen, a microphone, a trackpad, a trackball, and so on. As another example, data source 902 can include any suitable display devices, such as an LCD screen, an LED display, an OLED display, an electrophoretic display, a computer monitor, a touchscreen, a television, etc., one or more speakers, and so on.

In some embodiments, communications systems 1026 can include any suitable hardware, firmware, and/or software for communicating information to computing device 950 (and, in some embodiments, over communication network 954 and/or any other suitable communication networks). For example, communications systems 1026 can include one or more transceivers, one or more communication chips and/or chip sets, and so on. In a more particular example, communications systems 1026 can include hardware, firmware, and/or software that can be used to establish a wired connection using any suitable port and/or communication standard (e.g., VGA, DVI video, USB, RS-232, etc.), Wi-Fi connection, a Bluetooth connection, a cellular connection, an Ethernet connection, and so on.

In some embodiments, memory 1028 can include any suitable storage device or devices that can be used to store instructions, values, data, or the like, that can be used, for example, by processor 1022 to control the one or more data acquisition systems 1024, and/or receive data from the one or more data acquisition systems 1024; to generate images from data; present content (e.g., data, images, a user interface) using a display; communicate with one or more computing devices 950; and so on. Memory 1028 can include any suitable volatile memory, non-volatile memory, storage, or any suitable combination thereof. For example, memory 1028 can include RAM, ROM, EPROM, EEPROM, other types of volatile memory, other types of non-volatile memory, one or more types of semi-volatile memory, one or more flash drives, one or more hard disks, one or more solid state drives, one or more optical drives, and so on. In some embodiments, memory 1028 can have encoded thereon, or otherwise stored therein, a program for controlling operation of data source 902. In such embodiments, processor 1022 can execute at least a portion of the program to generate images, transmit information and/or content (e.g., data, images, a user interface) to one or more computing devices 950, receive information and/or content from one or more computing devices 950, receive instructions from one or more devices (e.g., a personal computer, a laptop computer, a tablet computer, a smartphone, etc.), and so on.

In some embodiments, any suitable computer-readable media can be used for storing instructions for performing the functions and/or processes described herein. For example, in some embodiments, computer-readable media can be transitory or non-transitory. For example, non-transitory computer-readable media can include media such as magnetic media (e.g., hard disks, floppy disks), optical media (e.g., compact discs, digital video discs, Blu-ray discs), semiconductor media (e.g., RAM, flash memory, EPROM, EEPROM), any suitable media that is not fleeting or devoid of any semblance of permanence during transmission, and/or any suitable tangible media. As another example, transitory computer-readable media can include signals on networks, in wires, conductors, optical fibers, circuits, or any suitable media that is fleeting and devoid of any semblance of permanence during transmission, and/or any suitable intangible media.

As used herein in the context of computer implementation, unless otherwise specified or limited, the terms “component,” “system,” “module,” “framework,” and the like are intended to encompass part or all of computer-related systems that include hardware, software, a combination of hardware and software, or software in execution. For example, a component may be, but is not limited to being, a processor device, a process being executed (or executable) by a processor device, an object, an executable, a thread of execution, a computer program, or a computer. By way of illustration, both an application running on a computer and the computer can be a component. One or more components (or system, module, and so on) may reside within a process or thread of execution, may be localized on one computer, may be distributed between two or more computers or other processor devices, or may be included within another component (or system, module, and so on).

In some implementations, devices or systems disclosed herein can be utilized or installed using methods embodying aspects of the disclosure. Correspondingly, description herein of particular features, capabilities, or intended purposes of a device or system is generally intended to inherently include disclosure of a method of using such features for the intended purposes, a method of implementing such capabilities, and a method of installing disclosed (or otherwise known) components to support these purposes or capabilities. Similarly, unless otherwise indicated or limited, discussion herein of any method of manufacturing or using a particular device or system, including installing the device or system, is intended to inherently include disclosure, as embodiments of the disclosure, of the utilized features and implemented capabilities of such device or system.

The present disclosure has described one or more preferred embodiments, and it should be appreciated that many equivalents, alternatives, variations, and modifications, aside from those expressly stated, are possible and within the scope of the invention.

Exemplary Image Estimation System

FIG. 11 is a diagram of an exemplary image estimation system in accordance with an embodiment of the present disclosure. In FIG. 11, data acquisition system 1102 shows a source 1104, a target 1106, and a receiver 1108. In FIG. 11, source 1104 generates and transmits waveforms at one or more viewing angles towards target 1106. The waveforms generated by source 1104 interact with target 1106 and/or a scene including target 1106. The waveforms that interacted with target 1106 and/or a scene including target 1106 are received by receiver 1108, which collects these waveforms and sends them to an image estimation device 1110.

In an embodiment, source 1104, receiver 1108, and/or image estimation device 1110 can be implemented using a single device or multiple devices. In an embodiment, source 1104, receiver 1108, and/or image estimation device 1110 can be implemented as standalone devices or integrated into one or more host devices. For example, in an embodiment, source 1104, receiver 1108, and image estimation device 1110 are all integrated into a single host device. In an embodiment, image estimation device 1110 is implemented as a standalone device. For example, in an embodiment, image estimation device 1110 is integrated into a host platform, such as a building, vehicle, ship, unmanned aerial vehicle (UAV). In an embodiment, both receiver 1108 and image estimation device 1110 are integrated into the same host platform.

In an embodiment, image estimation device 1110 includes communications system 1112, a memory 1114, a processor 1116, and a display 1118. In an embodiment, communications system 1112 receives the waveforms collected by receiver 1108, optionally stores them in memory 1114, and sends them to processor 1116. As shown in FIG. 11, processor 1106 inputs received waveforms 1120, performs pre-processing 1122, and sends the pre-processed waveforms to an image estimation sub-system 1124. As shown in FIG. 11, image estimation sub-system 1124 performs image estimation using methods in accordance with embodiments of the present disclosure (e.g., DR-/CG-Net 1126), which will be explained in further detail below. Image estimation sub-system 1224 generates an image based on the waveforms sent by receiver 1108 to image estimation device 1110 and generates an estimated image, which image estimation device 1110 can output to display 1118. For example, in an embodiment, the image generated by image estimation sub-system 1224 can be an estimated image of target 1106 and/or a scene including target 1106 that is generated based on information in the waveform received by receiver 1108.

In an embodiment, image estimation device 1110 recovers an image x=@c from undersampled measurements y by recovering c in the equation:

$\begin{matrix} y = Ψ x + v \equiv ΨΦ c + v \equiv Ac + v & (48) \end{matrix}$

In Equation (48), Φ represents a dictionary to represent the image (Wavelets, discrete cosine), Ψ′ represents a measurement matrix (e.g., represents the sensor), A=ΨΦ represents an effective measurement matrix, and ν represents additive noise. For example, in an embodiment, image estimation sub-system 1124 of image estimation device 1110 uses waveforms collected by receiver 1108 to recovers an image x=Φc from undersampled measurements y by recovering c in Equation (48). In an embodiment, image estimation device 1110 represents c via Compound Gaussian (CG) prior (subsumes many commonly used priors in image estimation e.g., Laplace, Student's t) according to the equation:

$\begin{matrix} c = z ⊙ u & (49) \end{matrix}$

In Equation (49), u is Gaussian random vector, and z a positive non-Gaussian random vector. In an embodiment, the method of FIG. 12 estimates both z and u by minimizing the cost function according to the equation:

$\begin{matrix} { y - A (z ⊙ u) }_{2}^{2} + λ { u }_{2}^{2} + R (z) & (50) \end{matrix}$

In Equation (50), the term ∥y−A(z⊙u)∥₂²encourages that the estimates fit the observed measurements, the term ∥u∥₂²encourages u to be Gaussian, and the term R(z) encourages a prior distribution on z. In an embodiment, image estimation device 1110 estimates z and u (to in-turn estimate c) by minimizing Equation (50). In an embodiment, image estimation device 1110 uses a CG-Net deep neural network (DNN) imaging subsystem by unrolling the imaging method in Equation (50) with R(z)=μ∥log z∥₂². In an embodiment, an optimization landscape is learned by the DNN. In an embodiment, image estimation device 1110 uses a DR-CG-Net DNN imaging subsystem by unrolling the imaging method in Equation (50). In an embodiment, prior distribution for z will be learned by learning R(z).

FIG. 12 is a flowchart of an exemplary method for image estimation in accordance with an embodiment of the present disclosure (e.g., in an embodiment, DR-/CG-Net 1126). In step 1202, c is represented in the equation image x=Φc, where Φ is a dictionary to represent the image, via Compound Gaussian (CG) prior c=z⊙u, where u is a Gaussian random vector, and z is a positive non-Gaussian random vector. For example, in an embodiment, image estimation device 1110 represents c in the equation image x=Φc via Compound Gaussian (CG) prior c=z⊙u. In step 1204, initial estimates for z and u are determined. For example, in an embodiment, image estimation device 1110 determines initial estimates for z and u. In step 1206, z is iteratively updated based on an update function g, and u is iteratively updated based on a Tikhonov least squares solution. For example, in an embodiment, image estimation device 1110 iteratively updates z and u. In step 1208, an optimization landscape is determined for z using intermediate scale variable mapping based on updated values for z and u. For example, in an embodiment, image estimation device 1110 determines an optimization landscape for z. In step 1210, a prior distribution for z is determined using a convolutional neural network. For example, in an embodiment, image estimation device 1110 determines a prior distribution for z. In step 1212, the image x is estimated based on the determined optimization landscape for z and prior distribution for z. For example, in an embodiment, image estimation device 1110 estimates the image x.

FIG. 13 is a diagram illustrating iteratively and alternatively estimating z and u and converting them into a deep neural network imaging system via unrolling in accordance with an embodiment of the present disclosure. FIG. 13 shows initial estimates 1302a and 1302b for z₀and u₀, updates of z 1304a and 1304b, updates of u 1306a and 1306b, and an output estimated image 1308a and 1308b.

FIG. 14 is a diagram illustrating complete scale variable mapping for each z_kin accordance with an embodiment of the present disclosure. FIG. 15 is a diagram illustrating CG-Net intermediate scale variable mapping in accordance with an embodiment of the present disclosure. FIG. 16 is a diagram illustrating PGD intermediate scale variable mapping in accordance with an embodiment of the present disclosure.

FIG. 17 shows exemplary image reconstructions in accordance with an embodiment of the present disclosure. For example, FIG. 17 shows original image x 1702 and radon transform y=Ψx+ν 1704. FIG. 17 shows 4 reconstructions, including a reconstruction using PGD DR-CG-Net 1706 in accordance with an embodiment of the present disclosure, a reconstruction using CG-Net 1708, a reconstruction using MADUN 1710, and a reconstruction using ISTA-Net⁺1712. As shown in FIG. 17, the reconstruction using PGD DR-CG-Net 1706 in accordance with an embodiment of the present disclosure has the highest structural similarity index measure (SSIM).

FIG. 18 is a diagram showing exemplary training dataset size vs. structural similarity index measure (SSIM) for a variety of image reconstruction techniques in accordance with an embodiment of the present disclosure. As shown in FIG. 18, DR-CG-Net in accordance with an embodiment of the present disclosure has the highest SSIM.

CONCLUSION

It is to be appreciated that the Detailed Description, and not the Abstract, is intended to be used to interpret the claims. The Abstract may set forth one or more but not all exemplary embodiments of the present disclosure as contemplated by the inventor(s), and thus, is not intended to limit the present disclosure and the appended claims in any way.

The present disclosure has been described above with the aid of functional building blocks illustrating the implementation of specified functions and relationships thereof. The boundaries of these functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternate boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed.

The foregoing description of the specific embodiments will so fully reveal the general nature of the disclosure that others can, by applying knowledge within the skill of the art, readily modify and/or adapt for various applications such specific embodiments, without undue experimentation, without departing from the general concept of the present disclosure. Therefore, such adaptations and modifications are intended to be within the meaning and range of equivalents of the disclosed embodiments, based on the teaching and guidance presented herein. It is to be understood that the phraseology or terminology herein is for the purpose of description and not of limitation, such that the terminology or phraseology of the present specification is to be interpreted by the skilled artisan in light of the teachings and guidance.

Any representative signal processing functions described herein can be implemented using computer processors, computer logic, application specific integrated circuits (ASIC), digital signal processors, etc., as will be understood by those skilled in the art based on the discussion given herein. Accordingly, any processor that performs the signal processing functions described herein is within the scope and spirit of the present disclosure.

The above systems and methods may be implemented using a computer program executing on a machine, using a computer program product, or using a tangible and/or non-transitory computer-readable medium having stored instructions. For example, the functions described herein could be embodied by computer program instructions that are executed by a computer processor or any one of the hardware devices listed above. The computer program instructions cause the processor to perform the signal processing functions described herein. The computer program instructions (e.g., software) can be stored in a tangible non-transitory computer usable medium, computer program medium, or any storage medium that can be accessed by a computer or processor. Such media include a memory device such as a RAM or ROM, or other type of computer storage medium such as a computer disk or CD ROM. Accordingly, any tangible non-transitory computer storage medium having computer program code that cause a processor to perform the signal processing functions described herein are within the scope and spirit of the present disclosure.

While various embodiments of the present disclosure have been described above, it should be understood that they have been presented by way of example only, and not limitation. It will be apparent to persons skilled in the relevant art that various changes in form and detail can be made therein without departing from the spirit and scope of the disclosure. Thus, the breadth and scope of the present disclosure should not be limited by any of the above-described exemplary embodiments.

	Number	Date	Country
	63502582	May 2023	US
	63513686	Jul 2023	US

Deep Regularized Compound Gaussian Network

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

CROSS REFERENCE TO RELATED APPLICATIONS

FEDERALLY SPONSORED RESEARCH AND DEVELOPMENT

Provisional Applications (2)