In inverse problems, one seeks to reconstruct a signal from incomplete and/or degraded measurements. Such problems arise in magnetic resonance imaging, computed tomography, deblurring, superresolution, inpainting, and other applications.
Image reconstruction can be used in a wide variety of applications to compensate for the limitations of imaging systems. Image reconstruction can be used to increase the resolution of an image beyond the original capture resolution (“superresolution”), inpaint missing areas of an image, reduce or eliminate errors in an image, and reduce blurring in an image.
Image reconstruction can be used for images formed based on visual light, as well as images captured using other imaging techniques like magnetic resonance imaging applications, computed tomography applications, and X-rays.
Therefore, there is a need for improved deep learning systems and related methods for posterior sampling in inverse problems, which can be applied when performing image reconstruction, for example.
Deep learning systems and methods for posterior sampling in inverse problems are described herein.
In some aspects, the techniques described herein relate to a method for training a deep learning model including: receiving a training dataset including a plurality of input/output pairs; and training a conditional generative adversarial network (cGAN) using the training dataset, wherein the training includes a regularization process configured to enforce consistency with a posterior mean and a posterior covariance or trace-covariance.
In some aspects, the techniques described herein relate to a method, wherein the regularization process uses a supervised L1 loss in conjunction with a standard deviation reward.
In some aspects, the techniques described herein relate to a method, wherein the standard deviation reward is weighted.
In some aspects, the techniques described herein relate to a method, further including autotuning the standard deviation reward.
In some aspects, the techniques described herein relate to a method, wherein the trained cGAN is configured to generate a plurality of posterior input sample values for a given output value.
In some aspects, the techniques described herein relate to a method, wherein the cGAN includes a generator model and a discriminator model.
In some aspects, the techniques described herein relate to a method, wherein each of the generator model and the discriminator model includes a respective convolutional neural network (CNN).
In some aspects, the techniques described herein relate to a method, wherein the respective CNN of the generator model is configured to output images.
In some aspects, the techniques described herein relate to a method, wherein the respective CNN of the generator model is configured for image segmentation.
In some aspects, the techniques described herein relate to a method, wherein the training dataset includes images.
In some aspects, the techniques described herein relate to a method, wherein the images are medical images.
In some aspects, the techniques described herein relate to a method, wherein the medical images are magnetic resonance (MR) images.
In some aspects, the techniques described herein relate to a method, wherein the MR images are collected using Cartesian or non-Cartesian sampling masks.
In some aspects, the techniques described herein relate to a method, wherein the MR images further include a temporal dimension.
In some aspects, the techniques described herein relate to a method for posterior sampling including: providing a trained conditional generative adversarial network (cGAN), wherein a regularization process used during training of the cGAN is configured to enforce consistency with a posterior mean and a posterior covariance or trace-covariance; and generating, using the trained cGAN, a plurality of posterior input sample values for a given output value.
In some aspects, the techniques described herein relate to a method, further including training the cGAN.
In some aspects, the techniques described herein relate to a method for image reconstruction or recovery including: providing a trained conditional generative adversarial network (cGAN), wherein a regularization process used during training of the cGAN is configured to enforce consistency with a posterior mean and a posterior covariance or trace-covariance; receiving a measurement from an imaging system; and generating, using the trained cGAN, a plurality of images based on the measurement.
In some aspects, the techniques described herein relate to a method, further including training the cGAN.
In some aspects, the techniques described herein relate to a system including: a conditional generative adversarial network (cGAN), wherein a regularization process used during training is configured to enforce consistency with a posterior mean and a posterior covariance or trace-covariance; a processor and a memory operably coupled to the processor, wherein the memory has computer-executable instructions stored thereon that, when executed by the processor, cause the processor to: input a measurement from an imaging system into the cGAN; and receive a plurality of images generated by the cGAN based on the measurement.
It should be understood that the above-described subject matter may also be implemented as a computer-controlled apparatus, a computer process, a computing system, or an article of manufacture, such as a computer-readable storage medium.
Other systems, methods, features and/or advantages will be or may become apparent to one with skill in the art upon examination of the following drawings and detailed description. It is intended that all such additional systems, methods, features and/or advantages be included within this description and be protected by the accompanying claims.
The components in the drawings are not necessarily to scale relative to each other. Like reference numerals designate corresponding parts throughout the several views.
according to implementations of the present disclosure.
and theoretical value
versus training epoch t for P=8, according to implementations of the present disclosure.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art. Methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present disclosure. As used in the specification, and in the appended claims, the singular forms “a,” “an,” “the” include plural referents unless the context clearly dictates otherwise. The term “comprising” and variations thereof as used herein is used synonymously with the term “including” and variations thereof and are open, non-limiting terms. The terms “optional” or “optionally” used herein mean that the subsequently described feature, event or circumstance may or may not occur, and that the description includes instances where said feature, event or circumstance occurs and instances where it does not. Ranges may be expressed herein as from “about” one particular value, and/or to “about” another particular value. When such a range is expressed, an aspect includes from the one particular value and/or to the other particular value. Similarly, when values are expressed as approximations, by use of the antecedent “about,” it will be understood that the particular value forms another aspect. It will be further understood that the endpoints of each of the ranges are significant both in relation to the other endpoint, and independently of the other endpoint. While implementations will be described for reconstructing certain types of images, it will become evident to those skilled in the art that the implementations are not limited thereto, but are applicable for reconstructing any type of image.
The term “artificial intelligence” is defined herein to include any technique that enables one or more computing devices or comping systems (i.e., a machine) to mimic human intelligence. Artificial intelligence (AI) includes, but is not limited to, knowledge bases, machine learning, representation learning, and deep learning. The term “machine learning” is defined herein to be a subset of AI that enables a machine to acquire knowledge by extracting patterns from raw data. Machine learning techniques include, but are not limited to, logistic regression, support vector machines (SVMs), decision trees, Naïve Bayes classifiers, and artificial neural networks. The term “representation learning” is defined herein to be a subset of machine learning that enables a machine to automatically discover representations needed for feature detection, prediction, or classification from raw data. Representation learning techniques include, but are not limited to, autoencoders. The term “deep learning” is defined herein to be a subset of machine learning that enables a machine to automatically discover representations needed for feature detection, prediction, classification, etc. using layers of processing. Deep learning techniques include, but are not limited to, artificial neural network or multilayer perceptron (MLP).
Machine learning models include supervised, semi-supervised, and unsupervised learning models. In a supervised learning model, the model learns a function that maps an input (also known as feature or features) to an output (also known as target or targets) during training with a labeled data set (or dataset). In an unsupervised learning model, the model learns patterns (e.g., structure, distribution, etc.) within an unlabeled data set. In a semi-supervised model, the model learns a function that maps an input (also known as feature or features) to an output (also known as target or targets) during training with both labeled and unlabeled data.
Described herein are deep learning systems and related methods for posterior sampling in inverse problems. As described in the Example, the deep learning systems and related methods can be used to perform image reconstruction. Implementations of the present disclosure include improvements to training conditional generative adversarial networks (cGANs). A generative adversarial network (GAN) is an unsupervised machine learning framework. A GAN can include multiple (e.g., 2) neural networks competing with each other (e.g., a generator and discriminator). The generator can be a neural network trained to generate new data, and the discriminator can be trained to classify examples generated by the generator as real or fake. The two models can be trained together so that the generator model generates plausible example data. A cGAN can condition the generation of data on certain inputs to make the generated data more targeted and/or specific.
As used herein, “regularization” in machine learning refers to systems and methods of preventing overfitting of models or encouraging certain behaviors in the output of the model such as the GANs and cGANs described herein. Implementations of the present disclosure can include systems and methods of regularization that use both mean values and covariance or trace-covariance to match that of the “true posterior.” As used herein, the “true posterior” refers to the distribution that represents an updated belief about an image after having seen the measurements. As used herein, a “trace covariance” can refer to the trace of a covariance matrix, where the covariance matrix defines the covariance between pairs of elements. Thus, implementations of the present disclosure can generate data with covariance or trace-covariance and mean similar to (or matching) the true posterior, allowing for improved generation of data in the cGAN. Thus, implementations of the present disclosure include methods for training cGAN networks, which can be used to improve the performance of those cGAN networks for image processing, as described throughout the present disclosure.
Additionally, implementations of the present disclosure include systems and methods of tuning the parameters used to train cGANs, which can optionally be used in implementations of the systems and methods for training cGANs described herein.
Implementations of the present disclosure include methods of automatically tuning the βstd used to train the cGAN. As described in the Example, below, βstd can affect the variation of the samples generated in the cGAN. A common problem in the field is that existing cGANs can suffer what is referred to as “mode collapse.” “Mode collapse” refers to cGANs outputting a small variety of samples, where the samples lack the variation of the true dataset. By performing automatic tuning of βstd during training, implementations of the present disclosure can train cGANs without suffering mode collapse, or with less severe mode collapse. Thus, implementations of the present disclosure include systems and methods that allow for improved training of cGAN networks to generate samples with variation that is similar to (or the same as) true posterior.
With reference to
The training dataset can include any type of data. As described herein, implementations of the present disclosure can be used for visual light images (e.g., photographs of people) as well as images produced using medical or scientific instruments. For example, implementations of the present disclosure can be used for computed tomography (CT) and/or magnetic resonance (MR) images. Optionally, the training dataset can be obtained from a database of medical images or photographs.
The method can further include training a conditional generative adversarial network (cGAN) using the training dataset at step 104. The training can include a regularization process configured to enforce consistency with a posterior mean and a posterior covariance or trace-covariance. As described in the Example, the regularization process used in training the cGAN can include automatically tuning βstd to prevent mode collapse. As also described in the Example, implementations of the present disclosure can further include training the cGAN so that the covariance of the generator outputs match (or is close to) the covariance of the true posterior of the training dataset used at step 104.
Different types of regularization are contemplated by the present disclosure. Optionally, the regularization process uses a supervised L1 loss in conjunction with a standard deviation reward. In some implementations, the standard deviation reward (referred to in the Example as βstd) is weighted.
In some implementations, the standard deviation reward can be autotuned.
Different configurations of cGAN are also contemplated by the present disclosure. In some implementations, the cGAN can include both a generator model and a discriminator model. The generator model and/or the discriminator model can each include a respective convolutional neural network (CNN). Optionally, the respective CNN of the generator model can be configured for image segmentation.
For example, in some implementations involving MR imaging described in the Example below, the generator model is a CNN, particularly a CNN inspired by the U-Net architecture [21] The primary input, y can be concatenated with the code vector z and fed through the U-Net. The network consists of 4 pooling layers with 128 initial channels. However, instead of pooling the study used use convolutions with filters of size 3×3, “same” padding, and a stride of 2 when downsampling. Conversely, the study upsampled using transpose convolutions, again with filters of size 3×3, “same” padding, and a stride of 2. All other convolutions utilize filters of size 3×3, “same” padding, and a stride of 1.
Within each encoder and decoder layer the example implementation included a residual block, the architecture of which can be found in [2]. The example implementation used instance-norm for all normalization layers and parametric ReLUs as our activation functions, in which the network learns the optimal “negative slope.” Finally, the study included 5 residual blocks at the base of the U-Net, in between the encoder and decoder. This can be done in an effort to artificially increase the depth of the network [6]. The generator has 86,734,334 trainable parameters.
It should be understood that the generator architecture described above is only provided as an example.
Additionally, in some implementations involving MR imaging described in the Example below, the discriminator model is a CNN, particularly a CNN having 5 layers. The example discriminator architecture included a discriminator that was a standard CNN with 5 layers. In the first 3 layers, the example implementation used convolutions with filter of size 4×4, “same” padding, and a stride of 2 to reduce the image resolution. The remaining two convolutional layers use the same parameters, with the stride modified to be 1. The study used batch-norm as the normalization layer and leaky ReLUs with a “negative-slope” of 0.2 as the activation functions.
The final convolutional layer does not have a normalization layer or activation function, and outputs a 1 channel “prediction map.” This prediction map gives a Wasserstein score for a patch of the image. The study achieved this patch-based discrimination by utilizing the receptive field of the network used in the example implementation. Consequently, by increasing or decreasing the number of strided convolutions used in the example implementation, the study can modify the size of the patches we are discriminating. Patch-based discrimination has been known to improve the high-frequency information in reconstructions [10]. The discriminator of the example implementation has 693,057 trainable parameters.
It should be understood that the discriminator architecture described above is only provided as an example.
Further, in some implementations involving MR imaging described in the Example below, the cGANs use the generator and discriminator architectures described above. For the 3 cGANs described in the Example, the study used the architectures described herein, as well as the same, or a similar, training/testing procedure. The study adapted the model's regularization and βadv to match the authors' original implementation. In particular, when training Ohayon's cGAN we set βadv=1e−3 and use 2,P regularization while training the generator. When training Adler's cGAN, we set βadv=1 and do not apply any regularization to the generator. Instead we modify the number of input channels to the discriminator and slightly modify the training logic to be consistent with the loss proposed in (7).
For Jalal's approach, the study did not modify the original implementation, other than replacing the default sampling pattern with the GRO undersampling mask. The study generated 32 samples for 72 different test images using a batch-size of 4, which took roughly 6 days. These samples were generated on a server with 4 NVIDIA V100 GPUS, each with 32 GB of memory.
It should be understood that the cGANs architecture described above is only provided as an example.
In some implementations, the trained cGAN can be configured to generate a plurality of posterior input sample values for a given output value.
Implementations of the present disclosure include methods for posterior sampling. Additional description of posterior sampling is provided herein in the Example, including example implementations of posterior sampling. An example method 200 for posterior sampling is shown in
The method 200 can include providing a trained conditional generative adversarial network (cGAN) at step 202, where a regularization process used during training is configured to enforce consistency with a posterior mean and a posterior covariance or trace-covariance. The trained cGAN can be trained using implementations of the method 100 described with respect to
The method can further include generating, using the trained cGAN, a plurality of posterior input sample values for a given output value at step 204. Implementations of the present disclosure include methods for image reconstruction and/or recovery. An example method 300 for image reconstruction/recovery is shown in
The method 300 can further include receiving a measurement from an imaging system at step 304 and generating, using the trained cGAN, a plurality of images based on the measurement at step 306.
Implementations of the present disclosure include systems for performing image reconstruction and/or training a cGAN to perform image reconstruction. An example system 400 is shown in
The system 400 can also include a computing device 404. The computing device 404 can include any/all of the features of the computing device 500 shown in
The imaging system 406 can include any system that can generate any type of images. Non-limiting examples of imaging systems include MRI, CT, and imaging systems based on any wavelength of electromagnetic radiation, including visual light.
It should be appreciated that the logical operations described herein with respect to the various figures may be implemented (1) as a sequence of computer-implemented acts or program modules (i.e., software) running on a computing device (e.g., the computing device described in
Referring to
In its most basic configuration, computing device 500 typically includes at least one processing unit 506 and system memory 504. Depending on the exact configuration and type of computing device, system memory 504 may be volatile (such as random access memory (RAM)), non-volatile (such as read-only memory (ROM), flash memory, etc.), or some combination of the two. This most basic configuration is illustrated in
Computing device 500 may have additional features/functionality. For example, computing device 500 may include additional storage such as removable storage 508 and non-removable storage 510 including, but not limited to, magnetic or optical disks or tapes. Computing device 500 may also contain network connection(s) 516 that allow the device to communicate with other devices. Computing device 500 may also have input device(s) 514 such as a keyboard, mouse, touch screen, etc. Output device(s) 512 such as a display, speakers, printer, etc. may also be included. The additional devices may be connected to the bus in order to facilitate communication of data among the components of the computing device 500. All these devices are well-known in the art and need not be discussed at length here.
The processing unit 506 may be configured to execute program code encoded in tangible, computer-readable media. Tangible, computer-readable media refers to any media that is capable of providing data that causes the computing device 500 (i.e., a machine) to operate in a particular fashion. Various computer-readable media may be utilized to provide instructions to the processing unit 506 for execution. Example tangible, computer-readable media may include, but is not limited to, volatile media, non-volatile media, removable media and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. System memory 504, removable storage 508, and non-removable storage 510 are all examples of tangible, computer storage media. Example tangible, computer-readable recording media include, but are not limited to, an integrated circuit (e.g., field-programmable gate array or application-specific IC), a hard disk, an optical disk, a magneto-optical disk, a floppy disk, a magnetic tape, a holographic storage medium, a solid-state device, RAM, ROM, electrically erasable program read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices.
In an example implementation, the processing unit 506 may execute program code stored in the system memory 504. For example, the bus may carry data to the system memory 504, from which the processing unit 506 receives and executes instructions. The data received by the system memory 504 may optionally be stored on the removable storage 508 or the non-removable storage 510 before or after execution by the processing unit 506.
It should be understood that the various techniques described herein may be implemented in connection with hardware or software or, where appropriate, with a combination thereof. Thus, the methods and apparatuses of the presently disclosed subject matter, or certain aspects or portions thereof, may take the form of program code (i.e., instructions) embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other machine-readable storage medium wherein, when the program code is loaded into and executed by a machine, such as a computing device, the machine becomes an apparatus for practicing the presently disclosed subject matter. In the case of program code execution on programmable computers, the computing device generally includes a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. One or more programs may implement or utilize the processes described in connection with the presently disclosed subject matter, e.g., through the use of an application programming interface (API), reusable controls, or the like. Such programs may be implemented in a high-level procedural or object-oriented programming language to communicate with a computer system. However, the program(s) can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language and it may be combined with hardware implementations.
The following examples are put forth so as to provide those of ordinary skill in the art with a complete disclosure and description of how the compounds, compositions, articles, devices and/or methods claimed herein are made and evaluated, and are intended to be purely exemplary and are not intended to limit the disclosure. Efforts have been made to ensure accuracy with respect to numbers (e.g., amounts, temperature, etc.), but some errors and deviations should be accounted for. Unless indicated otherwise, parts are parts by weight, temperature is in ° C. or is at ambient temperature, and pressure is at or near atmospheric.
A study was performed of an example implementation of the present disclosure can be used to reconstruct an image from incomplete and/or degraded measurements. The example implementation of the present disclosure can be used for any image reconstruction task. It should be understood that “image reconstruction,” as described herein, can refer to images with any resolution, and formed using any type of imaging data. As non-limiting examples, implementations of the present disclosure can be used for magnetic resonance imaging, computed tomography, deblurring, super resolution, inpainting, and other applications using images from a wide range of imaging devices.
Implementations of the present disclosure include systems and methods that can rapidly generate high-quality posterior samples. It is often the case that many image hypotheses are consistent with both the measurements and prior information, and so implementations of the present disclosure can be configured not to recover a single “best” hypothesis but instead to sample multiple images from the posterior distribution. Implementations of the present disclosure include a regularized conditional Wasserstein GAN that can generate dozens of high-quality posterior samples per second. The study of the example implementation described herein includes quantitative evaluation metrics (e.g., conditional Frechet inception distance), showing that methods described herein produce state-of-the-art posterior samples in both parallel MRI and inpainting applications.
An example implementation includes generative posterior sampling where, given a training dataset of input/output pairs {(xt, yt)}t=1T, example implementations described herein can learn a generating function {circumflex over (x)}=Gθ(z, y) that, for a given y, maps random code vectors z˜(0, I) to posterior samples {circumflex over (x)}˜px|y(⋅|y).
Posterior sampling and linear inverse problems can recover a signal or image x from a measurement y of the form
y=Ax+w
deblurring, superresolution, inpainting, computed tomography (CT), magnetic resonance (MR) imaging, and other fields. When A does not have full column rank, Ax filters out any components of x that lie in the nullspace of A, and so prior information about the true x is needed for accurate recovery (e.g., the example implementation can receive information that x is an MR image, for example by tagging x, or by). But even after taking such prior information into account, there may be many hypotheses of x that yield equally good explanations of y. Thus, rather than settling for a single “best” estimate of x from y, the goal is to efficiently sample from the posterior px|y(⋅|y).
There exist several approaches that learn to sample from the posterior given training samples {(xt, yt)}t=1T, including conditional generative adversarial networks (cGANs) [2,10, 39], conditional variational autoencoders (cVAEs) [7,23,30], conditional normalizing flows (cNFs) [4,28,34], and scorebased generative models using Langevin dynamics [12,26, 33]. The example implementation can include cGANs, which are known to generate high-quality samples, which may be at the expense of sample diversity.
The example implementation can include a cGAN that addresses the aforementioned lack-of-diversity issue by training with a regularization that enforces consistency with the true posterior mean and covariance or trace-covariance. The regularization can include supervised 1 loss plus an appropriately weighted standard-deviation reward. In certain cases, the optimal weight can be computed in closed form. The example implementation can further include systems and methods to automatically calibrate the weight during training.
The study included accelerated MR image recovery and large-scale image completion/inpainting. To quantify performance, the study focused on conditional Frechet inception distance (CFID) [24], but the study also considers FID [9], LPIPS [36], average pixel-wise standard deviation (APSD), PSNR, and SSIM [32]. The results show the proposed regularized cGAN (rcGAN) outperforming existing cGANs [2,16] and the state-of-the-art score-based generative models from [12,26] in all tested metrics.
The example implementation can include a Wasserstein cGAN framework [2]. The Wasserstein GAN framework [5,8] can be very successful in avoiding mode collapse and stabilizing the training of GANs. The study included designing a generator network Gθ: ×→ such that, for typical fixed values of y, the random variable {circumflex over (x)}=Gθ(z, y) induced by z˜pz has a distribution that best matches the posterior px|y(⋅|y) in the Wasserstein-1 distance. Here, z is drawn independently of y, and , , and denote the spaces of signals x, measurements y, and codes z, respectively. The Wasserstein-1 distance can be expressed as [5]
The study optimized the generator parameters θ to minimize the loss in (4), i.e.,
In practice, the discriminator is implemented using a neural network Dϕ. The parameters θ and ϕ are trained by alternately minimizing
adv(θ,ϕ)Ex,z,y{Dϕ(x,y)−Dϕ(Gθ(z,y),y)} (6)
Mode collapse and regularization. One of the main challenges with the cGAN framework in imaging problems is that, for each training measurement example yt there is only a single signal example xt. Thus, with the previously described training methodology, there may be no incentive for the generator to produce diverse samples G(z, yt)|z˜p
With unconditional GANs (uGANs), although mode collapse was historically an issue [1,20,22], it can be largely solved by the Wasserstein GAN framework [5,8]. It should be noted that the causes of mode-collapse in uGANs are fundamentally different than in cGANs because, in the uGAN case, the training set {xt} contains many examples of valid images, while in the cGAN case there is only one example of a valid image xt for each given measurement yt. As a result, many strategies used to combat mode-collapse in uGANs are not applicable to cGANs. For example, mini-batch discrimination, where the discriminator aims to distinguish a mini-batch of true samples {xt} from a mini-batch of generated samples {{circumflex over (x)}t} by leveraging inter-sample variation (e.g., MBSD [13] or its precursor from [22]), generally does not work with cGANs because the statistics of the posterior can significantly differ from the statistics of the prior.
To combat mode collapse in the cGAN case, Adler et al. [2] proposed to use a three-input discriminator Dϕadler: ××→ and replace adv from (6) with the loss
Ohayon et al. [16] proposed to fight mode collapse via supervised-2 regularization1 of adv, i.e.,
2(θ)Ex,y{∥x−Ez{Gθ(z,y)}∥22} (8)
It is important to note that, in practice, the expectation Ez in (8) can be replaced with a (finite) P-sample average with P≥2 (e.g., P=8 in Ohayon [17]), yielding
As shown in the study, 2,P has the potential to induce mode collapse rather than prevent it, and the potential grows larger as P grows smaller. Supervised-2 regularization can include problems, such as mode collapse. To understand why supervised-2 regularization using a finite-sample average can lead to mode collapse, (9) can be rewritten as as:
Using:
In (11) only
Experimentally, the study observed that supervised-2 regularization does indeed lead to mode collapse when P=2. Although mode collapse may not occur with larger values of P, there is a high computational cost to using large P as a result of GPU memory constraints: as P doubles, the batch size must halve, and so training time increases linearly with P. For example, in the MRI experiment, the study found that P=2 takes approximately 2.5 days to train for 100 epochs on a 4×A100 GPU server, while P=8 takes approximately 10 days.
To mitigate 2,P's incentive for mode-collapse, variance reward may be incorporated. In particular, training the generator can include solving:
var,P(θ)=Ey{Ez
The study included quantifying performance using CFID.
As described herein, the study included training a generator Gθ so that, for typical fixed values of y, the distribution p{circumflex over (x)}|y matches the true posterior px|y(⋅|y). It is essential to have a quantitative metric for evaluating performance with respect to this goal. For example, it is not enough that the generated samples are “accurate” in the sense that {circumflex over (x)}i or {circumflex over (x)}(P) are close to the ground truth x, nor is it enough that {circumflex over (x)}i are “diverse” in the sense of having a large element-wise standard deviation.
Thus, the study quantified the performance of posterior approximation using the conditional Frechet inception distance (CFID) [24], which is a computationally efficient approximation to the conditional Wasserstein distance (CWD)
CWDEy{W2(px|y(⋅,y),p{circumflex over (x)}|y(⋅,y))} (18)
In (18), W2(pa, pb) denotes the Wasserstein-2 distance between distributions pa and pb, defined as
In practice, the expectations, means, and covariances in (20) are replaced by sample averages using samples {(xt, yt)} from a test set.
The example implementation includes systems and methods for Regularization using supervised-1 plus standard deviation reward.
Unlike the previously described forms of regularization, implementations of the present disclosure include forms of cGAN regularization that encourage the samples {circumflex over (x)}i to match the true posterior in both mean and covariance or trace-covariance. As a non-limiting example, when training the generator, the example implementation can solve:
1,std,P(θ,βstd)1,P(θ)−βstdstd,P(θ) (22)
The study found that P=2 worked best for an example implementation, as shown in
For image recovery in general, the use of supervised-1 loss is often preferred over 2 because it results in sharper, more visually pleasing results [38]. But regularizing a cGAN using supervised-1 loss alone can push the generator towards mode collapse, for reasons similar to the supervised-2 case.
Implementations of the present disclosure can use a properly weighted standard deviation (std) reward in conjunction with supervised 1 loss, as in (22). The study shows that the std reward works together with the 1 loss to enforce the correctness of both the posterior mean and the posterior covariance or trace-covariance. This stands in contrast to the case of 2 loss with a variance reward, which enforces only the correctness of the posterior mean.
Proposition 1. Assume that P≥2 and that θ has complete control over the y-conditional mean and covariance of {circumflex over (x)}i. Then the regularization-minimizing parameters θ*=arg minθ1,std,(θ, ) with
E
z
|y
{{circumflex over (x)}
i(θ*)|y}=Ex|y{x|y}={circumflex over (x)}mmse (26a)
Covz
In practical applications, {circumflex over (x)}i and x are not expected to be independent Gaussian conditioned on y, as assumed in Proposition 1. So, using from (25) may not work well in practice. Implementations of the present disclosure include methods to automatically determine the correct βstd.
To illustrate the behavior of the previously described regularizers, the study considered the scalar-Gaussian case, where the generator is Gθ(z, y)={circumflex over (μ)}+{circumflex over (σ)}z, with code z˜(0, 1) and parameters θ=[{circumflex over (μ)}, {circumflex over (σ)}]T. In this case, the generated posterior is p{circumflex over (x)}|y(x|y)=(x; {circumflex over (μ)}, {circumflex over (σ)}2), and the study assumes that the true posterior is px|y(x|y)=(x; μ, σ2) for some μ and σ>0.
Implementations of the present disclosure include systems and methods of autotuning βstd.
The example implementation includes methods to autotune βstd for a given training dataset. The example approach can be based on the principle that larger values of βstd tend to yield samples {circumflex over (x)}i with more variation. But more variation is not necessarily better; implementations of the present disclosure can generate samples with the correct amount of variation. To assess variation, the study can compare the expected 2 error of the P-sample average {circumflex over (x)}(P) to that of {circumflex over (x)}(1).). In the case of mode collapse, these errors are identical. But when {{circumflex over (x)}i} are true posterior samples, these errors follow a particular relationship:
Given generator outputs {{circumflex over (x)}i} and their P sample average
the study defined the expected 2 error on {circumflex over (x)}(P) as
εPE{∥{circumflex over (x)}(P)−x∥22|y} (27)
If {{circumflex over (x)}i} are independent samples of the true posterior (i.e., {circumflex over (x)}i˜px|y(⋅|y)), then
Implementations of the present disclosure can include systems and methods to enforce data consistency.
The example data-consistency procedures described herein can be optionally used with implementations of the cGAN described herein. In some applications such as medical imaging or inpainting, the end user may feel comfortable knowing that all generated reconstructions {circumflex over (x)}i of x from y=Ax+w (recall (1)) are consistent with the measurements in that
y=A{circumflex over (x)}i (32)
The example recovery method aims to restore the information about x that was lost through the measurement process (i.e., the components of x lying in the nullspace of A) and so this approach applies when A has a non-trivial nullspace. Also, if no attempt is made to remove the noise w in y, the approach may be appropriate only for high-SNR applications.
The proposed data-consistency approach leverages the fact that, if (32) holds, then A+y=A+A{circumflex over (x)}i must also hold, where (⋅)+ denotes the pseudo-inverse. The quantity A+A can be recognized as the orthogonal projection matrix associated with the row space of A. So, (32) says that the components of {circumflex over (x)}i in the row space of A must equal A+y while the components in the null space are unconstrained.
This implies the following data-consistency procedure:
{circumflex over (x)}
i=(I−A+A){circumflex over (x)}iraw+A+y. (33)
The study of the example implementation further included MRI experiments using implementations of the present disclosure.
In the MRI version of (1), x is a complex-valued multicoil image. For the training {xt}, the study used the top 8 slices of all fastMRI [35] T2 brain training volumes with at least 16 coils, cropping them to 384×384 pixels and compressing to 8 virtual coils [37], yielding 12200 training images. Then 2376 testing and 784 validation images were obtained in the same manner from the fastMRI T2 brain testing volumes. From the 2376 testing images, the study randomly selected 72 from which to compute performance metrics, in order to limit the evaluation time of the Langevin competitor [12] to roughly 6 days. To create measurement data yt, the study transformed xt to the Fourier domain, sampled using the GRO pattern [3] at acceleration R=4, and transformed the zero-filled k-space measurements back to the (complex, multicoil) image domain.
The study architecture used a U-Net [21] for the generator and a standard CNN for the discriminator. The discriminator was patch-based [10] since that gave slightly improved performance. Also, the study used the data-consistency processing from Section 3.3.
At each training iteration, the generator takes in nbatch measurement samples yt, and P code vectors for every yt, and performs an optimization step on the loss
G(θ)βadvadv(θ,ϕ)+1,P(θ)−βstdstd,P(θ) (34)
D(ϕ)=−adv(θ,ϕ)+α1gp(ϕ)+α2drift(ϕ), (35)
The study included validation and testing of the example implementations described herein. To evaluate performance, the study converted the multi-coil generator outputs to complex-valued images using SENSE-based coil combining [19] with ESPIRITestimated [31] coil sensitivity maps (via SigPy [18]). The study then converted to the magnitude domain before computing CFID, PSNR, SSIM, and average pixel-wise standard deviation (APSD), defined as
PSNR and SSIM were computed from the P-averaged outputs {circumflex over (x)}(P) (recall (12)), while CFID was computed from the un-averaged outputs {circumflex over (x)}i. The study used P=32 for testing and P=8 for validation.
The study compared the example implementation of a cGAN according to the present disclosure to Adler et al.'s cGAN [2], Ohayon et al.'s cGAN [16], and the fastMRI Langevin approach from Jalal et al. [12]. The cGAN from [2] uses generator loss βadvadv(θ, ϕ) and discriminator loss −advadler(θ, ϕ)+α1gp(ϕ)+α2drift(ϕ), while the cGAN from [16] uses generator loss βadvadv(θ, ϕ)+2,P(θ) and discriminator loss −adv(θ, ϕ)+α1gp(ϕ)+α2drift(ϕ), where for each the study used the value of βadv from the original paper. All cGANs used the same generator and discriminator architectures, except that [2] used extra discriminator input channels to facilitate the 3-input loss (7). For the Langevin approach [12], the study did not modify the implementation from [11] except for the undersampling mask.
For the MRI experiment described herein, the test CFID, PSNR, SSIM, APSD, and evaluation time (for 4 samples) are shown in
The mode collapse of Ohayon et al.'s cGAN is evident from the dark pixel-wise standard deviation image. The fact that the cGAN errors are less than the Langevin errors near the image corners is a consequence of minor differences in sensitivity-map estimation relative to [11].
βstd autotuning results.
and the theoretical value
versus the training epoch t for P=8, as used during validation. As described herein the observed P-sample PSNR gain
is dependent on βstd, which is adapted according to (31).
The study further included inpainting experiments using an example implementation of the present disclosure. The study objective was to complete the missing centered 64×64 square of an 128×128 CelebA-HQ face image [13]. The study randomly split the dataset, yielding 27000 images for training, 2000 for validation, and 1000 for testing. This application is qualitatively different from MR image recovery in that the study may not aim to recover the ground-truth image but rather hallucinate faces that are consistent with the unmasked pixels.
For the inpainting experiments, the architecture in the study included the CoModGAN architecture from [41] along with the proposed 1,std,P regularization, but the study did not use MBSD at the discriminator, as in the original CoModGAN.
The study further included training, validation and testing of an example implementation of the present disclosure configured for inpainting. The study used the same general training and testing procedure described previously with respect to the study, but with βadv=5e−3, nbatch=128, and 110 epochs of cGAN training. Also, the study computed FID and LPIPS instead of PSNR and SSIM, since the goal of the inpainting experiment was not to recover the original image but rather generate faces with high perceptual quality and diversity. Running PyTorch on a server with 4 Tesla A100 GPUS, each with 82 GB of memory, the cGAN training took approximately 1.5 days.
The study compared an example implementation of the present disclosure with CoModGAN [41] with truncation parameter ψ=1, as well as the Langevin approach from Song et al. [25]. For CoModGAN, the study used the implementation [40]. For Song et al., the study used the authors' implementation from [27] after training their NCSNv2 model on the 128×128 celebA-HQ dataset using their celeba.yml configuration.
Reconstruction results from the study were recorded.
Implementations of the present disclosure include regularization techniques for cGANs including supervised-1 loss plus an appropriately weighted standard-deviation reward, i.e., 1,P(θ)−βstdstd,P(θ). The study shows that, for an independent Gaussian posterior, with appropriate βstd, minimizing the regularization yields generator samples that agree with the true posterior samples in both mean and covariance or trace-covariance. Implementations of the present disclosure further include methods to autotune βstd, which can be used with practical data.
For example implementations including multicoil MR reconstruction and large-scale image inpainting, the study showed that the example implementations (with appropriate choice of generator and discriminator architecture) can outperform state-of-the-art cGAN and Langevin competitors in CFID as well as accuracy metrics like PSNR and SSIM (for MRI) and perceptual metrics like FID (for inpainting). Compared to Langevin approaches, the method produces samples thousands of times faster.
It should be understood that the study and example implementations described with respect to the study are intended only as non-limiting examples. For example, the example implementations of the present disclosure include a cGAN is trained for a specific A from (1), however, it should be understood that additional types of A matrices can be used, and that the A matrix described herein is only a non-limiting example. In the inpainting and MR applications, for example, the generator could take in both the measurements y and the sampling mask. It should also be understood that the applications of the example implementation described in the study are also intended only as non-limiting examples. Additional non-limiting example applications of implementations of the present disclosure include any imaging application, including computed tomography (CT), superresolution, and/or deblurring.
accelerated MRI. arXiv:1811.08839, 2018. 6,16 [36] Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. In Proc. IEEE Conf. Comp. Vision Pattern Recog., pages 586-595, 2018. 1
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.
This application claims the benefit of U.S. provisional patent application No. 63/426,459, filed on Nov. 18, 2022, and titled “CONDITIONAL GENERATIVE ADVERSARIAL NETWORK (CGAN) FOR POSTERIOR SAMPLING AND RELATED METHODS,” the disclosure of which is expressly incorporated herein by reference in its entirety.
This invention was made with government support under B029957 awarded by the National Institutes of Health. The government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
63426459 | Nov 2022 | US |