1. Field of the Invention
This invention relates generally to the reconstruction of inputs to a linear system from the observed outputs, including for example the reconstruction of an object from images captured by a plenoptic system.
2. Description of the Related Art
Compressive sampling (sensing) is a theory that has emerged recently from the signal processing efforts to reduce the sampling rate of signal acquisition. However, almost all, if not all, prior work is based on the use of special random measurement matrices in conjunction with well-known bases, such as wavelets. These approaches solve a different problem than the reconstruction problem that we address, in which the measurement matrix (i.e., the system response matrix) is known and a customized basis is tailored to both the measurement matrix and the expected class of input signals.
Separately, dictionary learning for sparse signal models has also been a popular topic recently. However, almost all work in dictionary learning assumes that the input signal is directly observed. That is, it assumes that the system response matrix is the identity matrix. These approaches also are not applicable to the reconstruction problem that we address, since in our cases the system response matrix is far from the identity matrix.
Thus, there is a need for reconstruction approaches that can overcome the shortcomings of the prior art.
The present invention overcomes the limitations of the prior art by using machine learning techniques to train a “dictionary” of input signal elements, such that input signals can be linearly decomposed into a few, sparse elements. This prior knowledge on the sparsity of the input signal leads to excellent reconstruction results via maximum-aposteriori estimation. The learning method imposes certain properties on the learned dictionary (specifically, low coherence with the system response), which properties are important for reliable reconstruction.
A system can be characterized by y=Ax+η, where x represents input to the system, A represents a rank-deficient system response matrix, η represents system noise and y represents output of the system. The input x is further represented by x=Φc, where Φ is a “dictionary” for the system and the coefficients c are sparse.
One aspect of the invention concerns selection of the dictionary Φ, for a given system response matrix A and class of input signals x. The dictionary Φ is determined by a machine learning method. An initial dictionary estimate Φ is selected and then improved by using B samples selected from a training set and organized in a matrix X=[x1 x2 . . . xB], referred to as a “sample” X in the following. The improvement is based on an objective function that rewards a low error between the sample X and the sample representation ΦC where C=[c1 c2 . . . cB] is the coefficient representation of X, and that also rewards a low coherence between A and Φ. Low coherence between A and Φ is important for reliable reconstruction.
One specific approach is based on the following. A sample X is selected. In an “inference” step, C is estimated based on a sparse prior and assuming the current dictionary estimate Φ. In other words, given Φ, find C. In a complimentary “adaptive learning” step, Φ is estimated, based on assuming the current estimate of C. That is, given C, find Φ. This is repeated for the next sample X and so on. In one formulation, both of these steps are convex optimization problems, so they can be solved in a computationally efficient manner.
Another aspect of the invention concerns reconstructing x once the dictionary Φ has been selected. The problem is to reconstruct the input x1 that corresponds to an observed output y1. In one approach, the input x1 is a maximum-aposteriori estimate based on y1, A and the determined dictionary estimate Φ.
There are many potential applications for the approaches described above. One class of applications is for plenoptic systems. Plenoptic systems are more advantageous than conventional imaging systems because they can provide additional capabilities and functionalities such as instantaneous multi-spectral imaging, de-focusing and 3D imaging. This is achieved via optical modifications (insertion of a micro-lens array and optional filter module) in conjunction with advanced digital image (or 3D scene) reconstruction and processing algorithms. However, plenoptic systems historically have provided these additional functionalities at the expense of significantly reduced spatial image resolution. It is possible to model the plenoptic system as a linear system, but the system response matrix (referred to as the pupil image function or PIF) is rank deficient. Thus, the dictionary-based reconstruction methods described above can be used advantageously with plenoptic systems to reconstruct higher resolution images even though the system response is rank deficient.
In addition to plenoptic cameras, many other systems deal with the same problem of reconstruction from a limited number of measurements, especially medical imaging systems such as ultrasound tomography or MRI.
Other aspects of the invention include methods, devices, systems and applications related to all of the above.
The invention has other advantages and features which will be more readily apparent from the following detailed description of the invention and the appended claims, when taken in conjunction with the accompanying drawings, in which:
a-d are images illustrating dictionary-based reconstruction for a doll object.
a-d are images illustrating dictionary-based reconstruction for a Lena object.
The figures depict embodiments of the present invention for purposes of illustration only. One skilled in the art will readily recognize from the following discussion that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles of the invention described herein.
The figures and the following description relate to preferred embodiments by way of illustration only. It should be noted that from the following discussion, alternative embodiments of the structures and methods disclosed herein will be readily recognized as viable alternatives that may be employed without departing from the principles of what is claimed.
y=Ax+η, (1)
where x represents the input to the system, A represents a system response matrix, η represents system noise and y represents the output of the system.
The reconstruction problem is to find x, given y and A. In other words, estimate the original input signal, given measurements of the system outputs and given a linear characterization of the system. Since Eqn. 1 is linear, the reconstruction problem is a linear inverse problem. The existence and uniqueness of the solution depends on the properties of the system response matrix A. If the matrix A is invertible and well-conditioned, the reconstruction can be solved by inverting A and applying x=A−1y.
In many cases, however, the matrix A is a rectangular matrix and rank deficient, and the linear system is under-determined. This is the case when the number of rows of A, denoted as M, is smaller than the number of columns N, or when the rank of A is smaller than N even if M>N. In these cases, we can search for the most probable solution for x that approximately satisfies the given linear system. Formally, we are looking for an estimate {circumflex over (x)} such that {circumflex over (x)}=max P(x|y); it is the maximally probable {circumflex over (x)}, given y. That is, we would like to find the maximum-aposteriori (MAP) estimate for x.
MAP estimation can also allow us to incorporate some prior information about the signal or image. That is, the probability of measurements y can be augmented by some additional prior information on the signal x. One prior that has good performance for signal reconstruction is sparsity. The sparsity prior assumes that the input signal x is sparse in a certain dictionary. A dictionary in
N is a set of {φk}k=1L⊂
N of unit norm vectors, i.e., ∥φk∥2=1 ∀k. Elements of the dictionary are called atoms. The dictionary is complete when span{φk}=
N, i.e., when any vector in
N can be represented as a linear combinations of atoms. When span{φk}=
N and atoms are linearly dependent, the dictionary is overcomplete. A matrix whose columns are atoms is called the dictionary matrix Φ. For simplicity, we will refer to the matrix Φ as the dictionary. Given, these definitions, the sparsity prior assumes that the input signal x can be expressed as
x=Φc, (2)
where the vector of coefficients c has a small number of non-zero entries. That is, c is sparse. If the signal dimension of x is N×1 (or an image of size √{square root over (N)}×√{square root over (N)} in a vectorized form), and the dictionary Φ size is N×L, the vector of coefficients c has size L×1 with K non-zero elements, where K<<N. L can be equal to N (complete basis set), but it is usually assumed that L>N (overcomplete basis set).
In
When certain conditions on A and Φ are met, it has been shown that sparse c (and hence x) can be reconstructed using convex optimization from a small number of measurements. This number can be well below the Nyquist limit and it depends on the sparsity of c. Generally, the number of linearly independent measurements (i.e., the rank of A) depends on the sparsity of c. If number of measurements is small, then c should be sparser. If the coefficients are not exactly sparse (not exactly zeros) then the reconstructed signal may encounter an approximation error. The algorithm can also work if the c vector is not sparse, if the rank of matrix A is sufficiently large and the noise is not too large. One of the more difficult conditions to meet is that A and Φ should be mutually incoherent. That is, the coherence between them must be small to be assured of good reconstruction. The coherence between A and Φ is defined as:
where ai is the i-th row of A, φf is the j-th column of Φ and •
denotes the inner product. If Φ is selected a priori without knowledge of A, there is no guarantee of incoherence. One can ignore the condition and still perform signal estimation, hoping for the best, but the reconstruction can then fail in some cases. The coherence μ(A, Φ) ranges between 0 and 1. For mutually orthogonal A and Φ, the coherence μ=0. Usually, for normalized A and Φ, the coherence μ(A, Φ) preferably is below 0.1. In the examples below, the μ is approximately 0.03.
Referring back to
The coefficients c can be estimated from y as follows. Begin with the simplified case where A=I. In this case, the measured outputs are just noisy versions of the input signals:
y=x+η=Φ
c+η. (4)
If we are given a dictionary Φ and a vector of measurements y, the task is to estimate a sparse vector of coefficients ĉ (and hence also {circumflex over (x)}). A MAP estimate in this case is given by:
where the second statement comes from Bayes rule. If we assume that the sensor noise η has Gaussian distribution (0, σ), then:
where ∥Φ∥2 denotes the l2 norm or Euclidean norm and σ2 is the noise variance.
The assumption that the vector of coefficients is sparse is introduced by modeling the prior distribution on c as a Laplace distribution (0, α) that is tightly peaked at zero:
where ∥•∥1 denotes the l1 vector norm and α is the scale parameter. By substituting Eqns. 6 and 7 into Eqn. 5, the MAP estimation problem now becomes:
or equivalently:
where λ=2σ2/α. This optimization problem is convex and can be solved efficiently using interior point or gradient methods. Moreover, there are guaranteed error bounds for the coefficient estimation, which depend on the coherence of the matrix Φ with itself.
Eqns 4-9 are just one example. Other formulations can be derived for other cases. For example, the noise does not have to be Gaussian. It can be for example Laplacian or Poisson. For the sparse coefficients, their distribution can also be different, but it preferably is kurtotic (peaked at zero). One example is the Cauchy distribution, which also leads to a convex objective function. Thus, Eqns. 8 and 9 can take forms different from those shown above.
In the case where A≠I, then the MAP estimation can be posed as:
and solved in a similar manner as described above. However, the guarantees for the recovery of the correct c (with a small error) are more complicated. Namely, to find a good estimate for c, matrices A and Φ should have low mutual coherence. This condition influences the dictionary learning process because we need to find a dictionary Φ that not only describes the data well but also has low coherence with the system response matrix A.
In the specific example of
In more detail, the inference step 224 can be implemented as
Here, Y is a matrix whose columns are the outputs corresponding to the samples X. If the training set has Q examples, then the second dimension of Y and C would be Q if all of the examples were trained at once. However, since the amount of training data typically is large, at each iteration, we can use a different subset of B randomly chosen examples. Thus, the sizes of Y and C at each iteration are M×B and L×B, respectively. The inference step 224 finds the most probable solution for C under a sparse prior, given the set of outputs Y, the system response matrix A and the current dictionary estimate Φ. The objective function of Eqn. 11 is convex and it can be optimized using gradient methods. The derivative of the objective is simple:
The sign function at zero is defined to be equal to zero.
The learning step 226 can be implemented as
X is an N×B matrix which columns are examples from the training set and δ is a trade-off parameter.
The first term in this objective function measures the error between the samples X and their representation ΦĈ. It differs from the term in Eqn. 12 because it evaluates the error of the representation of X using Φ. The intuition behind this is that we want to learn a dictionary which does a good job of approximating the examples in the training set using coefficients obtained in the inference step.
The second term in the objective function is the penalty on the coherence between A and Φ. If we take a closer look at the definition of coherence in Eqn. 3, we can see that it is equal to μ=∥AΦ∥∞, i.e. to the infinity norm of AΦ. Since the infinity norm is not differentiable everywhere, we approximate it in this implementation with the Frobenius norm ∥AΦ∥F2, i.e., the l2 matrix norm. This norm is convex and differentiable everywhere, with a simple derivative that is fast to calculate. Alternatively, we can use a norm >2 that would better approximate the infinity norm, but this would increase the computation complexity. Thus, the Frobenius norm represents a good trade-off between performance and complexity.
The objective function in Eqn. 13 is convex and can be minimized using gradient methods. It has a simple derivative over Φ:
Since it is based solely on array operations, the derivative calculation is highly parallelizable and convenient for a GPU implementation.
Pseudo-code for this example implementation is shown below.
N×L(−0.5, 0.5)
B×1 (0, Q)
M×N (0, σ)
Once we have learned the dictionary Φ that is incoherent with the system response matrix A, we can use it to reconstruct the input signal x from the corresponding output measurements y, for example using the approach described previously.
The approach described above can be used with many different systems and applications. A particular example of a plenoptic imaging system will be described below. In a plenoptic system, light traversing the system is influenced in ways that are more complex than in conventional imaging. As a result, the system response matrix also becomes more complicated compared to a simple convolution with a point spread function, as might be the case for a conventional imaging system. In the following example, the input signal x represents the original object viewed by the plenoptic system, the output signal y is the plenoptic image captured by the system, A represents the system response matrix and η represents system noise. The reconstruction problem is to find x, given y and A.
The spatial coordinates (ξ,η) will be used at the object plane (the input to the system) and (t, w) at the sensor plane (the output of the system). In
Ignoring the filter module 325 for the moment, in imaging subsystem 1, the object 350 is imaged by the primary lens 310 to produce an image that will be referred to as the “primary image.” This primary lens 310 may be a camera imaging lens, microscope objective lens or any other such imaging system. The lenslet array 320 is placed approximately at the location of the primary image. Each lenslet then images the pupil of the primary lens to the sensor plane. This is imaging subsystem 2, which partially overlaps with imaging subsystem 1. The image created at the sensor array 330 will be referred to as the “plenoptic image” in order to avoid confusion with the “primary image.” The plenoptic image can be divided into an array of subimages, corresponding to each of the lenslets. Note, however, that the subimages are images of the pupil of imaging subsystem 1, and not of the object 350. In
The plenoptic system can be modeled as a linear system:
where IfT,W is the intensity at sensor element T, W; Io,M,N is the intensity from object element M,N; and PIFM,NT,W is the plenoptic imaging system response, which we refer to as the pupil image function or PIF. PIFM,NT,W is the intensity at sensor element T, W produced by object element M,N. T,W is the discretized version of sensor coordinates (t,w) and M,N is the discretized version of object coordinates (ξ,η).
Eqn. 15A can be written more compactly as
[Ivf]=[PIFv][Ivo] (15B)
where the additional “v” indicates that these are based on vectorized images. That is, the two-dimensional plenoptic image If is represented as a one-dimensional vector Ivf. Similarly, the two-dimensional object Io is represented as a one-dimensional vector Ivo. Correspondingly, the four-dimensional PIF is represented as a two-dimensional system response matrix PIFv. After a noise term is added, Eqn. 15B takes the same form as Eqn. 1. Iv, is the input vector x, Ivf is the output vector y, and PIFy is the system response matrix A. The PIF can be calculated and approximated in different ways, for example using geometrical optics or based on wave propagation. For examples, see the Appendices in U.S. patent application Ser. No. 13/398,815, “Spatial Reconstruction of Plenoptic Images,” which is incorporated by reference herein.
In the following example, the PIF was generated using a wave-propagation analysis, for the case when the object is in focus at the microlens array plane. The training set used for dictionary learning was a set of video frames from a BBC documentary on wildlife, picturing a group of zebras running in the wild. Some of the video frames are shown in
The various vectors and matrices have the following sizes. x is N×1, X is N×B, Φ is N×L, c is L×1,C is L×B, y is M×1, Y is M×B, and A is M×N. N is the total number of samples in a digitized version of the object. If considering a one-dimensional object signal and the passband of the system is band-limited with highest frequency f2, e.g f2=20, the discretized version of the object sampled at Nyquist rate has 2×20=40 samples in one dimension. For a two-dimensional object, the number of samples at Nyquist rate is N=40×40=1600. M is a sensor sampling of the plenoptic image of this object. M could be as low as N (maintaining the same original Nyquist sampling). Alternatively, rank of A can be smaller even though M is larger (e.g., because of the dark pixels between lenslets and the pixels imaging the same object points). In the following example, we have chosen M=52×52=2704, where the original sampling of 40+a boundary extension of 6 pixels on either side of a super pixel=52 pixels in one dimension at the sensor. We have also chosen L=1600. In each iteration of our learning algorithm we have selected a batch of B=4L=6400 frame blocks of size 40×40. Note that the rank of A is ˜700. That is, rank of A<N/2. Therefore even though M>N, we still have an underdetermined system (number of linearly independent equations is twice smaller than the number of unknowns). Each block was reshaped into a vector and represents a column of X. We have then simulated the plenoptic imaging process using the frame blocks as objects in front of the plenoptic system, placed at the focal plane. Note that this placement does not reduce the generality of the method, since the PIF matrix can be defined for any plane or for the whole volume. Therefore, the simulated data at the sensor plane was:
Y=PIF•X+η (16)
The noise η is white Gaussian noise of SNR=80 dB.
The dictionary Φ was developed using the iterative approach described above. The dictionary estimate Φ was initialized randomly and stopped after approximately 300 iterations, when there was no further significant improvement of the reconstruction quality.
a-d and 6a-d are images illustrating dictionary-based reconstruction for a doll object and for a Lena object, respectively. The doll and Lena images were taken as objects in front of the plenoptic system. For each 40×40 block of the original object, we have simulated the sensor data as in Eqn. 16 with added white Gaussian noise of SNR=80 dB. Using the inference step given by Eqn. 11, we have estimated the sparse coefficient vectors Ĉ and the reconstructed blocks are then obtained as {circumflex over (X)}=ΦĈ.
c and 6c show reconstructed objects using the dictionary-based approach described above. For comparison,
This is just one example application. Other applications will be apparent. The example above did not make use of a filter module 325. One application for plenoptic systems is spectral filtering. The filter module 325 can include different wavelengths. This system can be modeled by
[Ivf]=[[PIFv1][PIFv2] . . . [PIFvK]][[Ivo1]T[Ivo2]T . . . [IvoK]T]T (17)
Here, the plenoptic image Ivf has the same dimensions as before. However, the object Ivo is made up of K different components Ivok. For example, Ivo1 might be the component of the object at wavelength band 1, Ivo2 might be the component of the object at wavelength band 2, and so on. The corresponding “component PIFs” could include PIFv1, PIFv2 and so on, where the component PIFv1 represents the contribution of the component Ivo1 to the captured plenoptic image, component PIFv2 represents the contribution of the component Ivo2 to the captured plenoptic image, etc. The basic equation [Ivf]=[PIFv] [Iv o] still holds, except that now the matrix [PIFv]=[[PIFv1] . . . [PIFvK]] and the vector [Ivo]=[[Ivo1]T [Ivo2]T . . . [IvoK]T]T. This makes the system response matrix A even more rank deficient than before, making it even harder to reconstruct. The components could also be based on other characteristics: polarization, attenuation, object illumination or depth, for example.
Another example application is ultrasound tomography. In this example, the input x is an object to be imaged by the system, and the output y is ultrasound samples taken by the ultrasound tomography system. Yet another example application is MRI imaging. The input x is an object to be imaged by the system, and the output y is MRI samples taken by the MRI imaging system. Yet another example is ground penetrating radar tomography.
Although the detailed description contains many specifics, these should not be construed as limiting the scope of the invention but merely as illustrating different examples and aspects of the invention. It should be appreciated that the scope of the invention includes other embodiments not discussed in detail above. Various other modifications, changes and variations which will be apparent to those skilled in the art may be made in the arrangement, operation and details of the method and apparatus of the present invention disclosed herein without departing from the spirit and scope of the invention as defined in the appended claims. Therefore, the scope of the invention should be determined by the appended claims and their legal equivalents.
In alternate embodiments, the invention is implemented in computer hardware, firmware, software, and/or combinations thereof. Apparatus of the invention can be implemented in a computer program product tangibly embodied in a machine-readable storage device for execution by a programmable processor; and method steps of the invention can be performed by a programmable processor executing a program of instructions to perform functions of the invention by operating on input data and generating output. The invention can be implemented advantageously in one or more computer programs that are executable on a programmable system including at least one programmable processor coupled to receive data and instructions from, and to transmit data and instructions to, a data storage system, at least one input device, and at least one output device. Each computer program can be implemented in a high-level procedural or object-oriented programming language, or in assembly or machine language if desired; and in any case, the language can be a compiled or interpreted language. Suitable processors include, by way of example, both general and special purpose microprocessors. Generally, a processor will receive instructions and data from a read-only memory and/or a random access memory. Generally, a computer will include one or more mass storage devices for storing data files; such devices include magnetic disks, such as internal hard disks and removable disks; magneto-optical disks; and optical disks. Storage devices suitable for tangibly embodying computer program instructions and data include all forms of non-volatile memory, including by way of example semiconductor memory devices, such as EPROM, EEPROM, and flash memory devices; magnetic disks such as internal hard disks and removable disks; magneto-optical disks; and CD-ROM disks. Any of the foregoing can be supplemented by, or incorporated in, ASICs (application-specific integrated circuits) and other forms of hardware.
The term “module” is not meant to be limited to a specific physical form. Depending on the specific application, modules can be implemented as hardware, firmware, software, and/or combinations of these.