This disclosure describes a method for combining multiple low-resolution images or signals into a single high-resolution image or signal. More particularly, this disclosure relates to a Bayesian technique for forming a high-resolution image or signal from a plurality of low-resolution images or signals.
There are a number of prior techniques for deriving a high-resolution image from a plurality of low-resolution images.
In one technique, each low resolution image is aligned to a reference image using an alignment algorithm such as Lucas-Kanade [1-5]. The aligned images are then combined using stacking (robust sum), Bayesian inference, or learned statistics. There are two primary problems with this approach. (1) It attempts to achieve sub-pixel alignment accuracy in aligning the low-resolution images using only the low-resolution image. (2) This approach is not model-based, so it cannot accommodate barrel/pincushion distortion, diffraction or other effects.
In another technique, both the super-resolved image and the alignment parameters are constructed through optimization of the likelihood of the measured data (y) given the alignment parameters (A) and hypothesized super-resolved image (x) That is, the algorithm maximizes P(y|A, x). Some of these algorithms can optionally use a prior on the alignment parameters or the hypothesis (maximizing either P(y, x|A) or P(y, x, A). However is difficult (and frequently unstable) to simultaneously align and resolve the images.
An advantage of the model based approaches (optimization, Tipping-Bishop [6] and our own) is that the formulation is very general. For example, the set of “alignment parameters” (A) may capture any number of transformation parameters for example, degree of pin-cushion distortion, degree of barrel distortion, shift, rotation, degree of blurring kernels including Gaussian or other diffraction kernels,
US Patent Application Publication US2004/0170340 A1 of Tipping and Bishop refers to a Bayesian technique for computing a high resolution image from multiple low resolution images. The algorithm in the Tipping-Bishop application marginalizes the super-resolved image out of P(y, x|A) allowing one to directly optimize the likelihood for the alignment parameters followed by a super-resolution step. That is, the algorithm allows direct computation of P(y|A), allowing an optimization algorithm to directly optimize the alignment parameters. In the Tipping-Bishop application, these alignment parameters included shift, rotation and width of the point spread function (A=<s, θ, γ>) for the optical system (degree of blur). The problem with the approach of the Tipping-Bishop application is that it is mathematically incorrect. In the derivation of the approach they made a major algebra or formulation mistake with the result that the resulting alignment likelihood P(y|A) is incorrect. In practice, the algorithm frequently diverges when optimizing some imaging parameters, particularly the point spread function.
We have derived a corrected likelihood function that is more accurate, has a significantly simpler functional form and works extremely well, displaying none of the instability exhibited by the Tipping-Bishop approach. Multiple low-resolution images or signals are accurately aligned to sub-pixel resolution and stacked to form a single high-resolution image or signal. For example, ten-low resolution images are used to form a single high-resolution image with 4× to 9× resolution, that is, 4 to 9 times the pixels or 2 to 3 times the linear resolution of the low resolution images.
The approach is statistical in that the high-resolution image is obtained through an inference procedure. Inference is based on exploiting the knowledge derived from the low-resolution imagery and the models of the subsampling process (i.e. point-spread-function) and registration of the imagery. We present the details of the procedure and a correct derivation in the next section. The optimization is performed in one or more subsets or portions of the low resolution imagery due to the significant computational requirements of the procedure.
The processor 24 contains a block 28 which is responsive to the low resolution images 20a, 20b, and 20c to select one or more small characteristic regions or portions of the low resolution images 20a, 20b, and 20c for processing in accordance with this example of the invention. These selected regions constitute areas containing a great deal of detail or high frequency content in the low resolution images 20a, 20b, and 20c. Once these regions have been identified and selected, block 30 in the processor 24 optimizes both the point spread function and registration parameters for the selected regions. Block 32 then registers and deconvolves the full size image using the parameters from block 30 and generates the high resolution image 26.
After the regions of interest have been identified in block 36, block 38 estimates coarse registration parameters which in this application of the method can include translational and rotational orientation of the low resolution images with respect to a predetermined frame of reference. These translations and rotations are produced by slight motion of the camera 18 occurring in the time between capture of each successive low resolution image 20a, 20b, and 20c by the camera 18.
The registration parameters and the point spread function (PSF) parameters are optimized in block 40 using a marginal likelihood function specified by block 42. The details of the marginal likelihood function are described below. Block 40 produces a best estimate of the registration and PSF parameters for the selected regions of interest. These parameters are then used in block 44 to compute the mean of the posterior distribution for the full image which thereby defines the high resolution image.
To implement the Bayesian super-resolution approach in accordance with this invention, we must fuse K ‘low-resolution’ images each containing M-pixels in order to assemble a single super-resolved image with N pixels where N>M. Here, M is the product of the height and width of the k low-resolution image in pixels. N is the product of the height and width of the super resolved image. This patent application describes the alignment and super-resolution of rectangular images. In fact, the algorithm may be used for signals of arbitrary geometry, including 2D signals with non-rectangular sampling grids (such as a radar signal), 2D super-resolved images wrapped over a 3D surface, or recovery of a high-resolution 1D signal (such as audio) from multiple sources (microphones). Furthermore, notice that M can be constant across all samples (as when images are collected from a single camera) or different (i.e. M(k)) as when images are collected from different cameras and fused using the proposed method.
We must derive a model of the generation of the images (i.e. low-res samples) in order to obtain an observation model. In particular, we know that the camera's optics and sampling process cause a simultaneous sub-sampling and blurring of the scene. Hence, the observation model can be captured as:
y(k)=W(k)x+ε
εj=N(0,β−1)
Here, y(k) is the kth low-res image, x is the full-resolution scene, W(k) is a transform that captures the sub-sampling and blurring of the scene for the kth low-res image (i.e. sampling filter), and ε represents Gaussian noise in the process. Specifically, W(k) captures the focus, point spread function, and spatial transformation between the high resolution image and the kth low resolution image. This relationship between a low resolution image y and the high resolution image x is shown by the model of image generation 46 in
As W(k) captures the map between the high-resolution image and each of the K low-resolution samples, its dimensions are M×N. Given a Gaussian model for the imaging process, the W(k) values must be normalized to the [0,1] range to conserve energy:
Here, ujk are the hypothesized centers for the sampling array, given the alignment parameters. The vi are the centers for the super-resolved image or signal. Both vi ujk are expressed in the same global coordinate system. In this example, the point spread function is a Gaussian blur kernel
with a point spread variance of y. In practice, we can use any linear transformations, for example, we can use the Biot-Savart Law to determine a super-resolved image of a current sheet given magnetic field measurements.
and sk is the translation.
Each of the vi is the Cartesian coordinates in the super-resolved image space of the center of the grid cell i (i.e. the center of pixel i). Each vj is the Cartesian coordinates of the center of super-resolved grid cell j for each low-resolution image. Each uj(k) is the location of each vj on the super-resolved image after rotation and shift. In practice, we can use other geometric transformations to determine uj(k), including general affine transformations, perspective transformations, etc.
Finally, the prior is defined by a covariance matrix Zx, which is N×N.
Given the shift sk and rotation θk for each image and the γ for the PSF, we can compute the marginal likelihood:
μ is the mean of the super-resolved image pixels and Σ is their covariance.
Let P=Σ−1. In order to find the registration and deblurring parameters, we perform an optimization procedure a subset f (sk,θk,γ) of the marginal likelihood function (see appendix) to obtain:
The optimization only needs to be performed on one or more small regions of the image. For example, a 9×9 patch at the center of the image or spaced from the center of the image may be optimized. Once the optimization is performed, one can compute the full image μ using (1) and (2) above. Operation of the processor 24 in implementing Equation (2) to derive the high resolution image is illustrated by high resolution image computation block 48 in
The processor 24 generates the improved marginal likelihood function ƒ of this invention as shown in
The processor 24 includes registers or storage elements that hold system constants or inputs from previous stages that are used in the maximum likelihood calculation. Those storage elements include a storage element 50 containing the coordinates v, for example, the Cartesian coordinates, of the N pixels making up the high resolution image. They also include a storage element 52 containing the standard deviation r of the covariance matrix Z and a storage element 54 containing the variance β of the modeled noise for the imaging process.
The processor 24 also contains storage elements containing current values of optimization parameters. Storage element 56 contains two dimensional shift parameters sk for each of the K low resolution images being analyzed. Storage element 58 contains rotation parameters θk of the K low resolution images being analyzed. Storage element 60 contains the standard deviation γ for the distribution model of the point spread function (PSF). The processor 24 receives a series of inputs from the camera 18 composed of k low resolution images each of which is comprised of M pixels. This input is stored in storage element 62 in
Transpose operation 78 is responsive to the contents of storage element 76 to produce the transpose of each of the K matrices in storage element 76. Matrix product operation 80 multiplies the transpose of the W(k) matrices from operation 78 by the low resolution image information from storage element 62. The result of the operation 80 is input to a vector sum operation 82 in
A multiplier 88 multiplies the summation produced by operation 86 by the content of storage element 54 to produce one of the bracketed terms in equation (1).The other term is produced as follows. A vector subtract operation 90 in
A matrix sum operation 98 in
The other term on the right side of equation (3) is generated as follows. An inverse operation 106 in
The outputs of operation 104 and operation 118 in
A sample result of super-resolving the image utilizing Tipping and Bishop's algorithm and utilizing this invention is shown in
The original images (10 frames) were captured using a handheld digital camera. Each frame was captured so that it underwent a small rotation either as a pan and/or roll of the camera. The resulting image sequence was used as the inputs to the algorithm described in the previous section. In particular, the color images were converted to grayscale and analyzed for information content. In the next step, three regions-of-interest were identified and used as inputs to the optimization step of the algorithm. Once the registration and PSF parameters are identified the entire set of frames are passed to the full-frame enhancement stage to produce the super-resolved image.
We obtained LWIR imagery from Rockwell Collins recorded during a flight in December 2004. In this platform, a thermal infrared camera was mounted in a dichroic-beam splitter configuration to simultaneously collect registered visible imagery.
To supplement the data made available to us by Rockwell Collins in Example 2, we collected indoor imagery utilizing our own FLIR Systems A20 camera. This camera is capable of imaging long-wave infrared at 320×240 resolution.
The example shown in
The second comparison shown in
The marginal likelihood can be derived from the joint on x (the hi-res image) and y (the low-res images) as follows (to simplify the exposition, all of the images y(1) , . . ., y(k) are represented using a single signal y that is the concatenation of the component images:
First, define the probability on x and the probability of y given x:
Now, the joint probability can be defined as the product of the two probabilities defined above:
A number of simplifications can be performed:
Notice that Σ−1 ≡Z−1 +βWTW
Thus, by completing the square:
2 μT Σ−1 x =2 βyT Wx
Σ−1 μ=βWT y
μ=βΣWTy
We can substitute into our joint:
Then simplifying and collecting terms:
Now, we apply the marginalization of x (i.e. sum over the terms dependent on x):
This provides the simplified definition of the probability on y. Now, we obtain the marginal likelihood function by taking the log on both sides:
Allowing for K low resolution images, we obtain marginal likelihood function for our problem:
Because the marginal likelihood will be used as the optimization function, we can eliminate from it any terms that will remain constant as a function of the alignment parameters (Sk, θk y) that we seek to optimize. In particular the first, second, third, and sixth terms in the above equation are constant under changes to these parameters are thus eliminated from our target equation:
Since precision is the inverse ot distribution (P=Σ−1 ), we can avoid performing the computation of the inverse by substituting P:
the result is a function that can readily be optimized by performing a minimization technique on the parameters of interest:
The Title, Technical Field, Background, Summary, Brief Description of the Drawings, Detailed Description, References, Appendices, and Abstract are meant to illustrate the preferred embodiments of the invention and are not in any way intended to limit the scope of the invention. The scope of the invention is solely defined and limited by the claims set forth below.
Number | Name | Date | Kind |
---|---|---|---|
20040170340 | Tipping et al. | Sep 2004 | A1 |
Number | Date | Country | |
---|---|---|---|
20070130095 A1 | Jun 2007 | US |