The present invention relates to vision systems. More specifically, the present invention relates to a scene-based non-uniformity correction method employing super-resolution for eliminating fixed pattern noise in video sequences produced by solid state imagers, such as focal-plane arrays (FPA), in digital video cameras.
Focal plane array, (FPA) sensors are widely used in visible-light and infrared imaging systems. More particularly, FPA's have been widely used in military applications, environmental monitoring, scientific instrumentation, and medical imaging applications due to their sensitivity and low cost. Most recently research has focused on embedding powerful image/signal processing capabilities into FPA sensors. An FPA sensor comprises a two-dimensional array of photodetectors placed in the focal plane of an imaging lens. Individual detectors within the array may perform well, but the overall performance of the array is strongly affected by the lack of uniformity in the responses of all the detectors taken together. The non-uniformity of the responses of the overall array is especially severe for infrared FPA's.
From a signal processing perspective, this non-uniformity problem can be restated as how to automatically remove fixed-pattern noise at each pixel location. The FPA sensors are modeled as having fixed (or static) pattern noise superimposed on a true (i.e., noise free) image. The fixed pattern noise is attributed to spatial non-uniformity in the photo-response (i.e., the conversion of photons to electrons) of individual detectors in an array of pixels which constitute the FPA. The response is generally characterized by a linear model:
zt(x,y)=gt(x,y)·st(x,y)+bt(x,y)+N(x,y), (1)
where N(x,y) is the random noise, zt(x,y) is the observed scene value for a pixel at position (x,y) in an array of pixels (image) that are modeled as being arranged in a rectangular coordinate grid (x,y) at time t, st(x,y) is the true scene value (e.g., irradiance collected by the detector) at time t, gt(x,y) is the gain of a pixel at position (x,y) and time t, and bt(x,y) is the offset of a pixel at position (x,y) at time t. gt(x,y) can also refer to as a gain image associated with noise affecting the array of pixels, and b(x,y,) the offset image of pixels associated with noise. Generally speaking, gain and offset are both a function of time, as they drift slowly along (with temperature change. One key assumption of this model is that gt(x,y) and bt(x,y) change slowly, i.e., they are constant during the period used for algorithms to recover st(x,y). As a result, the time index for these parameters are dropped hereinafter. The task of non-uniformity correction (NUC) algorithms is to obtain st(x,y) via estimating the parameters g(x,y) and b(x,y) from observed zt(x,y).
Prior art non-uniformity correction (NUC) algorithms can be grouped into two main categories: 1) calibration methods that rely on calibrating an FPA with distinct sources, e.g., distinct temperature sources in long wave infrared (LWIR), and 2) scene-based methods that require no calibration. Prior art calibration methods include two-point and one-point non-uniformity correction (NUC) techniques. Two-point NUC solves for the unknowns g(x,y) and b(x,y) for all the (x,y) pixels in Equation 1 by processing two images taken of two distinct sources e.g., two uniform heat sources in an infrared imaging system (i.e., a “hot” source and a “cold” source), or a “light” image and a “dark” image in an optical imaging system. Since two distinct sources are hard to maintain, camera manufacturers use one source to counteract offset drift in real time application, which is often referred to one-point NUC. In a one-point NUC, gain information is stored in a lookup table as a function of temperature, which can be loaded upon update. Given the gain, Equation 1 is solved to obtain the offset b(x,y). Both calibration processes need to interrupt (reset) real time video operations, i.e., a calibration needs to be performed every few minutes to counteract the slow drift of the noise over time and ambient temperature. This is inappropriate for applications such as visual systems used on a battlefield or for video surveillance.
Scene-based NUC techniques have been developed to continuously correct FPA non-uniformity without the need to interrupt the video sequence in real time (reset). These techniques include statistical methods and the registration methods. In certain statistical methods, it is assumed that all possible values of the true-scene pixel are seen at each pixel location, i.e., if a sequence of video images are examined, each pixel is assumed to have experienced a full range of values, say 20 to 220 out of a range of 0 to 255. In general, statistical methods are not computationally expensive, and are easy to implement. But statistical methods generally require many frames and tie camera needs to move in such way as to satisfy the statistical assumption.
Though relatively new, registration-based methods have some desirable features over statistical methods. Registration methods assume that when images are aligned to each other, then aligned images have the same true-scene pixel at a given pixel location. Even if a scene is moving, when a pixel is aligned in all of the images, it will have the same value. Compared to statistical methods, registration methods are much more efficient, requiring fewer frames to recover the original images. However, prior art registration methods which rely on the above assumption can break down when handle significant fix-pattern noise, particularly unstructured fixed pattern noise. The assumption of the same true-scene pixel in the aligned image can also break down when the true signal response is affected by lighting change, automatic gain control (AGC) of the camera, and random noise. Existing methods either assume identical Gaussian fixed-pattern noise or structured pattern noise with known structure.
Moreover, prior art registration methods are reliable for computing restricted types of motion fields, for example, global shift motion (translation). It is desirable for a NUC method to handle parametric motion fields, in particular, affine motion fields, where the images taken by a camera are subjected to translation, rotation, scaling, and shearing. It would also be desirable for a NUC method to enhance the true scene, such as combining several images into a higher resolution images), i.e., a super-resolution image.
Accordingly, what would be desirable, but has not yet been provided, is a NUC method for eliminating fixed pattern noise in imaging systems that can recover clean images as quickly as prior art registration-based methods, can handle unknown structured or non-structured fixed-pattern noise, can work under affine motion shifts, and can improve the quality of recovered images.
Disclosed is a method and system describing a scene-based non-uniformity correction method using super-resolution for eliminating fixed pattern noise in a video having a plurality of input images, comprising the steps of warping each of the plurality of images with respect to a reference image to obtain a warped set of images; performing one of averaging and deblurring on the warped set of images to obtain an initial estimate of a reference true scene frame; warping the initial estimate of the reference true scene frame with respect to each of the plurality of images to obtain a set of estimated true signal images; performing a least square fit algorithm to estimate a gain image and an offset image given the set of estimated true signal images; applying the estimated gain image and estimated offset image to the plurality, of images to obtain a clean set of images; and applying a super-resolution algorithm to the clean set of images to obtain a higher resolution version of the reference true scene frame. The method can further comprise the step of obtaining a new set of estimated true signal images based on the higher resolution version of the reference true scene frame; and repeating the least square fitting step, the obtaining clean set of images step, and the applying a super-resolution algorithm step a predetermined number of times to obtain more accurate versions of the estimated gain image, estimated offset image, and higher resolution version of the reference true scene frame.
The applying a super-resolution algorithm step can further comprises the step of summing a previous higher resolution version of the reference true scene frame with a value that is based on a sum over all images in the plurality of images of a difference between the clean set of images and a previous clean set of images when there exists a previous higher resolution version of the reference true scene frame; otherwise, setting the higher resolution version of the reference true scene frame to an estimated clean reference image after applying the estimated gain image and the estimated offset image to initial estimate of the reference true scene frame, the estimated clean reference image being upsampled and convoluted with a back-projection kernel. Before performing the step of warping each of the plurality of images with respect to a reference image, the method can further comprise the step of providing an initial gain image, and an initial offset image derived from a statistical non-uniformity correction algorithm; and applying the initial gain image and initial offset image to the plurality of images to obtain a second clean set of images corresponding to the plurality of images. The method outlined above can be repeated for another plurality of images different from the plurality of images taken from the video, wherein the more accurate versions of the estimated gain image is substituted for the initial gain image and the initial offset image.
The following embodiments are intended as exemplary, and not limiting. In keeping with common practice, figures are not necessarily drawn to scale.
The present invention integrates super-resolution and a registration-based NUC in order to better handle structured fixed-pattern noise than prior art registration-based NUC methods and to recover a higher-resolution version of a plurality of true scene images St(x,y) from st(x,y). St(x,y) and st(x,y) are related by
t={St·h}↓s, (2)
where “·h” denotes convolution by a blur kernel h, and ↓ s denotes a down-sampling operation by a factor s (s≧1). Substituting Eq. 2 into Eq. 1, a comprehensive imaging model that relates St and zt is as follows:
zt(x,y)=g(x,y){St(x,y)·h}↓s+b(x,y)+N(x,y) (3)
Image st is referred to hereinafter as the true scene frame and image zt as the observed frame.
Referring now to
where m(x,y) is the temporal mean at (x,y) and σ(x,y) is the temporal standard deviation at (x,y). T is the number of frames, and constant N is the number of pixels.
At step 12, using. Eq. 1, and given the estimated gain g(x,y) and offset b(x,y) and a plurality of observed images zt(x,y) for a video sequence, a set of estimated “clean” true signal images f0, f1, . . . , fm/2, . . . fm−1 are obtained by inserting the observed images zt(x,y) and the gain g and offset b found in Eq. 4 and 5 into Equation 1 as follows:
where fm/2 is the median estimated true scene image. At step 14, each of the frames fi are registered using an image registration method, such as the hierarchical registration method detailed in Bergen, J., Anadan, P., Hanna, K., and Hingorani, R. 1992. “Hierarchical Molde-Based Based Motion Estimation,” Proc. European Conf. Comp. Vision, pp. 237-252, which is incorporated herein by reference in its entirety. The initial “boot-strap” rough estimate of gain g(x,y) and offset b(x,y) obtained from a non-uniformity correction algorithm is needed so that, after “cleaning” the images zt(x,y) in step 12 above the “cleaned” images are clean enough to allow for accurate registration. In a preferred embodiment, fm/2 is designated as a reference frame, from which further calculations are derived. However, any of the frames fn, f1, . . ., fm/2, . . . fm−1, can be selected as the reference frame. At step 16, each of the non-reference images are warped with respect to the reference image fm/2. At step 18, this warped set of images are either averaged or deblurred to obtain an image {tilde over (S)}r which can be used as an initial estimate of the reference true scene frame sr.
If the coordinate system of the reference frame is chosen as the reference coordinate system, then the reference true scene frame sr and other true scene frames St can be related as follows
sr(x,y)=st(x+Δtx,y+Δty) (6)
where (Δtx, Δty) are pixel-wise motion vectors. These motion vector can represent arbitrary and parametric motion types that are different from the restricted motion types assumed in registration-based NUC methods. Equation 6 can be replaced with a concise notation based on forward image warping as follows
st=srW
where Wt is the warping vector (−Δtxt−Δty).
The task of the method of the present invention is to recover the high-resolution image St given observed frames zt. Based on the imaging model (Eq. 3), the remainder of the method is concerned with obtaining an optimal solution to a least square fitting problem:
where m is the number of frames to be examined and st(St) is defined as
st(Sr)={tilde over (s)}t={(Sr)F
where Ft is the high resolution version of warping Wi, ● is convolution, blur kernel h mentioned above, which is determined by the point spread function (PSF) of an FPA sensor type of a given manufacturer. If manufacturer information is not available, then h is assumed to be a Gaussian filter, which is defined in M. Irani and S. Peleg, “Motion Analysis for Image Enhancement: Resolution, Occlusion, and Transparency,” Journal of Visual Comm. and Image Repre., Vol. 4, pp. 324-335, 1.993 (hereinafter “Irani et al.”). The size or number of taps of the filter used depends empirically on the severity of the fixed pattern noise being eliminated. In a preferred embodiment, the following default 5-tap filter can be used:
Rather than solving for the three unknowns, St, g, b, at once, the unknowns are found by an iterative method given the initial estimate {tilde over (s)}r of sr. Before applying the iterative method, the initial estimate of {tilde over (s)}r of sr needs to be, at step 20, warped with respect to each individual image in the initial set of images to obtain the set of estimated true signal images {tilde over (s)}t.
Given {tilde over (s)}t, the iterative method is as follows: At step 22, the least square fitting problem is solved to obtain an estimated gain g and offset b given the set of estimated true signal images f0, f1, . . . , fm/2, . . . fm−1, and the previous estimate of st, which is {tilde over (s)}t. At step 24, clean images ŝt are obtained by inserting, the gain g and offset b found in step 22 and the estimated true signal images f0, f1, . . . , fm/2, . . . fm−1 into Equation 1 as follows:
At step 26, a multi-frame super-resolution method is applied to clean images êt and a prior estimate of higher level resolution version of the reference true scene frame Sr to obtain a current estimate of a higher level resolution version of the reference true scene frame Sr. At step 28, if this is not the last iteration (determined empirically), then at step 30, a new version of {tilde over (s)}t is synthesized from the current estimate of Sr and steps 22-28 are repeated until the difference between the current estimate of Sr and the immediate prior estimate of Sr is below a predetermined (empirically estimated) threshold. The output of the method is the estimated gain g, offset b, and super-resolved reference true scene frame Sr.
Referring now to
In step 26, an iterative updated procedure is employed to apply super-resolution to obtain a bigger resolution reference image Sr, and to provide a means for making a decision as to whether the estimates for gain g and offset b obtained by solving the least square fitting problem in step 22 and Equation 8 is sufficient. The iterative updated procedure employs the following equation:
where I[n] is the n-th estimate of the high resolution or deblurred version of the clean images ŝt, p is a back-projection kernel defined in Irani et al., “♦” is convolution, ↑ is up-sampling by a factor of s, −F1 is inverse warping of a high resolution image, and ŝt(I[n]) is defined in Equation 9. I[n+1] serves as an estimate for a higher resolution version of the reference image Sr for the present iteration. At step 30, for the next iteration, the next set of estimated true signal images s{tilde over (s)}t can be synthesized by substituting I[n+1] for Sr into Equation 9.
If this is the first, iteration through steps 22-30 at n=0, then I|N| is set to zero in Equation 11, so that Equation 11 reduces to
I[G]=[ŝr↑s]●p
where ŝr is the estimated clean reference image after applying g and b to initial estimate of the reference true scene frame st. Note that, as the number of iterations increases, the difference between ŝt and ŝt(I[n]) decreases, so that Equation 11 converges to I[n+1]≅I[n]. Thus, if this difference is below a predetermined (empirical) threshold, then a good estimate of Sr, g and b are obtained.
Steps 10-36 can be repeated for another set of observed images zt(x,y) from the same video, except that the estimated gain g(x,y) and offset b(x,y) is obtained front the just estimated gain and offset instead of from a statistical-based NUC. This method can be repeated until all of the images in the input video have been processed.
In some embodiments, the method of the present invention can be incorporated directly into the hardware of a digital video camera system by means of a fiend programmable gate array (FPGA) or ASIC, or a microcontroller equipped with RAM and/or flash memory to process video sequences in real time. Alternatively, sequences of video can be processed offline using a processor and a computer-readable medium incorporating the method of the present invention as depicted in the system 40 of
It is to be understood that the exemplary, embodiments are merely illustrative of the invention and that many variations of the above-described embodiments may be devised by one skilled in the art without departing from the scope of the invention. It is therefore intended that all such variations be included within the scope of the following claims and their equivalents.
This application claims the benefit of U.S. provisional patent application No. 60/852,200 filed Oct. 17, 2006, the disclosure of which is incorporated herein by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
6681058 | Hanna et al. | Jan 2004 | B1 |
6910060 | Langan et al. | Jun 2005 | B2 |
7684634 | Kilgore | Mar 2010 | B2 |
Number | Date | Country | |
---|---|---|---|
20080107346 A1 | May 2008 | US |
Number | Date | Country | |
---|---|---|---|
60852200 | Oct 2006 | US |