The present invention relates generally to vision systems, and more particularly to a method and system for combining images from several sources of different bandwidths into a single “fusion” image by employing the technique of contrast normalization to optimize the dynamic range of pre-fused images.
Soldiers often find themselves in situations in which there is reduced or no visibility of the battlefield, especially at night. There is a need for providing visual information to soldiers in nighttime environments and day or night environments with visual obstructions such as fog or smoke. In such situations, the soldier may be equipped with soldier-worn, hand-held, and/or vehicle-based night vision and enhanced-vision systems. These systems enhance visibility in the visible range of frequencies of electromagnetic radiation, or provide “sight” in the infrared range of frequencies.
An improvement over individual systems that provide visibility over a single range of frequencies of electromagnetic radiation or combine visibility in several ranges of frequencies with different pieces of equipment is to combine the video of a long wave infrared (LWIR) source, a short wave infrared (SWIR) source and a standard visible source, into a single image using a single piece of equipment, thereby providing significantly enhanced visibility of the scene. Another example is combining video from two cameras that point at the same scene, but with different focal length, providing enhanced depth of focus. A third example is combining video from two cameras that have a different aperture setting, providing significantly enhanced dynamic range to the display. In all these applications, it is desirable to preserve the most significant details from each of the video streams on a pixel-by-pixel basis. Such systems employ a technique known in the art as image fusion.
One image fusion technique known in the art is to perform an averaging function of the multiple video streams. However, contrast is reduced significantly and sometimes detail from one stream may cancel detail from another stream. Laplacian pyramid fusion on the other hand provides excellent automatic selection of the important image detail for every pixel from multiple images at multiple image resolutions. By performing selection in the multi-resolution representation, the reconstructed—fused—image provides a more natural-looking scene. In addition, the Laplacian pyramid fusion algorithm allows for additional enhancement of the video. It may provide multi-frequency sharpening, contrast enhancement, and selective de-emphasis of image detail in either video source.
However, current multi-scale, feature-selective fusion techniques employing Laplacian pyramid decomposition/construction do not work well on high dynamic range (HDR), high noise imagery. Performing dynamic range adjustment on the input images before fusion may ameliorate some of these problems. Various techniques such as histogramming and linear stretching have been introduced to take better advantage of the input image dynamic range. But these techniques still do not adequately deal with localized areas of low contrast and do not address issues associated with noise in the input images.
Accordingly, what would be desirable, but has not yet been provided, is a method and system for effectively and automatically fusing images from multiple cameras of different frequency bands (modalities) that benefit from the advantages of Laplacian pyramid decomposition/construction while being immune to low contrast and the presence of noise.
The above-described problems are addressed and a technical solution is achieved in the art by providing a computer implemented method for fusing images taken by a plurality of cameras, comprising the steps of: receiving a plurality of images of the same scene taken by the plurality of cameras; generating Laplacian pyramid images for each source image of the plurality of images; applying contrast normalization to the Laplacian pyramids images; performing pixel-level fusion on the Laplacian pyramid images based on a local salience measure that reduces aliasing artifacts to produce one salience-selected Laplacian pyramid image for each pyramid level; and combining the salience-selected Laplacian pyramid images into a fused image. Performing pixel-level fusion further comprises, for each level of the Laplacian pyramid images corresponding to each source image of the plurality of images: passing a Laplacian pyramid image through a filter which average each pixel with surrounding pixels; convolving the filtered Laplacian pyramid image with the absolute value of the Laplacian pyramid image to produce an energy image; generating a selection mask wherein a pixel at a given location is selected based on comparing the energy of the pixels originating from energy images corresponding each of the source images; and multiplying the selection mask by a filter to reduce aliasing artifacts by smoothing the contributions to the selection mask from each of the energy images to produce a contribution image. The contribution images corresponding to each of the source images may be summed to produce the salience-selected Laplacian pyramid image at one level of a Laplacian pyramid.
The pixel that is selected at a given location is the pixel among the energy images corresponding to each source image which has the highest energy when only one pixel has the highest energy; otherwise, partial contributions from each of the pixels from each of the sources which have about the same energy are selected. A selected pixel may be incremented by a hysteresis factor to reduce flickering artifacts.
The original image is decomposed into multiscale, multi-spectral Laplacian images and a Gaussian image. The Gaussian image contains the lowest frequency and is least scaled. The collection of the Laplacians and the Gaussian is called a Laplacian pyramid. If the pyramid is n levels, then the Gaussian image is the (n-1)th level (the highest level).
The step of generating Laplacian pyramid images further comprises the steps of: generating a Gaussian pyramid for at least one of the plurality of images; and applying noise coring to at least the Level 0 Gaussian image of the Gaussian pyramid to produce a noise-cored Laplacian image. Applying noise coring further comprising the steps of: processing at least the Level 0 Gaussian image substantially simultaneously through a plurality of derivative (gradient) filters to produce a first set of gradient images; applying a noise coring function to each of the gradient images; processing the noise cored gradient images substantially simultaneously through a second set of derivative (gradient) filters; and negatively summing the resulting filtered/cored images to produce the noise-cored Laplacian image.
Applying contrast normalization further comprises the steps of, for each Laplacian image at a given level: obtaining an energy image from the Laplacian image; determining a gain factor that is based on at least the energy image and a target contrast; and multiplying the Laplacian image by a gain factor to produce a normalized Laplacian image. The target contrast may be halved with increasing pyramid level. The gain factor may undergo noise coring. The degree of noise coring may be halved with increasing pyramid level. A saturation mask may be applied to the gain factor.
Contrast normalization of the Gaussian image proceeds as follows. First, the maximum pixel value of the image is computed and used as the contrast target. The gain factor is based on Gaussian image and the target contrast, and multiplying the Gaussian image by the gain factor produces a normalized Gaussian image. The normalized Gaussian image may be further decomposed into a DC value (average value of the image) and a signed AC image. The DC value is adjustable and is defined as the brightness of the image.
A non-uniformity correction (NUC) may be applied to each of the plurality of images to produce corrected images. At least one of linear stretching, uniform histogram equalization, Rayleigh shape histogram equalization, Gaussian shape histogram equalization, and gamma correction may be applied to each of the plurality of images to produce preprocessed images. The preprocessed images may be warped to align the preprocessed images to each other.
If the images to be fused contains a visible image, that image may be separated into a luminance component and chrominance components, wherein the luminance component is contrast normalized and undergoes image fusion, and the chrominance component is combined with the fused image to produce a color enhanced fused image. A luminance damping function may be applied to the luminance component; and a gamma correction may be applied to the chrominance components. The chrominance components may be orthogonalized.
The present invention will be more readily understood from the detailed description of exemplary embodiments presented below considered in conjunction with the attached drawings, of which:
It is to be understood that the attached drawings are for purposes of illustrating the concepts of the invention and may not be to scale.
Referring now to
Treated as part of the Laplacian pyramid generation process, at level 0 only, gradient images (not shown) are created prior to Laplacians for each of the images 16, 18, 20. Noise coring at blocks 32 is applied to these gradient images to remove noise from input (pre-fused) images. In the Laplacian Pyramids 34 block, Laplacian cored images are generated from these noise-reduced gradient images at least at level 0. Laplacian images from other levels may be directly generated from the filter-subtract-decimate (FSD) coring of the Gaussian image representations of the source images 16, 18, 20 or the warped source images. The decimation depends on the sampling of the pyramid. In a preferred embodiment, Laplacian fusion of the present invention is applied to luminance images (i.e., not including chrominance portions in a YUV format). Laplacian Pyramids are also created in blocks 38 from the saturation masks 28 and in blocks 40 from the region-of-interest (ROI) mask 30. The Laplacian images of each of the modalities at all levels are then contrast normalized at blocks 36 with the aid of the generated saturation mask Laplacian pyramids in blocks 40 to optimize the dynamic rage of the pre-fused images. Up to this point, the pre-fused images are noise reduced, enhanced, and normalized across all modalities. Adaptive fusion using salience-selection is performed in blocks 42 based on a local salience measure and a hysteresis factor that reduces aliasing and flickering artifacts. The ROI mask pyramids are employed during selection to enhance blending during the fusion step by dynamically excluding non-overlapping input regions of input images. The resulting fused image is reconstructed in block 44 from the fused pyramid. After reconstruction from the fused pyramid, the fused monochromatic image is combined with color in block 46 from the visible input image 16. Color enhancement and correction is also performed in block 46 to scale and boost the color component from the visible camera by matching the discrepancy between the luminance image before and after image fusion.
Referring now to
In other embodiments, the computing platform 50 may be embodied in a single integrated circuit, which may be a digital signal processor, an field-programmable gate array (FPGA), an application-specific integrated circuit (ASIC) either standing alone or placed under the control of an external processor.
In a typical hardware implementation, the cameras used for generating the images 16, 18, 20 have different formats and operate with different numbers of data bits. For example, images from TV 16 may have a Bayer format with 10-bit data, images from SW IR 18 and LW IR 20 may have a 14-bit black and write format. Although the present invention is not limited to any particular format or number of data bits, the present invention will be described as having a 16-bit data format except for the color component.
Focal-plane array (FPA) sensors used in visible-light and infrared imaging systems are known to have fixed (or static) pattern noise superimposed on a true (i.e., noise free) image. The fixed pattern noise is attributed to spatial non-uniformity in the photo-response (i.e., the conversion of photons to electrons) of individual detectors in an array of pixels which constitute the FPA. The response is generally characterized by a linear model:
S(x,y)=a(x,y)·I(x,y)+b(x,y), (1)
where S(x,y) is the true signal of a single detector (pixel) I at (x,y) in an array of pixels that are modeled as being arranged in a rectangular coordinate grid (x,y) at time t, a(x,y) is the gain, and b(x,y) is the offset. Generally speaking, gain and offset are both a function of time, as they drift slowly along with the temperature change. The present invention employs a two-point NUC to solve for the gain a(x,y) and the offset b(x,y) which are read from a lookup table.
A histogram and linear stretching pre-processing enhancement is performed on the input images in pre-processing blocks 24 of
where LB and HB are lower bound and higher bound respectively, S(x,y) is the input signal to the Histogram and stretch block. bound{ } is a cutoff function that ensures the output data is in the range of [0, 2k−1]. In an illustrative embodiment, the image has a default 12-bit representation after pre-processing.
For some camera sources, in some embodiments, other image enhancement methods may be considered, which may include for uniform histogram equalization, damped histogram equalization, histogram specification, etc.
Referring now to
Damped histogram equalization may be used to apply a weight mapping between pure uniform histogram equalization and linear stretching. Damped histogram equalization may balance between over-saturation from the histogram equalization and sub-amplification from the linear stretching. The histograms may be weighted by a weight factor between the linear stretch and the histogram equalization.
Other pre-processing methods that may be used as alternative to histogram clipping and then weighting applied between the linear stretch and the histogram specific functions. These pre-processing methods include Rayleigh and Gaussian shape histogram equalization. All of these histogram specific functions are designed for comparison with the contrast normalization approach, which is described herein below. The pdf function P(x) and the cumulative function D(x) for the Rayleigh transform are plotted in
where σ2 is the variance, and
The pdf function P(x) and the cumulative function D(x) for the Gaussian transform are plotted in
where σ2 is the variance, and μ is the mean value,
where erƒ( ) is a error function.
The inverse cumulative distribution function for the Gaussian transform, or quantile function, may be expressed in terms of the inverse error function:
D−1(ƒ)=√{square root over (2)}erƒ−1(2ƒ−1) (7)
Gamma correction may be used for correcting the nonlinear relationship between pixel value and displayed intensity. Gamma correction I′(x,y) corrects for the nonlinear relationship between the true voltage signals and the display responses. Gamma correction is especially useful when applied to dark scenes.
where γ is the correction index, and p is the number of bits of the data.
A modified gamma correction formula may be used, known as the ITU-709 HDTV Production standard, to correct an RGB color image. The gamma index is a fixed number γ=2.22. The formula is based on the IPP manual:
Assume R,G,B are normalized in the range of [0,1],
For R,G,B<0.018
R′=4.5 R
G′=4.5 G
B′=4.5 B
For R,G,B>=0.018
R′=1.099 R(1/2.22)−0.099
G′=1.099 G(1/2.22)−0.099
B′=1.099 B(1/2.22)−0.099
The bi-linear interpolation algorithm uses image values at four pixels (x0, y0), (x0, y1), (x1, y0), (x1, y1), which are closest to (xs, ys) in the source image:
x1=x0+1,y1=y0+1,
x0≦xs≦x1,y0≦ys≦y1
The pixel value at (xs,ys) after bi-linear interpolation may be expressed as follows:
I0=I(x0,y0)·(x1−xs)+I(x1,y0)·(xs−x0),
I1=I(x0,y1)·(x1−xs)+I(x1,y1)·(xs−x0),
I(xs,ys)=I0·(y1−ys)+I1·(ys−y0) (9)
The bi-cubic interpolation algorithm uses image values at sixteen pixels in the neighborhood of the source point (xs,ys) in the source image:
xs1=xs0+1,xs2=xs0+2,xs3=xs0+3
ys1=ys0+1,ys2=ys0+2,ys3=ys0+3
xs1≦xs≦xs2,ys1≦ys≦ys2
For each ys1, the algorithm determines four cubic polynomials: F0(x), F1(x), F2(x), and F3(x), each Fi(x) has the following expression:
Fi(x)=aix3+bix2+cix+di,0≦i≦3, (10)
and I(xs,ys) is determined by Fi(x). The bi-cubic interpolation function is depicted in
The saturation mask may be used to get rid of a saturated portion of the image. From experimental data tested on 16-bit pixel values, saturation values may be highly concentrated at the higher bound. After linear stretch, these values may become 2k−1. Therefore, any pixels that are within sat_thr percentage of the k-bit Range may be mask out using a generated saturation mask as follows:
Noise coring at blocks 32 of
The data flow for coring on the gradient images for level 0 is illustrated in
A Gaussian image based on an input image from one of the modalities, G, is filtered by a preprocessing filter 80 and then further processed simultaneously through a plurality of derivative (gradient) filters 82. Then, a noise coring function 84 (to be discussed hereinbelow) is applied to the gradient images. A second set of derivative (gradient) filters 86, which are related to the gradient filters 82 are applied to the cored images. The filtered/cored images are then negatively summed in block 88 to produce an output cored Laplacian image Lc.
More specifically, at level 0 and/or 1, when the 5-tap filter w is used, the relation between the Laplacian and Gaussian image is expressed in the following equation:
L0=(1−w)G0, (12)
where w is a 5×5-tap filter:
(1−w) may be further decomposed into a summation of a convolution of two derivative filters and one constant pre-filter:
where p, w′, Dk and −Dk=Dk′, (k=x, x+y, y, x−y) are defined as follows:
At higher levels, a larger filter for double density pyramid is applied. This filter may be decomposed into a 9-spread-tap filter w1, and a 3-tap binomial filter w2:
w′ may be obtained from the root of w1 and w2:
Unfortunately, 1−w′ may not be decomposed into
Therefore, at a pyramid level at which a double density filter is applied, coring may be applied to the Laplacian image at that level instead of the Gaussian image. More specifically, at level 0, coring is applied to the gradient images. At higher level 1, coring is applied to the Laplacian images only. If level 1 is sub-sampled, and uses [14641] filter as in level 0, then level 1 may apply the same coring on the gradient images. Filter [134565431] is approximated to [104060401][121].
Two coring functions are considered in the present invention: a hardcore function and a softcore function. The hardcore function is a step function with threshold T,
The softcore is a function of the noise variance σ2 and peaking index p:
Iout(i,j)=(1−e−(1
By default, p=2. Iin(i,j) may be a Laplacian image, or a gradient image. Representative soft coring functions are plotted in
The present invention takes the Laplacian images at the various levels generated after noise coring to create a modified double-density Laplacian pyramid for each source image before fusion. Sub-sampling is not performed at the highest pyramid level. The image at this level is a quadruple density Gaussian image. Instead of double sampling at level 1, the double sampling at level 2 may be considered in order to reduce the latency. Further, in the preferred embodiment, a simplified 9-tap filter is used to replace a double-step convolution from a combination of a spread-9-tap and a 3-tap filter (which becomes an 11-filter).
In order to handle saturation and blending, a saturation mask pyramid and a ROI mask pyramid are generated for each of the modalities. The decimation of these masks at each level is the same as those from the image pyramid at that level, and so is the filter size. The mask pyramids are 8-bit, and do not require generating Laplacian images. The saturation mask pyramid will be used in the contrast normalization block to blend the enhanced images with the original images in the unmasked region while the ROI mask pyramid is used in the fusion selection process to guarantee proper stitching.
In some embodiments, the cameras have a fixed pose. Once the cameras are calibrated, the overlap among all the images after alignment is a fixed region. This facilitates obtaining the region of interest for each warped image. This region of interest, or ROI, is a function of the alignment parameters, the NUC coefficients and the image properties (e.g., black borders, static overlay, etc). Since camera information is lacking, the ROI may be generated from the alignment parameters.
Referring now to
In order to improve fusion, source images of each of the modalities need to be normalized before a selection step is performed at each level. Generally speaking, it is desirable to selectively diminish large, high frequency steps while greatly augmenting small, high frequency details. In a preferred embodiment, the present invention employs a pyramid technique called contrast normalization, which is fit into a generic pyramid fusion structure, and which uses different transfer functions to re-map values for each level of the pyramid. The concept of contrast normalization will be discussed hereinbelow. Afterward, the use of contrast normalization in the context of the present invention is discussed.
Salient features appear very different when present in brightly lit regions of a scene than when in heavily shadowed regions. Relatively lower contrast features may be masked by nearby high contrast features. Contrast normalization seeks to reduce the overall dynamic range of an image while adjusting the contrast of local features towards an optimal level that is sufficient for detection yet not so high as to mask other nearby features. The effects of contrast normalization on images are depicted in the images of
Contrast may be defined roughly as the difference between the maximum and minimum image intensity values within a local neighborhood, divided by the dynamic range of the image itself. Relevant pattern structure occurs at many scales within an image, with small scale details features, such as edges and texture, superimposed on larger scale features, such as gradations in illumination and shadow. For these reasons, the normalization process needs to be implemented within a multi-resolution transform domain, such as by employing Laplacian pyramids. In some embodiments, other multi-resolution transforms could be used.
In the present invention, the filter-subtract-decimate (FSD) Laplacian pyramid transform is used and may be defined briefly as follows. Let I(ij) be the source image. Let Gk(ij) be a K+1 level Gaussian pyramid based on I(ij), with k=0 to K. Let Lk(ij) be the corresponding Laplacian pyramid, for k=0 to K−1. The Gaussian is generated through a recursive filter and subsample process: G0=I, and for k=1 to K
Gk=[w*Gk−1]↓2. (26)
Here the symbol ↓2 means the image in brackets is sub-sampled by 2 in each image dimension. The generating kernel, w, is a small low-pass filter. In the present invention a 5 by 5 separable filter with binomial weights is used:
Each level of the Laplacian pyramid is obtained by applying a band pass filter to the corresponding Gaussian:
Lk=Gk−w*Gk. (28)
The original image may be recovered from its Laplacian transform by reversing these steps. This begins with the lowest resolution level of the Gaussian, GK, then uses the Laplacian pyramid levels to recursively recover the higher resolution Gaussian levels and the original image. For k=K−1 to 0,
Gk≈(1+w)*Lk+4w*[Gk+1]↑2. (29)
The symbol ↑2 means upsample by 2. The expansion filter 1+w is an approximation used here as suitable for contrast normalization. The recovered image is then just I=G0.
The Laplacian pyramid effectively decomposes the original image into a set of octave wide bandpass components that differ in center frequency, or scale, by factors of 2. In contrast normalization, the values that occur within each band are adjusted so that they approach the specified target contrast value. A local contrast for a band pass signal is defined as the local RMS (root mean square) value within a window W divided by the image dynamic range. A contrast map is defined in this way for each Laplacian pyramid level:
Here R=2N is the dynamic range of the original N bit image. In Li and Adelson et al., “Compressing and Companding High Dynamic Range Images with Subband Architectures,” Massachusetts Institute of Technology, Cambridge, Mass., Li and Adelson et al. have found that the least window size should be the filter size used for pyramid generation. This would greatly reduce the signal distortion since the local gain factor is based on the energy value from the local contrast map (Ck). The window size picked is twice the size of the filter in a software simulation, W=2w, but equal size in a hardware implementation after considering limiting the latency in the fusion system, W=w. Sub-sampling within the pyramid means that the effective size of the window doubles from level to level.
In contrast normalization, the sample values of the source pyramid L are adjusted to form a normalized pyramid {circumflex over (L)} in which the local contrast is shifted towards the specified target level T. In general, T differs from level to level. Let Tk be the target contrast at pyramid level Lk. Normalization is achieved through multiplication by a gain factor, g, which is a function both of the local contrast and the target contrast:
{circumflex over (L)}k(ij)=g(Ck(ij),Tk)Lk(ij). (31)
The gain function is compressive toward T, so that Laplacian sample values will be increased where Ck(ij)<Tk and decreased where Ck(ij)>Tk(ij). More specifically, a gamma exponential function is adopted to achieve this result:
Here ε serves to limit the gain when the contrast is very small, εTk is comparable to the amplitude of image noise. The exponent, γ, controls the degree of normalization. γ varies between 0 and 1 with values near 0 corresponding to a high degree of normalization and values near 1 corresponding to little normalization. These relationships are show in
In general it is expedient to reduce the contrast of lower frequency components of an image relative to higher frequency components. High frequencies often represent the pattern detail important to object recognition, while low frequency often represents gradations in illumination or larger scale objects in the scene. In a preferred embodiment, a simple scaling rule is adopted in which the target contrast in reduced by a factor β, from level to level:
Tk=βkT0. (33)
β is a constant, 0<β≦1. Larger values of β within this range result in uniform contrast across levels, while smaller values result in reduced contrast of low frequency components relative to higher frequencies. Gain functions for successive pyramid levels and β=1/2 are shown in
The target contrast parameter, T0, determines how strong feature contrast is in each level of the normalized pyramid, and in the final normalized image. The role of T0 is illustrated in
The exponent, γ, determines the degree of normalization. When γ is near 1 there is little change in Laplacian values. When γ is near 0 local feature contrast is forced to closely approach the target across the image. These effects are shown in
In the context of the present invention, the gain function, g(ij) is set according to specific requirements to be outlined hereinbelow. Further, the normalized images {circumflex over (L)}k(ij) of Equation 31 are not recombined immediately into one Laplacian image after applying gain, but instead undergo image fusion steps for each of the modalities to be discussed hereinafter.
Given ε0, γ0, and T0 at level 0,
For 0<n<max_lev,
εn=εn−1,γn=γn−1,and Tn=Tn−1/2.
Since small energy values are boosted in the gain function, noise at each level, which is usually much smaller, is boosted as well with high gain. In order to reduce noise while keeping the signals boosted, Eq. 25 may be combined with the gain function of Eq. 32. The expression of the gain function is thus written as:
where E(x,y) are the local energy map values of an image after block 110, and is the local contrast map values Ck(ij) in Eq. 32. The image may be enhanced further by means of noise coring. Noise may be reduced in higher level images. The parameter a representing noise used in the coring formula is related to the pyramid level as follows:
Given σ0, at level 0,
For 0<n<max_lev,
σn=σn−1/2.
The gain function with coring is plotted in
In the present invention, the process of constructing a pyramid of images from a source image is to employ the noise cored images at all scales except for the smallest scale. For all scales except the lowest, the contrast normalization algorithm depicted in
G′n(x,y)=CN(Gn(x,y)), (35)
G″n(x,y)=α(G′n(x,y)−
The contrast normalization setting of this image uses the same rules as the Laplacian images, except that the contrast target is set to be the maximum value of the Gaussian image. The offset is set to the middle of the data range (2n−1) by default. If the offset is set to be the average of the image, then there is no change in the brightness
When considering saturation mask, it is assumed that the masked region should be untouched, thus the gain is 1.0. To avoid the sharp change at the mask borders, a blending mechanism is used to transit from the masked region to the unmasked region as expressed in Equation 36:
where g(x,y) is the contrast normalization gain function. The level 0 mask has the binary values: 0 and 2p−1, and p=8: Other masks of higher levels will have a blurred border because of the filters that are applied to them. In embodiments in which the saturation mask is used, the contrast normalization block in
The present invention fuses the Laplacian images and the top level Gaussian image from each of the modalities using different methods. For the Laplacian images, salience selection fusion, to be described hereinafter, is employed. The salience selection fusion technique of the present invention, which is based on a localized correlation measure, effectively reduces aliasing in the fused image. In order to reduce the flicker problem when values from different source Laplacians at the same location have equal amplitudes but opposite signs, a hysteresis function is applied during the application of a selection mask to be described hereinbelow. Salience selection fusion reduces aliasing in the spatial domain, and hysteresis reduces flickers in the temporal domain. To reduce border effects from several source images in fusion, the ROI mask described above is introduced in the selection process.
In a preferred embodiment of the present invention, the winner among a plurality of images to be fused is not solely based on the amplitude or energy of a Laplacian pixel at (x,y) (better known as feature-selective fusion), but on a localized correlation measure (or salience measure) around (x,y). The energy of each pixel in a given Laplacian pyramid image is averaged with its surrounding neighbors. Then each source Laplacian is associated with a selection mask. This binary mask set its value at (x,y) to 2p−1 (p=8) if its salience measure is the maximum in amplitude compared to the other source images at the same location, otherwise, it is set to 0. The selection mask may be set to other different values if and only if 1) ROI mask is used 2) more than one source share the same maximum amplitude at (x,y). To further reduce the aliasing, selection masks are again filtered and used as a weight factor for the fused Laplacians.
Referring now to
As discussed above, the source image ROI is generated after warping. The ROI information may be either calculated from the warping parameters or input externally. The pixel value of image A at point B in
i is the index of the source images, 0≦i<s, and p is the number of bits in ROI mask data representation. |Li(x,y)| is the absolute value image of the Laplacian image.
In block 122, a selection mask is generated based on the energy of the source Laplacian image at each pixel and corresponding pixels from each of the other source images 124. Based on energy/amplitude, the pixel with the highest amplitude is automatically chosen for the fusion result. In the present invention, the generated mask values for each pixel vary with the number of sources with similar energy. Each source image has a selection mask pyramid, which may be regarded as a mask with binary values. If one source at B has the maximum value compared to other sources at (x,y), then the selection mask for that source at (x,y) will be marked as 1, and the selection masks for other sources will have 0, respectively. This selection is processed for the whole pyramid. An 8-bit representation is used for the mask such that 2p−1 represents 1. If the sources have identical values at point B, the mask values may be set to ½, ⅓, etc., of the range 2p−1, depending on the total number of sources 124. In other embodiments, only one source gets the full value of 2p−1 while the other sources are set to values of 0. Table 1 lists the selection rule when there is no ROI mask, which lists the conditions of three input sources and their output after selection. Bi(x,y) represents the value at position B in
The resulting image after applying the selection mask is filtered by Filtb in block 126 to reduce aliasing. The filter size used in a preferred embodiment is 5. The effect of block 126 is to more smoothly blend the contributions from each of the images, e.g., if image A1 was selected originally at 1, its value may now be 0.9, while A2 and A3 may become 0.05 respectively. In order to get the contribution from a given source, e.g., Source A, at step 128, the filtered, selection masked image is multiplied with the Laplacian image (see Eq. 38).
Referring now to
where SM is the selection mask of the source images, and gainL is the output gain for the fused Laplacian.
Top-level Gaussian images contain the lowest frequency signals.
G′n(x,y)=CN(Gn(x,y)), (39)
G″n(x,y)=αn(G′n(x,y)−
where
At the top level, the fused image, GF, is the weighted average of the source Gaussian images G″0 to G″n.
Flickering is often seen in the fusion of LWIR and SWIR/TV modalities when both sources contain reverse intensities at the same locations. For example, at (x,y) in the Laplacian images at level n, the selection rule would pick LWIR over the SWIR at time t, and then may pick SWIR over LWIR at time t+1. This randomness is mostly determined by noise. In order to reduce the flicker, hysteresis may be employed. A hysteresis function is defined that remembers the selection from the last frame. If the source at a sample position (x,y) is selected the last time, then the salience energy for this source image is incremented from its last selection. The boosted energy is then used in the selection to determine the new selection mask. The higher the hysteresis factor, the better the chances that the last selected source wins a subsequent selection.
In the present invention, each source modality pyramid has a hysteresis mask pyramid. The mask value h_mask(x,y) at a pyramid level n is determined by the following rule:
Eq. 37 may then be modified after using the hysteresis:
where h_weight is the hysteresis factor, and h_weight ε[0,1].
Pyramid reconstruction starts from the adapted Gaussian image. Each Gaussian image is up-sampled to the lower level if there exists a non-trivial scaling factor between two levels of images. The expanded Gaussian image is then added to the filter-boosted (convolve with 1+w) Laplacian to form the Gaussian at that level. This process is repeated until level 0 is reached.
When fusing a color image with other images of different spectra (e.g., LWIR, SWIR, TV) it is customary to first decompose the color image into luminance and chrominance components. Then, the luminance component from the color image fuses with images of other modalities. The fused result then combines with the chrominance component to form the final output image. The color space is chosen such that the luminance part of the image may be processed without affecting its color component. YUV and HSV color spaces have been used in pattern-selective fusion. YUV is the basic color space used in analogue color TV broadcasting. The color difference, or chrominance components (UV), are formed by subtracting luminance from blue and from red.
The HSV (hue, saturation, value) color space was developed to be more “intuitive” in decomposing color and was designed to approximate the way humans perceive and interpret color. Hue defines the color itself; saturation indicates the degree to which the hue differs from a neutral gray, and value indicates the illumination level. In Ruderman, D. L., Cronin, T. W., and Chiao, C., “Statistics of Cone Reponses to Natural Images: Implications for Visual Coding,” JOSA A, 15(8), 2036-2045, 1998, an lαβ color space was developed in the context of understanding the perception of the human visual system, and it was believed that this space would provide least correlation between axes and any operation in one color channel would not create distortion and artifacts to other channels. A feature of this space is its non-linearity. The l axis is the luminance channel, and α and β channels are chromatic yellow-blue and red-green channels. Generally speaking, the orthogonalization of a color image is a complicated problem, which is related to the capturing device, the display device, and human perception. In the present invention, instead of concentrating on the selection of a color space, what is sought are orthogonal bases in a color space based on the imaging system.
Referring now to
The problem to be solved may be stated as follows: given a color image YUV (or any orthogonal color space), where Y is the luminance, and UV is the chrominance, assume Y changes, what are the changes to UV so that the color is preserved and the color image is optimal based on some criteria? Theoretically, if YUV are the true decoupled (orthogonal) bands, changes in Y do not require any change to chrominance (UV). In real applications, a few exceptions require changes in chrominance. For example, when UV is small, and the change in Y is large, color is not fully expressed in the result image; second, when fused Y is saturated, the color fades away; last, the color bands in the captured images are often not orthogonal. This implies that directly converting RGB to other color spaces may not fully decouple the luminance from the chrominance. The change in luminance will result in color skew in the fused image. The solutions to the first and second problem are discussed in sections on Luminance Damping and Chrominance Gamma correction. The orthogonalization of image bands is discussed in a section on Orthogonalization of Color Bands.
Luminance Damping
When fusion luminance YF is near saturation, the color in the nearly saturated region is washed away. To get the color back, a damping function is defined to “drag” YF toward Y when |YF−Y| is large:
where k is the damping factor that controls the damping rate. Since k is much larger than the dynamic range of the image, for hardware implementation, Eq. 41 may be Taylor-expanded to its first order components:
Chrominance Gamma Correction
In color enhancement, U and V are normalized to be within [−1,1]. If U and V are an unsigned byte image, then the normalized u and v are expressed as follows:
Ru and Rv are the ranges of U and V band respectively. If U and V are not 8-bit, Ru and Rv may be replaced with a proper range. The color enhancement is similar to the gamma correction formula. The differences are that color enhancement is only applied to the chrominance channels, and the gamma index is a function of the luminance difference at each pixel as shown in Eq. 44 as follows:
b is between [0, 1], and a is larger than the range of the image. For byte images in a preferred embodiment, b=0.7, a=2048, and k=2048 are chosen as default values. The default range for these parameters are: bε[0.3,1.0], aε[512,2048], and kε[512,2048].
In order to reduce noise, a linear relationship is defined between u and u′, and v and v′ when they are smaller than a threshold value:
If|u|<thresh,u′=gu;
If|v|<thresh,v′=gv, (45)
where g is the gain, and thresh is a small value between [0,1]. If the absolute value is less than the threshold, Eq. 45 is used; otherwise, Eq. 44 is used. A more complicated relation may be defined so that Eq. 44 and 45 may be connected with higher order continuity.
In HSV color space, the change of value conforms to Eq. 42, and hue is preserved. Saturation is used to enhance the color. Assume S is in the range of [0,1], then
Orthogonalization of Color Bands
Color orthogonalization is used to preserve a color band when other bands are changed. Generally speaking, most of the defined color spaces have the intensity decoupled from the color information. The decomposition of a correlated space into orthogonal components may be used in the present invention for the basis of color enhancement. The method for orthogonalizing a correlated color space is described in
If the imaging system has non-uniformity noise, or the physical characteristics of pixels on the sensor are irregular, then the above method does not apply. Since the calibration matrix entirely relies on the sample images, the collection of these images are such that they represent the full stretch of the color and luminance. If the imaging system is not calibrated, the YUV or HSV or lαβ space is used, as each is very close to the orthogonal space.
It is to be understood that the exemplary embodiments are merely illustrative of the invention and that many variations of the above-described embodiments may be devised by one skilled in the art without departing from the scope of the invention. It is therefore intended that all such variations be included within the scope of the following claims and their equivalents.
This application claims the benefit of U.S. provisional patent application No. 60/991,100 filed Nov. 29, 2007, the disclosure of which is incorporated herein by reference in its entirety.
This invention was made with U.S. government support under contract number NBCHC030074. The U.S. government has certain rights in this invention.
Number | Name | Date | Kind |
---|---|---|---|
6201899 | Bergen | Mar 2001 | B1 |
7492962 | Zhang et al. | Feb 2009 | B2 |
Entry |
---|
In Ruderman, D. L., Cronin, T. W., and Chiao, C., “Statistics of Cone Responses to Natural Images: Implications for Visual Coding,” JOSA A, 15(8), 2036-2045, 1998. |
Burt, P. J. and Adelson, E. H., “The Laplacian pyramid as a compact image code,” IEEE Trans. on Communications, COM-31, No. 4, 532-540, 1983. |
Burt, P. J. and Kolczynski, R. J., “Enhanced image capture through fusion,” Proc. International Conference on Computer Vision, 1993, May 11-14, 1993 pp. 173-182. |
Li, Y., Sharan, L., and Adelson, E. H., “Compressing and companding high dynamic range images with subband architectures,” ACM Transactions on Graphics (TOG), 24(3), Proceedings of SIGGRAPH, 2005. |
Heeger, D. J., “Half-squaring in responses of cat striate cells,” Visual Nerosci. 9, 427-443, 1992. |
Number | Date | Country | |
---|---|---|---|
20090169102 A1 | Jul 2009 | US |
Number | Date | Country | |
---|---|---|---|
60991100 | Nov 2007 | US |