The present disclosure relates to image processing techniques and, in particular, to techniques to merge image content from related cameras into a single output image.
Image fusion techniques involve merger of image content from multiple source images into a common image. Typically, such techniques involve two stages of operation. In a first stage, called “registration,” a comparison is made between the images to identify locations of common content in the source images. In a second stage, a “fusion” stage, the content of the images are merged into a final image. Typically, the final image is more informative than any of the source images.
Image fusion techniques can have consequences, however, particularly in the realm of consumer photography. Scenarios may arise where a final image has different regions for which different numbers of the source images contribute content. For example, a first region of the final image may have content that is derived from the full number of source images available and, consequently, will have a first level of image quality associated with it. A second region of the final image may have content that is derived from a smaller number of source images, possibly a single source image, and it will have a different, lower level of image quality. These different regions may become apparent to viewers of the final image and may be perceived as annoying artifacts, which diminishes the subjective image quality of the final image, taken as a whole.
The inventors perceive a need in the art for an image fusion technique that reduces perceptible artifacts in images that are developed from multiple source images.
Embodiments of the present disclosure provide image fusion techniques that hide artifacts that can arise at seams between regions of different image quality. According to these techniques, image registration may be performed on multiple images having at least a portion of image content in common. A first image may be warped to a spatial domain of a second image based on the image registration. A fused image may be generated from a blend of the warped first image and the second image, wherein relative contributions of the warped first image and the second image are weighted according to a distribution pattern based on a size of a smaller of the pair of images. In this manner, contributions of the different images vary at seams that otherwise would appear.
The image processor 120 may include a selector 122, a registration unit 124, a warping unit 126, a feather mask estimator 128, a frontal mask estimator 130, and an image fusion unit 132, all operating under control of a controller 134. The selector 122 may select an image from one of the cameras 112, 114 to be a “reference image” and an image from another one of the cameras 112, 114 to be a “subordinate image.” The registration unit 124 may estimate skew between content of the subordinate image and content of the reference image. The registration unit 124 may output data representing spatial shifts of each pixel of the subordinate image that align with a counterpart pixel in the reference image. The registration unit 124 also may output confidence scores for the pixels representing an estimated confidence that the registration unit 124 found a correct counterpart pixel in the reference image. The registration unit 124 also may search for image content from either the reference image or the subordinate image that represents a region of interest (“ROI”) and, if such ROIs are detected, it may output data identifying location(s) in the image where such ROIs were identified.
The warp unit 126 may deform content of the subordinate image according to the pixel shifts identified by the registration unit 124. The warp unit 126 may output a warped version of the subordinate image that has been deformed to align pixels of the subordinate image to their detected counterparts in the reference image.
The feather mask estimator 128 and the frontal mask estimator 130 may develop filter masks for use in blending image content of the warped image and the reference image. The feather mask estimator 128 may generate a mask based on differences in the fields of view of images, with accommodations made for any ROIs that are detected in the image data. The frontal mask estimator 130 may generate a mask based on an estimate of foreground content present in the image data.
The image fusion unit 132 may merge content of the reference image and the subordinate image. Contributions of the images may vary according to weights that are derived from the masks generated by the feather mask estimator 128 and the frontal mask estimator 130. The image fusion unit 132 may operate according to transform-domain fusion techniques and/or spatial-domain fusion techniques. Exemplary transform domain fusion techniques include Laplacian pyramid-based techniques, curvelet transform-based techniques, discrete wavelet transform-based techniques, and the like. Exemplary spatial domain transform techniques include weighted averaging, Brovey method and principal component analysis techniques. The image fusion unit 132 may generate a final fused image from the reference image, the subordinate image and the masks.
The image processor 120 may output the fused images to other image “sink” components 140 within device 100. For example fused images may be output to a display 142 or stored in memory 144 of the device 100. The fused images may be output to a coder 146 for compression and, ultimately, transmission to another device (not shown). The images also may be consumed by an application 148 that executes on the device 100, such as an image editor or a gaming application.
In an embodiment, the image processor 120 may be provided as a processing device that is separate from a central processing unit (colloquially, a “CPU”) (not shown) of the device 100. In this manner, the image processor 120 may offload from the CPU processing tasks associated with image processing, such as the image fusion tasks described herein. This architecture may free resources on the CPU for other processing tasks, such as application execution.
In an embodiment, the camera 110 and image processor 120 may be provided within a processing device 100, such as a smartphone, a tablet computer, a laptop computer, a desktop computer, a portable media player, or the like.
The method 200 also may estimate whether a region of interest is present the subordinate image (box 240). If no region of interest is present (box 250), the method 200 may develop a feather mask according to spatial correspondence between the subordinate image and the reference image (box 260). If a region of interest is present (box 250), the method 200 may develop a feather mask according to a spatial location of the region of interest (box 270). The method 200 may fuse the subordinate image and the reference image using the feather mask and the frontal mask, if any, that are developed in boxes 230 and 260 or 270 (box 280).
Estimation of foreground content (box 210) may occur in a variety of ways. Foreground content may be identified from pixel shift data output by the registration unit 124 (
ROI identification (box 240) may occur in a variety of ways. In a first embodiment, ROI identification may be performed based on face recognition processes or body recognition processes applied to the image content. ROI identification may be performed from an identification of images having predetermined coloration, for example, colors that are previously registered as corresponding to skin tones. Alternatively, ROI identification may be performed based on relative movement of image content across a temporally contiguous sequence of images. For example, content in a foreground of an image tends to exhibit larger overall motion in image content than background content of the same image, whether due to movement of the object itself during image capture or due to movement of a camera that performs the image capture.
The registration unit 124 (
As illustrated in
In the embodiment of
In implementation, the distribution of weights may be tailored to take advantage of relative performance characteristics of the two cameras and to avoid abrupt discontinuities that otherwise might arise due to a “brute force” merger of images. Consider, for example, an implementation using a wide camera and a tele camera in which the wide camera has a relatively larger field of view than the tele camera and in which the tele camera has a relatively higher pixel density. In this example, weights may be assigned to tele camera data to preserve high levels of detail that are available in the image data from the tele camera. Weights may diminish at edges of the tele camera data to avoid abrupt discontinuities at edge regions where the tele camera data cannot contribute to a fused image. For example, as illustrated in
In the embodiment of
Similarly, in
Weights also may be assigned to reference image data based on the weights that are assigned to the sub-ordinate image data.
The illustrations of
At each level i, the method 400 may scale a shift map (SX, SY)i-1 from a prior level according to the resolution of the current level and the shift values within the map may be multiplied accordingly (box 430). For example, for a dyadic pyramid, shift map values SXi and SYi may be calculated as SXi=2*rescale(SXi-1), SYi=2*rescale(SYi-1). Then, for each pixel location (x,y) in the reference image at the current level, the method 400 may search for a match between the reference image level pixel and a pixel in the subordinate image level (box 440). The method 400 may update the shift map value at the (x,y) pixel based on the best matching pixel found in the subordinate image level. This method 400 may operate at each level either until the final pyramid level is reached or until the process reaches a predetermined stopping point, which may be set, for example, to reduce computational load.
Searching between the reference image level and the sub-ordinate image level (box 440) may occur in a variety of ways. In one embodiment, the search may be centered about a co-located pixel location in the subordinate image level x+sx and four positions corresponding to one pixel shift up, down, left and right, i.e. (x+sx+1, y+sy), (x+sx−1, y+sy), y+sy+1), (x+sx, y+sy−1). The search may be conducted between luma component values among pixels. In one implementation, versions of the subordinate image level may be generated by warping the subordinate image level in each of the five candidate directions, then calculating pixel-wise differences between luma values of the reference image level and each of the warped subordinate image levels. Five difference images may be generated, each corresponding to a respective difference calculation. The difference images may be filtered, if desired, to cope with noise. Finally, at each pixel location, the difference value having the lowest magnitude may be taken as the best match. The method 400 may update the pixel shift value at each pixel's location based on the shift that generates the best-matching difference value.
In an embodiment, once the shift map is generated, confidence scores may be calculated for each pixel based on a comparison of the shift value of the pixel and the shift values of neighboring pixels (box 460). For example, confidence scores may be calculated by determining the overall direction of shift in a predetermined region surrounding a pixel. If the pixel's shift value is generally similar to the shift values within the region, then the pixel may be assigned a high confidence score. If the pixel's shift value is dissimilar to the shift values within the region, then the pixel may be assigned a low confidence score. Overall shift values for a region may be derived by averaging or weighted averaging shift values of other pixel locations within the region.
Following image registration, the sub-ordinate image may be warped according to the shift map (box 470). The location of each pixel in the subordinate image may be relocated according to the shift values in the shift map.
The mixer 540 may take the frontal mask data and feather mask data as inputs. The mixer 540 may output data representing a pixel-wise merger of data from the two masks. In embodiments where high weights are given high numerical values, the mixer 540 may multiply the weight values at each pixel location or, alternatively, take the maximum weight value at each location as output data for that pixel location. An output from the mixer 540 may be input to the first layer frequency decomposition unit 514 for the mask data.
The layer fusion units 550-556 may output image data of their associated layers. Thus, the layer fusion unit 550 may be associated with the highest frequency data from the reference image and the warped sub-ordinate image (no frequency decomposition), a second layer fusion unit 552 may be associated with a first layer of frequency decomposition, and a third layer fusion unit 554 may be associated with a second layer of frequency decomposition. A final layer fusion unit 556 may be associated with a final layer of frequency decomposition. Each layer fusion unit 550, 552, 554, . . . 556 may receive the reference image layer data, the subordinate image layer data and the weight layer data of its respective layer. Output data from the layer fusion units 550-556 may be input to the merger unit 560.
Each layer fusion unit 550, 552, 554, . . . 556 may determine whether to fuse the reference image layer data and the subordinate image layer data based on a degree of similarity between the reference image layer data and the subordinate image layer data at each pixel location. If co-located pixels from the reference image layer data and the subordinate image layer data have similar values, the layer fusion unit (say, unit 552) may fuse the pixel values. If the co-located pixels do not have similar values, the layer fusion unit 552 may not fuse them but rather output a pixel value taken from the reference image layer data.
The merger unit 570 may combine the data output from the layer fusion units 550-556 into a fused image. The merger unit 570 may scale the image data of the various layers to a common resolution, then add the pixel values at each location. Alternatively, the merger unit 570 may weight the layers' data further according to a hierarchy among the layers. For example, in applications where sub-ordinate image data is expected to have higher resolution than reference image data, correspondingly higher weights may be assigned to output data from layer fusion units 550-552 associated with higher frequency layers as compared to layer fusion units 554-556 associated with lower frequency layers. In application, system designers may tailor individual weights to fit their application needs.
The first mixer 610 in the layer fusion unit 600 may receive filtered data from a frequency decomposition unit associated with the sub-ordinate image chain and a second mixer 620 may receive filtered data from the frequency decomposition unit associated with the reference image chain. Thus, the mixers 610, 620 may apply complementary weights to the reference image data and the sub-ordinate image data of the layer. The adder 630 may generate pixel-wise sums of the image data input to it by the mixers 610, 620. In this manner, the adder 630 may generate fused image data at each pixel location.
The selector 640 may have inputs connected to the adder 630 and to the reference image data that is input to the layer fusion unit 600. A control input may be connected to the comparison unit 650. The selector 640 may receive control signals from the comparison unit 650 that, for each pixel, cause the selector 640 to output either a pixel value received from the adder 630 or the pixel value in the reference image layer data. The selector's output may be output from the layer fusion unit 600.
As indicated, the layer fusion unit 600 may determine whether to fuse the reference image layer data and the subordinate image layer data based on a degree of similarity between the reference image layer data and the subordinate image layer data at each pixel location. The comparison unit 650 may determine a level of similarity between pixels in the reference and the subordinate image level data. In an embodiment, the comparison unit 650 may make its determination based on a color difference and/or a local high frequency difference (e.g. gradient difference) between the pixel signals. If these differences are lower than a predetermined threshold then the corresponding pixels are considered similar and the comparison unit 650 causes the adder's output to be output via the selector 650 (the image data is fused at the pixel location).
In an embodiment, the comparison threshold may be set based on an estimate of a local noise level. The noise level may be set, for example, based properties of the cameras 112, 114 (
In another embodiment, the image fusion techniques described herein may be performed by a central processor of a computer system.
The central processor 710 may read and execute various program instructions stored in the memory 740 that define an operating system 712 of the system 700 and various applications 714.1-714.N. The program instructions may perform image fusion according to the techniques described herein. As it executes those program instructions, the central processor 5710 may read from the memory 740, image data created by the cameras 720, 730 and it may perform image registration operations, image warp operations, frontal and feather mask generation, and image fusion as described hereinabove.
As indicated, the memory 40 may store program instructions that, when executed, cause the processor to perform the image fusion techniques described hereinabove. The memory 740 may store the program instructions on electrical-, magnetic- and/or optically-based storage media.
The image processor 120 (
Several embodiments of the disclosure are specifically illustrated and/or described herein. However, it will be appreciated that modifications and variations of the disclosure are covered by the above teachings and within the purview of the appended claims without departing from the spirit and intended scope of the disclosure.