Embodiments of the present invention generally relate to merging multiple images of a scene to generate a tuned, corrected, and/or altered final scene image.
Images captured by digital cameras are formed from pixels, and every pixel has a limited number of digital bits per color. The number of digital bits per pixel is called the pixel bit width value or pixel depth value. A High Dynamic Range (HDR) image has pixel bit width values greater than 8 bits, which means more information can be provided per pixel, thus affording greater image contrast and detail. HDR images can thus provide more complete gradients of gray shades, and improved clarity in an image's shadows, highlights and mid tone regions that would be missing from standard low dynamic range (LDR) images.
An HDR image can be captured by rapidly acquiring multiple LDR images of a scene that are captured at different exposure levels. There are a variety of scenarios that present unique challenges in generating HDR images, including low light levels, high noise, and high dynamic range situations. The dynamic range in an imaging situation refers to the range of luminance values in the scene to be imaged. It can be expressed as the ratio of the greatest luminance value in the scene to the smallest luminance value in the scene. Luminance values are dictated by the imaging exposure level of the scene. A low exposure level will properly capture the gray shades in scene areas fully illuminated by bright sunlight and a high exposure level will properly capture the gray shades in scene areas completely shielded from the sun and sky by buildings and trees. However, at the low exposure level the areas of the scene in shadow will be completely black, in black saturation, and show no detail, and the mid-tone areas will lose detail. Further, at the high exposure level, the highlights of the scene will be completely white, in white saturation, and show no detail, and the mid-tone areas will again lose detail. Thus, a third, mid exposure level image, which properly captures mid-level gray shades, is often acquired as well. By mixing these three LDR images, an HDR image can be generated that depicts an enlarged gray scale range of the scene. Merging multiple exposures preserves both the saturated and the shadow regions and thus provides a higher dynamic range than a single exposure. Most imaging systems are not capable of acquiring or capturing an HDR image with a single exposure. Thus, HDR images are typically computer-generated or generated from a combination of images captured at different times or with different exposure settings.
There are several known techniques for generating an HDR image from two or more exposures. In one technique, the exposures may be spatially interleaved. In some techniques, the imaging system merges multiple exposures and provides a native HDR Bayer image with a pixel bit width ranging from 12 to 20 bits. In some techniques, the imaging system captures multiple temporally spaced exposures and these exposures are merged to form an HDR image in the imaging device receiving the multiple exposures. Whether the imaging system generates the HDR image or the imaging device generates the HDR image, tone mapping may need to be performed on the HDR image to permit processing of the HDR image in an imaging pipeline with a lesser pixel bit width value, e.g., 10 to 12 bits.
Once an HDR image has been created, it can be challenging to then display that image properly in an electronic or printed medium, because the electronic or print medium itself lacks dynamic range. This challenge is typically addressed with tone mapping operators (TMOs), which convert a range of luminance values in an input image into a range of luminance values that well matches the electronic or pre-print medium. Although known, current tone mapping methodologies require significant processing operations requiring the performance of a large number of floating point operations over a short period of time. Thus, there is a need to provide techniques and algorithms for improved tone mapping and for improved generation of HDR tuned images without this significant computational burden.
Even following the creation of a properly-tuned HDR image, certain professions or marketplace segments require that images be further corrected and/or have segments replaced entirely with alternative imaging material. One example is the field of real estate photography, which requires that exterior property photographs collected during a particular time of day, and during certain weather, be corrected and/or modified to present buyers with scenes of the property in differing contexts (e.g. exterior overcast shot versus sunny, removing image ghosting due to wind movement, etc.). HDR image classification, segmentation, and replacement methods exist to address the needs of these marketplace segments, however they present significant challenges from a lack of efficient automation in segmenting and replacing portions of images to produce acceptable scenes.
It is against this background that the techniques and algorithms described herein have been developed. To overcome the problems and limitations described above there is a need for an improved method of classification-based HDR image exposure merging, tuning, correction, and segment replacement.
One or more embodiments of the invention are directed a classification-based HDR image merging, tuning, correction, and/or replacement method.
The invention may be embodied as a method of mixing a plurality of digital images of a scene, including capturing the images at different exposure levels, registering counterpart pixels of each image to one another, deriving a normalized image exposure level for each image, and employing the normalized image exposure levels in an image blending process. The image blending process includes using the image blending process to blend a first selected image and a second selected image to generate an intermediate image, and when the plurality is composed of two images, outputting the intermediate image as a mixed output image. When the plurality is composed of more than two images, the image blending process includes repeating the image blending process using the previously generated intermediate image in place of the first selected image and another selected image in place of the second selected image until all images have been blended, and outputting the last generated intermediate image as the mixed output image.
The image blending process blends the counterpart pixels of two images and includes deriving a luma value for a pixel in the second selected image, using the luma value of a second selected image pixel as an index into a look-up table to obtain a weighting value between the numbers zero and unity, using the weighting value, the normalized exposure level of the second selected image, and the second selected image pixel to generate a processed second selected image pixel, selecting a first selected image pixel that corresponds to the second selected image pixel, using the first selected image pixel and the result of subtracting the weighting value from unity to generate a processed first selected image pixel, adding the processed first selected image pixel to the counterpart processed second selected image pixel to generate a blended image pixel, and repeating the above processing sequence until each second selected image pixel has been blended with its counterpart first selected image pixel.
The method may include determining gamma correction in an image by: receiving, by one or more processors, at least a first exposure image of a scene; receiving, by the one or more processors, at least a second exposure image of the scene, wherein the second exposure image of the scene has a shorter exposure time than the first exposure image of the scene; computing, by the one or more processors, a pixel value for a pixel location of a high dynamic range (HDR) image to be a sum of a pixel value of the first exposure image weighted by a first exposure weight and a pixel value of the second exposure image weighted by a second exposure weight, to produce a merged HDR image comprising Y bit data; adaptively mapping, by the one or more processors, the HDR image, to produce an output HDR image having Z bit data and a total number of pixels; applying, by the one or more processors, a range of gamma value correction levels to the output HDR image detecting a number of pixels having a value of black level less than a predefined black level threshold; and selecting a tuned gamma value correction level.
The method may include correcting detail obscured by brightness glare in an image, by: receiving, by one or more processors, at least a first exposure image of a scene; receiving, by the one or more processors, at least a second exposure image of the scene, wherein the second exposure image of the scene has a shorter exposure time than the first exposure image of the scene; computing, by the one or more processors, a refined mask, by performing a conjugation of the at least first exposure image of the scene and the at least second exposure image of the scene and selecting a number of pixels having a value of black level greater than a predefined black level threshold tb to form an unrefined mask of the scene, quantifying an amount of detail present in at least one portion of the second exposure image of the scene having a brightness level higher than the average brightness level of all pixels in the second exposure image of the scene, by applying a Laplacian to said second exposure image of the scene and applying a median blur denoising operation to form an intermediary laplacian mask, and selecting at least one pixel in at least one region of the intermediary laplacian mask that does not have a zero value; and computing a blended image by applying a gaussian pyramid operation and a Laplacian pyramid merging of the at least second exposure image and an exposure fusion image using the refined mask.
The method may also commence the image mixing prior to the capture of all the images of the plurality. The method may also commence the image mixing immediately after the capture of the second image of the plurality.
The method may include segmenting an image having sky in a scene, by computing a pixel mask as follows: receiving, by one or more processors, at least a first exposure image of a scene; receiving, by the one or more processors, at least a second exposure image of the scene, wherein the at least second exposure image of the scene has a shorter exposure time than the at least first exposure image of the scene; detecting, a number of pixels having a blue hue value greater than a predefined blue hue level threshold, greater than a red hue value for the number of pixels, and greater than a green hue value for the number of pixels; computing a detection mask by performing a linear combination of at least one mean blue hue mask and one threshold blue hue mask; and computing at least one group of pixels from the detection mask as sky, by detecting a largest group of pixels having a blue hue value greater than a predefined blue hue level threshold, greater than a red hue value for the number of pixels, and greater than a green hue value for the number of pixels in the detection mask, and designating pixels away from the largest group of pixels as ‘not sky.’
The embodiments may further include a method of removing location shifted replications of scene objects appearing in a mixed image generated by a digital image mixing process applied to a plurality of images acquired at different exposure levels and different times, with the images having their counterpart pixels registered to one another.
The method may detect local motion by determining an absolute luminosity variance between each pixel of the reference image and the comparison image to produce a difference image and identifying difference image regions having absolute luma variances exceeding a threshold. The selected images used as reference images may include the image with the lowest exposure level, the image with the highest exposure level, or any images with intermediate exposure levels. The images to be processed may be downscaled in luminance value before processing.
Another embodiment of the invention is a method for segmenting image patches patches, comprising applying a morphological erosion operation on a binary image of relevant pixels, applying a morphological close operation on the binary image, applying a labeling algorithm to distinguish different patches, and outputting patch descriptions. In this embodiment, the relevant pixels may share identified properties, that may include inconsistent luminosity values from at least one image comparison, and/or detected local motion.
In a further embodiment, a method is provided for selecting a replacement image from a plurality of candidate replacement images as a source for replacement patch image data. The method may include computing a weighted histogram of luma values of border area pixels of a particular patch of a reference image, dividing the histogram into a plurality of regions according to threshold values determined from relative exposure values of the candidate replacement images, calculating a score function for each histogram region, selecting the region with the maximum score, and outputting the corresponding candidate replacement image. The histogram weighting may increase the influence of over-saturated and under-saturated luma values. In this embodiment, candidate replacement images of relatively low exposure values may be selected for replacing patches from reference images of relatively high exposure values, and vice-versa. The reference image may be a medium exposure image with downscaled luma values. The score function for a particular histogram region may be defined as the ratio of the number of pixels in the particular histogram region considering the size of the particular histogram region, to the average difference of the histogram entries from the mode luma value for the particular histogram region.
Additionally, the invention may be embodied as a method for replacing patch image data in a composite image and blending the composite image and the upscaled smoothed patch image to produce an output image.
The above and other aspects, features and advantages of the invention will be more apparent from the following more particular description thereof, presented in conjunction with the following drawings wherein:
The present invention comprising classification-based image merging, tuning, correction, and replacement method will now be described. In the following exemplary description numerous specific details are set forth in order to provide a more thorough understanding of embodiments of the invention. It will be apparent, however, to an artisan of ordinary skill that the present invention may be practiced without incorporating all aspects of the specific details described herein. Furthermore, although steps or processes are set forth in an exemplary order to provide an understanding of one or more systems and methods, the exemplary order is not meant to be limiting. One of ordinary skill in the art would recognize that the steps or processes may be performed in a different order, and that one or more steps or processes may be performed simultaneously or in multiple process flows without departing from the spirit or the scope of the invention. In other instances, specific features, quantities, or measurements well known to those of ordinary skill in the art have not been described in detail so as not to obscure the invention. It should be noted that although examples of the invention are set forth herein, the claims, and the full scope of any equivalents, are what define the metes and bounds of the invention.
For a better understanding of the disclosed embodiment, its operating advantages, and the specified object attained by its uses, reference should be made to the accompanying drawings and descriptive matter in which there are illustrated exemplary disclosed embodiments. The disclosed embodiments are not intended to be limited to the specific forms set forth herein. It is understood that various omissions and substitutions of equivalents are contemplated as circumstances may suggest or render expedient, but these are intended to cover the application or implementation.
The term “first”, “second” and the like, herein do not denote any order, quantity or importance, but rather are used to distinguish one element from another, and the terms “a” and “an” herein do not denote a limitation of quantity, but rather denote the presence of at least one of the referenced item.
As used herein, the term “substantially,” “about,” and similar terms are used as terms of approximation and not as terms of degree, and are intended to account for the inherent deviations in measured or calculated values that would be recognized by those of ordinary skill in the art. Further, the use of “may” when describing embodiments of the present invention refers to “one or more embodiments of the present invention.” As used herein, the terms “use,” “using,” and “used” may be considered synonymous with the terms “utilize,” “utilizing,” and “utilized,” respectively. Also, the term “exemplary” is intended to refer to an example or illustration.
As will be understood by one skilled in the art, for any and all purposes, such as in terms of providing a written description, all ranges disclosed herein also encompass any and all possible sub-ranges and combinations of sub-ranges thereof. Any listed range can be easily recognized as sufficiently describing and enabling the same range being broken down into at least equal halves, thirds, quarters, fifths, tenths, etc. As a non-limiting example, each range discussed herein can be readily broken down into a lower third, middle third and upper third, etc. As will also be understood by one skilled in the art all language such as “up to”, “at least”, “greater than”, “less than”, and the like include the number recited and refer to ranges which can be subsequently broken down into sub-ranges as discussed above. Finally, as will be understood by one skilled in the art, a range includes each individual member. Thus, for example, a group having 1-3 articles refers to groups having 1, 2, or 3 articles. Similarly, a group having 1-5 articles refers to groups having 1, 2, 3, 4, or 5 articles, and so forth. The phrases “and ranges in between” can include ranges that fall in between the numerical value listed. For example, “1, 2, 3, 10, and ranges in between” can include 1-1, 1-3, 2-10, etc. Similarly, “1, 5, 10, 25, 50, 70, 95, or ranges including and or spanning the aforementioned values” can include 1, 5, 10, 1-5, 1-10, 10-25, 10-95, 1-70, etc.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the present invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and/or the present specification, and should not be interpreted in an idealized or overly formal sense, unless expressly so defined herein.
One or more embodiments of the present invention will now be described with references to
In the first step of the ghosting detection process 18, the images are thresholded using two thresholds td and tb, for each image 212, 214, and 216 where:
(I>tb or I<td)=>I=0
Where I represents that respective image's intensity or brightness.
In the second step, for each pixel's intensity in each respective image 212, 214, and 216 check for the following condition:
i
H
i
>I
M
i
>I
L
i
where i is the pixel location, and IH, IM, IL are the high 216, medium 214, and low 212 exposure images, respectively.
In the third step, create a binary mask, using the equation in step two, setting a mask pixel value to ‘1’ if condition is ‘false’, and setting a mask pixel value to ‘0’ if the condition is ‘true.’ The white pixels in the formulated binary black/white ghosting mask indicate the pixels in the merged composite image 218 that containing a ghosting region 220.
In the final step, filter for noise and count the number of non-zero pixels in the merged composite image 218. Finally, threshold this number of non-zero pixels by dividing it by total number of pixels in the image for ghosting:
Referring again to
Each of the four alignment fields 316, 318, 320, and 322 is divided into segments, and each segment is matched with a corresponding candidate segment in an alternate image. The overall displacement in alignment is calculated from the sum of L1 distance between the pixel of the reference segment and the search area expanded by (4,4) in the alternate segment. This generates an aligned image set where for each segment, there are offset values determining which segment to choose while merging the images 312 and 314.
After aligning the images 312 and 314, they can be merged first temporally and then spatially. In the initial temporal merging step, each segment is first merged between burst images based on an offset determined by the alignment process described above. Weighting for selection of a segment is determined based on the average distance between pixel values of aligned segments.
The temporal phase of post-alignment merging follows the following equations:
O(x,yt)=Σi=n(Wti*Iti(x,y)/Ws)+Itt(x,y)/Ws
Where,
W
r
i=1/NDti if NDti>290 else 0
ND
r
i=max(1,Σx=16,y=16(Iti(x,y)−Ir(x,y))/256)
W
s=Σi=3(Wt)
Where Iti is the underlying intensity value of segment ‘t’ in the ith exposure image and Iti is the initially-segregated reference image.
Following the temporal merging of images, the temporally-merged segments are now merged spatially to create an aligned and deghosted composite HDR image 26 (
Referring again to
The applied label is then used to select particular parameters and/or thresholds in later HDR image processing. Again, depending on the applicable context for image review and use, the classification labeling may differ from the above.
If ghosting was not detected following the application of the process 18 above, the images 12, 14, and 16 can be merged using a standard Mertens algorithm (via Exposure Fusion) 24, resulting in a manipulable HDR image 26. As part of the merge, the Exposure Fusion weights for contrast, saturation and exposure were determined by the classification 22 of the scene as being either interior or exterior.
To improve image quality, the merged output HDR image 26 from either the classification 22 and Exposure Fusion merge 24, or the deghosted alignment and merging process 20, is tuned using gamma corrections 28 followed by correcting dark and bright regions 30 in the scene.
In this embodiment related to residential photography, the tuning algorithm is different for ‘interior’ and ‘exterior’ scenes. In interior scenes, one goal is to not allow black levels below a certain threshold. With an HDR image 26 as a starting point, the HDR image 26 can be adaptively mapped to produce an output HDR image having ‘Z’ bit data and a total number of pixels Ntotal. A range of gamma values from 0.5 to 2.0 are then applied to alter the image 26, the number of pixels (Nb) with values less than a black level threshold (tb) is determined, and then the lowest tuned gamma value correction level is selected, for which:
N
b<0.025*(Nt)
where Nt is the total number of pixels.
Further, interior scenes often include significant glare in brighter pixel regions, thus leading to an overall degradation of detail within brighter regions of the scene. To correct this, the omitted details are recovered from the lowest exposure image.
M
b=AND(Ib,Id)>tb, where Ib is the brightest image and Id is the darkest image.
Referring again to
M
i=Denoise(Laplacian(Id))
for each region (R) in Mi:
Where, Nr is the number of non zero pixels in region R
Next is selecting 422 at least one pixel Nr in at least one region R of the intermediary Laplacian mask Mi 616 that does not have a zero value in comparison to the unrefined mask Mb 518, and then keeping only those common regions in the unrefined mask Mb 518 not having a zero value to form a refined mask Mb (Refined) 618.
The final step in correcting detail obscured by brightness glare in an ‘interior’ image 410 is computing a blended image Iblend 716 (
I
blend=Blend(Id,IEF,Mb(Refined))
The end result is a blended image Iblend 716 where the brightest regions of the darker (e.g. under-exposed) second exposure image Id 514 bearing a greater amount of detail replace the overexposed regions in the simple Exposure Fusion image IEF 712 that lack detail.
In contrast to interior images, exterior image gamma tuning can be accurately determined based on the image's colorfulness. Colorfulness (C) is generally calculated in the following way:
C=stdRoot+(0.3*meanRoot)
Where,
stdRoot=√(stdB2+stdYB2)
meanRoot=√(meanB2+meanYB2)
YB=absolute(0.5*(R+G)−B)
where R,G,B are the RGB channels of the exterior image, and meanX and stdX are mean and standard deviations of channel x, respectively.
In the context of exterior exposure fusion images IEF and white-balanced images IWB, Colorfulness is calculated on the exposure fusion result and the output of auto white balance on exposure fusion.
I
WB=WhiteBalance(IEF)
C1=Colorfulness(IEF)
C2=Colorfulness(IWB)
If (C1<1.03*C2):
I
EF
=I
WB
The output is gamma corrected as follows:
for gamma in [0.5,0.6,0.7,0.8,0.9,1.0,1.1,1.2]:
I
gamma=GammaAdjustment(IEF)
if (maxMeangamma>128) and (minMeangamma>110):
I
EF
=I
gamma
Where
maxMeangamma=max(Mean(Rgamma),Mean(Ggamma),Mean(Bgamma))
minMeangamma=min(Mean(Rgamma),Mean(Ggamma),Mean(Bgamma))
With a gamma-corrected (based on Colorfulness) exterior HDR composite image Igamma, we are then able to correct over-brightened regions of the image that lost detail due to over exposure (as done above for interior images 410). The difference in the context of an exterior image is that gradient values are not used to filter out the over-exposed regions:
M
d=AND(INV(Id),INV(Ib))
Where Md is the mask of the darkest regions in the exterior scene. This darkness mask is then utilized in a blend to create a brightness-corrected and gamma-tuned HDR exterior image composite:
I
ext blend=Blend(Id,Igamma,Md)
After improving the visibility of details apparent in glare-obscured regions of a scene captured with HDR composite images, regions of that scene may need to be selected in their entirety for manipulation, correction, or wholesale replacement. In one embodiment of the present invention, the portion of an exterior image has such a region, such that segmentation and replacement of that portion can be performed to replace the overcast appearance within the final image to something else, or to manipulate the appearance of the existing region by increasing blue level or even enhancing contrast. In the present embodiment, the region for manipulation and/or replacement is an overcast sky.
Ideally in the sky region 816 (
M
BR=(B−R)>th
M
BG=(B−G)>th
M
B
=B>th
M
1=AND(OR(MBR,MBG),MB)
A threshold blue hue mask M2 1010 (
meanB=mean(M1*B)
M
2
=B−meanB<th
M
3=AND(M2,M1)
The detection mask M3 1012 (
dist1=L2-norm([μB,μG,μR],[Rmeanr,Gmeanr,Bmeanr])
if dist1>th:
M
3
{r}=0
In the lowest (darkest) exposure image Idark 812 (
sky_brightness=mean(V[0:0.2*image_h])
min_brightness=mean(V[0.6*image_h:image_h])
M
V[0:0.4*image_h]=V[0:0.4*image_h]>min(0.7*sky_brightness,t1)
M
V[0.4*image_h:]=V[0.4*image_h:]>min(2*min_brightness,t2)
if sky_brighntess<=t3:
M
V=AND(MV,V<245)
We then compute thresholded masks for the blue-red 926 and blue-green 928 channels to get masks for blue-red MBR1 1014 (
M
BR1
=B>R+10
M
BG1
=B>G+10
Finally, the combined detection mask M41020 (
M
4=AND(AND(MBR1,MBG1),MV)
After completing the two heuristic approaches for detecting and differentiating (i.e. segmenting) the ‘sky’ region 816 (
M
HSV=AND(S<t1,AND(V>t2,H<t3))
Next, the newly-formed hue, saturation, and value mask MHSV 1022 (
M
combined=OR(M3,M4,MHSV)
Next, a bright mask Mbright 1026 (
M
bright=AND(INTbright,INTdark)>tb
Next, a sure sky mask Msure 1028 (
M
sure=(Mcombined⊖Mbright)
Next, an intermediary grab-cut mask Mgrab_cut 1030 (
M
sure=sure foreground seed
M
combined
−M
sure=probable foreground (Mpf)
M
sure_bg=Bottom 40% region=sure background
INV(Msure_bg+Mcombined)=probable background
M
grab_cut=GRAB_CUT(Msure,Mpf,Msure_bg,INV(Msure_bg+Mcombined))
Finally, the newly-created intermediary grab-cut mask Mgrab_cut 1030 (
M
final=OR(Mgrab_cut,Msure)
In this embodiment, the creation of the finalized sky segmentation mask Mfinal1032 (
The segmentation improvement method described in the present embodiment also affords improvement in pretraining convolutional networks of artificial neurons through the repeated convolution and pooling of at least one set of clear sky images and at least one set of sky images at least partially containing cloud cover. In an alternative embodiment, clear-sky and cloud-cover images may be synthetically generated using computer graphics. In a further alternative embodiment, the improvements in pretraining can be applied to features and/or portions of indoor and outdoor scenes other than the sky (e.g. rectangular real estate signboards, ceilings, lawns, pools, carpets, etc.).
While the invention herein disclosed has been described by means of specific embodiments and applications thereof, numerous modifications and variations could be made thereto by those skilled in the art without departing from the scope of the invention set forth in the claims.