Embodiments of the present invention generally relate to the processing of image data.
The human visual system perceives the same range of colors under a wide variety of scene illuminations. A white piece of paper remains resolutely white independent of the color of the illuminant (the color of the light under which the piece of paper is viewed). In contrast, color imaging systems (e.g., digital cameras) are less color constant in that they will often incorrectly infer the color of the illuminant. Unless the influence of the color of the illuminant is compensated, the digital camera cannot acceptably reproduce actual scene colors. The process of “correcting” the image data to compensate the effect of the illuminant color is commonly referred to as white balancing.
In white balancing, the color of the scene illumination either is measured or is estimated from the image data, and then the image data is adjusted to compensate for the effect of the illuminant. Because it is not practical to equip each camera with a dedicated illumination sensor and then expect users to calibrate to a white reference every time a picture is taken, conventional cameras typically estimate the illuminant color from the image data.
According to one conventional white balancing method, all of the pixel color values in an image are averaged, and the image data is then adjusted so that the average of the pixel color values is gray. According to another conventional white balancing method, the brightest spot (a specularity) in an image is presumed to be the light source, the color of that specularity is presumed to be the color of the illuminant, and the image data is adjusted to compensate for the color of the illuminant. These types of methods are simple to implement but unfortunately they fail in many scenarios because they are based on assumptions that do not always hold true. For example, the average color of a scene may not actually be gray, and a specularity may not appear in an image (and when present may be difficult to find).
Other attempts to address the illuminant estimation problem use a probabilistic framework. Given an input image Cim, the objective of a probabilistic approach is to recover Pr(E|Cim)—the probability that E was the scene illuminant given Cim. The illuminant hypothesis Ei that produces the largest Pr(E|Cim) is the estimated scene illuminant. According to Bayes' Rule:
where Pr(E) is the probability that the scene illuminant is E, and Pr(Cim|E) is the probability of observing the image Cim under illuminant E.
Pr(Cim) is a constant regardless of the illuminant, so it can be omitted from equation (1) without changing the outcome of the maximum probability estimation:
Pr(E|Cim)=Pr(Cim|E)Pr(E); (2)
where Pr(E) is the probability of illuminant E. In most cases, Pr(E) is assigned an equal value for all illuminant colors. The likelihood function Pr(Cim|E) is typically estimated by assuming that the chromaticity values of the pixels in an image are independent of each other, so that Pr(Cim|E) can be replaced in equation (2) by the product of the conditional probabilities of the pixel given illuminant E:
where c is the color (chromaticity) of a pixel in image Cim, and Pr(c|E) indicates how “probable” it is that the camera would capture a color c under the illuminant color E (that is, given an illuminant color E, what is the probability that color c would be observed in the image).
In order to generate Pr(c|E), a correlation matrix is created to correlate possible image colors with potential colors of scene illuminants. The correlation matrix characterizes, for each illuminant color considered, the range of possible image colors that can be observed under that illuminant color. In other words, a “camera gamut” for each potential illuminant color is constructed.
To implement a correlation matrix in a camera, training image data is used to create a two-dimensional (2D) or even three-dimensional (3D) color histogram for each illuminant color considered, and the histograms are installed in the camera's memory as lookup tables. If pixel color is represented using eight (8) bits per pixel, and the probabilistic score Pr(c|E) is represented in an 8-bit format, a 2D lookup table requires 64,000 bytes of memory per illuminant color, and a 3D lookup table requires 16,000,000 bytes of memory per illuminant color. Because multiple illuminant colors are typically considered, an extensive amount of memory can be consumed by a conventional correlation matrix.
Thus, one problem with conventional white balancing approaches is that they can consume an extensive amount of a camera's memory. Another problem with conventional approaches is that they are heavily biased by the training image database that is used to generate the correlation matrix. If the training image database is not carefully selected, estimation of the illuminant color can be adversely affected. For example, it is important to match the training image database with the type of image data expected to be encountered when the camera is put to use. Studies on this subject have demonstrated that conventional training image databases are problematic.
Consequently, methods and/or systems that can be used for white balancing, but that consume less memory, would be advantageous. Methods and/or systems that can accomplish this with an improved training image database would also be advantageous. Embodiments in accordance with the present invention provide these and other advantages.
In overview, embodiments in accordance with the present invention utilize an integrated framework to improve the performance of probabilistic or correlation-based illuminant estimation methods. In one embodiment, incoming image data goes through a scene classifier and a gamut mapper. The scene classifier determines the probabilities that various scene classes (e.g., indoor, landscape, sky, portrait, etc) are associated with the image data. The gamut mapper determines the probabilities that various combinations of illuminant color and scene class are associated with the image data. The probabilities from the scene classifier are used to weight the probabilities from the gamut mapper. The weighted results can be used to select an illuminant color. The image data can be adjusted to compensate for the selected illuminant color.
According to embodiments of the present invention, the correlation matrix used by the gamut mapper is scene class-dependent. As a result, the process of preparing the training image database can be better focused. Also, because construction of the correlation matrix focuses on the most probable colors that can occur within the scene classes considered, the effective camera gamut is contracted and thus the memory requirement for the correlation matrix can be reduced. These and other objects and advantages of the various embodiments of the present invention will be recognized by those of ordinary skill in the art after reading the following detailed description of the embodiments that are illustrated in the various drawing figures.
The accompanying drawings, which are incorporated in and form a part of this specification, illustrate embodiments of the present invention and, together with the description, serve to explain the principles of the invention.
Reference will now be made in detail to the various embodiments of the present invention, examples of which are illustrated in the accompanying drawings. While the invention will be described in conjunction with these embodiments, it will be understood that they are not intended to limit the invention to these embodiments. On the contrary, the invention is intended to cover alternatives, modifications and equivalents, which may be included within the spirit and scope of the invention as defined by the appended claims. Furthermore, in the following detailed description of the present invention, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be understood that the present invention may be practiced without these specific details. In other instances, well-known methods, procedures, components, and circuits have not been described in detail so as not to unnecessarily obscure aspects of the present invention.
Some portions of the detailed descriptions that follow are presented in terms of procedures, logic blocks, processing, and other symbolic representations of operations on data bits within a computer memory. These descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. In the present application, a procedure, logic block, process, or the like, is conceived to be a self-consistent sequence of steps or instructions leading to a desired result. The steps are those utilizing physical manipulations of physical quantities. Usually, although not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated in a computer system. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as transactions, bits, values, elements, symbols, characters, samples, pixels, or the like.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussions, it is appreciated that throughout the present invention, discussions utilizing terms such as “determining,” “weighting,” “selecting,” “repeating,” “summing,” “using,” “identifying,” “adjusting,” “accessing” or the like, refer to actions and processes (e.g., flowchart 60 of
The image data 11 and 13 (
Scene classifier 22 determines the probability that SN is the scene class given the raw image data 11. This probability is expressed herein as Pr(SN|C) or Pr(SN|Cim). For example, there may be N scene classes, and scene classifier 22 determines, given the raw image data 11, a first probability that S1 is the scene class, a second probability that S2 is the scene class, and so on.
Scene classification is a technique used for image database indexing and content retrieval. Scene classification is known in the art. The classification can be done by image segmentation or by qualitative scene structure encoding. Several predetermined scene classes, such as but not limited to portrait, landscape, beach, sky, ski, sunset and indoor, are assigned and trained in the scene classifier a priori. The output of the scene classifier is the confidence level (expressed herein as a probability) that the given image resembles a certain class of scene.
If scene classifier 22 categorizes the images into N different scene classes, then Pr(E|Cim)—the probability that E was the scene illuminant given Cim—can be expressed as:
where Pr(S|Cim) is the probabilistic output score (e.g., the confidence level or probability) of the scene classifier 22 for the scene class S, and Pr(E|Cim,S) is the probability that E was the scene illuminant given Cim for the scene class S. Pr(E|Cim,S) can be further expanded by Bayes' Rule as follows:
where p(Cim|E,S) is the probability of observing Cim given E and S (that is, what is the probability of obtaining the image data Cim if E is the illuminant color and S is the scene class), Pr(E|S) is the probability that the illuminant is E given the scene class S, and p(Cim|S) is the probability of observing Cim given S (that is, what is the probability of obtaining the image data Cim if S is the scene class).
In one embodiment, p(Cim|E,S), Pr(E|S) and p(Cim|S) are implemented as lookup tables (LUTs) 1, 2 and 3, respectively. The LUTs are scene class-dependent; thus, there is a lookup table for each scene class S. In one embodiment, gamut mapper 23 determines P(E|C,S) for each of the M illuminant colors E being considered, where P(E|C,S) is the probability that E is the illuminant color given the scene class S and the image data C.
The probability Pr(S|Cim) from scene classifier 22 can be viewed as a weighting factor that is applied by illuminant estimator 24 to the scene class-dependent gamut mapping results P(E|C,S) from gamut mapper 23. The probability Pr(S|Cim) can be obtained directly from the output of the scene classifier 22, because Pr(S|Cim) represents the probability of scene class S if the incoming image is Cim. This is illustrated further by
With reference to
The correlation matrix p(Cim|E,S) can be implemented as a 2D or 3D histogram (with a smaller data range relative to conventional art), or by other clustering/classification schemes, such as but not limited to VQ (vector quantization), SVM (support vector machine), and neural network.
The value of p(Cim|S) is independent of the illumination color, and is implemented as a constant value for each scene class S.
Pr(E|S) is a pre-programmed value. As mentioned above, it indicates how probable it is that illuminant E can occur if the scene class is S. For example, if the image is categorized as landscape, the score for daylight illuminants would be higher than incandescent or fluorescent illuminants.
Significantly, instead of a large multipurpose correlation matrix as applied in equation (3), embodiments in accordance with the present invention decompose the correlation matrix into several scene class-dependent lookup tables. One of the advantages of creating scene class-dependent LUTs is that the preparation of the training image database is more focused and guided. For example, to build a correlation matrix for the portrait scene class, the skin tone color becomes a major color cue, as the blue tone is for the sky/sea classes and the green tone is for the landscape/vegetation classes. The major color cues will be the major contributor for building the correlation matrix of the corresponding scene class. More major color samples are collected, finer resolution at the major color zone is prepared, and a higher weighting factor is assigned. The non-major colors can either be de-emphasized or even omitted. Down-playing the non-major colors reduces the chance of a false positive error. It is also expected to save memory space, because the correlation matrix covers only the spread of the major colors instead of the whole color spectrum.
There could be more than one major color for a scene class. For example, the landscape class may use green, blue, and gray as major colors, and the sunset class can use yellow and red.
Equation (5) is implemented on a per-pixel basis. If the scene classifier 22 applies image segmentation techniques, the likelihood function p(Cim|E,S) of equation (5) can be further decomposed to:
where K represents an image segment of similar color. Thus, instead of processing (white balancing) the image data pixel-by-pixel, the processing can be performed on a per-segment basis. Image segmentation techniques are known in the art.
If the scene classifier does not use an image segmentation technique, equation (6) may still be applicable if the definition of K is changed from an image segment to a major or non-major color. In other words, in one instance, segmentation is performed by grouping pixels of a similar color where the pixels in a segment are spatially dependent, and in another instance, segmentation is performed by grouping pixels of a similar color where the pixels in the segments are spatially independent. For example, in the former instance, if the image data represents an image of a landscape, with an expanse of blue sky in the background and a similarly colored blue object in the foreground, the pixels associated with the blue sky would be grouped in one segment and the pixels associated with the foreground object would be grouped in another segment. However, in the latter instance, the pixels associated with the blue sky and the pixels associated with the foreground object would be grouped in the same segment.
For each of the M illuminant colors being considered, and for each of the N possible scene classes (that is, for each possible combination of M and N), gamut mapper 23 determines a value of P(E|C,S) that is input to illuminant estimator 24. For each of the N possible scene classes, scene classifier 22 determines a value for Pr(S|Cim) that is input to illuminant estimator 24.
With references to
For each illuminant color being evaluated, illuminant estimator 24 then sums the results of the multiplication operations to determine Pr(EM|C), M=1, 2, . . . , M: That is, the results of the multiplication operations include a probability value for each illuminant color for each scene class. Thus, for example, there is a first set of probability values for illuminant E1 for all of the scene classes S1, S2, . . . , SN; a second set of probability values for illuminant E2 for all of the scene classes S1, S2, . . . , SN; and so on, for M=1, 2, . . . , M. At adder 33, the probability values within each set are added. Thus, for example, the individual values within the first set (associated with illuminant E1) are added to determine a probability value Pr(E1|C). In one embodiment, the illuminant color E that is associated with the largest value of Pr(EM|C) is selected as the illuminant color that in turn may be used by image adjustor 25 (
For each scene class N=1, 2, . . . , N, gamut mapper 23 determines a probability value P(E1|C,S). For example, for scene class S=1, gamut mapper 23 accesses the appropriate scene class-dependent set of LUTs 1, 2 and 3 to fetch p(C|E1,S1), Pr(E1|S1) and p(C|S1) and determines P(E1|C,S1). At multiplier 41, for scene class S=1, P(E1|C,S1) is weighted by multiplying P(E1|C,S1) and Pr(S1|C) from scene classifier 22.
The fetch and multiply operations described in the preceding paragraph are repeated for each scene class N=1, 2, . . . , N for illuminant color E1. At adder 43, the results of the multiply operations are added. Thus, the output of adder 43 is the probability Pr(E1|C) that E1 is the illuminant color given the raw image data C.
The operations described in the above discussion of
In block 601 of
In block 602, a scene class S is selected.
In block 603, for the current values of E and S, the probability P(E|C,S) that E is the illuminant color given S and the raw image data C is determined. In one embodiment, the probability P(E|C,S) is determined by gamut mapper 23 of
In block 604 of
In block 605 of
At block 606, if there is another scene class to consider, then flowchart 60 returns to block 602. Otherwise, flowchart 60 proceeds to block 607.
At block 607, the values of Pr(E|C) for the current value of E and all values of S are summed.
At block 608, if there is another illuminant color to consider, then flowchart 60 returns to block 601. Otherwise, flowchart 60 proceeds to block 609.
At block 609, the maximum of the summed probabilities for all values of E (the summed probabilities determined in block 607 for each value of E) is identified. The value of E associated with the maximum value is selected. The selected value of E is thus identified as the most likely color of the illuminant that was used to illuminate a scene at the time the image data representing an image of the scene was captured. Accordingly, the selected value of E can be applied to white balance the image data.
In summary, embodiments of the present invention provide a hybrid approach of gamut mapping and scene classification. More specifically, according to the present invention, the results of a scene classification step are used to weight the results of a gamut mapping step.
Furthermore, the correlation matrix used by the gamut mapper of the present invention is scene class-dependent. Consequently, the process of preparing the training image database can be better focused. Also, because construction of the correlation matrix focuses on the most probable colors that can occur within the scene classes considered, the effective camera gamut is contracted and thus the memory requirement for the correlation matrix can be reduced. Embodiments in accordance with the present invention thus provide methods and systems that can be used for white balancing, but that consume less memory and utilize an improved training image database relative to conventional techniques.
Embodiments of the present invention are thus described. While the present invention has been described in particular embodiments, it should be appreciated that the present invention should not be construed as limited by such embodiments, but rather construed according to the below claims.
Number | Name | Date | Kind |
---|---|---|---|
3904818 | Kovac | Sep 1975 | A |
4253120 | Levine | Feb 1981 | A |
4642684 | Alkofer | Feb 1987 | A |
4739495 | Levine | Apr 1988 | A |
4920428 | Lin et al. | Apr 1990 | A |
5652621 | Adams, Jr. et al. | Jul 1997 | A |
5667944 | Reem et al. | Sep 1997 | A |
5850470 | Kung et al. | Dec 1998 | A |
6038339 | Hubel et al. | Mar 2000 | A |
6377702 | Cooper | Apr 2002 | B1 |
6683643 | Takayama et al. | Jan 2004 | B1 |
6724932 | Ito | Apr 2004 | B1 |
6785814 | Usami et al. | Aug 2004 | B1 |
6873727 | Lopez et al. | Mar 2005 | B2 |
6900836 | Hamilton, Jr. | May 2005 | B2 |
7009639 | Une et al. | Mar 2006 | B1 |
7092018 | Watanabe | Aug 2006 | B1 |
7133072 | Harada | Nov 2006 | B2 |
7305148 | Spampinato et al. | Dec 2007 | B2 |
7486844 | Chang et al. | Feb 2009 | B2 |
7502505 | Malvar et al. | Mar 2009 | B2 |
7671910 | Lee | Mar 2010 | B2 |
20020015111 | Harada | Feb 2002 | A1 |
20020044778 | Suzuki | Apr 2002 | A1 |
20020105579 | Levine et al. | Aug 2002 | A1 |
20020167602 | Nguyen | Nov 2002 | A1 |
20030007076 | Okisu et al. | Jan 2003 | A1 |
20030194125 | Hubel et al. | Oct 2003 | A1 |
20030222995 | Kaplinsky et al. | Dec 2003 | A1 |
20040001234 | Curry et al. | Jan 2004 | A1 |
20040032516 | Kakarala | Feb 2004 | A1 |
20040120572 | Luo et al. | Jun 2004 | A1 |
20050069201 | Speigle et al. | Mar 2005 | A1 |
20050238225 | Jo et al. | Oct 2005 | A1 |
20050248671 | Schweng | Nov 2005 | A1 |
20050264658 | Ray et al. | Dec 2005 | A1 |
20060176375 | Hwang et al. | Aug 2006 | A1 |
20070091188 | Chen et al. | Apr 2007 | A1 |
20070247532 | Sasaki | Oct 2007 | A1 |
20090010539 | Guarnera et al. | Jan 2009 | A1 |
20090116750 | Lee et al. | May 2009 | A1 |
Number | Date | Country |
---|---|---|
1275870 | Dec 2000 | CN |
2004063989 | Jul 2004 | WO |