This application claims the benefit, under 35 U.S.C. §119 of European Patent Application No. 13305459.3, filed Apr. 9, 2013 and European Patent Application No. 13306205.9, filed Sep. 4, 2013.
The invention relates to a method and an apparatus for performing alpha matting. In particular, the invention relates to a method and an apparatus for determining an alpha value for a candidate pixel of an image in an alpha matting process, which make use of a probability function that is based on an information theoretical analysis.
Alpha matting refers to the problem of softly extracting a foreground object out of an image. In contrast to binary segmentation, where each pixel is either classified as fully foreground or background, alpha matting recognizes the existence of “mixed” pixels. A major reason for such mixed pixels is the limited resolution of cameras, where light from the foreground object and the background contribute to the incoming light of a CCD element. Other reasons can be motion-blur and (semi-) transparencies in the object itself. Alpha matting and thus the soft extraction of objects from a still image or a video sequence is a fundamental problem in computer vision in general and movie post-production in particular.
The mixing coefficient is typically called “alpha”. It is defined to be between 0 and 1, i.e., 0% and 100%, and describes the fraction to which light from the foreground object contributed to the incoming light on an image sensor element, i.e. to an image pixel. An alpha matting algorithm tries to estimate this alpha coefficient, as well as the unmixed foreground and background colors. Each (unmixed) color is defined by three parameters, e.g. R, G, and B values in case of the RGB color space. Alpha matting hence needs to determine seven unknowns from only three knowns. The problem is thus ill-posed and requires additional constraints.
Many algorithms for estimating alpha mattes have been developed over the recent years. Their computational complexity is usually very high, often preventing their application in professional post-production of high-resolution images. However, the achievable results are usually much more visually appealing than results of a binary segmentation.
Wang et al.: “Image and Video Matting: A Survey”, Foundations and Trends in Computer Graphics and Vision, Vol. 3 (2007), pp. 97-175, provides a good overview over the state of the art of alpha matting as of 2007. A number of different approaches exist today, where significant progress has been made over the recent years. Generally a distinction is made between two fundamental approaches to solve the matting problem, namely color sampling based methods and propagation (affinity) based methods.
Most of these algorithms assume that a trimap is provided in addition to the input image or sequences thereof. The trimap indicates three different types of regions: known foreground, known background, and an unknown region for which alpha values shall be estimated.
Color sampling based methods try to explain an observed color in the unknown region with the help of known pixels from nearby foreground and background regions. They make the assumption that the true unmixed colors that produced the observed color of the unknown pixel can be found more or less nearby in image space. A further distinction is made between parametric and non-parametric versions. The former fit a parametric statistical model, e.g. a Gaussian or a mixture of Gaussians, to the color distribution of known close-by image regions. The latter ones directly use pairs of individual samples to estimate alpha values. Recent algorithms show a trend towards non-parametric approaches. It seems to be difficult to build adequate models especially for highly textured image areas.
In the second category, propagation-based methods try to estimate the alpha values based on affinities between neighboring pixels. Pixels in the unknown region with high affinity should receive similar values. If the input image adheres to certain constraints, the algorithm may exactly recover the ground truth alpha matte. An important example is the color line model, which was used by A. Levin et al.: “A Closed-Form Solution to Natural Image Matting”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 30 (2008), pp. 228-242, to derive a closed-form solution based on the now widely used matting Laplacian. This closed-form solution, however, requires finding a global optimum over all pixels in the unknown region, which is computationally expensive. Furthermore, textured images as well as broad unknown areas still tend to be challenging.
Latest developments in the art combine the two fundamental approaches. In a first stage, a sampling-based matting algorithm is used to get a good initial estimate of the alpha matte. In a second stage, the results of the first stage are refined by a propagation-based optimization of the alpha matte (e.g. using the matting Laplacian). Two recent representatives of this class of algorithms are described by E. S. L. Gastal et al.: “Shared Sampling for Real-Time Alpha Matting”, Computer Graphics Forum, Vol. 29 (2010), pp. 575-584, and K. He et al.: “A Global Sampling Method for Alpha Matting”, Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR'11) (2011), pp. 2049-2056. As can be seen from the benchmark provided by C. Rhemann et al.: “A Perceptually Motivated Online Benchmark For Image Matting”, Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR'09) (2009), pp. 1826-1833, they belong to the top-performing algorithms.
In the color-sampling stage, for each pixel in the unknown region multiple pairs of foreground (FG) and background (BG) samples are evaluated with the help of a cost function. The sample pair with the lowest cost is deemed to be the pair that is best suited to estimate the alpha value of the candidate pixel. Designing a cost function that indeed selects the sample pair that best explains the true alpha value of the unknown pixel is an art, and subject of a lot of current research.
Most of the cost functions of recent matting algorithms combine spatial and colorimetric costs to evaluate the suitability of a sample pair. In principle, the smaller the image-space distance of the sampled pixel to the unknown pixel, the better. A spatially close candidate is more likely to be a good candidate than a candidate further away. Furthermore, a pair of FG/BG samples should well model the unknown pixel's color as a linear mixture of themselves. The smaller the deviation of the observed color from the line connecting the collected sample colors, the better.
In general, the cost functions are designed in a somewhat “ad hoc” fashion. Typically, they combine unrelated physical quantities. In the work by E. S. L. Gastal et al., the cost function is defined as a product of an estimated probability and several not normalized distances in color space and image space, all of which are raised to some power. In the work by K. He et al., the cost function is merely a weighted sum of one not normalized distance in color space and two normalized distances in image space. In both cases, the parameters that control the contribution of the individual costs are usually determined experimentally by comparing results with ground-truth data, as available for example from C. Rhemann et al.
It is an object of the present invention to propose an improved solution for determining an alpha value for a candidate pixel of an image in an alpha matting process.
According to the invention, a method for determining an alpha value for a candidate pixel of an image in an alpha matting process comprises:
Accordingly, an apparatus configured to determine an alpha value for a candidate pixel of an image in an alpha matting process comprises:
Also, a computer readable storage medium has stored therein instructions enabling determining an alpha value for a candidate pixel of an image in an alpha matting process, which when executed by a computer, cause the computer to:
According to the present invention, a sound probability function, and accordingly a sound cost function, are determined based on an information theoretical analysis. The probability function consists of comparable entities for the colorimetric and spatial costs within the framework of non-parametric color sampling. One aspect of the invention is to regard color and space as two statistically independent characteristics of a pixel and to model their probability distribution with the help of exponential functions. Maximizing the joint probability, i.e. the product of the individual probabilities, is then basically identical two minimizing the sum of normalized distances in the exponent. Of course, this idea is extensible to any number of statistically independent characteristics.
A second aspect of the invention is the avoidance of costs for statistical events that do not influence the final outcome. In case a sample pair estimates an unknown pixel to be purely background, its FG sample should have no influence on the calculated cost. The same is true for the BG sample in case the unknown pixel is deemed to belong to the foreground. Therefore, the cost function advantageously employs an alpha-dependent weighting of the spatial costs.
The proposed cost function provides a significant improvement of the quality of the estimated alpha mattes without significantly increasing the computational complexity.
For a better understanding the invention shall now be explained in more detail in the following description with reference to the figures. It is understood that the invention is not limited to this exemplary embodiment and that specified features can also expediently be combined and/or modified without departing from the scope of the present invention as defined in the appended claims.
The estimated alpha value for an unknown pixel resulting from a pair of FG and BG samples can be calculated as
where Fi and Bj denote the colors of the FG and BG samples and Cp denotes the color of the unknown pixel. Geometrically, alpha is the result of a projection in 3D color space of the pixel's color onto the line connecting Fi and Bj. This projection is illustrated in
The value {circumflex over (α)} is computed as the ratio of the Euclidean distances between Ĉp and Bj and between Fi and Bj. Since the color space projection may yield {circumflex over (α)} values smaller than 0 or greater than 1, the min and max functions are used to clamp {circumflex over (α)} to the interval [0,1].
In the work by K. He et al., the Euclidean distance between Ĉp and Cp in 3D (RGB) color space is then used as a measure of how well the sample pair models the observed color Cp. This distance is also referred to as the “chromatic distortion”, describing the degree to which the examined sample pair is able to explain the unknown pixel's color through a linear combination of themselves:
εc(Fi,Bj)=∥Cp−Ĉp∥ (2)
The smaller the distance εc, the better the sample pair is deemed to represent the true unmixed colors for the unknown pixel, in turn being better suited as the sample pair finally used to estimate alpha.
The complete cost function used by K. He et al. is defined as
ε(Fi,Bj)=w·εC(Fi,Bj)+εS(Fi)+εS(Bj), (3)
where εs(Fi) and εs(Bj) denote the spatial costs of the foreground sample and the background sample, respectively. The cost εs(Fi) is computed as
where the nominator describes the Euclidean distance of the FG sample to the unknown pixel in image space (X denotes spatial coordinates) and the denominator describes the smallest distance of the unknown pixel to the set of FG samples. The cost εs(Bj) is computed accordingly, and w denotes an empirical weighting factor.
If exponential distributions are assumed for the statistical independent random variables chromatic distortion and spatial distance, minimizing the cost function in Equation 3 may be seen as maximizing the joint probability of the observed statistical events. It should, however, be noted that, for true exponential distributions, the spatial distances would need to be normalized by their expected mean value, and not the smallest possible distance. As the difference may, however, be compensated quite accurately by a constant correction factor, this is effectively like changing the empirical weight w.
According to the solution proposed herein, a slightly different approach to model the chromatic distortion is adopted. As the variation in the sampled colors should have a strong influence on the likelihood of the color mix, it is proposed to use the Mahalanobis distance as normalized color distance. The color variation in terms of the observed variance is a measure for the homogeneity or texturedness of the image region.
In principle this is like modeling the FG and BG colors as normally distributed. It should, however, be noted that in order to avoid an averaging of colors, which is probably the reason for the suboptimal results of the parametric color sampling based approaches, the sampled colors are directly used as the mean value of the normal distribution, i.e. without any averaging. The covariance matrices are estimated using basically the smallest possible region, i.e. the 3×3 neighborhood.
Accordingly, this change does not affect the computation of the estimated alpha, but only the chromatic distortion, which now becomes:
εc(Fi,Bj)=√{square root over ((Cp−Ĉp)T·S(Ĉp)−1·(Cp−Ĉp).)}{square root over ((Cp−Ĉp)T·S(Ĉp)−1·(Cp−Ĉp).)}{square root over ((Cp−Ĉp)T·S(Ĉp)−1·(Cp−Ĉp).)} (5)
The covariance matrix S(Ĉp) for the mixed color Ĉp is thereby obtained as a weighted sum of the covariance matrices S(Fi) and S(Bj):
S(Ĉp)=(1−{circumflex over (α)})2·S(Bj)+{circumflex over (α)}2·S(Fi). (6)
Favorably, fully populated covariance matrices are used for S(Fi), S(Bj) and S(Ĉp). Alternatively, the covariance matrices are populated only on the main diagonal, neglecting the correlation between colors. Finally, a single scalar value may also be assumed for the variance, which is independent of the color components. These simplifications avoid the matrix inversion and reduce the number of divisions, respectively, which limits the additional amount of computation.
For uniform image regions, the values on the main diagonal of the covariance matrix may become arbitrarily small, leading to a huge penalty for the slightest deviation in color. As such a penalty is not quite substantiated by the proposed simple model, and in order to even give a slight advantage to such rather pure colors as well as to avoid numerical instabilities, a small constant is advantageously added to all diagonal elements.
The “clamping” described above for the color projection automatically results in the chromatic distortion being independent of the sampled FG or BG color whenever the examined pixel is estimated to be pure BG or FG, i.e. when {circumflex over (α)} equals zero or one. In addition, the newly introduced mixing of the covariance matrices also produces a natural weighting between the two colors for arbitrary alpha values in-between.
In the approaches by used by K. He et al. and C. Rhemann et al. the influence of the spatial costs is independent of the estimated alpha value {circumflex over (α)}. According to a further aspect of the present solution an alpha-dependent weighting of the spatial costs is used. Exemplarily, the overall cost function of K. He et al. is extended as follows:
ε(Fi,Bj)=w·εC(Fi,Bj)+wF({circumflex over (α)})·εS(Fi)+wB({circumflex over (α)})·εs(Bj). (7)
The weighting functions wF and wB are functions of a and as such functions of (Fi,Bj).
Generally, wF({circumflex over (α)}) can be any function that has its minimum at {circumflex over (α)}=0 and increases monotonically with {circumflex over (α)}. Similarly, wB({circumflex over (α)}) can be any function that has its minimum at {circumflex over (α)}=1 and monotonically decreases with {circumflex over (α)}.
Intuitively, this models that when {circumflex over (α)} is 0, i.e. the pixel is estimated as pure BG, the influence of the FG candidate's spatial distance should be minimal or it should not have any influence at all. Vice versa, when {circumflex over (α)} is 1, i.e. the pixel is estimated as pure FG, the influence of the BG candidate's spatial distance should be minimal or it should not have any influence at all. A key aspect of the proposed solution is that both spatial distances should only contribute to the overall costs for truly mixed pixels in the unknown region, for which 0<{circumflex over (α)}<1.
To give an example, the two functions may be (piecewise) linear functions. Alternatively, the functions are defined by three different intervals on the {circumflex over (α)} axis. As a further alternative, for any value of {circumflex over (α)} the functions fulfill the condition
wF({circumflex over (α)})+wB({circumflex over (α)})=c, (8)
with c a constant. According to yet another alternative, the two functions are symmetric to each other:
wF({circumflex over (α)})=wB(1−{circumflex over (α)}). (9)
An example of a function pair is illustrated in
The case of {circumflex over (α)}1={circumflex over (α)}2=0.5 is illustrated in
Finally, the two functions can also be defined as:
In this case complete suppression of one spatial cost, and amplification of the other spatial cost, is only applied in the unmixed color cases, i.e. when {circumflex over (α)} equals 0 or 1. For all other cases the two costs are equally weighted.
An apparatus 20 configured to perform the method according to the invention is schematically depicted in
Number | Date | Country | Kind |
---|---|---|---|
13305459 | Apr 2013 | EP | regional |
13306205 | Sep 2013 | EP | regional |
Number | Name | Date | Kind |
---|---|---|---|
6829389 | Arakawa | Dec 2004 | B1 |
7692664 | Weiss | Apr 2010 | B2 |
20060039611 | Rother | Feb 2006 | A1 |
20110038536 | Gong | Feb 2011 | A1 |
20120023456 | Sun | Jan 2012 | A1 |
20120294519 | He | Nov 2012 | A1 |
20140002746 | Bai et al. | Jan 2014 | A1 |
20140003719 | Bai et al. | Jan 2014 | A1 |
20140119643 | Price | May 2014 | A1 |
Entry |
---|
Kaiming He et al., “A Global Sampling Method for Alpha Matting”, 2011 IEEE conference on computer Vision and Pattern Recognition. |
Yung-Yu Chuang et al, “A Bayesian Approach to Digital Matting”, Proceeding of the 2001 IEEE, published Jan. 2001. |
Kaiming He et al., “A Global Sampling Method for Alpha Matting” 2011 IEEE conference on computer Vision and Pattern Recognition. |
Jorge Lacasa Cabeza, “Explointing local and global knowledge in alpha matting”, pp. 49,Date of Submission: Mar. 31, 2013 and Examination Date: Apr. 10, 2013; <http://upcommons.upc.edu/bitstream/handle/2099.1/19819/Master—Thesis—of—Jorge—Lacasa—Cabeza.pdf?sequence=4&isAllowed=y>. |
Ruzon et al., “Alpha Estimation in Natural Images”, IEEE Conference on Computer Vision and Pattern Recognition 2000, vol. 1, Jun. 2000, pp. 18-25. |
Number | Date | Country | |
---|---|---|---|
20140301639 A1 | Oct 2014 | US |