This application is the U.S. national phase of International Application No. PCT/CN2013/074636, filed on Apr. 24, 2013, which claims the priority benefit a Chinese Patent which is application No. 2013100803797 filed on Mar. 13, 2013. the entire contents of which is hereby incorporated by reference.
1. Field of the Invention
The present invention belongs to the field of image processing and computer vision, and particularly relates to a novel image foreground matting method based on neighborhood and non-neighborhood smoothness priors.
2. Background of the Invention
Image foreground matting intends to decompose an image I into a foreground F and a background B. From the mathematical point of view, the image I is a linear combination of F and B in the following manner:
C=Fα+B(1−α),
where α defines opacity of each pixel, and has a value in a range of [0, 1]. Accurate image matting is of vital importance in different image and video editing applications. However, since the number of unknown points is much larger than that of known equations, the equations cannot be solved. Therefore, generally a method is adopted in which a user brush type interaction is used or a black-white-gray trimap is input to simplify the solution of such a problem.
The existing image methods can usually be divided into three categories: a sampling-based method, an affinity-based method, and a comprehensive method which is the combination of these two methods.
Sampling-based image foreground matting simultaneously estimates α (alpha) value of a pixel as well its foreground color and background color. In various methods, different parametric or non-parametric models are used to sample neighboring pixels of the known foreground area and background area. Ruzon and Tomasi assume that unknown pixels lie in a narrow band area at the edge of the foreground area. Then, this method was extended by Chuang et al. with a Bayesian framework. In case that the unknown pixels are located near the edge of foreground and the number of unknown pixels is relatively small, these methods provide good results. Rhemann et al. proposes an improved color model based on Geodesic distance sampling. In a shared matting method, the pixels are sampled in different directions of light. Generally, these methods have relatively good effects when the color neighborhoods are smooth.
The affinity-based image foreground matting is independent from the foreground color and background color, so that the problem of alpha matting is solved. In the Poisson matting method, it is assumed that the gradient of alpha mattes is proportional to that of image. In the image foreground matting method based on random walk algorithm (random walk matting), the random walk algorithm is used to solved α values according to the neighboring color similarity. In the closed-form matting method, a color line model is assumed on a neighborhood window, and the problem of alpha matting is solved by minimizing a cost function. In the spectral analysis-based image foreground matting method (spectral matting), its relationship with the spectral clustering is explored so that it is extended into an unsupervised method. Laplacian image matting is combined with different data constraints, prior, or learning-based methods to solve the problem of image matting. However, under the assumption of neighborhood smoothness, this method is insufficient to solve a complicated image problem. Therefore, we combine it with the non-neighborhood smoothness prior to improve the results.
The image foreground matting method, which integrates sampling and similarity, makes a good balance between these two methods. In a robust matting method, samples with high confidence degree are firstly sampled, and then the image foreground matting energy is minimized by the random walk algorithm. In a global sampling matting method, the random search algorithm from the PatchMatch algorithm is used to search global optimal samples.
In the closed-form matting, the Laplacian matrix for image foreground matting is obtained from the color line model, and is used for constraining alpha matting within the neighborhood window. This neighborhood smoothness prior can be combined with the data set obtained from color sampling. Such smoothness prior has a good effect in the image area where there are only a constant number of foreground colors and background colors. He et al. uses a generalized Patchmatch method to improve the effect of color sampling. Recent research indicates that the data set and neighborhood smoothness set can be combined to provide high quality results. However, during calculating Laplacian matrix, it is difficult to set a proper neighborhood window. A small window may be insufficient to capture the detail information of structures. On the other hand, a large window may destroy the color line model, which will also lead to bad results.
Recently, Chen et al. has proposed a manifold preserving edit propagation method, and applied it to the transparent image matting. We note that this method in fact relates to a novel alpha matting based on non-neighborhood smoothness prior. In this method, α values of remote pixels are linked together, which is complementary with Laplacian matting. When only this non-neighborhood smoothness prior is applied, the neighborhood structure information of translucent object would not be captured. Thus, we propose to combine this non-neighborhood smoothness prior with neighborhood Laplacian smoothness prior, and include it into an ordinary data set. Our novel image matting algorithm exhibits excellent performance on the standard test data set.
It is an object of the present invention to provide a novel image foreground matting method based on neighborhood and non-neighborhood smoothness priors. This novel method has excellent performance on the standard test data set.
To accomplish the object of the invention, the present invention adopts the following technical solutions.
There is proposed an image foreground matting method based on neighborhood and non-neighborhood smoothness priors, which comprises the steps of.
step S100, marking the foreground area, the background area, and the unknown area in the input image, wherein the foreground area is an area where the image content to be extracted is located, the background area is an image content background which does not need to be extracted, and the unknown area is an indistinguishable area where the foreground and background overlap, for example, an upstanding hairline area;
step S200, initializing the probabilities α values that the pixel belongs to the foreground by a color sampling method for each pixel in the unknown area of the input image, calculating confidence degree of α values, admitting α values of pixels of which confidence degree is larger than a given threshold, marking these pixels as known pixels, setting α values of each pixel in the foreground area to a maximum value, and setting α values of each pixel in the background area to a minimum value. Usually α values lies between 0 and 1, the foreground area has an α value of 1, the background area has an α value of 0, while the pixel in the unknown area has a α value larger than 0 but smaller than 1. Step S200 aims to determine α values of pixels as many as possible, so as to make preparations for calculating more real α values for pixels in the unknown area whose α values cannot be determined in step S200 (i.e., the pixels whose α values can be calculated, while the calculated α values are not admitted). α values refers to the probabilities that the pixel belongs to the foreground. If the α values of the unknown pixels can be calculated, it is possible to judge whether the pixel belongs to the foreground or the background;
step S300, calculating data term weights of each pixel in the input image according to α values of each pixel, calculating neighborhood smoothness constraint term weights and non-neighborhood smoothness constraint term weights of each pixel, and constructing the overall graph patterns for all pixels of the input image according to these three kinds of weights;
step S400, according to α values of all foreground area pixels, background area pixels and the known pixels in the unknown area, under the constraint of graph patterns from step S300, solving probabilities that each pixel belongs to the foreground by minimizing the energy equation, so as to obtain alpha mattes. As a result, the task of distinguishing whether each pixel in the input image belongs to the foreground or background is completed.
Preferably, in step S100 of said image foreground matting method based on neighborhood and non-neighborhood smoothness priors, a user marks the foreground area, the background area, and the unknown area in the input image via a brush type interaction, or the user inputs a trimap to mark the foreground area, the background area, and the unknown area in the input image. This step is completed by the user. In this step, the user selects the foreground area, the background area, and the fuzzy area (unknown area) with a brush, or distinguishes the foreground area, the background area, and the unknown area by inputting a trimap of the same size as that of the input image.
Preferably, in said image foreground matting method based on neighborhood and non-neighborhood smoothness priors, α values of each pixel in the foreground area are set to the maximum value 1, and α values of each pixel in the background area are set to the minimum value 0.
Preferably, in step S100 of said image foreground matting method based on neighborhood and non-neighborhood smoothness priors, marking the foreground area, the background area, and the unknown area by the user via the brush type interaction comprises:
Preferably, in step S200 of said image foreground matting method based on neighborhood and non-neighborhood smoothness priors, calculating the probabilities α values that each pixel in the unknown area belongs to the foreground is performed by:
searching for k foreground pixels Fi (i=1−k) and k background pixels Bj (j=1−k) in the input image which are nearest to the position of the unknown pixel in the input image, wherein each foreground pixel and each background pixel are paired to form k2 foreground-background point pairs Fi Bj, and calculating α′ value for each foreground-background point pair according to the following equation to obtain k2 α′ values:
where C is the color value of the unknown pixel,
wherein, further calculating confidence degree of α′ values according to C−(α′ Fi+(1−α′)Bj)=d(Fi,Bj), i.e., calculating confidence degree according to the difference d(Fi,Bj) between the color value C of the unknown pixel and the color value α′Fi+(1−α′)Bj which is estimated on basis of α′ values, thus obtaining k2 difference values, wherein a small difference value indicates a high confidence degree of α′ values; selecting α′ values with the highest confidence degree as α values of the unknown pixel, and selecting the corresponding confidence degree as the confidence degree for α values of the unknown pixel; admitting α values of the unknown pixel of which confidence degree is larger than a threshold d, and admitting the unknown pixel as a known pixel. Finally, it is necessary to solve the remaining α values which have not been admitted, so as to obtain their more approximate solutions.
Preferably, in step S300 of said image foreground matting method based on neighborhood and non-neighborhood smoothness priors, calculating data term weights of each pixel according to α values of each pixel is performed by:
step S310, calculating the data term weight according to the following equations, wherein the data term weight comprises two terms, i.e., the weight value W(i,F) which indicates probabilities that the pixel belongs to the foreground and the weight value W(i,B) which indicates probabilities that the pixel belongs to the background,
W(i,F)=γαW(i,B)=γ(1−α),
where γ is a parameter to balance the data term and the smoothness set. Namely, each pixel has two terms of data term weights.
Preferably, in said image foreground matting method based on neighborhood and non-neighborhood smoothness priors, the parameter γ is set to 0.1.
Preferably, in step S300 of said image foreground matting method based on neighborhood and non-neighborhood smoothness priors, calculating the neighborhood smoothness constraint term weights of the pixel is performed by:
step S320, as for pixel i, constructing the neighborhood smoothness constraint by Laplacian approach within a fixed-size window wk(k=m
where the number of the neighboring pixel j is m2, lap indicates Laplacian approach, δ is a parameter which controls the intensity of neighborhood smoothness, μk and Σk represent the color average and variance of m*m pixels in the window respectively, ε is a regularized coefficient, Ci represents the color value of pixel i, and I is an identity matrix.
Preferably, in said image foreground matting method based on neighborhood and non-neighborhood smoothness priors, ε is set to a regularized coefficient of 10−5, and m has a value of 3.
Preferably, in step S300 of said image foreground matting method based on neighborhood and non-neighborhood smoothness priors, calculating the non-neighborhood smoothness constraint term weights of the pixel is performed by:
step S330, generating a characteristic space which comprises all pixels according to characteristic values for each pixel in the input image, getting k neighboring pixels in the characteristic space which are nearest to pixel i in term of Euclidean distance, constructing the non-neighborhood smoothness constraint by a local linear embedded dimension reduction approach, and obtaining the non-neighborhood smoothness constraint term weight value Wimlle under the constraint of Σm=1KWimlle=1 by minimizing the following equation:
where LLE indicates the local linear embedded dimension reduction approach, m is a neighboring pixel among the k neighboring pixels, N is the number of all pixels in the input image, and Xi indicates the characteristic value of pixel i.
Preferably, in said image foreground matting method based on neighborhood and non-neighborhood smoothness priors, the characteristic value of pixel i comprises ri,gi,bi,xi,yi, where ri,gi,bi are RGB color values of pixel i, xi,yi are coordinate positions of pixel i in the input image, so that said characteristic space is a five dimensional characteristic space which comprises characteristic values ri,gi,bi,xi,yi.
Preferably, in step S400 of said image foreground matting method based on neighborhood and non-neighborhood smoothness priors, according to the admitted α values of the known pixels in the unknown area, α values of pixels in the known foreground area, and α values of pixels in the known background area, under the constraint of graph patterns from step S300, solving probabilities that each pixel belongs to the foreground by minimizing the following energy equation, thus obtaining alpha mattes:
where, E is energy, λ is a weight coefficient, S is a set of all pixels in the input image of which α values are known, gi is α values determined in step S200 for pixels α values of which are known, αi in
is the optimal α values to be solved in the above energy equation, N refers to the set of all pixels in the graph patterns and two virtual pixel sets ΩF and ΩB which correspond to the data term W(i,F) and W(i,B) respectively, i indicates pixel i, Ni indicates the set of neighboring pixels of pixel i, and said set of neighboring pixels Ni comprises k neighboring pixels in step S330, neighboring pixels among m*m pixels in step S320, foreground virtual neighboring pixels which correspond to W(i,F), and background virtual neighboring pixels which correspond to W(i,B), wherein Wij represents three kinds of weight values, which comprise data term weight value W(i,F) and W(i,B), neighborhood smoothness term weight value Wijlap, and non-neighborhood smoothness term weight value Wijlle.
The above function can be expressed in the following matrix form:
E=(α−G)TΛ(α−G)+αTLTLα,
where:
The expression of the above energy equation in the matrix form is a quadratic equation of α, and α values can be minimized by solving the following linear equation:
(Λ+LTL)α=ΛG.
The above equation is a system of sparse linear equations, and a global optimal closed-form solution can be solved by a preconditioned conjugate gradient method.
The image foreground matting method based on neighborhood and non-neighborhood smoothness priors of the present invention can accurately extract the foreground like hairlines from an image and has a high accuracy.
The present invention proposes a novel image foreground matting method based on neighborhood and non-neighborhood smoothness priors. The method comprises the following steps:
step S100, interactively marking foreground points and background points by a user, which comprises:
marking foreground points and background points in the image by the user with a brush, wherein pixels covered by a white brush are foreground pixels, pixels covered by a black brush are background pixels, and the other pixels are the unknown pixels; or
providing a black-white-gray trimap of the same size as that of the input image by the user, wherein pixels of the input image to which the white area corresponds are the known foreground pixels, pixels of the input image to which the black area corresponds are the known background pixels, and pixels of the input image to which the gray area corresponds are the unknown pixels;
setting α values of the known foreground pixels to 1, and setting α values of the known background pixels to 0.
Step S200, as for the color value C of each unknown pixel, searching for k foreground pixels F and k background pixels B which are nearest to the unknown pixel in term of spatial distance, pairing the foreground pixels and background pixels to form k2 foreground-background point pairs, calculating the probability a that the unknown pixel belongs to the foreground according to the color of the unknown pixel, and calculating α values according to the following equation:
calculating confidence degree of the unknown pixel according to the difference d(Fi,Bj) between the color value C of the unknown pixel and the color value α′Fi+(1−α′)Bj which is restored on basis of α′ and the foreground-background point pairs; selecting α′ values to which the foreground-background point pairs with the highest confidence degree correspond as the estimated α values of C, and selecting the corresponding confidence degree as the confidence degree for the estimated α values of C; admitting α values of the unknown pixel of which confidence degree is larger than a threshold, and admitting C as a known pixel.
Step S300, calculating data term weights, neighborhood smoothness constraint term weights and non-neighborhood smoothness constraint term weights of each pixel in the input image to construct graph patterns of all pixels of the input image, which comprises the step of:
step S310, calculating the data term weight for each pixel in the input image, wherein the data term weight is expressed by W(i,F) which indicates probabilities that the pixel belongs to the foreground and W(i,B) which indicates probabilities that the pixel belongs to the background, the values of W(i,F) and W(i,B) are determined from the known α values or α values estimated in step S200, and W(i,F) and W(i,B) are calculated according to the following equations:
W(i,F)=γαW(i,B)=γ(1−α),
where γ is a parameter to balance the data term and the smoothness set. In our tests, γ is always set to 0.1. Here, WF and WB are used to represent {W(i,F)|i=1, . . . , N} and {W(i,B)|i=1, . . . , N}, respectively.
Step S320, as for each pixel i in the image, constructing neighborhood smoothness constraint by Laplacian approach within a 3*3 fixed-size window wk, and calculating the neighborhood smoothness constraint term weight value Wijlap of the neighboring pixel j according to the following equation:
where, δ is a parameter which controls the intensity of neighborhood smoothness, μk and Σk represent the color average and variance of in each window respectively, and ε is a regularized coefficient and is set to 10−5.
The neighborhood smoothness prior can enhance the neighborhood smoothness for image foreground matting result. However, it is insufficient to only apply neighborhood smoothness prior. During calculating Laplacian matrix, it is difficult to set a proper size for the neighborhood window. A small window may be insufficient to capture the detail information of structures. On the other hand, a large window may destroy the color line model, which will also lead to bad results. As shown in
Step S330, as for each pixel I of the input image, getting k neighboring pixels in the characteristic space which are nearest to pixel i in term of Euclidean distance, constructing the non-neighborhood smoothness constraint by a local linear embedded dimension reduction approach, and obtaining the non-neighborhood smoothness constraint term weight value Wimlle under the constraint of Σm=1KWimlle=1 by minimizing the following equation:
where, Xi indicates the characteristics (ri,gi,bi,xi,yi) of pixel i, (ri,gi,bi) is the RGB color value of pixel i, and (xi,yi) is coordinate positions of pixel i in the image. The resulting matrix Wlle represents the non-neighborhood manifold constraint.
In the foreground matting algorithm which utilizes non-neighborhood smoothness prior, for example, the neighborhood linear preserving edit propagation algorithm, α values of the known pixels are kept constant, and each pixel is regarded as a linear combination of its neighboring pixels in the characteristic space. For example, in
When only non-neighborhood smoothness prior is applied, it is insufficient to solve the problem of alpha matting accurately. As shown in the case of the first row in
Step S400, according to α values of the known pixels, under the constraint of graph patterns from step S300, solving probabilities that each pixel belongs to the foreground by minimizing the following energy equation, so as to obtain alpha mattes:
where, N refers to all points in the graph patterns, and comprises all pixels in the image lattice and two sets of virtual points ΩF and ΩB (which represent the foreground pixels and the background pixels respectively); Wij represents three kinds of weight values, which comprise data term W(i,F) and W(i,B), neighborhood smoothness term Wijlap, and non-neighborhood smoothness term Wijlle; the set Ni refers to the set of neighbors for pixel i, and comprises two sets of virtual points, i.e., neighboring pixels within the 3*3 window and the closest K pixels in the RGBXY space.
The above function can be expressed in the following matrix form:
E=(α−G)TΛ(α−G)+αTLTLα,
where:
The expression of the above energy equation in the matrix form is a quadratic equation of α, and α values can be minimized by solving the following linear equation:
(Λ+LTL)α=ΛG.
The above equation is a system of sparse linear equations, and a global optimal closed-form solution can be solved by a preconditioned conjugate gradient method.
The present invention will be further described by way of example with reference to the drawings.
Reference is made to
Reference is made to
Reference is made to
Reference is made to
Reference is made to
The general description of the present invention has been set forth above. It is appreciated that all the equivalent modifications to the technical solutions of the present invention fall within the protection scope of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
2013 1 0080379 | Mar 2013 | CN | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2013/074636 | 4/24/2013 | WO | 00 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2014/139196 | 9/18/2014 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
8391594 | Wang et al. | Mar 2013 | B1 |
20100061628 | Yamada | Mar 2010 | A1 |
20110229024 | El-Maraghi et al. | Sep 2011 | A1 |
20140071347 | Chen et al. | Mar 2014 | A1 |
Number | Date | Country | |
---|---|---|---|
20150220805 A1 | Aug 2015 | US |