The present invention relates to the field of image processing and, more particularly, to methods and systems for obtaining an alpha matte for an image.
Image matting, in general, refers to the process of decomposing an observed image into a foreground object image, a background image and an alpha matte. Image matting is typically used to composite the observed image with a new background image, the decomposed foreground image and the alpha matte. For example, a composite image may be formed by combining the decomposed foreground image (or the decomposed background image) with a background (or a foreground) of the other image, by using the decomposed alpha matte. Image matting is widely used in image editing, and in film and video motion picture production, by combining different visual elements from separate sources into a single image.
It is generally difficult to determine the alpha matte from the observed image. Image decomposition is generally under-constrained due to many unknown variables. For example, for each pixel in the observed image, an alpha value (one unknown), foreground color values (three unknowns, for example, red, blue, and green), and background color values (three unknowns, such as red, blue, and green) are used to decompose the observed image. Known values for the observed image include the color (for example, red, blue, and green). Accordingly, to decompose the observed image, there are seven unknown variables and three known variables. To determine the alpha matte of the observed image, constraints, in different forms, are typically included in the image decomposition process. These constraints may be based on some type of user feedback or may be based on the incorporation of more images of the same scene.
The present invention is embodied in a method for obtaining an alpha matte for an image. The method determines a first known portion and a second known portion of an image. The method also estimates the alpha matte for the image based on an initial fuzzy connectedness (FC) determination between the entire image and the first and second known portions. The method also obtains a third known portion. The method refines the estimated alpha matte for the image based on a subsequent FC determination between a subset of the image and the first, second, and third known portions for the subset of the image and the initial FC determination for a remainder of the image.
The present invention is also embodied in a system for obtaining an alpha matte for an image. This system includes a controller configured to receive a first known portion, a second known portion, and a third known portion of the image. The system also includes an alpha matte estimator configured to: a) estimate the alpha matte for the image based on an initial FC determination between the entire image and the first and second known portions received from the controller, and b) refine the estimated alpha matte for the image based on a subsequent FC determination between a subset of the image and the first, second and third known portions for the subset of the image and the initial FC determination for a remainder of the image.
The invention may be understood from the following detailed description when read in connection with the accompanying drawing. It is emphasized that, according to common practice, various features of the drawing may not be drawn to scale. On the contrary, the dimensions of the various features may be arbitrarily expanded or reduced for clarity. Moreover, in the drawing, common numerical references are used to represent like features. Included in the drawing are the following figures:
As a general overview, and as will be described in detail below, the present invention is directed to obtaining an alpha matte (a) for an image (I). A first known portion and a second known portion of an image may be determined. Known portions of the image may be indicated by applying strokes to the image or by determining a trimap. As defined herein, each known portion of the image may refer to either a foreground (F) or a background (B) of the image. A general relationship between the foreground, the background and the alpha matte is shown in equation (1) as:
I=αF+(1−α)B. (1)
According to aspects of the present invention, the alpha matte may be estimated for the image based on an initial fuzzy connectedness (FC) determination between pixels of the entire image and the first and second known portions.
Additional strokes represent a third known portion, which may be used to refine the estimated alpha matte. For example, a third known portion may be indicated by one or more further strokes on the image or by a subsequent trimap. As defined herein, the third known portion also refers to the foreground or the background of the image. According to aspects of the present invention, the alpha matte may be refined by determining a subsequent FC between pixels of a subset of the image and the first, second, and third known portions for a subset of the image. The present invention also uses the initial FC determination for a remainder of the image to refine the alpha matte. In this manner, a computational overhead may be reduced when additional inputs are used to refine the estimated alpha matte. It is contemplated that any image, including medical images, may be decomposed according to aspects of the present invention.
The FC determination is based on techniques known in fuzzy medical segmentation, which is typically used to determine a partial volume from multiple tissues in a single image. FC determination, in general, determines an adjacency and/or similarity between pixels of the image. As described herein, the FC at each pixel can be computed by searching for a strongest connected path to a known foreground and background. According to an embodiment of the present invention, FC values may be determined between any two pixels as well as between a pixel and pixels corresponding to each stroke. The alpha matte may then be estimated from the FC, as described further below.
An exemplary system will now described with reference to the individual figures.
User interface 104 may be used to indicate strokes on an image displayed on display 106. User interface 104 may also be used to indicate a trimap (described below). In addition, user interface 104 may be used to select parameters to decompose the image or to composite the image with an other image. User interface 104 may further be used to select images to be displayed, composited and/or stored. User interface 104 may include a pointing device-type interface for indicating strokes on an image shown on display 104. User interface 104 may further include a text interface for entering information.
Display 106 may be configured to display an image, including strokes indicated by a user responsive to user interface 104. Display 106 may display the estimated alpha matte, a decomposed foreground image (F), a decomposed background image (B), a further image and/or a composited image. It is contemplated that display 106 may include any display capable of presenting information including textual and/or graphical information.
Storage 108 may store the observed image, strokes indicated by the user, a trimap, the initial and subsequent FC determination, the estimated alpha matte, the decomposed foreground image and/or the decomposed background image. Storage 108 may optionally store composited images. Additionally, storage 108 may act as a buffer to temporarily store an image(s) prior to display by display 106. Storage 108 may be a memory, a magnetic disk, a database or essentially any local or remote device capable of storing data.
The illustrated alpha matte estimator 102 includes a controller 110, similarity estimator 112, FC estimator 114, image partitioner 116, and alpha matte generator 118. Alpha matte estimator 102 may also include composite image generator 120. Controller 110 is configured to receive user inputs from user interface 104, such as strokes, and display the user inputs on display 106. Controller 110 is also configured to control similarity estimator 112, FC estimator 114, image partitioner 116, alpha matte generator 118 and composite image generator 120, responsive to user inputs received from user interface 104. Furthermore, controller 110 may also filter the image prior to estimating the alpha matte. Controller 110 may be a conventional digital signal processor.
Similarity estimator 112 receives an image that is displayed on display 106 and strokes (e.g., foreground and background strokes) provided from user interface 104. Similarity estimator 112 determines a pixel-pixel similarity between adjacent pixels in the image. In addition, similarity estimator 112 determines a pixel-stroke similarity. The pixel-stroke similarity includes a similarity between pixels to the foreground strokes and a similarity to pixels to the background strokes. The pixel-pixel and pixel-stroke similarities are described further below with respect to
Image partitioner 116 receives an initial FC determination (described further below with respect to FC estimator 114) from FC estimator 114, a further indicated stroke (such as a further background stroke or foreground stroke) from user interface 104, and the image. The initial FC determination is based on the combined similarity measure (or trimap). Image partitioner 116 partitions the image into three subsets based on the initial FC determination and the further stroke, as described below with respect to
FC estimator 114 receives the combined similarity measure from similarity estimator 112, an image, and the initial strokes, in order to determine the initial FC values for the foreground strokes and the background strokes. The initial FC determination is described further with respect to
Alpha matte generator 118 receives the FC values determined for each pixel, for both foreground and background strokes, from FC estimator 114. Alpha matte generator 118 generates an alpha matte (and a refined alpha matte) for each pixel based on the foreground and background FC values for the corresponding pixel. Alpha matte generation is described further below with respect to
Alpha matte estimator 102 may optionally include composite image generator 120 for generating a composite image based on the estimated alpha matte and the foreground/background strokes with a further image. Alternatively, composite image generator 120 may be provided remote from system 100. The compositing of an image with an other image is described further below with respect to
It is contemplated that system 100 may be configured to connect to a global information network, e.g., the Internet, (not shown) such that the decomposed image including the refined alpha matte may also be transmitted to a remote location for further processing and/or storage.
In step 204, first and second portions of the image are determined. For example, strokes representing the foreground and the background may be indicated via user interface 104 (
Referring back to
In step 210, an additional portion of the image is determined. For example, an additional stroke is indicated via user interface 104 (
In step 216, it is determined whether the alpha matte is sufficiently estimated. If the alpha matte is sufficiently estimated, step 216 proceeds to optional step 218. If the alpha matte is not sufficiently estimated, step 216 proceeds to step 210 and processing continues with steps 210, 212, 214 and 216 until the alpha matte is sufficiently estimated.
In optional step 218, the decomposed foreground image may be composited with a background of a further image, for example, by composite image generator 120 (
In an exemplary embodiment, a Gaussian mixture model (GMM) is used to fit both the foreground stroke colors and background stroke colors. A GMM is used for both the pixel-pixel similarity and the pixel-stroke similarity. In an exemplary embodiment, an International Commission on Illumination 1976 L*u*v*(CIE LUV) color representation is used. CIE LUV is well known to those of skill in the art of image processing. In addition, for each Gaussian in the corresponding GMM, a variance of all three channels (e.g., red, green and blue) is averaged.
The pixel-pixel similarity is shown in equation (2) as:
where p1 and p2 represents pixels, T represents a transpose operation and
represents a covariance matrix of a Gaussian that has a largest average variance. By selecting a largest average variance, a robustness of FC determination may be robust in highly textured regions.
In step 304, a pixel-stroke similarity is determined between pixels to a second determined portion of the image (i.e., a background stroke). The pixel-stroke similarity measures the color similarity between p1 and p2 and the color of strokes in o and takes a high value when p1 and p2 are both close to the color of the stroke.
A similarity metric Sf(p) between a color of a pixel (p) to the foreground stroke color is defined in equation (3) as:
where i is an index of the Gaussian in the GMM of the foreground color, and mif,
are the mean vector and covariance matrix, respectively, of the corresponding number Gaussian. A similarity metric Sb(p) between a color of a pixel (p) to the background stroke color can be similarly determined.
The pixel-stroke similarity for the foreground stroke (f) is determined using the similarity metric (equation 3) as:
where
W
min
f(p1, p2)=min[Sf(p1), Sf(p2)]
and
W
max
f(p1, p2)=max[Sf(p1), Sf(p2)]
Although not show, the pixel-stroke similarity for the background stroke (b) can be similarly determined.
In step 306, a combined similarity measure (i.e., the affinity measure) is determined from the combination of the pixel-pixel similarity and pixel-stroke similarity measures. Steps 300-306 may be performed by similarity estimator 112 (
The combined similarity measure (i.e. affinity A) is defined by equation (5) as:
A
0(p1, p2)=λμψ0 (p1, p2)+(1−λ)μφ0(p1,p2) o ε{f,b} (5)
where λ is used to balance the pixel-pixel and pixel-stroke similarity measures and is between zero and one. The combined similarity measure shown in equation (5) extends the affinity between pixels to color images, and extends the known pixels to foreground and background strokes.
In step 308, a first FC value of each pixel to the first portion (e.g., the foreground stroke) is determined. In step 310, a second FC of each pixel to the second portion (e.g., the background stroke) is determined. Steps 308 and 310 may be performed, for example, by FC estimator 114 (
Referring to
Then, the FC between p1 and p2 may be defined as:
where FCf and FCb represent the FC with respect to the foreground and background strokes.
The min and max metrics in respective equations (6) and (7) substantially guarantee that the FC between two pixels inside the same region of the same object is large. Furthermore, a path exists along which the color changes smoothly. Accordingly, even if the pixel intensities are different, for example, due to a graded composition of a heterogeneous material of the object, the FC values for the original received image may still be large.
Referring back to
where f and b represent the foreground and the background strokes, respectively. Referring to
Referring back to
Equation (9) may be used to determine a maximum value of pixel-to-pixel FC values.
In step 312, an alpha value for each pixel is determined based on the first and second FC values FCf(p), FCb(p), for example, by alpha matte generator 118 (
In an exemplary embodiment, thresholds of v1 of about 0.95 and v2 of about 0.05 are used to determine if each pixel p represents a foreground (i.e., α=1), a background (i.e., α=0), or is transparent (i.e., α is a function of the ratio of the foreground and background FC values).
In an exemplary embodiment, efficient algorithms may be used to compute, for pixels with unknown alpha, their FC to the foreground and the background strokes. The computed FC can then directly be used to evaluate the alpha matte with equation (10).
Computing the FC between two pixels is, in general, a single-source shortest path problem. The only difference is that the cost of each path may be considered to be a minimal affinity between all neighboring pixels in the FC computation. In an exemplary embodiment, Dijkstra's algorithm may be used to compute the FC values. The running time of the algorithm may be further improved by using a Fibonacci heap. In addition, because Dijkstra's algorithm computes a shortest path from a single node to all nodes, the FC from each pixel to the strokes may be determined using the algorithm. Dijkstra's algorithm and the Fibonacci heap are well known to those of skill in the art of image processing.
It is desirable to reduce computational overhead when further strokes are added to the image. If a further stroke introduces new colors in the GMM, changes may occur to: i) the affinity between neighboring pixels, and ii) the FC value that is based on the strongest path from the pixel to the foreground and background strokes.
For the affinity measure (equation 5), a computational cost is incurred to calculate the pixel-pixel similarity (equation 2) and pixel-stroke similarity (equation 4). To minimize this cost, the strokes can be indicated to cover most of the main colors of foreground or background in the first iteration so that the cost for updating affinity will be kept minimal when new strokes are added. As described below with respect to
Referring to
∀q ε Ω1pp′, FC(p, q)>FC(p, p′)
∀q ε Ω2pp′, FC(p, q)=FC(p, p′)
∀q ε Ω3pp′, FC(p, q)<FC(p, p′)
The above relationships represent the computation of pixel-pixel FC. The relationships imply that once the initial FC values from each pixel to one pixel p are obtained, to compute the FC value of all pixels to a new pixel p′, only pixels in subset 804 may recompute the FC using the strongest path search. The remaining pixels in regions 802 and 806 may simply reuse the previously computed FC. Furthermore, the FC for a pixel in q in the set of region 804 is either equal to FC(p,p′) or the strongest path for computing the FC is in the domain region 804.
In an exemplary embodiment, the FC values of pixels in domain 804 may be initialized as FC(p, p′), and a Dijkstra-FC-T algorithm may be used to search for the strongest path within the constrained domain of 804, rather than the entire image. This approach may substantially reduce the computation cost associated with refining the alpha matte.
The results of pixel-pixel FC determination may be extended to pixel-stroke FC determination. Given a new stroke that covers a set of pixels P′, the image Q may be partitioned into three subsets Ω1pp′, Ω2pp′ and Ω1pp′ such that:
∀q ε Ω1pp′, FC(p, q)>FC(p, p′)
∀q ε Ω2pp′, FC(p, q)=FC(p, p′)
∀q ε Ω3pp′, FC(p, q)<FC(p, p′)
The above relationships illustrates that, given the FC values of all pixels to some stroke pixels P, to compute the FC values of all pixels to the newly added stroke pixels P′, the strongest path for pixels in regions 802 and 806 do not need to be recalculated. Instead, the FC values in regions 802, 806 may be determined directly from the initially determined FC to stroke pixels P. For the remaining pixels in region 804 where the strongest path is recomputed, the search domain is within region 804.
In an exemplary embodiment, to compute the FC values for pixels in region 804 to the newly added stroke, the FC values of pixels in the new stroke are initialized to about 1, and template values in each subset are set to about 0. The Dijkstra-FC-T algorithm is used to determine the subsequent FC values to region 804 as well as the FC values to regions 802 and 806 that rely on the initial FC values.
Referring back to
In step 408, alpha values for an alpha matte are generated based on the refined FC, for example, by alpha matte generator 118 (
In step 500, a foreground color is determined based at least on a first determined portion (i.e., a first foreground stroke) and the refined alpha matte. In step 502, a background color is determined based at least on the second portion (the initial background stroke) and the refined alpha matte. In step 504, the background of a further image is composited with the foreground color using the refined alpha matte.
To determine the foreground and background colors, a GMM is fitted to represent both the foreground and background colors of the pixels whose FC exceeds the thresholds (v1,v2). An optimal pair of foreground and background colors are determined that minimize a fitting error based on equation (10).
The present invention is illustrated by reference to a number of examples. The examples are included to more clearly demonstrate the overall nature of the invention. These examples are exemplary, and not restrictive of the invention.
The images shown in
In
In
If the exemplary alpha matte estimator (
Because the exemplary alpha matte estimator only re-computes the FC values for a small subset of the pixels, the processing time may be reduced with the addition of subsequent strokes. For example, the last iteration shown in
Each estimator shown in
Furthermore, interactive BP (
Referring generally to
For the peacock image (
Although the invention has been described in terms of systems and methods for obtaining an alpha matte for an image, it is contemplated that one or more steps and/or components may be implemented in software for use with microprocessors/general purpose computers (not shown). In this embodiment, one or more of the functions of the various components and/or steps described above may be implemented in software that controls a computer. The software may be embodied in tangible computer readable media for execution by the computer.
Although the invention is illustrated and described herein with reference to specific embodiments, the invention is not intended to be limited to the details shown. Rather, various modifications may be made in the details within the scope and range of equivalents of the claims and without departing from the invention.
This application is related to and claims the benefit of U.S. Provisional Application No. 61/074,221 entitled SYSTEMS AND METHODS FOR OBTAINING AN IMAGE ALPHA MATTE filed on Jun. 20, 2008, the contents of which are incorporated herein by reference.
The present invention was supported in part by a grant from the IDeA Network of Biomedical Research Excellence (INBRE) Program of the National Center for Research Resources (NCRR), a component of the National Institutes of Health (NIH) (Grant No. 2 P20 RR016472-08). The United States Government may have certain rights to the invention.
Number | Date | Country | |
---|---|---|---|
61074221 | Jun 2008 | US |