The described subject matter relates to data processing, and more particularly to systems and methods for data separation.
In the field of image processing, users often need to separate certain portions of an image from the whole image. The user typically has a visual sense of what portions need to be separated, but conveying that information to a computer-based image processing tool can be quite challenging. The process of separating particular image data from the image can be very time consuming and tedious, especially when the image or the portions to be separated are complex.
“Image cutout” is a technique of extract an object in an image from its background. The cutout can be composited on a different background to create a new scene. With the advent of digital imaging, it has become possible to specify the foreground and background on an individual pixel level. The task in image cutout involves specifying which parts of the image are “foreground” (the part the user wants to cut out) and which are the background. In some traditional approaches, the user must specify each pixel of foreground individually. The tediousness of this pixel-accurate work can make image cutout a particularly frustrating task for users.
Two other approaches have evolved: boundary-based and region-based. Each of these methods takes features of the image that the computer can detect and uses them to help automate or guide the foreground specification process. Boundary-based methods cut out the foreground by allowing the user to surround the foreground with an evolving curve. The user traces along the foreground boundary and the system optimizes the curve in a piecewise manner. Examples of the boundary-based approach include intelligent scissor, image snapping and Jetstream.
While the boundary-based approach is easier than individual pixel selection, boundary-based techniques still demand a large amount of attention from the user. For example, there is almost never a perfect match between the features used by the algorithms and the foreground image. As a result, the user must control the curve carefully. If a mistake is made, the user must “back up” the curve and try again. The user is also required to enclose the entire boundary, which can take some time for a complex, high-resolution object. The close control required interferes with the user's ability to get an overview of their progress. It is difficult to zoom in and out of the image while dragging the pixel-accurate boundary line. Finally, once the boundary is specified, most tools are no longer helpful. Any errors must be cleaned up at the end using traditional selection tools.
Traditional region-based approaches do not require a pixel-accurate boundary line, but also tend to be inaccurate. Traditional region-based methods allow the user to select pixels that have a common feature (such as RGB color) of pixels to be included in the foreground or background. An underlying algorithm then extrapolates to surrounding pixels that have the feature in common with the selected pixels to within a user-specified tolerance. One problem with region-based techniques is that there are often cases where the features used by the region detection algorithms do not match up with the desired foreground or background elements. Often, there is no specific feature that will discriminate foreground from background without user assistance, such as the case of removing a single individual from a group photograph.
In traditional region-based approaches, even when some feature distinction exists, it is often necessary to constantly adjust tolerances in ambiguous areas, such as shadow and low-contrast edges. Such constant adjustment to tolerances can be extremely tedious. In practice, the user must employ a combination of traditional boundary tools, region tools, and hand-selection to produce a satisfactory result.
Therefore, there is a need for a system enabling a user to specify data to be separated that does not require the user to specify every unit of the data, without sacrificing accuracy.
Implementations described herein provide for automatically identifying a region of an image to be separated based on a similarity measure corresponding to pixels in the region. A system includes an image processing module automatically segmenting a determined region from an image based on a similarity measure characterizing similarity between pixels in the determined region and a set of one or more specified seed pixels associated with pixels to be included in the determined region.
Exemplary System
An exemplary system includes a data separation module separating one of more data units, called data nodes, from a collection of data nodes. In the implementations described herein, data nodes refer to pixels in a digital image. For illustrations purposes, implementations shown and described here involve separation of pixels in a foreground region of a digital image from a background region in the image.
In the marking step 102, the foreground region 110 and the background region 112 are specified by the user. The user marks any number of pixels in the foreground region 110 using a foreground specification mode. Similarly, the user marks any number of pixels in the background region 112 using a background specification mode.
In a particular implementation, the foreground specification mode includes user activation of a control on an input device, such as the left button on a mouse while pointing to pixels in the foreground; the background specification mode involves user activation of a different control on the input device, such as the right button on the mouse while pointing to pixels in the background. In this implementation, the foreground region 110 is marked with a foreground indicator 114 in a first color (e.g., yellow line), and the background region 112 is marked with a background indicator 116 in another color (e.g., blue line). The marking step 102 is described in further detail below with respect to an exemplary user interface.
After the foreground region 110 and the background region 112 are specified, the foreground region 110 is automatically enclosed with a boundary marker.
The polygon conversion and editing step 104 automatically converts the foreground region 110 into a polygon including a plurality of vertices and lines, and enables the user the edit the polygon. In one implementation, the user can edit the boundary by clicking and dragging on polygon vertices to adjust the boundary marker 200. In another implementation, the user can employ a polygon brush, described further below, for easily adjusting a polygon line or lines.
After polygon conversion and boundary editing 104, the foreground region 110 is separated from the background region 112 in the extracting step 106. The extracted foreground region 110 can be inserted into another image having a different background.
After the user marks the image, pixels intersected by the marks are assigned to either a set F or a set B, depending on which mark they intersect. Set F includes pixels intersected by the foreground marker 302, which are called foreground seeds 306. Set B includes pixels intersected by the background marker 304, which are called background seeds 308. A third set, U, of uncertain nodes 310 is defined to include pixels that are not marked.
Unmarked pixels are assigned to either a foreground region or a background region based on similarity with the pixels in sets F and B. After similarity is determined, a segmentation boundary 312 is rendered between the pixels in the foreground and pixels in the background.
In a particular implementation, similarity is measured using an energy function. A graph cut algorithm minimizes the energy function in order to locate a segmentation boundary. The graph 300 may be characterized by the statement G=N,A
, where N is the set of all nodes and A is the set of all arcs connecting adjacent nodes. The arcs are adjacency relationships with multiple (e.g., four or eight) connections between neighboring pixels. Each node is assigned a unique label xi, for iεN, wherein xiε{foreground(=1), background(=0)}. The solution, X={xi}, can be obtained by minimizing a Gibbs energy E(X) function:
E1(xi) represents a cost associated with node i with label xi. E2(xi,xj) represents a cost when the labels of adjacent nodes i and j are xi and xj, respectively. The energy terms, E1 and E2, are determined based on user input. Those skilled in the art will readily recognize how to minimize E(X) in equation (1). One exemplary technique for minimizing E(X) is the max-flow algorithm.
In equation (1), E1 encodes the color similarity of a node, and is used to assign a node to the foreground or background. To compute E1, the colors in sets F and B are first clustered by the K-means method. In this method, the mean colors of the foreground and background clusters are denoted as {KnF} and {KnB}, respectively.
The K-means method is initialized to have 64 clusters. Then, for each node i, the minimum distance is computed from the node's color C(i) to foreground and background clusters. The minimum distance to foreground and background clusters can be computed using equation (2a) and (2b), respectively:
Therefore, E1(xi) can be defined as follows:
In equation (3), U=N\{F∪B} represents the uncertain region in
Energy value E2 represents the energy due to the gradient along the boundary enclosing the foreground region. The energy value E2 can be defined as a function of the color gradient between two nodes i and j:
E2(xi, xj)=|xi−xi|·g(Cij) (4)
where
and Cij=∥C(i)−C(j)∥2 is the L2-Norm of the red-green-blue (RGB) color difference of two pixels i and j.
The value |xi−xj| includes the gradient information only along the segmentation boundary between the foreground region and the background region. Thus, E2 may be viewed as a penalty term when adjacent nodes are assigned with different labels (i.e., foreground and background). The greater the similarity between two adjacent nodes, the larger E2 is, and thus the less likely nodes i and j are located along the boundary between foreground and background.
An Enhanced graph cut algorithm involves a pre-segmenting step in which pixels are grouped into regions prior to the segmenting process. In this implementation, a node is a group or region of pixels rather than an individual pixel. The watershed algorithm may be used to locate boundaries of the groups of pixels, while preserving small differences inside each group of pixels. Such an implementation is presented in
N,A
. In this case, the nodes N are the set of all pixel groups 402, and the edges A are the set of all arcs connecting adjacent pixel groups 402.
In this implementation, a set F is again defined to include foreground seeds (not shown), but unlike the implementation of
Similarity among groups 402 can be determined using an energy function, such as equation (1) above. The likelihood energy E1 is also similar to equation (3), but in this case the color C(i) is computed as the mean color of a pixel group i. For ease of illustration, the mean color of each group 402 is represented by a filled circle 404.
To compute prior energy E2 using equation (4), a first implementation defines Cij as the mean color difference between the two pixel groups i and j. In another implementation, Cij is similarly defined but it is further weighted by the shared boundary length between pixel groups i and j.
Based on the energy minimization for the pixel groups 402, each group 402 is labeled as either a foreground group or a background group. A segmentation boundary 406 is rendered between adjacent foreground and background groups 402.
Studies have shown that the approximation using pre-segmentation (e.g., watershed segmentation) as in the implementation of
Using either the implementation shown in
Also shown in
The polygon 502 is constructed in an iterative way. An initial polygon is constructed that has only one vertex, which is the point with the highest curvature on the segmentation boundary. Stepping around the segmentation boundary, the distance from each point on the segmentation boundary to the polygon in the previous step is computed. The farthest point is inserted to generate a new polygon. The iteration stops when the largest distance is less than a pre-defined threshold (e.g., 3.2 pixels).
After the polygon 502 is constructed each of the vertices 508 can be adjusted by the user. For example, the user can “click and drag” a vertex 508 to move the vertex to another position. During polygon editing, once the user releases the mouse button, the system will execute the graph cut segmentation algorithm again to optimize the segmentation boundary. The optimized boundary automatically snaps around the foreground even though the polygon vertices 508 may not be on it.
During polygon editing, the polygon is not enforced as hard constraints. However, the segmentation algorithm optimizes E(X) again to get an optimized boundary, while using the polygon location as a soft constraint. The likelihood energy E1 is defined as in equation (3) above. However, when E(X) is recomputed during polygon editing, the prior energy E2 is defined differently, as shown in equation (5):
E2(xi,xj)=|xi−xj|·g((1−β)·Cij+β·η·g(Dij2)) (5)
As shown in equation (5), in addition to the gradient term (Cij), E2 is a function of polygon locations as soft constraints, in order to handle ambiguous and low contrast gradient boundaries. In equation (5),
Dij is the distance from the center of arc (i, j) to the polygon and η is a scaling factor to unify the units of the two terms (a typical value is 10).
In Equation (5), βε[0,1] is used to control the influence of D(i, j). A typical value of β is 0.5, although β may be adjusted to achieve better performance. Note that β=1 makes the graph cut segmentation output the result that is snapped onto the polygon, regardless of the image gradient. When color gradient Cij is small, g(Dij2) dominates E2, which encourages the result to snap close to the polygon location. By using polygon soft constraints, the segmentation boundary more accurately snaps to low contrast edges. In addition, unlike traditional region-based tools, polygon soft constraints result in accurate segmentation even when foreground edges are ambiguous, low-contrast, or otherwise unclear.
Through the user interface described below, the user may specify manually that a polygon vertex be a “hard” constraint, so that the system ensures the graph cut segmentation result to pass through this vertex. For a specified hard constrained vertex, the uncertain region U is automatically split into two parts along its bisector. The two “split” lines are added into foreground seeds F 504 and background seeds B 506 respectively, so that graph cut segmentation outputs a result passing through this vertex, because it is the only connection between the foreground and background at the specified location.
Exemplary User Interface
An exemplary user interface enables a user to step through each marking, polygon editing, and extraction steps described above.
A selectable step selector 604 includes three numbers (e.g., 1, 2, 3) associated with the steps in the process. When the user selects one of the numbers in the step selector 604, the user interface 600 proceeds to a screen corresponding to the selected step. In this illustration, step 1 corresponds to the marking step, step 2 corresponds to the polygon editing step (illustrated in
At the marking step, the user creates one or more marks 606 on a foreground region 608 using a foreground marking mode. In one implementation, the user can clicks the left mouse button while dragging the mouse over the desired portion of the foreground region 608. In another implementation, the user creates the mark(s) 606 on a touch sensitive screen and/or with a pen-computing device, such as a stylus.
The foreground mark(s) 606 are presented in a foreground color (e.g., yellow). The foreground mark(s) 606 do not need to completely fill or completely enclose the foreground region 608. By making the foreground mark(s) 606, the user coarsely indicates which portions of the image are similar to the foreground region 608.
The user also creates one or more marks 610 on a background region 612 using a background marking mode. In one implementation, the user can clicks the right mouse button while dragging the mouse over the desired portion of the background region 612. In another implementation, the user creates the mark(s) 610 on a touch sensitive screen and/or with a pen-computing device, such as a stylus.
The background mark(s) 606 are presented in a background color (e.g., blue). The background mark(s) 606 do not need to completely fill the background region 612 or completely enclose the foreground region 608. In addition, the background mark(s) 606 can be relatively far from the boundary of the foreground region 608. The user simply coarsely indicates which portions of the image 602 are similar to the background region 612.
The graph cut algorithm is triggered when the user releases the mouse button after drawing the foreground mark(s) 606 or the background mark(s). The resulting segmentation boundary 614 is rendered around the foreground region 608. The user then inspects the segmentation boundary 614 on screen and decides if more marks need to drawn. The segmentation boundary 614 is generated virtually instantaneously so that the user can rapidly see the result and add marks, if necessary.
In addition to adding marks, the user may undo or redo any marks that have been made using an undo button 616 or a delete button 618. A tools button 620 enables the user to adjust configuration parameters. Exemplary configuration parameters are organized into three groups corresponding to the three steps, respectively. For the marking step, an exemplary configuration parameter is a speed factor. The speed factor controls the maximum image size that can be pre-segmented in the pre-segmentation step. If the input image is larger than the given size (e.g., speed factor times 100), the image is resized to fulfill the requirement.
For the polygon editing step, three exemplary parameters include max error, dilation scale, and erosion scale. The max error parameter controls the boundary to polygon conversion error. The dilation and erosion scale parameters control the width of the band for the graph cut segmentation algorithm.
For the extraction step, four exemplary parameters are variance, erode scale, dilate scale, and enable alpha prior. The variance parameter controls the sensitivity of the Bayesian Matting algorithm to noise. The erode and dilate scale parameters are used to control the band of pixels around the boundary for matting extraction. If enable alpha prior is enabled, variance alpha is used to control the influence of feathering alpha prior to the Bayesian matting algorithm.
An alpha channel button 622 (labeled “A”) can be used to display the image as an alpha channel format, rather than RGB. An alpha channel multiplier button 624 (labeled “O”) can be used to display the image with the foreground multiplied by the alpha channel. An image button 626 (labeled “I”) displays the original color image without any alpha channel adjustment.
A trimap button 628 can be toggled to hide or show trimap indicators, discussed further below. A boundary button 630 can be toggled to hide or show the segmentation boundary 614. A polygon button 632 can be toggled to hide or show the editable polygon. A marker button 634 can be toggled to hide or show the foreground mark(s) 606 and the background mark(s) 610. An “on/off” button 636 is used to hide and show the trimap indicators, the segmentation boundary 614, the polygon, and the foreground and background markers.
Zoom controls 638 enable the user to zoom into or away from the image 602. An information window 640 indicates what area of the image 602 is shown, and enables the user to center the image at a selected position. The information window 640 also indicates the RGB values for a selected pixel in the image 602.
Although the marking step and the graph cut algorithm produces a highly accurate segmentation boundary 614 around the foreground region 608, the user may want to further refine the segmentation boundary 614. Therefore, the user can select step 2 in the step indicator 604 to proceed to the polygon editing step. When step 2 is selected, the segmentation boundary 614 is automatically converted into a polygon.
For direct vertex editing, the user selects a polygon vertex radio button 706. When the polygon vertex radio button 706 is selected, the user can select and move individual vertices (i.e., one vertex at a time) using the mouse or other input device. The user may also add or delete vertices 702. In addition, direct vertex editing enables the user to group multiple vertices together for processing. Because the vertices 702 may be rather small, it may be beneficial to zoom in close to a particular area using the zoom controls 638 during individual vertex editing.
For polygon brushing, the user selects a polygon brush radio button 708. When the user selects the polygon brush radio button 708, a brush tool 710 appears. The brush tool 710 enables the user to draw a single stroke to replace a segment of a polygon. The user brushes a stroke starting from the polygon (e.g., A) and stopping on another place on the polygon (not necessarily be vertex) (e.g., B) so that the polygon 700 is split into two parts, one of which has less angle difference to the user stroke. The part with the less angle difference is replaced by the user stroke to generate a new polygon. The angle of the user stroke and the two parts of the polygon is measured by the tangent direction at vertex A and from A to B.
The user interface 600 of
An optional pre-segmenting operation 902 pre-segments the image by grouping pixels into regions according to an algorithm, such as the watershed algorithm. The pre-segmenting operation 902 may also include filtering the image and/or down-sampling to speed the segmentation process.
A receiving operation 904 receives foreground and/or background seeds. In one implementation, the foreground seeds are specified by a user clicking the left mouse button and dragging the mouse over the foreground seed pixels, and the background seeds are specified by a user clicking the right mouse button and dragging the mouse over the background seed pixels. The foreground seeds are presented in a foreground color, while the background seeds are presented in another color.
A determining operation 906 determines a similarity measure for pixels in the image based on assignments of the pixels to either foreground or background. In one implementation, pixels are assigned to either the foreground or the background such that total energy in the image is minimized.
A segmenting operation 908 segments the image according to the pixel assignment in the determining operation 906. A segmentation boundary is automatically generated between pixels in the foreground region and pixels in the background region.
A generating operation 910 generates an editable polygon based on the segmentation boundary. The editable polygon is presented to the user. The user is able to move vertices of the polygon to further refine the boundary around the foreground region. The user may move vertices individually or multiple vertices at a time.
A receiving operation 912 receives the user inputs to edit the polygon and the algorithm 900 returns to the segmenting operation 906 to re-segment the image based on the user edits. During second and subsequent iterations of the segmenting operation 906, the segmenting is performed using the vertices of the polygon as soft or hard constraints.
After the user has completed editing the polygon around the foreground region, an extracting operation 914 cuts the foreground region out of the image. One implementation of the extracting operation 914 employs coherent matting, which is an enhanced Bayesian matting algorithm with alpha prior, to compute the opacity around the segmentation boundary before compositing the foreground cutout on a new background. The uncertain region for matting is computed by dilating the segmentation boundary. Usually this dilation is of four pixels width on each side.
Exemplary Computing Device
Computing device 1000 further includes a hard disk drive 1044 for reading from and writing to a hard disk (not shown), and may include a magnetic disk drive 1046 for reading from and writing to a removable magnetic disk 1048, and an optical disk drive 1050 for reading from or writing to a removable optical disk 1052 such as a CD ROM or other optical media. The hard disk drive 1044, magnetic disk drive 1046, and optical disk drive 1050 are connected to the bus 1036 by appropriate interfaces 1054a, 1054b, and 1054c.
The drives and their associated computer-readable media provide nonvolatile storage of computer-readable instructions, data structures, program modules and other data for computing device 1000. Although the exemplary environment described herein employs a hard disk, a removable magnetic disk 1048 and a removable optical disk 1052, other types of computer-readable media such as magnetic cassettes, flash memory cards, digital video disks, random access memories (RAMs), read only memories (ROMs), and the like, may also be used in the exemplary operating environment.
A number of program modules may be stored on the hard disk 1044, magnetic disk 1048, optical disk 1052, ROM 1038, or RAM 1040, including an operating system 1058, one or more application programs 1060, other program modules 1062, and program data 1064. A user may enter commands and information into computing device 1000 through input devices such as a keyboard 1066 and a pointing device 1068. Other input devices (not shown) may include a microphone, joystick, game pad, satellite dish, scanner, or the like. These and other input devices are connected to the processing unit 1032 through an interface 1056 that is coupled to the bus 1036. A monitor 1072 or other type of display device is also connected to the bus 1036 via an interface, such as a video adapter 1074.
Generally, the data processors of computing device 1000 are programmed by means of instructions stored at different times in the various computer-readable storage media of the computer. Programs and operating systems may be distributed, for example, on floppy disks, CD-ROMs, or electronically, and are installed or loaded into the secondary memory of the computing device 1000. At execution, the programs are loaded at least partially into the computing device's 1000 primary electronic memory.
Computing device 1000 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 1076. The remote computer 1076 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to computing device 1000. The logical connections depicted in
The WAN 1082 can include a number of networks and subnetworks through which data can be routed from the computing device 1000 and the remote computer 1076, and vice versa. The WAN 1082 can include any number of nodes (e.g., DNS servers, routers, etc.) by which messages are directed to the proper destination node.
When used in a LAN networking environment, computing device 1000 is connected to the local network 1080 through a network interface or adapter 1084. When used in a WAN networking environment, computing device 1000 typically includes a modem 1086 or other means for establishing communications over the wide area network 1082, such as the Internet. The modem 1086, which may be internal or external, is connected to the bus 1036 via a serial port interface 1056.
In a networked environment, program modules depicted relative to the computing device 1000, or portions thereof, may be stored in the remote memory storage device. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers may be used.
The computing device 1000 may be implemented as a server computer that is dedicated to server applications or that also runs other applications. Alternatively, the computing device 1000 may be embodied in, by way of illustration, a stand-alone personal desktop or laptop computer (PCs), workstation, personal digital assistant (PDA), or electronic appliance, to name only a few.
Various modules and techniques may be described herein in the general context of computer-executable instructions, such as program modules, executed by one or more computers or other devices. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Typically, the functionality of the program modules may be combined or distributed as desired in various embodiments.
An implementation of these modules and techniques may be stored on or transmitted across some form of computer-readable media. Computer-readable media can be any available media that can be accessed by a computer. By way of example, and not limitation, computer-readable media may comprise “computer storage media” and “communications media.”
“Computer storage media” includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer.
“Communication media” typically embodies computer-readable instructions, data structures, program modules, or other data in a modulated data signal, such as carrier wave or other transport mechanism. Communication media also includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared, and other wireless media. Combinations of any of the above are also included within the scope of computer-readable media.
In addition to the specific implementations explicitly set forth herein, other aspects and implementations will be apparent to those skilled in the art from consideration of the specification disclosed herein. It is intended that the specification and illustrated implementations be considered as examples only, with a true scope and spirit of the following claims.
This patent application is related to co-owned U.S. patent application Ser. No. 10/861,771 filed Jun. 3, 2004, entitled “Foreground Extraction Using Iterated Graph Cuts,” which is incorporated herein by reference for all that is discloses.