For many years, image scientists and mathematicians have struggled with the problem of how to identify the foreground from the background in a static two-dimensional image. Currently, the only options that exist to accomplish this are to use a “green screen” (a background of uniform color) or a “motion scene” where a subject is separated from the background by the nature of its movement. Those solutions suffer from several problems. For instance, the green screen solution requires substantial effort to implement at the time the photograph is taken. Likewise, the motion scene option is of little value for still images. These and other shortcomings render the existing solutions inadequate. Improvements are needed in the area of distinguishing a subject from a background in a photograph.
Embodiments are directed to an image processing server on which executes an image analysis engine configured to analyze a first digital image, the first digital image having a first subject and a first background, the analysis including a discrimination of the first subject from the first background. In one embodiment, the image processing server further includes a profile manager configured to store information about the first background in a background profile data store. In this embodiment, the image processing server further includes an image comparison component configured to compare a second digital image to data stored in the background profile data store, the comparison identifying whether the second digital image includes data that resembles the first background.
Many advantages of the disclosure will become more readily appreciated as the same becomes better understood with reference to the following detailed description, when taken in conjunction with the accompanying drawings, briefly described here.
Embodiments are described below in detail with reference to these Figures, in which like numerals refer to like elements throughout
Various embodiments are described more fully below with reference to the accompanying drawings, which form a part hereof, and which show specific exemplary implementations for practicing various embodiments. However, other embodiments may be implemented in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will satisfy formal statutory requirements. Embodiments may be practiced as methods, systems or devices. Accordingly, embodiments may take the form of a hardware implementation, an entirely software implementation, or an implementation combining software and hardware aspects. The following detailed description is, therefore, not to be taken in a limiting sense.
The logical operations of the various embodiments are implemented (1) as a sequence of computer implemented steps running on a computing system and/or (2) as interconnected machine modules within the computing system. The implementation is a matter of choice dependent on various considerations, such as performance requirements of the computing system implementing the embodiment. Accordingly, the logical operations making up the embodiments described herein may be referred to alternatively as operations, steps or modules.
First, embodiments will be described as implemented in a sample system that implements certain embodiments of the invention. This sample system may be implemented using common or special purpose computing equipment programmed in accordance with the teachings of this disclosure. Next, embodiments will be described as implemented in one or more methods for better distinguishing a subject from a background of a photograph. Finally, examples will be provided to illustrate how the system and methods may be used in practice.
What follows is a technical description of a system that can take an arbitrary two-dimensional (2D) photograph and isolate a subject of the photograph from the background. The uses for the technology are numerous, including improved e-commerce photography. Industry professionals frequently pay large sums of money to create an image studio that allows them to take crisp pictures with a white background. If the same results were possible from a mobile phone camera, that would allow amateurs to compete with the professionals.
Memory 104 includes at least an operating system 116 and may additionally include other special purpose components 118. The operating system 116 includes the core functionality to enable the computing device 100 to operate, such as a file system, memory management, and a graphical user interface. The special purpose components 118 may include any one or more additional components to implement functionality on the computing device 100. Examples of special purpose components 118 are numerous, and include word processing components, spreadsheet components, web browsing components, and the like. One particular example of a special purpose component to implement functionality of the preferred embodiment is illustrated in
Additionally, device 100 may have other features and functionality. For example, device 100 may also include additional storage (removable and/or non-removable) including, but not limited to, magnetic or optical disks or tape. Such additional storage is illustrated in
Computing device 100 includes one or more communication connections 114 that allow computing device 100 to communicate with one or more computers and/or applications 113. The communication connections 114 may take the form of an Ethernet interface, WiFi interface, Bluetooth interface, USB connection, eSATA connection, mobile network radio, or the like. Device 100 may also have input device(s) 112 such as a keyboard, mouse, digitizer or other touch-input device, voice input device, digital camera, or the like. Output device(s) 111 such as a monitor, speakers, printer, PDA, mobile phone, and other types of digital display devices may also be included. These devices are well known in the art and need not be discussed at length here.
The computing environment 200 includes at least a user computer 205 and a photo processing server 206 connected over a network 202. The network 202 can be any electrical components and supporting software and firmware for interconnecting two or more disparate computing devices. Examples of the network 202 include a local area network, a wide area network, a metro area network, the Internet, a mobile data network, and the like.
In this implementation, the user computer 205 represents a computing device, such as the computing device illustrated in
A photo processing server 206 is a computing device, such as the computing device illustrated in
The photo processing server 206 further includes a data store 208 in which are stored digital representations of known or anticipated backgrounds for a particular user. More specifically, user 203 may provide one or more digital photos to the photo processing server 206, which then distinguishes subject from background in each photo. The photo processing server 206 stores representations of each background recognized in the photos. In this embodiment, the representations are stored in the data store 208.
When the photo processing server 206 subsequently receives another digital photo from the user 203, the photo processing server 206 performs a background analysis on the new photo to compare the new photo with the stored representations of known backgrounds in the data store 208. Under circumstances indicating that the new photo includes a known background, the photo processing server 206 performs an analysis comparing the new photo to the known background to greatly simply identifying the subject of the new photo. In some circumstances, the user 203 may be prompted to confirm that the new photo does indeed include a known background.
A web services server 215 may also be included in the computing environment 200. The web services server 215 of this embodiment may implement a service that assists users to advertise products for sale, such as through online auction or the like. In one specific example, the web services server 215 may enable users to upload images of products which the users desire to offer for sale. The web services server 215 may then facilitate advertising those products through one or more online catalogs of products which various other visitors may browse and search over the network 202. The web services server 215 may additionally, or alternatively, conduct operations to list the users' products with other, third-party online auctions or digital storefronts.
Although illustrated as separate computing devices, it should be appreciated that the various functions and components of the several computing devices shown in
Similarly, the web services server 215 may operate in conjunction with the photo processing server 206 to provide a web service to the user 203 while also offering the photo processing service of the photo processing server 206. Alternatively, the photo processing components of the photo processing server 206 could be resident on the web services server 215. These and other alternatives will be apparent to those skilled in the art.
The photo processing components 300 include a user interface 304 and a photo manipulator 306. In this embodiment, the user interface 304 enables a remote computing device, such as user computer 205, to interact with or control the several photo processing components 300, including specifically the photo manipulator 306. The photo manipulator 306 provides functionality to allow a user to manipulate elements of a digital photo and to provide input regarding elements of the digital photo. Specifically, the photo manipulator 306 allows a user to identify or select one or more elements of the photo for the purpose of identifying whether those elements are either background or subject. The photo manipulator 306 may also, optionally, be configured to allow certain visual effects to be applied to the photo, such as a sepia effect, black-and-white effect, cartoonish effect, or the like.
A photo analyzer 308 is provided and which includes functionality to analyze a photo to discern characteristics of the photograph. For instance, the photo analyzer 308 is configured to construct a histogram or histograms for the photo to identify characteristics of each pixel or groups of pixels of the photo. The photo analyzer 308 may be further configured to enable the user, using the photo manipulator 306, to selectively alter certain characteristics of the photo. Examples of characteristics that may be altered include, but are not limited to, brightness, saturation, color temperature, and the like. The photo analyzer 308 is further configured to store the characteristics of a photo, or the photo itself, or both the photo and its characteristics in a background profile data store 314.
The background profile data store 314 includes a number of representations of backgrounds (“background profiles”) associated with a particular user. The background profile data store 314 may include numerous background profiles 316 grouped or correlated by user. For example, background profiles 318 may be one or more background profiles associated with one particular user. Although described here as being correlated by “user,” it should be appreciated that a “user” could be one account which multiple individuals use, or it could be multiple accounts which one individual uses, or any other combination. Accordingly, the term user does not imply an individual. Alternatively, the background profiles 316 could, conceivably, not be correlated by users and instead be a undifferentiated grouping of background profiles
A photo comparer 310 is included and is configured to perform a comparison of one photo to another photo. The photo comparer 310 works in conjunction with the photo analyzer 308 to gather the several characteristics for each of two or more photos and performs a comparison of the characteristics of one photo to the characteristics of another photo. The photo comparer 310 performs operations to make a probabilistic determination whether at least some elements of one photo are present in another photo. Specifically, in the preferred embodiment, the photo comparer 310 is configured to compare a new photo to background profiles stored in the background profile data store 314 to determine if the new photo includes a known background profile.
The photo comparer 310 could, conceivably, compare a new photo against any background profile 316 in the entire background profile data store 314. However, ideally the background profiles 316 are correlated by user given that similar backgrounds are far more likely with the same user than with different users.
Briefly stated, by using background profiles 316, the photo processing components 300 can more easily differentiate a subject of a photo from its background. In other words, when presented with a new photo for which it is desired to identify a subject, comparing the new photo to known background profiles greatly simplifies the process of identifying the background. Once the background of the photo is known, identifying the subject is substantially simpler using methods for discriminating objects within a photo.
Although any method for discriminating between objects within a photo may be used, certain exemplary methods are provided below for completeness of disclosure. However, the methods described should not be viewed as limiting as any alternative method for distinguishing objects within a photo may be used without deviating from the spirit and scope of the disclosure.
The principles and concepts will now be generally described with reference to sample processes that may be implemented by a computing device, such as the computing device illustrated in
In one classic example, and referring briefly to
Returning to
Referring briefly to
Once an estimate is made, the user is prompted for feedback regarding which object or objects constitute the background of the photo. (Step 405). In one embodiment, the user is prompted to select which of one or more identified objects within the photo 600 is the subject.
Referring briefly to
Using the feedback from the user, the background profile 701 may be updated with additional information to improve the accuracy of subsequent comparisons. Referring now briefly to
A comparison is performed of the new photo to a background profile to determine if the background profile is similar to in width & height to the new photo. (Step 503). The histogram may also be effective without matching the sizes of the photos.
A determination is made of how similar are the probable background regions of the new photo and the background profile. (Step 505). The determination could be made by first converting a raw number of pixels for each photo to a percentage of the image. Alternatively, the two histograms could be normalized to add up to same number of pixels. Subtract one histogram from the other. When performing the subtraction, allow pixels to match (=subtract) if they are within a fuzzing range of each ether.
The difference between the two histograms is compared to some threshold, such as 75% or 90% similarity. (Step 507). If the similarity between the two histograms exceeds the threshold, a match is considered likely. Otherwise, an alternative background profile is analyzed.
Additional information may also be used in the similarity comparison. For example a difference in time between when the background profile photos were submitted versus when the instant photo was submitted. In another example, it may be determined how many different background profiles have been applied since the last time a given background profile was used. The longer since a background profile was applied, the less likely that it will apply to the instant photo being analyzed.
Finally, if the similarities between the new photo and the subject background profile are considered “high” (based on some predetermined threshold), then a straight pixel-based color matching filter can be performed with the difference between the color values of the background profile and the instant photo being summed to give an approximation of the relative degree of difference between the two. In this comparison, the pixel values of the instant photo could be modified by a scalar to account for differences in brightness or hue between images that are otherwise the same. If not, use “Color” or “Raw” shape bundle, and mark any bundles within some threshold of histogram and mask position as “BG” as one of the filters
A similarity score may be computed and used in the comparison. For example, the likelihood of a match could, in one implementation, be calculated using a formula such as:
Likelihood=(BP match % of result)*BP histogram separation
Illustrative Technique For Distinguishing Subject From Background
What follows is an illustrative technique or process that may be used to identify individual objects within a photo for the purpose of distinguishing subject from background. This technique is one of many alternative processes that may be implemented in various embodiments. This illustrative process is described here for completeness of disclosure and not as an example of the only technique that can be implemented in embodiments.
The preferred embodiment of the process begins by running a series of canny image thresholds on the initial image in an effort to build a mosaic of shapes that represent regions (also referred to as “shapes”) of the image being processed. The coarser canny thresholds (those with fewer lines) are considered more authoritative in revealing where the meaningful boundaries between shapes lies. However, these coarse canny processing passes typically leave many shapes in the mosaic unclosed.
To close the shapes, the preferred embodiment superimposes the results of increasingly detailed canny passes, discarding as noise lines that aren't connected to the result of the previous (coarser) canny pass. The superset of canny results may be referred to as a “Supercanny.” With each new canny generated, lines that are connected to the coarser canny get stored, and after a few passes almost all meaningful regions within an image are closed.
The final step in completing the Supercanny is to try to extend each dangling line by a length proportional to the distance from the end of the dangling line back to the previous junction between that line and some other line. The logic here is that if there is a long line that is connected to many other lines, it has a higher likelihood of being random noise (e.g., from a pattern in the original image), whereas if the line spans a great distance from its junction point to where the line ends, then it is more likely to be a line of significance that should be made into a closed region if possible. To extend the line, the process backtracks from a point at which the line ends to a reasonable distance (such as something in the range of 5-10 pixels) and then draws a line out from the dangle point that has the same rise/run as the section of the line just previous to where it ends off. This completes the Supercanny.
The next step is to process each closed region, saving into an object various properties of the region, including its average RGB values, the density of dangling canny lines contained in it, and the total pixel area of the region. In a typical image, there may be upwards of 500-1000 closed regions at this stage. Thus, the process consolidates these regions by grouping like regions. Each shape is analyzed, evaluating whether the shapes next to it are sufficiently similar in their attributes to merit being consolidated. The smaller the region, the stronger its difference must be with adjacent shapes to not be consolidated. Conversely, the bigger a shape is, the more likely it will subsume the shapes around it, unless they are significantly different than the large shape. Factors used to determine potential merges include the RGB values, the distance of shared edge between shapes, and the texture similarity between the shapes (as approximated by Sobel image).
By this point, the process will have reduced the total number of shapes by at least 2×. Thus, the process proceeds to build groups of shapes. The process of grouping shapes does not destroy them during the consolidation, as may occur in the previous step. It uses a more detailed heuristic than the consolidation used when converting the closed regions in the previous step.
Shapes are bundled if they are sufficiently similar across the following dimensions: texture (approximated by Sobel), distance between center of gravity, distance from image border, pixels count inside shape, and the actual RGB values of the shapes. These values are added together, and if the difference between the shapes is small enough, they become part of the same “shape group.”
Next, the process may runs several passes on the shape groups, classifying each group with a designation, such as 0-3, corresponding to the likelihood that the group is foreground or background. These designations may be passed to a grabcut algorithm in a subsequent step. Because there will be a relatively small number of shape groups after a series of consolidations, the process can quickly experiment with numerous configurations of shape groups being marked as foreground or background.
One configuration is the “edge discard” set. In this arrangement, the process loops around the edge of the photo, marking as background any shape group that touches the edge of the picture, other than, perhaps, the bottom edge. The assumption in this configuration is that it is rare for a product photo to intersect any non-bottom edge of the picture. Humans who make their way into product photographs are known for intersecting the bottom edge of photos. It is a function of common physics.
Another configuration is the “foreground-centric” set. In this arrangement, any shape group that shares a similar color to the dominant shape group in the center of the photograph his assumed to be foreground. This configuration is useful for correctly cropping items like jeans, that tend to be of uniform color. The inverse of this configuration is “background-centric,” where any shape group that matches the dominant color in shape groups on the periphery of the picture is marked as “background.”
The arrangements specified above (plus any number of other arrangements) are analyzed for sanity. For example, do the areas marked as “foreground” exceed 5% of the image area and fall under 90% of the area? Do those areas intersect only a limited portion of the edge of the image? If so, they may be sent through the grabcut algorithm. The grabcut algorithm is known in the art and is freely available through the OpenCV library. It takes a mask labeled 0-3 and iterates on it to match like-with-like, giving a result that tends to be somewhat more pixel-perfect than the shape groups themselves.
Finally, the illustrative process rates the results of the resultant images that emerge from the grabcut algorightm based on a series of heuristics often shared by good product photographs. For instance, is the center of mass near the center of the photo? The nearer to the center of the photo, the more highly it rates. Does the image involve fewer “islands,” or areas surrounded by background? Islands often indicate an imperfect result.
The process may send the highest rated image crops to the user for the user to select the image that most perfectly matches the user's intention. The user's selections may be recorded over time to establish which arrangements of shape groups tend to be the most successful.
Still other alternative methods include identifying the foreground of a subject by creating a paint-by-numbers picture, ensuring that every object in the picture has an outline around it. Generally stated, this embodiment breaks an image into granular parts, and iteratively combines pieces into increasingly larger groups using heuristics that describe the nature of typical product pictures. A further enhancement matches such a paint-by-numbers picture to a foreground probability map to identify the objects most likely to be foreground (in the case of humans, a Haar detector can be used to identify a face, then use a 2D humanoid form to build a probability map (i.e., the area directly under the face is most likely to be the human and the subject of the picture)).
Yet another method associates related shapes to make a broader guess about the nature of the foreground given a set of probable foreground shapes.
Still another method involves making a series of guesses and running them through a heuristic algorithm to determine which one has the most characteristics in common with a valid result. The results of that method can be presented to a user for refinement.
Another method includes taking an image that the user eventually chooses (after having possibly applied the user's own hand-edits to the machine-generated background profile), and feeding that result back intro the process to use to improve future results.
Using these methods, results may be improved by identifying colors and shapes that tend to be in the foreground/background of a given user's photos. In addition, results may be improved by identifying broader patterns that tend to occur across all users who use the tool. For example, past results could help continuously refine the ideal location of our foreground probability map.
These and other uses and alternatives will become apparent from the foregoing teachings.
In this description, numerous details have been set forth in order to provide a thorough understanding of the described embodiments. In other instances, well-known features have not been described in detail in order to not obscure unnecessarily the description.
A person skilled in the art in view of this description will be able to practice the present invention, which is to be taken as a whole. The specific embodiments disclosed and illustrated herein are not to be considered in a limiting sense. Indeed, it should be readily apparent to those skilled in the art that what is described herein may be modified in numerous ways. Such ways can include equivalents to what is described herein. In addition, the invention may be practiced in combination with other systems. The following claims define certain combinations and subcombinations of elements, features, steps, and/or functions, which are regarded as novel and non-obvious. Additional claims for other combinations and subcombinations may be presented in this or a related document.
This patent application claims the benefit of and priority to U.S. Provisional Patent Application Ser. No. 62/028,554, filed on Jul. 24, 2014, titled “Background Burner,” the disclosure of which is hereby incorporated by reference for all purposes.
Number | Date | Country | |
---|---|---|---|
62028554 | Jul 2014 | US |