Recent years have seen significant advancement in hardware and software platforms for image composition. In particular, systems often implement various techniques that improve the aesthetic of a composite images, such as by modifying one or more of its visual elements to provide a realistic appearance of the foreground object against the background. Despite these advancements, conventional image composition systems typically operate via tedious workflows that require a significant amount of user interaction and are prone to manual errors, resulting in composite images that appear inaccurate and unrealistic.
One or more embodiments described herein provide benefits and/or solve one or more of the foregoing or other problems in the art with systems, methods, and non-transitory computer-readable media that flexibly generate realistic composite images via intelligent, auto-compositing techniques. In particular, in one or more embodiments, the disclosed systems implement an artificial-intelligence-based compositing pipeline that automatically predicts object scale and location for compositing, harmonizes object tone, estimates lighting conditions, and/or synthesizes object shadow conditioned on object and scene appearance. The disclosed systems provide options for executing the pipeline via a graphical user interface of a client device and generates a composite image in accordance with one or more selections from the options. In some cases, the disclosed systems further utilize compositing-aware search technology to discover objects that are suitable for compositing. In this manner, the disclosed systems offer flexible search and composite features for efficiently generating realistic composite images based on a reduced set of user interactions.
Additional features and advantages of one or more embodiments of the present disclosure are outlined in the following description.
This disclosure will describe one or more embodiments of the invention with additional specificity and detail by referencing the accompanying figures. The following paragraphs briefly describe those figures, in which:
One or more embodiments described herein include an object recommendation system that utilizes an efficient graphical user interface and flexible searching methods for realistic image composition. Indeed, in one or more embodiments, the object recommendation system implements front-end searching and editing interactions and back-end search engines and image editing techniques to generate composite images. For instance, in some cases, the object recommendation system utilizes one or more search engines to recommend foreground objects for composition based on their compatibility with a background. Further, in some embodiments, the object recommendation system generates the composite image in accordance with a one-click compositing experience provided via a graphical user interface. To illustrate, in some implementations, the object recommendation system modifies the background and/or the foreground object of the composite image based on user selections via the graphical user interface to match lighting or scale or to provide a realistic positioning or shadow.
To illustrate, in one or more embodiments, the object recommendation system determines a background image and a foreground object image for use in generating a composite image. Further, the object recommendation system provides, for display within a graphical user interface of a client device, at least one selectable option for executing an auto-composite model for the composite image, the auto-composite model comprising at least one of a scale prediction model, a harmonization model, or a shadow generation model. The object recommendation system detects, via the graphical user interface, a user selection of the at least one selectable option and generates, in response, the composite image by executing the auto-composite model using the background image and the foreground object image.
As just mentioned, in one or more embodiments, the object recommendation system determines a foreground object image for use in generating a composite image with a background image. In some embodiments, the object recommendation system receives the foreground object image from a client device. For instance, in some cases, the object recommendation system receives a foreground object image that was stored locally on the client device. In some implementations, however, the object recommendation system determines the foreground object image to use using one or more search engines. Indeed, in some cases, the object recommendation system uses the one or more search engines to recommend a foreground object image for use within a composite image.
To illustrate, in some cases, the object recommendation system uses at least one search engine to search through a database and identify at least one foreground object image for use in compositing with a background image. For instance, in some embodiments, the object recommendation system utilizes a compositing-aware search engine, a text search engine, or an image search engine. Indeed, in some instances, the object recommendation system utilizes the one or more search engines to identify the foreground object image based on a compatibility with the background image (e.g., a compatibility with the geometry of the background image). In some cases, the object recommendation system utilizes the search engine that corresponds to search input provided via a graphical user interface, such as text input, spot input, bounding box input, and/or sketch input.
As further mentioned, in one or more embodiments, the object recommendation system provides, for display within a graphical user interface, one or more selectable options for executing an auto-composite model in generating the composite image. For instance, in some cases, the object recommendation system provides selectable options for executing a scale prediction model, a harmonization model, and/or a shadow generation model. Upon detecting a selection of one or more of the selectable options, the object recommendation system executes the corresponding model(s). Thus, in some instances, the object recommendation system utilizes a selectable option displayed within the graphical user interface as a one-click trigger for executing a corresponding model—which, in some cases, executes a series of actions to provide an output.
As mentioned, in one or more embodiments, the object recommendation system generates a composite image utilizing the foreground object image and the background image. In particular, the object recommendation system generates the composite image by executing the auto-composite model in accordance with the selection(s) received via the graphical user interface. For instance, in some cases, the object recommendation system utilizes the auto-composite model to modify the foreground object image and/or the background image within the composite image in accordance with the received selections(s).
In some implementations, the object recommendation system generates the composite image by positioning and or scaling the foreground object image in accordance with additional user selections received via the graphical user interface. Indeed, in some cases, the object recommendation system receives a user interaction indicating a positioning and or a scaling for the foreground object image within the composite image. In some embodiments, however, the object recommendation system determines and recommends a location and/or a scale for the foreground object image. Thus, in some cases, where explicit instructions have not been received, the object recommendation system automatically generates the composite image using a recommended scale and/or a recommended location for the foreground object image.
As mentioned above, conventional image composition systems suffer from several technological shortcomings that result in inflexible, inefficient, and inaccurate operation. In particular, many conventional systems are inflexible in that they employ models that rigidly search for and recommend foreground object images based on a limited set of features. For instance, conventional systems often employ models that retrieve foreground object images based on a semantic search but fail to consider aspects of compatibility with a background image that affects the resulting image composition. Additionally, many conventional systems rigidly require parameter inputs, such as a query bounding box, to guide the object search and retrieval.
Further, conventional image composition systems often suffer from inefficiencies. In particular, conventional systems typically require a significant number of user interactions for generating a compositing result. For example, in many cases, after combining a foreground object image and a background image, conventional systems require tedious workflows of user interactions to blend the two components together. Indeed, a conventional system may require a series of user interactions to execute a single modification, such as by adjusting the location, size, lighting, or orientation of the foreground object image within the composite image. This problem is exacerbated where the foreground object image is largely incompatible with the background image, and additional modifications are needed to accommodate for that incompatibility.
In addition to inflexibility and inefficiency problems, conventional image composition systems can also operate inaccurately. In particular, conventional systems often generate composite images that appear unrealistic. For instance, by employing models that suggest foreground object images that are semantically compatible with a background image but may be incompatible in other respects, conventional systems often generate composite images where the foreground object image appears unnatural against the background image. While many systems allow for additional modification after combining the components, these modifications typically involve workflows of user interactions, and thus are prone to suffering from user errors that fail to rectify aesthetic deficiencies in the composite image.
The object recommendation system provides several advantages over conventional systems. For example, the object recommendation system improves the flexibility of implementing computing devices when compared to conventional systems. To illustrate, by implementing a compositing-aware search engine, the object recommendation system flexibly recommends foreground object images based on various aspects of compatibility that are not considered by conventional systems. Indeed, in some instances, the object recommendation system utilizes the compositing-aware search engine to determine compatibility based on factors, such as geometry, of foreground object images and background images.
Additionally, the object recommendation system improves the efficiency of implementing computing devices when compared to conventional systems. For instance, by executing an auto-composite model based on a selection of one or more options provided via a graphical user interface, the object recommendation system reduces the number of user interactions required to generate a composite image or implement corresponding modifications to a composite image. Indeed, rather than requiring a series of user interactions per modification, the object recommendation system triggers a backend workflow in response to a single click. Thus, the object recommendation system implements a graphical user interface that facilitates various modifications to a composite image based on relatively few user interactions.
Further, the object recommendation system improves the accuracy of implementing computing devices when compared to conventional systems. In particular, the object recommendation system generates comparatively more realistic composite images. For instance, by suggesting foreground object images that are more compatible with a background image and/or utilizing a computer-implemented auto-composite model in generating/modifying a composite image, the object recommendation system provides compositing results having a more aesthetically natural appearance. For example, by implementing an auto-composite model, the object recommendation system avoids error-prone user interactions that are typically used under conventional systems.
Additional details regarding the object recommendation system will now be provided with reference to the figures. For example,
Although the environment 100 of
The server(s) 102, the network 108, and the client devices 110a-110n are communicatively coupled with each other either directly or indirectly (e.g., through the network 108 discussed in greater detail below in relation to
As mentioned above, the environment 100 includes the server(s) 102. In one or more embodiments, the server(s) 102 generates, stores, receives, and/or transmits data including search engines, an auto-composite model, digital images, composite images, and/or recommendations for foreground object images. In one or more embodiments, the server(s) 102 comprises a data server. In some implementations, the server(s) 102 comprises a communication server or a web-hosting server.
In one or more embodiments, the image editing system 104 provides functionality by which a client device (e.g., a user of one of the client devices 110a-110n) generates, edits, manages, and/or stores digital images. For example, in some instances, a client device sends a digital image to the image editing system 104 hosted on the server(s) 102 via the network 108. The image editing system 104 then provides many options that the client device may use to edit the digital image, store the digital image, and subsequently search for, access, and view the digital image. For instance, in some cases, the image editing system 104 provides one or more options that the client device may use to create a composite image using the digital image.
Additionally, the server(s) 102 includes the object recommendation system 106. In one or more embodiments, via the server(s) 102, the object recommendation system 106 identifies and recommends foreground object images that are compatible with background images for generating composite images. For instance, in some cases, the object recommendation system 106, via the server(s) 102, builds and implements a composite object search engine 114 to identify and recommend foreground object images. In some cases, via the server(s) 102, the object recommendation system 106 further executes an auto-composite model 116 in generating composite images, such as by using recommended foreground object images. Example components of the object recommendation system 106 will be described below with regard to
In one or more embodiments, the client devices 110a-110n include computing devices that can access, edit, modify, store, and/or provide, for display, digital images, including composite images. For example, the client devices 110a-110n include smartphones, tablets, desktop computers, laptop computers, head-mounted-display devices, or other electronic devices. The client devices 110a-110n include one or more applications (e.g., the client application 112) that can access, edit, modify, store, and/or provide, for display, digital images, including composite images. For example, in some embodiments, the client application 112 includes a software application installed on the client devices 110a-110n. In other cases, however, the client application 112 includes a web browser or other application that accesses a software application hosted on the server(s) 102.
The object recommendation system 106 can be implemented in whole, or in part, by the individual elements of the environment 100. Indeed, as shown in
In additional or alternative embodiments, the object recommendation system 106 on the client devices 110a-110n represents and/or provides the same or similar functionality as described herein in connection with the object recommendation system 106 on the server(s) 102. In some implementations, the object recommendation system 106 on the server(s) 102 supports the object recommendation system 106 on the client devices 110a-110n.
For example, in some embodiments, the object recommendation system 106 on the server(s) 102 builds one or more search engines described herein (e.g., the composite object search engine 114) and/or trains one or more compositing models described herein (e.g., the auto-composite model). The object recommendation system 106 on the server(s) 102 provides the one or more search engines and/or the one or more compositing models to the object recommendation system 106 on the client devices 110a-110n for implementation. Accordingly, although not illustrated, in one or more embodiments the client devices 110a-110n utilize the one or more search engines to generate recommend foreground object images for image composition and/or utilizes the one or more compositing models to generate composite images.
In some embodiments, the object recommendation system 106 includes a web hosting application that allows the client devices 110a-110n to interact with content and services hosted on the server(s) 102. To illustrate, in one or more implementations, the client devices 110a-110n accesses a web page or computing application supported by the server(s) 102. The client devices 110a-110n provide input to the server(s) 102 (e.g., a background image). In response, the object recommendation system 106 on the server(s) 102 utilizes the one or more search engines to generate a recommendation for utilizing a foreground object image with the background image in generating a composite image. The server(s) 102 then provides the recommendation to the client devices 110a-110n. In some instances, the server(s) 102 further implements the one or more compositing models to generate a compositing result and provides the compositing result to the client devices 110a-110n.
In some embodiments, though not illustrated in
As mentioned above, the object recommendation system 106 generates recommendations for using foreground object images in creating a composite image.
In one or more embodiments, a foreground object image includes a digital image portraying a foreground object. In particular, in some embodiments, a foreground object image includes a digital image usable for providing a foreground object for a composite image. For example, in some implementations, a foreground object image includes a digital image portraying a person or other object that is used to generate a composite image having the same portrayal of the person or object. In some implementations, a foreground object image includes a portrayal of the foreground object against a solid background or a cutout of the foreground object (e.g., without a background). Accordingly, in some instances, the following disclosure utilizes the terms foreground object image and foreground object interchangeably.
In some embodiments, the object recommendation system 106 recommends a foreground object image based on a background image to be used in generating a composite image. Indeed, as shown in
In one or more embodiments, a background image includes a digital image portraying a scene. In particular, in some embodiments, a background image includes a digital image that portrays a scene that is usable as a background within a composite image. For instance, in some cases, a background image portrays a scene that is used to generate a composite image portraying the same scene as a background.
As further shown in
Indeed, as shown in
As illustrated by
In one or more embodiments, a neural network includes a type of machine learning model, which can be tuned (e.g., trained) based on inputs to approximate unknown functions used for generating the corresponding outputs. In particular, in some embodiments, a neural network includes a model of interconnected artificial neurons (e.g., organized in layers) that communicate and learn to approximate complex functions and generate outputs based on a plurality of inputs provided to the model. In some instances, a neural network includes one or more machine learning algorithms. Further, in some cases, a neural network includes an algorithm (or set of algorithms) that implements deep learning techniques that utilize a set of algorithms to model high-level abstractions in data. To illustrate, in some embodiments, a neural network includes a convolutional neural network, a recurrent neural network (e.g., a long short-term memory neural network), a generative adversarial neural network, a graph neural network, or a multi-layer perceptron. In some embodiments, a neural network includes a combination of neural networks or neural network components.
In some embodiments, a geometry-lighting-aware neural network includes a computer-implemented neural network that identifies foreground objects (e.g., foreground object images) that are compatible with background images for use in generating composite images. In particular, in some embodiments, a geometry-lighting-aware neural network includes a computer-implemented neural network that analyzes a background image and determines, from a set of foreground objects, one or more foreground objects that are compatible with the background image based on the analysis. For instance, in some cases, a geometry-lighting-aware neural network determines compatibility by considering similarities of a variety of image characteristics, such as lighting, geometry, and semantics.
In one or more embodiments, the object recommendation system 106 generates a recommendation using the foreground object image 206. For example, as shown in
As further shown in
By utilizing a geometry-lighting-aware neural network, the object recommendation system 106 recommends foreground object images that are more similar to background images in terms of lighting and geometry (as well as semantics) when compared to conventional systems.
Indeed,
As shown in
As previously indicated, in one or more embodiments, the object recommendation system 106 recommends foreground object images that are compatible with background images in terms of geometry and lighting by building and implementing a geometry-lighting-aware neural network that is sensitive to such image features. Indeed, in one or more embodiments, the object recommendation system 106 builds a geometry-lighting-aware neural network by learning network parameters that facilitate the detection of similarities between background images and foreground objects in terms of geometry and lighting.
In one or more embodiments, a segmentation mask includes an identification of pixels in an image that represent an object. In particular, in some embodiments, a segmentation mask includes an image filter useful for partitioning a digital image into separate portions. For example, in some cases, a segmentation mask includes a filter that corresponds to a digital image (e.g., a foreground image) that identifies a portion of the digital image (i.e., pixels of the digital image) belonging to a foreground object and a portion of the digital image belonging to a background. For example, in some implementations, a segmentation map includes a map of the digital image that has an indication for each pixel of whether the pixel is part of an object (e.g., foreground object) or not. In such implementations, the indication can comprise a binary indication (a 1 for pixels belonging to the object and a zero for pixels not belonging to the object). In alternative implementations, the indication can comprise a probability (e.g., a number between 1 and 0) that indicates the likelihood that a pixel belongs to the object. In such implementations, the closer the value is to 1, the more likely the pixel belongs to the foreground or object and vice versa.
As further shown in
As shown in
In one or more embodiments, a background network includes a neural network or neural network component that analyzes background images. Similarly, in one or more embodiments, a foreground network includes a neural network or neural network component that analyzes foreground object images. In some cases, a background network and/or a foreground network includes a neural network encoder that generates one or more embeddings based on an analysis of a background image or a foreground image, respectively. For example, in some cases, a background network and/or a foreground network include a convolutional neural network (CNN) or CNN component for generating embeddings from background or foreground image features.
In particular, in one or more embodiments, the object recommendation system 106 utilizes the background network 324 and the foreground network 326 to generate predicted embeddings from the background image 302, the foreground object image 304, and the additional foreground object image 320 within a geometry-lighting-sensitive embedding space.
Generally, in one or more embodiments, an embedding space includes a space in which digital data is embedded. In particular, in some embodiments, an embedding space includes a space (e.g., a mathematical or numerical space) in which some representation of digital data (referred to as an embedding) exists. For example, in some implementations, an embedding space includes a vector space where an embedding located therein represents patent and/or latent features of the corresponding digital data. In some cases, an embedding space includes a dimensionality associated with a representation of digital data, including the number of dimensions associated with the representation and/or the types of dimensions. In one or more embodiments, a geometry-lighting-aware embedding space includes an embedding space for embeddings that encode the lighting and/or geometry features of corresponding digital data (e.g., background images or foreground object images).
As shown in
Similarly, the object recommendation system 106 compares a background embedding corresponding to the background image 302 and an additional foreground embedding corresponding to the additional foreground object image 320 and determine a measure of loss based on the comparison. In particular, the object recommendation system 106 penalizes (e.g., determines a larger measure of loss) for smaller distances between the background embedding and the additional foreground embedding. In this manner, the object recommendation system 106 teaches the background network 324 and the foreground network 326 to move background embeddings further away from negative (non-ground-truth) foreground objects within the geometry-lighting-sensitive embedding space.
In one or more embodiments, the object recommendation system 106 determines the loss 328 by determining a triplet loss utilizing the following:
t=[S(Nb(Ib), Nf(If−))−S(Nb(Ib), Nf(If+))+m]+ (1)
In equation 1, S represents the cosine similarity and [·]+ represents the hinge function. Additionally, Nb and Nf represent the background network 324 and the foreground network 326, respectively. Further, Ib represents a background image (e.g., the background image 302), If+ represents a positive foreground object image with respect to the background image (e.g., the foreground object image 304), and If− represents the negative foreground object image with respect to the background image (e.g., the additional foreground object image 320). Also, in equation 1, m represents a margin for triplet loss. Though equation 1 shows use of the cosine similarity, the object recommendation system 106 utilizes various measures of similarity in various embodiments. For instance, in some cases, the object recommendation system 106 utilizes Euclidean distance as the measure of the similarity in determining the loss 328.
In one or more embodiments, the object recommendation system 106 utilizes the loss 328 to update the parameters of the geometry-lighting-aware neural network 322. For instance, in some cases, the object recommendation system 106 updates the parameters to optimize the geometry-lighting-aware neural network 322 by reducing the errors of its outputs. Accordingly, in some cases, the object recommendation system 106 utilizes the loss 328 in accordance with the optimization formulation arg minN
As previously mentioned, in some cases, the object recommendation system 106 learns parameters for a geometry-lighting-aware neural network using one or more transformed foreground object images.
In one or more embodiments, a geometry transformation includes a modification to a foreground object image that changes the geometry of the foreground object image. In particular, in some embodiments, a geometry transformation includes a modification to one or more geometric properties of a foreground object image. For instance, in some implementations, a geometry transformation includes, but is not limited to, a modification to the shape, orientation, perspective, or size of a foreground object image. Indeed, in some cases, a geometry transformation modifies one or more patent geometric features of a foreground object image. In some embodiments, however, a geometry transformation additionally or alternatively modifies one or more latent geometric features.
In one or more embodiments, a lighting transformation includes a modification to a foreground object image that changes the lighting of the foreground object image. In particular, in some embodiments, a lighting transformation includes a modification to one or more lighting properties of a foreground object image. For instance, in some cases, a lighting transformation includes, but is not limited to, a modification to a brightness, hue, or saturation of a foreground object image, a light source of a foreground object image, or shadows or reflections portrayed by the foreground object image. Indeed, in some cases, a lighting transformation modifies one or more patent lighting features of a foreground object image. In some embodiments, however, a lighting transformation additionally or alternatively modifies one or more latent lighting features.
As shown in
Thus, the object recommendation system 106 generates a transformed foreground object image 410. Though
As further shown in
As illustrated in
As further shown in
Additionally, as shown, the object recommendation system 106 utilizes one or more enhancements 424 to further transform the portion 420 extracted from the modified digital image 418. In some cases, the object recommendation system 106 further transforms the portion 420 by enhancing the variance of the portion 420. For instance, in some implementations, the object recommendation system 106 enhances the variance using an exponential function. Thus, the object recommendation system 106 generates an enhanced lighting map 426 from the digital image 414.
As further shown, the object recommendation system 106 utilizes the enhanced lighting map 426 to generate a transformed foreground object image 428 from the foreground object image 412. For instance, in some embodiments, the object recommendation system 106 multiplies the foreground object image 412 by the enhanced lighting map 426 to generate the transformed foreground object image 428. Thus, in some cases, the object recommendation system 106 utilizes the enhanced lighting map 426 to change the lighting of the foreground object image 412, such as by highlighting some region of the foreground object image 412.
As previously stated with regard to geometry transformations,
Further,
As shown in
In particular, in one or more embodiments, the object recommendation system 106 utilizes the background network 440 and the foreground network 442 to generate predicted embeddings from the background image 432, the foreground object image 434, and one of the transformed foreground object images 436a-436b within a geometry-lighting-sensitive embedding space. As shown in
c=[S(Nb(Ib), Nf(If−))−S(Nb(Ib), Nf(If+))+m]+ (2)
In equation 2, Ift represents a transformed foreground object image (e.g., one of the transformed foreground object images 436a-436b). Though equation 2 (like equation 1) shows use of the cosine similarity, the object recommendation system 106 utilizes various measures of similarity in various embodiments. For instance, in some cases, the object recommendation system 106 utilizes Euclidean distance as the measure of the similarity in determining the loss 444.
In one or more embodiments, the object recommendation system 106 utilizes the loss 444 to update the parameters of the geometry-lighting-aware neural network 438. For instance, in some cases, the object recommendation system 106 updates the parameters to optimize the geometry-lighting-aware neural network 438 by reducing the errors of its outputs. For example, in some instances, by updating the parameters, the object recommendation system 106 decreases the distance between positive samples and increases the distance between negative samples within the geometry-lighting-sensitive embedding space even where those negative samples merely differ in terms of lighting and/or geometry. Thus, at inference time, the object recommendation system 106 utilizes the geometry-lighting-aware neural network 438 to identify compatible foreground object images based on the distance between their embeddings and the embedding of the given background image.
By updating parameters of the geometry-lighting-aware neural network 438 utilizing transformed foreground object images, the object recommendation system 106 improves the accuracy with which the geometry-lighting-aware neural network 438 identifies foreground object images that are compatible with background images for image composition. In particular, the object recommendation system 106 enables the geometry-lighting-aware neural network 438 to identify foreground object images that are similar to background images in terms of lighting and/or geometry (as well as semantics).
In some implementations, the object recommendation system 106 combines the triplet loss of equation 1 and the triplet loss of equation 2 to determine a loss (e.g., a combined loss) for the geometry-lighting-aware neural network 438. For instance, in some implementations, the object recommendation system 106 generates predicted embeddings for a background image, a foreground object image corresponding to the background image, a transformed foreground object image generated from the foreground object image, and an additional foreground object image. The object recommendation system 106 further determines the triplet loss of equation 1 and the triplet loss of equation 2 utilizing the respective predicted embeddings and updates the parameters of the geometry-lighting-aware neural network 438 utilizing a combination of the triplet losses. For instance, in some cases, the object recommendation system 106 combines the triplet loss of equation 1 and the triplet loss of equation 2 as follows:
=t+c (3)
In some implementations, the object recommendation system 106 employs additional methods for building a geometry-lighting-aware neural network. Indeed, in some cases, the object recommendation system 106 implements one or more additional methods during that facilitate the learning of network parameters that improve operation at inference time.
In particular,
Accordingly, as shown in
In one or more embodiments, an eroded segmentation mask includes a segmentation mask that has been modified so that the number of pixels of a digital image that are attributed to a foreground object is reduced. Indeed, in some embodiments, an eroded segmentation mask includes a segmentation mask that has been modified so that the resulting foreground object image includes less pixels than would be included from using the unmodified segmentation mask. For example, in one or more embodiments, the object recommendation system 106 generates an eroded segmentation mask by randomly or semi-randomly eroding a number of pixels of the segmentation mask at the edge between the foreground and the background (e.g., changing the affiliation of the pixels from the foreground object to the background).
Accordingly, in some cases using an eroded segmentation mask results in a foreground object image that includes relatively fewer background edge pixels. Thus, as shown in
Further, as shown in
By utilizing foreground object images extracted from digital images using eroded segmentation masks and background images generated from the digital images using extended masks, the object recommendation system 106 improves the parameter-learning for the geometry-lighting-aware neural network. Indeed, the object recommendation system 106 avoids learning network parameters that rely on edge pixels as cues for determining similarities. Thus, at inference time, the geometry-lighting-aware neural network can more accurately identify foreground object images that are compatible in terms of other features, such as semantics, geometry, and/or lighting.
To illustrate, in one or more embodiments, in the first stage 512, the object recommendation system 106 utilizes the background network 514 and the foreground network 516 of the geometry-lighting-aware neural network to generate predicted embeddings from a background image, a foreground object image, a transformed foreground object image (as a negative sample), and/or an additional foreground object image (as a negative sample). The object recommendation system 106 further determines a loss 520 from the predicted embeddings, such as by using the triplet loss of equation 1, the triplet loss of equation 2, or the combined loss of equation 3. The object recommendation system 106 back propagates the loss 520 (as shown by the line 522) and updates the parameters of the background network 514 accordingly while maintaining the parameters of the foreground network 516. As mentioned, in some implementations, the object recommendation system 106 repeats the process through various iterations using further positive and negative samples.
Similarly, in one or more embodiments, the object recommendation system 106 updates the foreground network 516 of the geometry-lighting-aware neural network in the second stage 518. The object recommendation system 106 can utilize the same (or different) positive and negative samples as used in the first stage 512 or use different samples. Like with the first stage 512, the object recommendation system 106 utilizes the background network 514 and the foreground network 516 to generate predicted embeddings, determines a loss 524 from the predicted embeddings, and back propagates the loss 524 (as shown by the line 526) to update the parameters of the foreground network 516. As mentioned, in some implementations, the object recommendation system 106 repeats the process through various iterations using further positive and negative samples.
In one or more embodiments, the object recommendation system 106 performs each of the first stage 512 and the second stage 518 multiple times. In some cases, however, the object recommendation system 106 performs each of the first stage 512 and the second stage 518 once.
In one or more embodiments, by modifying the parameters of the foreground network 516, the object recommendation system 106 enables the foreground network 516 to flexibly learn from various data samples, which is not available under many conventional systems that utilize frozen, pre-trained parameters for the foreground network. Further, by learning parameters for the background network 514 and the foreground network 516 in separate stages, the object recommendation system 106 prevents the embedded features of the geometry-lighting-aware neural network from drifting significantly, which can be seen from training these components together. In particular, the object recommendation system 106 improves the accuracy of the geometry-lighting-aware neural network by enabling it to maintain semantic features while allowing the foreground network 516 to flexibly learn from the data for other features (e.g., lighting and geometry), further improving performance at inference time.
Indeed,
In the table, “Fixed Foreground” refers to an embodiment of the geometry-lighting-aware neural network where the foreground network was pre-trained and its parameters were frozen during the learning process. “Direct Training” refers to an embodiment of the geometry-lighting-aware neural network where the parameters of the foreground network and background network were learned simultaneously. “Aug” refers to an embodiment of the geometry-lighting-aware neural network where the parameters were learned using one or more augmented masks, such as eroded segmentation masks for the foreground object images and/or extended masks for the background images. “Aug+Alternating” refers to an embodiment of the geometry-lighting-aware neural network that learned parameters via augmented masks and an alternating update strategy. The table of
As shown by the table of
Thus, the object recommendation system 106 builds a geometry-lighting-aware neural network by learning network parameters via one or more of the processes described above with reference to
As indicated above, in some cases, the object recommendation system 106 receives a query bounding box with a background image for guiding the object search and retrieval. In some cases, the object recommendation system 106 utilizes the geometry-aware-lighting neural network with the learned parameters to generate an embedding for the portion of the background image that corresponds to the query bounding box. Thus, the object recommendation system 106 utilizes the geometry-lighting-aware neural network in identifying and recommending foreground object images that are specifically compatible with that portion of the background image. Indeed, in some cases, the object recommendation system 106 utilizes the size and/or location of the query bounding box as parameters for object retrieval.
In one or more embodiments, however, the object recommendation system 106 receives a background image without receiving a query bounding box. In some cases, the object recommendation system 106 still operates to identify and recommend foreground object images that are compatible with the background image in terms of semantics, lighting, and/or geometry. For example, in some cases, the object recommendation system 106 determines a location and/or scale for a foreground object image within the background image for use in generating a composite image. Accordingly, in some implementations, the object recommendation system 106 recommends a foreground object image by further recommending a location and/or a scale for the foreground object image within the given background image.
As shown in
In one or more embodiments, the object recommendation system 106 retrieves foreground image objects based on the plurality of bounding boxes. For instance, in some cases, the object recommendation system 106 retrieves one or more foreground object images for a bounding box upon determining that the foreground object image(s) is compatible with a portion of the background image 602 associated with the bounding box. Indeed, in some implementations, the object recommendation system 106 utilizes a neural network to generate an embedding for the portion of the background image 602 associated with the bounding box. Further, the object recommendation system 106 determines similarity scores (e.g., using cosine similarity, Euclidean distance, or some other measure of proximity within the embedding space) for the embedding and the embeddings of foreground object images. Accordingly, the object recommendation system 106 selects the one or more foreground object images based on the similarity scores (e.g., by selecting the one or more foreground object images having the highest similarity scores). In one or more embodiments, the object recommendation system 106 utilizes a geometry-lighting-aware neural network to generate the embeddings within a geometry-lighting-sensitive embedding space to facilitate the retrieval of foreground object images that are compatible in terms of geometry and/or lighting (as well as semantics).
In one or more embodiments, the object recommendation system 106 determines a ranking for the retrieved foreground object images based on their similarity scores. Further, the object recommendation system 106 selects a foreground object image based on the ranking, such as by selecting the foreground object image having the highest similarity score. In some embodiments, the object recommendation system 106 further associates the selected foreground object image with a bounding box, such as by generating a bounding box having the same aspect ratio of the foreground object image (or using the bounding box for which the foreground object image was retrieved). In some implementations, the object recommendation system 106 further generates the bounding box for the foreground object image to include a scale that is a fraction of the scale of the background image 602.
As shown in
Accordingly, the object recommendation system 106 determines a location for the selected foreground object image within the background image 602 using the similarity scores (e.g., by selecting the location associated with the highest similarity score). In one or more embodiments, the object recommendation system 106 generates a recommendation that recommends using the foreground object image at the determined location within the background image 602 for generating a composite image.
As shown in
In one or more embodiments, a location heatmap includes presentation of location compatibility. In particular, in some embodiments, a location heatmap includes a heatmap that indicates the compatibility of a foreground object image with various locations withing a background image. For instance, in some cases, a location heatmap includes a heatmap having a range of values (e.g., color values) where a particular value from the range indicates a degree of compatibility between a location within a background image and a foreground object image. In one or more embodiments, a location heatmap provides indications for the entirety of the background image. In other words, a location heatmap provides an indication of compatibility (e.g., a value) for each location of a background image.
In one or more embodiments, the object recommendation system 106 generates the location heatmap 608 by interpolating the similarity scores determined for the locations of the grid 606 across the background image 602 (e.g., via bilinear interpolation). In some embodiments, the object recommendation system 106 further normalizes the interpolated values. Thus, in some cases, the object recommendation system 106 utilizes the similarity scores for those locations to determine compatibility of a selected foreground object image with all locations of the background image 602. In one or more embodiments, dimensions of the grid 606 are configurable. In particular, in some instances, the object recommendation system 106 changes the dimensions of the grid 606 (e.g., the stride of moving the foreground object image across the background image 602) in response to input from a client device, allowing for a change to the level of refinement with which the object recommendation system 106 determines the recommendation location.
In one or more embodiments, the object recommendation system 106 provides the location heatmap 608 as part of the recommendation. For instance, in some embodiments, the object recommendation system 106 provides the location heatmap 608 for display on a client device as a visualization of the location of the background image 602 that is recommended for the foreground object image. Further, in some cases, by providing the location heatmap 608, the object recommendation system 106 also shows other compatible or non-compatible locations for the foreground object image.
As further shown in
In some implementations, the object recommendation system 106 determines a recommended location and a recommended scale for a foreground object image utilizing one of various other methods than described above. For instance, in some implementations, the object recommendation system 106 recommends a global optimum scale-location pair for the foreground object image. To illustrate, in one or more embodiments, the object recommendation system 106 generates a plurality of bounding boxes with different scales at a plurality of locations of the background image. For instance, in some implementations, the object recommendation system 106 generates a plurality of grids for the background image where each grid is associated with a different scale than the other grids. In some cases, the object recommendation system 106 analyzes the plurality of bounding boxes with the various scales at the different locations to determine a bounding box associated with a global optimum scale-location pair. For instance, the object recommendation system can determine that a bounding box is associated with a global optimum scale-location pair if it provides the highest similarity score when compared to the other bounding boxes. Thus, in some cases, the object recommendation system recommends utilizing the foreground object image with the scale and location of the bounding box associated with the global optimum scale-location pair.
By recommending locations and/or scales for foreground object images within a background image, the object recommendation system 106 operates more flexibly when compared to conventional systems. Indeed, where many conventional systems require a query bounding box to be provided in order to guide the object search and retrieval process, the object recommendation system 106 can flexibly identify compatible foreground object images when a query bounding box is not provided. Further, the object recommendation system 106 can flexibly determine a location and/or scale for the foreground object image that optimizes the compatibility of the foreground object image with the background image so that the resulting composite image has a realistic appearance. Additionally, by recommending locations and/or scales, the object recommendation system 106 operates more efficiently, as it reduces the amount of user input required in order to generate a recommendation.
As mentioned above, in some embodiments, the object recommendation system 106 implements a graphical user interface to facilitate object retrieval and recommendation. In particular, in some cases, the object recommendation system 106 utilizes the graphical user interface to implement a workflow for providing foreground object image recommendations and composite images.
For example, as shown in
In one or more embodiments, the object recommendation system 106 searches for and retrieves the plurality of digital images via a web search. In some cases, the object recommendation system 106 searches local storage of the client device 704 or a remote storage device. Further, in some embodiments, rather than presenting the search field 706, the object recommendation system 106 presents one or more folders or links to the plurality of digital images or provides interactive options for selecting various parameters for retrieving background images.
As shown in
As illustrated in
In some implementations, the object recommendation system 106 utilizes a neural network (e.g., a geometry-lighting-aware neural network) to identify the one or more foreground object images. For instance, in some cases, the object recommendation system 106 utilizes the neural network to generate an embedding for the background image 710 and embeddings for a plurality of foreground object images within an embedding space (e.g., a geometry-lighting-sensitive embedding space). Further, the object recommendation system 106 determines compatibility based on the embeddings, such as by determining similarity scores between the embeddings for the foreground object images and the embedding for the background image 710. In some cases, as shown in
As shown in
As further shown in
As shown in
As further show in
As shown in
As indicated by
As shown by
As shown by
As shown by
Thus, in one or more embodiments, the object recommendation system 106 utilizes a graphical user interface to implement a workflow that operates with more efficiency when compared to conventional systems. Indeed, the object recommendation system 106 can recommend foreground object images and corresponding composite images based on a low number of user interactions. For instance, as discussed above, based on as little as a selection of a background image, the object recommendation system 106 can retrieve a compatible foreground object image, determine a recommended location and scale for the foreground object image, generate a heatmap indicating the recommended location, and/or generate a composite image using the foreground object image at the recommended location and with the recommended scale.
Additionally, the object recommendation system 106 further maintains flexibility by changing the recommendation in response to additional user interaction. Again, the additional user interaction can be minimal, such as a mere selection of a different foreground object image provided within search results or an input indicating a category of foreground object images to target. Thus, in some implementations, the object recommendation system 106 provides a predicted optimal recommendation based on little input and gradually changes the recommendation to satisfy more specific needs are more input is received.
As previously mentioned, the object recommendation system 106 operates more accurately when compared to conventional systems. In particular, by utilizing a geometry-lighting-aware neural network to determine compatibility in terms of geometry and lighting as well as semantics, the object recommendation system 106 can retrieve foreground object images that are a better fit with a given background images. Researchers have conducted studies to determine the accuracy of one or more embodiments of the object recommendation system 106.
In particular,
As shown in
As shown by
The table of
As shown by
The researchers measured the discriminative ability of the models as the sensitivity to these transformations (e.g., the square Euclidean distance between normalized embedding features of the original and transformed foreground objects). With L2 normalization, the square Euclidean distance is d=2−2s where s is the cosine similarity. Accordingly, a higher sensitivity value corresponds to a larger distance between the features of original and transformed foreground objects.
As shown by the table of
In one or more embodiments, the object recommendation system 106 implements a compositing pipeline for generating composite images. In particular, the object recommendation system 106 implements a pipeline of various processes. The object recommendation system 106 implements one or more of these processes to produce a compositing result. In some embodiments, the object recommendation system 106 implements components of the pipeline in response to one or more user interactions received via a graphical user interface. For instance, as previously discussed, the object recommendation system 106 provides recommendations for, and generates, composite images in response to selections of various selectable options displayed within a graphical user interface.
As further shown in
As shown in
In one or more embodiments, a composite object search engine includes a search engine that searches for one or more foreground object images to use in generating a composite image. In particular, in some embodiments, a composite object search engine includes a search engine that searches for one or more foreground object images for use in combining with a background image to generate a composite image based on one or more search criteria indicated by search input. To illustrate, in some cases, a composite object search engine searches for one or more foreground object images based on search criteria indicated by text input, sketch input, and/or a portion (e.g., a selected portion) of the background image itself In some cases, a composite object search engine includes, but is not limited to, a compositing-aware search engine, a text search engine, or an image search engine.
In one or more embodiments, a compositing-aware search engine includes a search engine that searches for one or more foreground object images based on a compatibility with a background image in generating a composite image. For instance, in some cases, a compositing-aware search engine searches for one or more foreground object images based on compatibility in terms of semantics, lighting, geometry, tone, and/or scale. In one or more embodiments, a compositing-aware search engine includes a neural network or other machine learning model trained to determine compatibility based on one or more of the above-mentioned characteristics (or one or more additional or alternative characteristics). To illustrate, in some implementations, a compositing-aware search engine includes the geometry-lighting-aware neural network described above.
Additionally, as shown in
In one or more embodiments, an auto-composite model includes a computer-implemented model that generates or modifies a composite image. In particular, in some embodiments, an auto-composite model includes a computer-implemented model that executes one or more processes for generating or modifying a composite image in response to a consolidated set of user interactions. In some cases, an auto-composite model includes, but is not limited to, a plurality of underlying models, such as a scale prediction model, a harmonization model, and a shadow generation model. Accordingly, the object recommendation system 106 implements one or more of the underlying models based on user selections made via a graphical user interface. For instance, in some cases, in response to receiving a selection of a particular feature to include within the composite image 1214, the object recommendation system 106 executes the corresponding model to provide this feature.
Indeed, as shown in
For instance, as shown in
In one or more embodiments, a composite-aware search includes a search for one or more digital images for the purpose of creating a composite image. In particular, in some embodiments, a composite-aware search includes a search for one or more foreground object images for use in combining with a background image to generate a composite image. For instance, in some cases, a composite-aware search includes a search for one or more foreground object images based on a compatibility with a background image in generating a composite image. To illustrate, in some cases, the object recommendation system 106 executes a composite-aware search by searching for one or more foreground object images that are compatible with a background image in terms of semantics, lighting, geometry, tone, and/or scale. In some cases, the object recommendation system 106 executes a composite-aware search using a composite object search engine (e.g., a composite-aware search engine of the composite object search engine).
In one or more embodiments, a sketch-based search includes a search for one or more digital images based on a sketch input. In particular, in some embodiments, a sketch-based search includes a search for one or more foreground object images that match a sketch input. For instance, in some cases, the object recommendation system 106 executes a sketch-based search by searching for one or more foreground object images based on a size and object class indicated by a sketch input.
Additionally, as shown in
Further, as shown in
In one or more embodiments, a scale prediction model includes a computer-implemented model that determines a scale for a foreground object image within a composite image. In particular, in some embodiments, a scale prediction model includes a computer-implemented model (e.g., a machine learning model or other set of algorithms) that determines a scale for a foreground object image within a composite image based on a scale of the background image used for the composite image. In some cases, a scale prediction model further modifies the foreground object image based on the determined scale (e.g., resizes the foreground object image).
In one or more embodiments, a harmonization model includes a computer-implemented model that determines a lighting or tone for a foreground object image within a composite image. In particular, in some embodiments, a harmonization model includes a computer-implemented model (e.g., a machine learning model or other set of algorithms) that determines a lighting or tone for a foreground object image within a composite image based on a lighting or tone, respectively, of the background image used for the composite image. In some cases, a harmonization model further modifies the foreground object image based on the determined lighting or tone.
In one or more embodiments, a shadow generation model includes a computer-implemented model that generates a shadow for a foreground object image within a composite image. In particular, in some embodiments, a shadow generation model includes a computer-implemented model (e.g., a machine learning model or other set of algorithms) that generates a shadow for a foreground object image based on a lighting of the composite image (e.g., a lighting provided by the background image used in generating the composite image).
Though not expressly shown in
Thus, as indicated by
As shown in
Additionally, as shown in
As indicated in
As further shown in
As illustrated, the back-end components 1404 include a compositing-aware search engine 1412a, a text search engine 1412b, and an image search engine 1412c. The object recommendation system 106 utilizes each of the search engines to search through a digital image database 1414 for one or more foreground object images in accordance with received search input. In particular, in some cases, the object recommendation system 106 utilizes the search engines to conduct a search based on parameters provided by the search input (e.g., parameters indicated via interactions/selections received among the user interaction components 1406a-1406d and the search components 1408a-1408b).
As shown in
For example, as shown in
In some implementations, the object recommendation system 106 utilizes the compositing-aware search engine 1412a to conduct a search without receiving user interactions via one of the user interaction components 1406a-1406b. Accordingly, in some cases, the object recommendation system 106 utilizes the compositing-aware search engine 1412a to retrieve foreground object images based on the entirety of the background image or a prominent portion of the background image. In one or more embodiments, the object recommendation system 106 utilizes the geometry-lighting-aware neural network described above as the compositing-aware search engine 1412a.
In some cases, the object recommendation system 106 utilizes the text search engine 1412b in conjunction with another search engine. For instance, in some implementations, the object recommendation system 106 retrieves a set of foreground object images via the compositing-aware search engine 1412a and utilizes the text search engine 1412b as a filter in determining a subset of foreground object images from the retrieved set. To illustrate, in some embodiments, the object recommendation system 106 utilizes the text search engine 1412b as a filter to remove foreground object images that do not satisfy received text input. For instance, in some cases, the object recommendation system 106 determines that metadata or labels associated with a foreground object image does not satisfy the text input or that attributes determined for a foreground object image (e.g., determined via a classification neural network) does not satisfy the text input. In some implementations, the object recommendation system 106 utilizes the text search engine 1412b without using one of the other search engines. For example, in some cases, the object recommendation system 106 utilizes the text search engine 1412b to retrieve a plurality of foreground object images that satisfy received text input.
In some embodiments, the object recommendation system 106 utilizes, as the image search engine 1412c, the image search engine described in U.S. Pat. App. Ser. No. 17/809,494, filed on Jun. 28, 2022, entitled GENERATING UNIFIED EMBEDDINGS FROM MULTI-MODAL CANVAS INPUTS FOR IMAGE RETRIEVAL, which is incorporated herein by reference in its entirety. For instance, in some cases, the object recommendation system 106 utilizes one of the multi-modal embedding neural networks described therein as the image search engine 1412c.
To provide an example of searching for foreground object images, in some cases, the object recommendation system 106 receives a selection of one of the search components 1408a-1408b. Based on the selection, the object recommendation system 106 enables user input via one or more of the user interaction components 1406a-1406d (in some cases, the object recommendation system 106 always enables one or more of the user interaction components 1406a-1406d but uses the selection of one of the search components 1408a-1408b to indicate which user interaction components to use). The object recommendation system 106 further utilizes user input received via at least one of the user interaction components 1406a-1406d to execute a search for foreground object images using the appropriate search engine.
As further shown in
For example, as indicated in
In one or more embodiments, the object recommendation system 106 utilizes the location prediction model 1416a to determine a recommended location for a foreground object image as described above with reference to
Thus, in one or more embodiments, the object recommendation system 106 receives one or more user interactions via the front-end components 1402 and performs corresponding actions via the back-end components 1404. For instance, in some cases, the object recommendation system 106 executes a search for foreground object images utilizing one of the search engines based on user interactions with the user interaction components 1406a-1406d and the search components 1408a-1408b. Further, in some instances, the object recommendation system 106 generates or modifies a composite image utilizing the auto-composite model (e.g., one or more of the scale prediction model 1416d, the harmonization model 1416e, or the shadow generation model 1416f) based on user interactions with the auto-composite components 1410a-1410c and/or other user input.
In some cases, the object recommendation system 106 executes one or more of the other models shown in
As previously suggested, the object recommendation system 106 implements a compositing pipeline based on various user interactions with a graphical user interface. For example, in some cases, the object recommendation system 106 executes a search based on various search input (e.g., search input indicating a type of search and/or parameters for the search). Further, in some embodiments, the object recommendation system 106 generates or modifies a composite image based on various editing input (e.g., input indicating how to execute an auto-composite model).
For instance,
As further shown in
In response to receiving the search input, the object recommendation system 106 searches for and retrieves one or more foreground object images utilizing the corresponding search engine(s). Indeed, as shown in
As shown in
As illustrated in
As previously mentioned, in one or more embodiments, a foreground object image includes a digital image portraying a foreground object. Thus, it should be noted that description of positioning a foreground object image within a composite image, re-sizing a foreground object image within a composite image, or performing some other action with respect to a foreground object image within a composite image refers to performing that action with respect to the foreground object portrayed by the foreground object image in some implementations.
Indeed, in some embodiments, a foreground object image portrays a foreground object image against a background. Accordingly, the object recommendation system 106 separates the foreground object from the background via segmentation. In particular, as suggested above with reference to
Thus, as indicated by
As further shown in
In response to receiving the search input, the object recommendation system 106 searches for and retrieves one or more foreground object images utilizing the corresponding search engine(s). Indeed, as shown in
As shown in
In response to receiving the search input, the object recommendation system 106 searches for and retrieves one or more foreground object images utilizing the corresponding search engine(s). Indeed, as shown in
As shown in
In some cases, the object recommendation system 106 generates or modifies a composite image using the auto-composite model based on the order of user interactions. For instance, in some embodiments, where options for the auto-composite model are selected before the composite image is generated, the object recommendation system 106 executes the auto-composite model in generating the composite image. In contrast, where options for the auto-composite model are selected after the composite image is generated, the object recommendation system 106 executes the auto-composite model in modifying the composite image.
As shown in
For instance, as shown in
Additionally, as shown in
Further, as shown in
Though
Thus, with a simple user interaction received via the graphical user interface 1800, the object recommendation system 106 executes the auto-composite model in generating a composite result (e.g., either by generating a composite image or modifying a previously generated composite image). As such, the object recommendation system 106 operates more efficiently when compared to many conventional systems. Indeed, while conventional systems typically require a series of user interactions to perform a single action—such as adjusting the lighting or scale of a foreground object image—the object recommendation system 106 can consolidate these user interactions to a single click that triggers back-end processing. By responding to user interactions with computer-implemented models via the back-end processing, the object recommendation system 106 provides more realistic images when compared to conventional systems. Indeed, by using computer-implemented models in generating or modifying a composite image, the object recommendation system 106 avoids the user error that typically results from the manual processes required under many conventional systems.
In one or more embodiments, sketch input includes drawing input. In particular, in some embodiments, sketch input includes input created via one or more user interactions with an interactive canvas using at least one drawing tool. Though
In response to receiving the search input, the object recommendation system 106 searches for and retrieves one or more foreground object images utilizing the corresponding search engine(s). In particular, as mentioned, the object recommendation system 106 executes a search via an image search engine using sketch input. In one or more embodiments, the object recommendation system 106 executes the search by managing the background image 1908 with the sketch input 1906 as an input digital image to the image engine. In other words, the object recommendation system 106 handles the background image 1908 with the sketch input 1906 as its own digital image. As shown in
In some cases, the object recommendation system 106 determines an object class of the sketch input 1906 and utilizes the object class in narrowing the search. For instance, in some cases, the object recommendation system 106 determines the object class via a classification neural network. In some implementations, however, the image search engine searches for results corresponding to the sketch input 1906 without explicitly determining the object class (e.g., using embeddings that implicitly encode the object class or object features such as shape, color, etc.).
As further shown in
In some cases, the object recommendation system 106 utilizes a previously generated composite image to generate an additional composite image.
Indeed, as shown in
Thus, in one or more embodiments, the object recommendation system 106 manages a previously generated composite image as a background image when using the previously generated composite image to generate a subsequent composite image. Though not shown in
Turning now to
As just mentioned, and as illustrated in
Additionally, as shown in
Further, as shown in
As shown in
Additionally, as shown, the object recommendation system 106 includes data storage 2110. In particular, data storage 2110 (implemented by one or more memory devices) includes auto-composite model 2112, the composite object search engine 2114, and the foreground object images 2116. In one or more embodiments, the auto-composite model 2112 stores the auto-composite model using in generating or modifying composite images. In particular, in some cases, the auto-composite model 2112 stores various components of the auto-composite model, such as the scale prediction model, the harmonization model, and/or the shadow generation model. In some instances, the composite object search engine 2114 stores the composite object search engine used in retrieving foreground object images. In particular, in some implementations, the composite object search engine 2114 stores various component search engines, such as a compositing-aware search engine, a text search engine, and/or an image search engine. In some embodiments, the foreground object images 2116 stores foreground object images accessed via search for one or more foreground object images for image compositing.
Each of the components 2102-2116 of the object recommendation system 106 can include software, hardware, or both. For example, the components 2102-2116 can include one or more instructions stored on a computer-readable storage medium and executable by processors of one or more computing devices, such as a client device or server device. When executed by the one or more processors, the computer-executable instructions of the object recommendation system 106 can cause the computing device(s) to perform the methods described herein. Alternatively, the components 2102-2116 can include hardware, such as a special-purpose processing device to perform a certain function or group of functions. Alternatively, the components 2102-2116 of the object recommendation system 106 can include a combination of computer-executable instructions and hardware.
Furthermore, the components 2102-2116 of the object recommendation system 106 may, for example, be implemented as one or more operating systems, as one or more stand-alone applications, as one or more modules of an application, as one or more plug-ins, as one or more library functions or functions that may be called by other applications, and/or as a cloud-computing model. Thus, the components 2102-2116 of the object recommendation system 106 may be implemented as a stand-alone application, such as a desktop or mobile application. Furthermore, the components 2102-2116 of the object recommendation system 106 may be implemented as one or more web-based applications hosted on a remote server. Alternatively, or additionally, the components 2102-2116 of the object recommendation system 106 may be implemented in a suite of mobile device applications or “apps.” For example, in one or more embodiments, the object recommendation system 106 can comprise or operate in connection with digital software applications such as ADOBE® PHOTOSHOP® or ADOBE® CAPTURE. The foregoing are either registered trademarks or trademarks of Adobe Inc. in the United States and/or other countries.
The series of acts 2200 includes an act 2202 for determining a background image and a foreground object image for a composite image. For example, in one or more embodiments, the act 2202 involves determining a background image and a foreground object image for use in generating a composite image.
In one or more embodiments, determining the foreground object image for use in generating a composite image comprises: receiving, via the graphical user interface, search input for performing a composite-aware search to retrieve one or more foreground object images based on a compatibility with the background image; and performing the composite-aware search to determine the foreground object image utilizing a composite object search engine. In some embodiments, the object recommendation system 106 provides the background image for display via the graphical user interface. Accordingly, in some instances, receiving, via the graphical user interface, the search input comprises receiving, via the graphical user interface, an additional user selection of a location within the background image for positioning the foreground object image. In some implementations, receiving, via the graphical user interface, the search input comprises receiving, via the graphical user interface, user input within the background image indicating a scale for the foreground object image. In some cases, the user selection of the location and the user input indicating the scale correspond to a query bounding box received within the background image via one or more user interactions
The series of acts 2200 also includes an act 2204 for providing, via a graphical user interface, a selectable option for executing an auto-composite model for the composite image. In particular, in one or more embodiments, the act 2204 involves providing, for display within a graphical user interface of a client device, at least one selectable option for executing an auto-composite model for the composite image.
In one or more embodiments, the auto-composite model comprising at least one of a scale prediction model, a harmonization model, or a shadow generation model. Indeed, as shown in
Additionally, the series of acts 2200 includes an act 2212 for detecting a user selection of the selectable option. For instance, in some cases, the act 2212 involves detecting, via the graphical user interface, a user selection of the at least one selectable option.
In one or more embodiments, detecting the user selection of the at least one selectable option comprises detecting a user selection of a selectable option for executing the scale prediction model. In some embodiments, detecting the user selection of the at least one selectable option comprises detecting a user selection a selectable option for executing the harmonization model. In some cases, detecting the user selection of the at least one selectable option comprises detecting a user selection of a selectable option for executing the shadow generation model.
Further, the series of acts 2200 includes an act 2214 for generating the composite image by executing the auto-composite model. To illustrate, in some implementations, the act 2214 involves generating, in response to detecting the user selection, the composite image by executing the auto-composite model using the background image and the foreground object image.
In one or more embodiments, executing the auto-composite model using the background image and the foreground object image comprises modifying, utilizing the scale prediction model, a scale of the foreground object image within the composite image based on a scale of the background image. In some embodiments, executing the auto-composite model using the background image and the foreground object image comprises modifying, utilizing the harmonization model, a lighting of the foreground object image within the composite image based on a lighting of the background image. In some implementations, executing the auto-composite model using the background image and the foreground object image comprises generating, utilizing the shadow generation model, a shadow associated with the foreground object image within the composite image.
In one or more embodiments, the object recommendation system 106 generates an initial composite image utilizing the background image and the foreground object image; and provides the initial composite image for display within the graphical user interface. Accordingly, in some instances, generating the composite image by executing the auto-composite model comprises generating the composite image by modifying the initial composite image within the graphical user interface via the auto-composite model.
In some instances, the object recommendation system 106 determines a recommended location or a recommended scale for the foreground object image within the composite image. Accordingly, in some embodiments, generating the composite image comprises generating the composite image utilizing the recommended location or the recommended scale for the foreground object image.
To provide an illustration, in one or more embodiments, the object recommendation system 106 provides, for display within a graphical user interface of a client device, at least one interactive element for providing search input and at least one additional interactive element for executing an auto-composite model comprising at least one of a scale prediction model, a harmonization model, or a shadow generation model; receives, via the graphical user interface, a user interaction with the at least one interactive element and an additional user interaction with the at least one additional interactive element; retrieves a foreground object image for use in generating a composite image with a background image in accordance with the user interaction with the at least one interactive element; generates the composite image utilizing the foreground object image and the background image by executing the auto-composite model in accordance with the additional user interaction with the at least one additional interactive element; and provides the composite image for display within the graphical user interface.
In some embodiments, providing, for display within the graphical user interface, the at least one interactive element for providing the search input comprises providing the background image for display within the graphical user interface; and receiving, via the graphical user interface, the user interaction with the at least one interactive element comprises receiving, within the background image displayed on the graphical user interface, a sketch input indicating a category of foreground object images to be retrieved. In some instances, providing, for display within the graphical user interface, the at least one interactive element for providing the search input comprises providing the background image for display within the graphical user interface; and receiving, via the graphical user interface, the user interaction with the at least one interactive element comprises receiving, within the background image displayed on the graphical user interface, a bounding box indicating a scale of foreground object images to be retrieved and a portion of the background image for which the foreground object images are to be compatible. In some cases, retrieving the foreground object image for use in generating the composite image comprises retrieving the foreground object image utilizing a composite object search engine that includes one or more of a compositing-aware search engine, a text search engine, or an image search engine.
In one or more embodiments, the object recommendation system 106 further performs a composite-aware search to retrieve an additional foreground object image based on a compatibility of the additional foreground object image with the composite image; and modifies the composite image to include the additional foreground object image.
In one or more embodiments, receiving the additional user interaction with the at least one additional interactive element for executing the auto-composite model comprises receiving a plurality of user interactions for executing the scale prediction model, the harmonization model, and the shadow generation model; and generating the composite image utilizing the foreground object image and the background image by executing the auto-composite model in accordance with the additional user interaction with the at least one additional interactive element comprises generating the composite image by executing the scale prediction model, the harmonization model, and the shadow generation model utilizing the foreground object image and the background image.
In some cases, the object recommendation system 106 further determines a recommended location and a recommended scale for the foreground object image within the composite image. Thus, in some embodiments, generating the composite image utilizing the foreground object image and the background image comprises inserting the foreground object image into the background image at the recommended location using the recommended scale.
To provide another illustration, in one or more embodiments, the object recommendation system 106 provides, for display within a graphical user interface of a client device, the background image for use in generating a composite image; receives, via the graphical user interface, user input selecting a location within the background image to position a foreground object image for the composite image; determines, utilizing the composite object search engine, the foreground object image for use in generating the composite image based on the location within the background image selected via the user input; receives, via the graphical user interface, additional user input for executing the auto-composite model for the composite image based on the background image and the foreground object image; and generates the composite image using the background image and the foreground object image in accordance with the user input and the additional user input.
In one or more embodiments, the object recommendation system 106 further determines a recommended scale for the foreground object image within the composite image based on the location selected by the user input; and generates the composite image using the background image and the foreground object image by positioning the foreground object image within the composite image at the location selected by the user input and using the recommended scale. In some embodiments, the object recommendation system 106 further receives, via the graphical user interface, further user input selecting an additional location within the composite image to position an additional foreground object image; determines, utilizing the composite object search engine, the additional foreground object image for use in modifying the composite image based on the additional location; and modifies the composite image utilizing the additional foreground object image based on the additional location.
In some implementations, the object recommendation system 106 receives the additional user input for executing the auto-composite model by receiving user selections for executing the scale prediction model, the harmonization model, and the shadow generation model. Further, in some instances, the object recommendation system 106 generates the composite image using the background image and the foreground object image in accordance with the additional user input by: modifying, utilizing the scale prediction model, a scale of the foreground object image within the composite image based on a scale of the background image; modifying, utilizing the harmonization model, a lighting of the foreground object image within the composite image based on a lighting of the background image; and generating, utilizing the shadow generation model, a shadow associated with the foreground object image within the composite image.
Embodiments of the present disclosure may comprise or utilize a special purpose or general-purpose computer including computer hardware, such as, for example, one or more processors and system memory, as discussed in greater detail below. Embodiments within the scope of the present disclosure also include physical and other computer-readable media for carrying or storing computer-executable instructions and/or data structures. In particular, one or more of the processes described herein may be implemented at least in part as instructions embodied in a non-transitory computer-readable medium and executable by one or more computing devices (e.g., any of the media content access devices described herein). In general, a processor (e.g., a microprocessor) receives instructions, from a non-transitory computer-readable medium, (e.g., a memory), and executes those instructions, thereby performing one or more processes, including one or more of the processes described herein.
Computer-readable media can be any available media that can be accessed by a general purpose or special purpose computer system. Computer-readable media that store computer-executable instructions are non-transitory computer-readable storage media (devices). Computer-readable media that carry computer-executable instructions are transmission media. Thus, by way of example, and not limitation, embodiments of the disclosure can comprise at least two distinctly different kinds of computer-readable media: non-transitory computer-readable storage media (devices) and transmission media.
Non-transitory computer-readable storage media (devices) includes RAM, ROM, EEPROM, CD-ROM, solid state drives (“SSDs”) (e.g., based on RAM), Flash memory, phase-change memory (“PCM”), other types of memory, other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer.
A “network” is defined as one or more data links that enable the transport of electronic data between computer systems and/or modules and/or other electronic devices. When information is transferred or provided over a network or another communications connection (either hardwired, wireless, or a combination of hardwired or wireless) to a computer, the computer properly views the connection as a transmission medium. Transmissions media can include a network and/or data links which can be used to carry desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer. Combinations of the above should also be included within the scope of computer-readable media.
Further, upon reaching various computer system components, program code means in the form of computer-executable instructions or data structures can be transferred automatically from transmission media to non-transitory computer-readable storage media (devices) (or vice versa). For example, computer-executable instructions or data structures received over a network or data link can be buffered in RAM within a network interface module (e.g., a “NIC”), and then eventually transferred to computer system RAM and/or to less volatile computer storage media (devices) at a computer system. Thus, it should be understood that non-transitory computer-readable storage media (devices) can be included in computer system components that also (or even primarily) utilize transmission media.
Computer-executable instructions comprise, for example, instructions and data which, when executed by a processor, cause a general-purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions. In some embodiments, computer-executable instructions are executed on a general-purpose computer to turn the general-purpose computer into a special purpose computer implementing elements of the disclosure. The computer executable instructions may be, for example, binaries, intermediate format instructions such as assembly language, or even source code. Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the described features or acts described above. Rather, the described features and acts are disclosed as example forms of implementing the claims.
Those skilled in the art will appreciate that the disclosure may be practiced in network computing environments with many types of computer system configurations, including, personal computers, desktop computers, laptop computers, message processors, hand-held devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, mobile telephones, PDAs, tablets, pagers, routers, switches, and the like. The disclosure may also be practiced in distributed system environments where local and remote computer systems, which are linked (either by hardwired data links, wireless data links, or by a combination of hardwired and wireless data links) through a network, both perform tasks. In a distributed system environment, program modules may be located in both local and remote memory storage devices.
Embodiments of the present disclosure can also be implemented in cloud computing environments. In this description, “cloud computing” is defined as a model for enabling on-demand network access to a shared pool of configurable computing resources. For example, cloud computing can be employed in the marketplace to offer ubiquitous and convenient on-demand access to the shared pool of configurable computing resources. The shared pool of configurable computing resources can be rapidly provisioned via virtualization and released with low management effort or service provider interaction, and then scaled accordingly.
A cloud-computing model can be composed of various characteristics such as, for example, on-demand self-service, broad network access, resource pooling, rapid elasticity, measured service, and so forth. A cloud-computing model can also expose various service models, such as, for example, Software as a Service (“SaaS”), Platform as a Service (“PaaS”), and Infrastructure as a Service (“IaaS”). A cloud-computing model can also be deployed using different deployment models such as private cloud, community cloud, public cloud, hybrid cloud, and so forth. In this description and in the claims, a “cloud-computing environment” is an environment in which cloud computing is employed.
As shown in
In particular embodiments, the processor(s) 2302 includes hardware for executing instructions, such as those making up a computer program. As an example, and not by way of limitation, to execute instructions, the processor(s) 2302 may retrieve (or fetch) the instructions from an internal register, an internal cache, memory 2304, or a storage device 2306 and decode and execute them.
The computing device 2300 includes memory 2304, which is coupled to the processor(s) 2302. The memory 2304 may be used for storing data, metadata, and programs for execution by the processor(s). The memory 2304 may include one or more of volatile and non-volatile memories, such as Random-Access Memory (“RAM”), Read-Only Memory (“ROM”), a solid-state disk (“SSD”), Flash, Phase Change Memory (“PCM”), or other types of data storage. The memory 2304 may be internal or distributed memory.
The computing device 2300 includes a storage device 2306 including storage for storing data or instructions. As an example, and not by way of limitation, the storage device 2306 can include a non-transitory storage medium described above. The storage device 2306 may include a hard disk drive (HDD), flash memory, a Universal Serial Bus (USB) drive or a combination these or other storage devices.
As shown, the computing device 2300 includes one or more I/O interfaces 2308, which are provided to allow a user to provide input to (such as user strokes), receive output from, and otherwise transfer data to and from the computing device 2300. These I/O interfaces 2308 may include a mouse, keypad or a keyboard, a touch screen, camera, optical scanner, network interface, modem, other known I/O devices or a combination of such I/O interfaces 2308. The touch screen may be activated with a stylus or a finger.
The I/O interfaces 2308 may include one or more devices for presenting output to a user, including, but not limited to, a graphics engine, a display (e.g., a display screen), one or more output drivers (e.g., display drivers), one or more audio speakers, and one or more audio drivers. In certain embodiments, I/O interfaces 2308 are configured to provide graphical data to a display for presentation to a user. The graphical data may be representative of one or more graphical user interfaces and/or any other graphical content as may serve a particular implementation.
The computing device 2300 can further include a communication interface 2310. The communication interface 2310 can include hardware, software, or both. The communication interface 2310 provides one or more interfaces for communication (such as, for example, packet-based communication) between the computing device and one or more other computing devices or one or more networks. As an example, and not by way of limitation, communication interface 2310 may include a network interface controller (NIC) or network adapter for communicating with an Ethernet or other wire-based network or a wireless NIC (WNIC) or wireless adapter for communicating with a wireless network, such as a WI-FI. The computing device 2300 can further include a bus 2312. The bus 2312 can include hardware, software, or both that connects components of computing device 2300 to each other.
In the foregoing specification, the invention has been described with reference to specific example embodiments thereof. Various embodiments and aspects of the invention(s) are described with reference to details discussed herein, and the accompanying drawings illustrate the various embodiments. The description above and drawings are illustrative of the invention and are not to be construed as limiting the invention. Numerous specific details are described to provide a thorough understanding of various embodiments of the present invention.
The present invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive. For example, the methods described herein may be performed with less or more steps/acts or the steps/acts may be performed in differing orders. Additionally, the steps/acts described herein may be repeated or performed in parallel to one another or in parallel to different instances of the same or similar steps/acts. The scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description. All changes that come within the meaning and range of equivalency of the claims are to be embraced within their scope.
The present application is a continuation-in-part of U.S. application Ser. No. 17/658,770, filed on Apr. 11, 2022. The aforementioned application is hereby incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
Parent | 17658770 | Apr 2022 | US |
Child | 18167690 | US |