The present technology generally relates to catalog-based image recommendations and associated systems and methods.
Consumers often shop visually. Frequently, a person who sees an eye-catching product on a retail website will try to find visually similar products in the website. Although there are existing techniques for determining similarity between two images, these techniques are not currently suited for online retailer recommendation systems. Some algorithms for generating similarity scores between images may have high accuracy for some types of images, enabling a recommendation system to select images that are of interest to a consumer. However, these algorithms may have significantly lower accuracy for other types of images, causing the recommendation system to produce poor recommendations. For a website selling many different types of products, existing algorithms fail to generate useful image-based recommendations across all of the website's products.
Aspects of the present disclosure are directed generally to catalog-based image recommendations and associated systems and methods. Specific details of several embodiments of the present technology are described herein with reference to
A catalog-based image recommendation system uses multiple image classification algorithms to extract feature vectors from images in an image catalog. The image catalog can include images from a variety of image categories. For example, the image catalog can contain images of products sold through an online retail website, and the images can be categorized according to the type of product shown in each image. Some of the image classification algorithms applied by the image recommendation system may be better for some of the image categories than for others. Image similarity measurements generated using a more accurate algorithm for the image's category produce more useful recommendations for a human user, while similarity measurements generated using algorithms that are less accurate for the image's category produce less useful recommendations. One image classification algorithm may be highly accurate for a first category in the image catalog, but relatively inaccurate for a second category in the image catalog.
To balance the varying accuracy of the multiple image classification algorithms for each category of image, the image recommendation system calculates and applies a set of weights to the outputs of the image classification algorithms. The set of weights is particular to an image category, and causes the output of more accurate algorithms for the category to have a greater influence on similarity determination than the output of less-accurate algorithms. The image recommendation system therefore flexibly applies the same set of image classification algorithms to all categories of images in the image catalog while generating more accurate similarity measurements. The image recommendation system is also scalable, allowing new categories of images to be processed in the same way as preexisting image categories without retooling the existing image classification algorithms.
In some implementations, server 210 is an edge server which receives client requests and coordinates fulfillment of those requests through other servers, such as servers 220A-C. In some implementations, server computing devices 210 and 220 comprise computing systems, such as computer system 100. Though each server computing device 210 and 220 is displayed logically as a single server, server computing devices can each be a distributed computing environment encompassing multiple computing devices located at the same or at geographically disparate physical locations. In some implementations, each server 220 corresponds to a group of servers.
Client computing devices 205 and server computing devices 210 and 220 can each act as a server or client to other server/client devices. In some implementations, servers (210, 220A-C) connect to a corresponding database (215, 225A-C). As discussed above, each server 220 can correspond to a group of servers, and each of these servers can share a database or can have its own database. Databases 215 and 225 warehouse (e.g., store) information such as catalog data. Though databases 215 and 225 are displayed logically as single units, databases 215 and 225 can each be a distributed computing environment encompassing multiple computing devices, can be located within their corresponding server, or can be located at the same or at geographically disparate physical locations.
Network 230 can be a local area network (LAN) or a wide area network (WAN), but it can also be other wired or wireless networks. In some implementations, network 230 is the Internet or some other public or private network. Client computing devices 205 are connected to network 230 through a network interface, such as by wired or wireless communication. While the connections between server 210 and servers 220 are shown as separate connections, these connections can be any kind of local, wide area, wired, or wireless network, including network 230 or a separate public or private network.
The image catalog 310 is a computer-readable storage of images. Images are stored in the image catalog in a computer-readable image format, such as .png, .jpeg, .jpg, .bmp, or binary stream. Each image can be associated with a category. In one example use of the image recommendation system 300, the image catalog 310 contains images of products sold through an online retailer, and the category associated with each image represents a type of product in the image. For example, images of clothing can be associated with categories such as shoes, shirts, pants, dresses, or bags. Alternatively, images can be categorized based on attributes of the content of the images. Example attribute-based categories include images of items that have a brown color, items that have a stripes pattern, or items that feature a logo. Any of a variety of other categories can be associated with the images in the image catalog 310, and the image catalog 310 may contain any number of categories of images.
The user I/O system 320 receives user requests and outputs data to the users in response to the requests. The user I/O system 320 generates interfaces for display to the user, for example via a user device, that enable the user to input an image or images for similarity matching and view similar images returned as a result of similarity matching. Examples of interfaces of the user I/O system 320 include, for example, web interfaces, desktop applications, mobile applications, application interfaces (APIs) without any graphical user interface, or computer-readable documents.
The processing unit 330 calculates similarities between images in the image catalog 310 and stores the calculated similarities in the similarity score store 350. The processing unit 330 can additionally generate image-based recommendations using the calculated similarities, for example in response to user queries for images that are similar to a target image. Processes performed by the processing unit 330 to calculate the similarity scores and use the scores to generate image recommendations are described, respectively, with respect to
The configuration system 340 applies system configurations to improve accuracy of the recommendations output by the image recommendation system 300. These configurations can include weights that are associated with each category of images and that define relative weighting of the outputs of each image classification algorithm applied by the processing unit 330. The configuration system 340 can, in some embodiments, dynamically determine the weights for each image category by evaluating accuracy metrics associated with the similarity scores output by the image classification algorithms when applied to images in the image category. Generally, those algorithms that produce more accurate similarity scores for a given image category can be assigned higher weights, while those algorithms that generate less accurate similarity scores are assigned lower weights. By dynamically generating the weights, the configuration system 340 enables the processing unit 330 to flexibly apply the same image classification algorithms to any image in the image catalog, while accounting for varying accuracy of the algorithms for different image categories.
The similarity score store 350 comprises one or more computer-readable storage mechanisms to store the calculated similarity scores for future use. The storage could be a persistent or a non-persistent storage with fast access. Examples of persistent storage include, but are not limited to, a database system or a file system. Examples of non-persistent storage include, but are not limited to, in-memory storage, CPU cache storage, or GPU cache storage.
As shown in
At block 420, the processing unit 330 applies multiple image classification algorithms to the first image to extract respective sets of features from the first image. The multiple image classification algorithms can include different types of algorithms, such as a combination of neural network-based algorithms and visual descriptor-based algorithms. Additionally or alternatively, the multiple image classification algorithms can include algorithms that are trained differently, such as a first neural network trained with a first type of training data and a second neural network trained with a second type of training date. Existing image classification algorithms, new customized algorithms developed for the image catalog, or a combination of existing and customized algorithms can be used among the multiple algorithms applied to the first image. Collectively, the multiple image classification algorithms extract different feature sets from the first image. At block 430, the processing unit 330 calculates a feature vector for the first image using the features output by each algorithm.
As an example first algorithm applied at block 420 and used to extract feature vectors at block 430, the neural network-based image classification algorithm AlexNet can be applied to the first image. An example process for using AlexNet to extract features from the first image is as follows:
Another example algorithm of the multiple image classification algorithms applied to the first image is the neural network-based VGG16. An example process for using VGG16 is as follows:
Pattern method, Next, calculate a feature vector for the preprocessed first image by performing average pooling of convolution5_3 layer of VGG16 Net.
As an example visual descriptor-based algorithm that can be applied at block 420, the processing unit 330 can use a Bag of Visual Words algorithm. Applying this algorithm can include the following process:
Each cluster center represents a feature or visual word. The histogram of visual words can be used as a feature vector for the first image.
Another example visual descriptor-based algorithm is Fisher Vectors. To apply the Fisher Vectors algorithm, the processing unit 330 can perform a process including:
At block 440, the processing unit 330 calculates similarity scores between the first image and one or more other images in the image catalog. For each pair of images, the processing unit 330 calculates multiple similarity scores. As illustrated in
At block 450, the processing unit 330 accesses a set of weights associated with the category of the first image. The set of weights includes multiple weights, each of which corresponds to one of the multiple image classification algorithms applied to the first image. Collectively, the set of weights define relative influence of each image classification algorithm for determining similarity between images in a given category. For example, each weight in the set has a value between zero and one, and the values of the weights may together sum to a value of one. A process for generating the set of weights is described with respect to
At block 460, the processing unit 330 generates weighted similarity scores by applying the weights in the set of weights to the similarity scores calculated between the first image and one or more other images. For example, the processing unit 330 multiplies each similarity score by the weight that corresponds to the image classification algorithm from which the feature vectors used to generate the similarity score were derived. The output of block 460 is a set of weighted similarity scores between each pair of images. The processing unit 330 stores the sets of weighted similarity scores in a computer readable storage system at block 470.
The processing unit 330 repeats the process shown in
The set of weights applied by the processing unit 330 to generate weighted similarity scores can be particular to the category of the first image, representing relative influences of the multiple image classification algorithms for determining the image similarities for the category. One embodiment of a process 500 for generating the set of weights is illustrated in
As shown in
At block 520, the configuration system 340 applies multiple image classification algorithms to each image in the golden dataset to extract sets of feature vectors from each image. The process to apply the algorithms and extract feature vectors can be similar to that described with respect to block 420. At block 530, the configuration system 340 generates raw similarity scores between pairs of images in the golden dataset using the extracted feature vectors. Each pair of images can be associated with multiple similarity scores, where each similarity score quantifies a similarity of the feature vectors extracted from the images in the pair by one of the multiple image classification algorithms.
Starting with a weight set containing initial weight values, the configuration system 340 performs a grid search at block 540 to select weights to apply to the raw similarity scores of the golden dataset. During the grid search, the configuration system 340 searches for weight values that, when applied to the raw similarity scores, will generate weighted similarity scores within a threshold of the predetermined similarities of the images. In some embodiments, the weights in the set are constrained to a fixed sum, such as a value of one. The grid search adjusts and tests the values of the weights while keeping the sum of the weights within the constraint.
Once the weight values have been selected, at block 550, the configuration system 340 stores the selected weights as a weight set associated with the category of the golden dataset. The stored weight set can be retrieved for use by the processing unit 330 for generating weighted similarity scores between images of unknown similarity, as described with respect to
The catalog-based image recommendation system 300 uses the weighted similarity scores generated and stored by the processing unit 330 to generate image recommendations.
At block 610, the image recommendation system 300 selects a target image from the image catalog. In some cases, the target image may be selected in response to an explicit user query that specifies an image and request images that are similar to the specified image. In other cases, the target image may be selected in response to an implicit user request. For example, if a user is viewing a product's webpage through an online retail website, the image recommendation system 300 may select an image of the product as the target image in order to generate a set of recommended images for display on the product's webpage.
At block 620, the image recommendation system 300 retrieves the weighted similarity scores for the target image. As discussed above, the weighted similarity scores include multiple similarity scores between the target image and each of one or more other images in the catalog, where each of the multiple scores is weighted by a weight value that is particular to the category of the target image and that corresponds to an image classification algorithm that generated the feature vectors from which the similarity score was calculated.
At block 630, the recommendation system 300 sorts the retrieved similarity scores and selects, at block 640, one or more recommendations based on the sorted scores. In some embodiments, the recommendation system 300 selects a specified number of images that have the highest weighted similarity scores to the target image. For example, the recommendation system 300 identifies the ten highest weighted similarity scores, and selects the images corresponding to the ten identified scores as the recommended images. The recommended images can be output to the user.
Embodiments of the catalog-based image recommendation system described herein provide a flexible framework for generating image recommendations across multiple categories of images. By generating category-specific weightings, the recommendation system favors the outputs of image classification algorithms that better represent human-observable similarities between images in a given category.
The above detailed description of embodiments of the technology are not intended to be exhaustive or to limit the technology to the precise form disclosed above. Although specific embodiments of, and examples for, the technology are described above for illustrative purposes, various equivalent modifications are possible within the scope of the technology as those skilled in the relevant art will recognize. For example, although steps are presented in a given order, alternative embodiments can perform steps in a different order. The various embodiments described herein can also be combined to provide further embodiments.
From the foregoing, it will be appreciated that specific embodiments of the technology have been described herein for purposes of illustration, but well-known structures and functions have not been shown or described in detail to avoid unnecessarily obscuring the description of the embodiments of the technology. Where the context permits, singular or plural terms can also include the plural or singular term, respectively.
Moreover, unless the word “or” is expressly limited to mean only a single item exclusive from the other items in reference to a list of two or more items, then the use of “or” in such a list is to be interpreted as including (a) any single item in the list, (b) all of the items in the list, or (c) any combination of the items in the list. Additionally, the term “comprising” is used throughout to mean including at least the recited feature(s) such that any greater number of the same feature and/or additional types of other features are not precluded. It will also be appreciated that specific embodiments have been described herein for purposes of illustration, but that various modifications can be made without deviating from the technology. Further, while advantages associated with some embodiments of the technology have been described in the context of those embodiments, other embodiments can also exhibit such advantages, and not all embodiments need necessarily exhibit such advantages to fall within the scope of the technology. Accordingly, the disclosure and associated technology can encompass other embodiments not expressly shown or described herein.
This application claims the benefit of U.S. Provisional Patent Application No. 62/896,485, filed Sep. 5, 2019, which is incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
62896485 | Sep 2019 | US |