VISUAL FACET SEARCH ENGINE

BACKGROUND

Search engines are used to navigate the Internet or other networks by identifying webpages related to a search query. A search results page is presented in response to a search query, and the search results page includes links to the related webpages. Some search engines are website specific, meaning that the search engine returns search results corresponding to different webpages of the website. Other search engines identify search results across the Internet by crawling the Internet, and the related webpages found are presented to the user at the search results page. Some search engines can conduct a reverse image search, which is a content-based image retrieval query in which a sample image is provided to a search engine for a search. Reverse image searching may be used for finding a content creator, finding a source of an image, getting information about an image, and so forth.

SUMMARY

At a high level, aspects described herein relate to a visual facet-based search for search results at a search engine that uses an item listing image from the search results as an image for a facet option. In general, facets are tools for navigating searches at search engines. A user can use a facet option of the facet to refine a search query and to refine category listings.

The item listing image is identified from search results having item listing images. The item listing image is identified upon receipt of a search image as a search query at a search engine and is selected based on an item feature prominence score determined by a machine learned model. The machine learned model can be trained using images having known item features. When employed, the trained machine learned model identifies an item listing image from among the item listing images from the search results. Upon inputting the item listing images, the machine learned model outputs an item feature prominence score for an item feature in the item listing image identified for the facet option. That is, the trained machined learned model identifies the item listing image among the item listing images based on item feature prominence scores corresponding to the item listing images. The item feature prominence score is associated with an item feature of the particular item listing image and indicates a relative prominence of the item feature corresponding to the item listing image. In aspects, the item listing image identified has a highest item feature prominence score compared to the other item listing images.

Once the item listing image is identified, the item listing image is provided for display at the search engine. The item listing image is identified as the image for the facet option associated with the item feature of the item listing image. The facet option is included within a set of facet options for a facet that was determined for the search image.

The Summary is intended to introduce a selection of concepts in a simplified form that is further described in the Detailed Description of this disclosure. The Summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter. Additional objects, advantages, and novel features of the technology will be provided, and in part will become apparent to those skilled in the art upon examination of the disclosure or learned through practice of the technology.

BRIEF DESCRIPTION OF THE DRAWINGS

The present technology is described in detail below with reference to the attached drawing figures, wherein:

FIG. 1 is an example operating environment in which an example visual facet search engine can be employed, in accordance with an aspect described herein;

FIG. 2 is an example graphical user interface generated by the visual facet search engine of FIG. 1 to provide facet options having item listing images, in accordance with an aspect described herein;

FIG. 3 is another example graphical user interface generated by the visual facet search engine of FIG. 1 to provide facet options having item listing images, in accordance with an aspect described herein;

FIGS. 4 and 5 are block diagrams of example methods for identifying item listing images for a facet option from search results having item listing images using the visual facet search engine of FIG. 1, in accordance with aspects described herein; and

FIG. 6 is an example computing device suitable for implementing the described technology, in accordance with an aspect described herein.

DETAILED DESCRIPTION

Conventional search engines fail to provide images for facets that have a visual context for the respective search refinement options. These search refinement categories, called “facets” (such as brand, color, size, etc.), are useful when a user is using a search engine. Facets can comprise facet options as refinement options for the facet. For instance, a facet for “color” may comprise facet options of “red,” “blue,” and so forth. Conventional facets provide facet options that are textual (e.g., “red,” “blue,” “formal,” “women's,” etc.). These textual facets tend to reorient a user using an image as a search query from a visual-based search to a textual-based search. Further, conventional facet options that include images (e.g., a solid color that can be used to refine search results) are not accurate depictions of the images found in the search results, making it challenging for users to use visual filters and challenging for search engines to provide proper and accurate searches. Moreover, the conventional facet options that do use images are not the most prominent, distinctive, or illustrative of the facet option compared to a population of images that can be used for the facet option.

The problems with using these conventional images for search engines and Internet users include, but are not limited to, improper search results based on the lack of prominence of items in the images displayed for refinement options, inaccurate search results due to the lack of prominence of the items in the images displayed, a population of search results that are too broad due to the lack of prominence, and an inaccurate detection of user intent due to the lack of prominence of the items in the images displayed for the refinement options. In particular, a user can only view so many search results, and only a limited number of search results can be provided at a user interface. Thus, as the conventional methods are presenting broad and inaccurate search results based on visual refinement options that are not prominently and accurately displaying item features of each refinement option, the limited number of item listings presented by the conventional methods fail to capture the intent of the users.

As an example, a search for an image of a “running shoe” entered into a commonly used search engine can provide hundreds of thousands of related images that include many different brands, colors, sizes, styles, and are made of many different materials. Continuing the example, by refining the “running shoe” image search by using a conventional visual refinement option that provides an image of a light grey running shoe having 80% light grey coloring and 20% greyish-white coloring for a “color” facet for a facet option of “grey,” the search results provide hundreds of thousands of images of running shoes having a plethora of shades of grey; most of which include other colors in addition to grey, such as white, black, gold, and dark yellow. Further, these search results also include shoes having a majority color that is similar to grey, such as light blues and light purples. Furthermore, the conventional refinement options fail to provide additional color refinement features after a selection of the “grey” refinement option.

Thus, due at least in part to the high number of search results having too many options that are not all relevant or accurate with respect to the intent of the search and due to the nature of visual search queries of conventional methods, more accurate and relevant visual facet options that display the item for the facet option more prominently so that more precise and accurate search results related to the user's intent are provided at a search results page is critical to the operation of the Internet and the operation of devices utilizing visual search engines. Otherwise, the probability of a person searching for a particular item via a search engine is nearly impossible to find, since the results that are presented are in the millions and are not accurate or precise with respect to the searcher's intent. As such, conventional methods are effectively making it difficult to properly navigate search result images from a visual search using visual facet options.

Furthermore, conventional methods also use stock images for visual refinement options or images that are not included within the search results. The use of stock images or images not from the search results causes latency in the network and in receiving the search results due to additional database recall. To overcome this additional problem in the conventional methods, the present technology disclosed herein reduces these latency issues caused by the use of stock images and the use of images that are not from the search results as the visual refinement options by using the images of the search results as the images for the visual facet options. As such, the present technology reduces the latency in the conventional methods by identifying images in the database from recalled search results, as opposed to recalling stock images or images that are not within the search results.

In addition, the present technology disclosed herein provides mechanisms to help overcome the challenges in the conventional methods. For example, some aspects of the present technology employ a machine learned model that identifies an image for a facet option that has a particular item feature prominence score. The images provided for the facet options having the particular item feature prominence scores provide for more accurate searching at a faster rate without having to search hundreds of thousands of results that are too broad. By providing more accurate search results, the user can be presented with the best of the related search results, thus allowing the user the ability to actually identify an intended search result.

In all, by providing images for facet options that have the particular item feature prominence score, the search system receives fewer inputs from the user, as would be traditionally required to achieve a similar result. From the perspective of the search engine, the search engine is processing fewer commands due to the fewer user inputs. This enhances the efficiency of the computing system employing the search engine because the computing system is now able to process a larger number of search requests from users with the same computational processing ability as before. When employed, the reduced inputs and functions processed in response frees up bandwidth across the network on which the computing system is employed, thus allowing the network to facilitate the processing of more transactions by the system.

One example of the technology that effects these benefits includes computerized methods for providing an item listing image have an item feature prominence score, the item listing image provided for a visual facet option for a visual search. For example, the computerized methods receive a search image as a search query at a search engine. Further, search results are identified for the search query. In aspects, the search results are identified based on search image features extracted from the search image. The search results may comprise item listings associated with item listing images. In some aspects, facets are determined for the search image based on the search results, and each facet comprises facet options. Further, a machine learned model may be employed to identify an item listing image among the item listing images based on an item feature prominence score. Further, the item feature prominence score may be determined by the machine learned model. The item feature prominence score may be associated with an item feature of the item listing image and indicate a relative prominence of the item feature within the item listing images. As such, the item listing image identified by the machine learned model is provided for display at the search engine as a first facet option for a first facet determined for the search image.

It will be realized that the method previously described is only an example that can be practiced from the description that follows, and it has been provided to more easily understand the technology and recognize its benefits. Additional examples are now described with reference to the figures.

With reference first to FIG. 1, among other components or engines not shown, example operating environment 100 includes computing device 102. Computing device 102 is shown communicating using network 104 search server 108, visual facet search engine 110, and database 130.

Beginning with computing device 102, the computing device 102 may be a device that corresponds to the computing device 600 described with reference to FIG. 6. In implementations, computing device 102 may be a client-side or front-end device, while in other implementations computing device 102 represents a back-end or server-side device. As will be discussed, computing device 102 may also represent one or more computing devices, and as such, some variations of the technology comprise both a client-side or front-end device, and a back-end or server-side computing device performing one or more functions that will be further described.

Turning to network 104, network 104 may include one or more networks or one or more portions of networks (e.g., a public network such as the internet or a virtual private network “VPN”). Network 104 may include, without limitation, one or more local area networks (LANs) wide area networks (WANs), or any other communication network or method that enables communication unilaterally, between, or among machines, databases, computing devices, and so forth.

Turning now to search server 108, the search server 108 receives one or more search queries from a user or other service. For instance, the search query may be received from computing device 102. The search queries may be textual queries or visual queries. In some embodiments, the query is an audio query. In some embodiments, the search server 108 sends the search query to an e-commerce server, which generates results for the user. It will also be understood that in other configurations, the search server 108 acts as the e-commerce server. In some embodiments, the search server 108 performs the search query on a search image or an image identifier to identify item listings and item listing images corresponding to the item listings. As such, the search server 108 receives the search query from the image receiving component 112 and provides the search results to the search results identifier 114 of the visual facet search engine 110, as will be discussed in more detail.

Having identified various components of operating environment 100, it is noted again, and emphasized, that any additional or fewer components, in any arrangement, may be employed to achieve the desired functionality within the scope of the present disclosure. Although some components of FIG. 1 are depicted as single components, the depictions are intended as examples in nature and in number, and are not to be construed as limiting for all implementations of the present disclosure. The functionality of operating environment 100 can be further described based on the functionality and features of its components. Other arrangements and elements (e.g., machines, interfaces, functions, orders, groupings of functions, etc.) can be used in addition to or instead of those shown, and some elements may be omitted altogether.

Further, many of the elements described in relation to FIG. 1, such as those described in relation to visual facet search engine 110, are functional entities that may be implemented as discrete or distributed components or in conjunction with other components, and in any suitable combination and location. Various functions described herein are being performed by one or more entities and may be carried out by hardware, firmware, or software. For instance, various functions may be carried out by a processor executing computer-executable instructions stored in memory. Moreover, the functions described in relation to FIG. 1 may be performed by computing device 102 at either the front-end, back-end, or in any combination or arrangement.

Turning now to visual facet search engine 110, visual facet search engine 110 comprises image receiving component 112, search results identifier 114, facet determiner 116, item listing image identifier 118, and facet option generator 120. Visual facet search engine 110, and components thereof, may be employed by the search server 108. Further, the visual facet search engine 110 determines item listing images from search results for facet options by employing a machine learned model to determine item feature prominence scores for the item listing images. In aspects, the item feature prominence scores are associated with an item feature of the item listing image. Further, the item feature prominence scores indicate a relative prominence of an item feature within an item listing image.

Image receiving component 112 may receive one or more search images for a search query at the search server 108. For example, the user can instantly take a photo to be used as the search query using a camera of the computing device 102, upload an image from a photo gallery accessible to the computing device 102, or upload an image stored locally on the computing device 102. The one or more search images may include one or more items within the image and may have various item features that correspond to the one or more items, and a background of the corresponding search image.

The one or more items in a search image may correspond to one or more categories. In some cases, categories may include media (e.g., books, music, video games), collectibles, antiques, art (e.g., landscape paintings and art glass), fashion, jewelry (e.g., watches, earrings), cellphones, accessories, health and beauty, and so forth. In some embodiments, item listings include item listings of products available on a network-based marketplace. In some embodiments, item listings include item listing images of the products available on the network-based marketplace. Further, item listings may include details comprising an item condition, an item rating, a pattern, a product identification for the item, a brand, a style, a size, a seller identification, a color, an available quantity, a price (e.g., list price, sale price, or current auction price or bid), a number of items previously sold, and any other suitable information related to sale, purchase, or interaction with the item listing. In some embodiments, a set or a subset of item listings may include similarly related items, items having one or more common features, one or more common features corresponding to a quality of the item listing image, a common category, and so forth. In some embodiments, items in the set or the subset may be indexed based on tags.

Further, examples of search image features that are extractable from the search image may include orientation, brightness, and user usage.

Beginning with orientation, an “aspect ratio” of a search image is the ratio of its width to its height. When the search image has an aspect ratio higher than 1, the search image has “horizontal orientation.” When the search image has an aspect ratio lower than 1, the search image has “vertical orientation.”

Turning to brightness, in aspects, images stored in a gallery may have higher brightness features than search images taken directly from the camera of computing device 102. An item corresponding to a search image may have various brightness features at various portions of the item. Further, images stored in a gallery may include both user-generated images and professional studio photos locally stored and taken by a different device. In embodiments, the professional studio photos have a total brightness score that is greater than the total brightness score of the user-generated images.

Turning to user usage, the user usage of an image may include a number of times the user viewed the image, a number of times the image was edited by the user, a number of times the image was transmitted from the computing device 102, a total length of time the user viewed the image since its upload or capture, and the like.

Turning to search results identifier 114, search results are identified for the search query in response to receiving the image at the receiving component 112. Search results include item listings associated with item listing images. The search results are based on search image features extracted from the search image. For example, the search image features relate to the item depicted in the search image. The search image may have one or more items depicted, and not all items depicted in the image may be relevant. As such, the search results correspond only to the items in the image that are relevant in some embodiments.

For example, item relevance may be determined based on sizes of the items in an image. Continuing the example, if one item in the image is 3× the size of another item in the image, the item having the larger size is identified as the relevant item. As another example, item relevance may be determined using item recognition and user intent. In aspects, user intent is determined by identifying items in the item listing images and comparing the items to a search corpus specific to a particular search engine, and identifying the most relevant item in the item listing images. For example, if the user intent is to shop online for products offered by a particular website and the uploaded image contains three items including a woman wearing a pair of sunglass and a large sun in the background, search results identifier 114 will then classify the sunglasses as the most relevant item in the image.

Turning now to facet determiner 116, facet determiner 116 generally determines facets. The facets may be determined based on the search results, such as the item features associated with the item listing images. In embodiments, facets may be determined based upon the item features of the item listing images associated with a category. For example, for a category “books,” facet determiner 116 may extract item features from the item listing images, rank each of the item features based on how many item listing images have that item feature or how many times a textual refinement was used. For example, for a category of “books,” facets may include the following: genre, author, suggested age range, and other available languages. In aspects, the facets may be determined using metadata, such as tags.

In embodiments, facet determiner 116 may prioritize facets based on certain item features that are more clearly illustrated by a visual facet than by a textual facet (such as a particular shade of a color of apparel, a size of an uncommon dog breed next to an adult Dalmatian for size reference, a texture of material for sewing, and so forth). In some embodiments, certain item features of the item in the item listing image may be prioritized over other item features. For example, if the item is a shoe, shoe color and material may be prioritized over shoe size, since shoe size is difficult to determine based solely on an image of only a shoe. On the other hand, if the shoe is next to a ruler in an item listing image, then shoe size may take priority as an item feature in an item listing image over a color of the shoe lace. In some embodiments, facets may be prioritized based on a purchase history of the user (e.g., the user has a pattern of purchasing white running shoes of a particular brand and size once every three months). In some embodiments, facets are prioritized based on a purchase history of users having purchase histories similar to other users (e.g., a majority of the users who have also purchased white running shoes of the same particular brand as the user purchase these shoes made of nylon—shoe material being the prioritized facet in this scenario).

Furthermore, facet determiner 116 also determines an order in which to present the facets determined. In embodiments, the order of facets is based on user histories of previously used facets from one or more applications. For example, attributes associated with item features of the item listing images associated with the user histories may be extracted. Attributes associated with a high number of purchases, for example, may be listed or presented prior to those attributes associated with a lower number of purchases. In aspects, the high number of purchases are associated with users who have a common user history with the user of the present search query, the common user history corresponding to the search image. In embodiments, the common user history may include a common geographical area registered with a user profile, a number of purchases of a brand of a particular apparel item that exceeds a threshold, and so forth.

In aspects, users having the common user history can be clustered via a clustering algorithm to identify common users. For example, users having a plurality of common aspects, such as a number of views of a particular product that is above a threshold and a number of purchases of the same items from the same application that is above a threshold, may be clustered for identification as common users having common user histories. In aspects, facets and an order of the facets may be determined based on the common user histories.

In an embodiment, the order of facets is based on a number of clicks for refining text and image queries for a particular application. Continuing the example, an order of facets for a “shoe” search (by text or image) may include the following: brand, color, size, material, US shoe size, size type, style, type, and heel height. As another example, an order of facets for a “clothes” search (by text or image) may include the following: sleeve length, dress length, original/reproduction, pattern, and team. In aspects, the number of clicks for facets that are below a threshold may not be presented in the order of facets.

Turning now to item listing image identifier 118, an item listing image from the search results is identified for a particular facet option (e.g., red, blue, and white may be facet options for a color facet for an item that is a sweatshirt). In aspects, a machine learned model (e.g., machine learned model(s) 138) is employed to identify an item listing image among the item listing images of the search results. In identifying an item listing image, the machine learned model may output an item feature prominence score for an input item. An item listing can be selected based on its item feature prominence score. The item feature prominence score is associated with an item feature of the item listing image and indicates a relative prominence of the item feature within the item listing images.

The machine learned model is trained to determine the item feature prominence scores for item listing images. The machined learned model can be trained using a training data set (e.g., training data set 136) that comprises images having known items, image features, or item features. For instance, images of the items may be selected for use in the training data set. The selected item images of the training data set can be labeled by tagging them to indicate an item within the image or the items features within the image. The machine learned model can be trained using the training data set to provide a trained machine learned model.

When employed, the trained machine learned model may receive an item listing image as an input. In an aspect, the trained machine learned model identifies item features within the input image. The identification of the item features can comprise a confidence score indicating the probability that the item feature identified by the trained machine learned model is correctly identified.

In some cases, the trained machine learned model identifies more than one item feature in an image. For instance, an input image of a shoe may be identified as having an item feature associated with brand, color, style, etc. Thus, in cases where an input image comprises more than one item feature, the trained machine learned model may identify one or more of the item features of the input image. In doing so, each of the identified item features has an associated confidence score.

The associated confidence score may be used as a variable to determine the item feature prominence score. That is, the confidence score may indicate an item feature prominence relative to other item features. For instance, item features having a relatively greater confidence score output by the trained machine learned model compared to other identified item features in the input image or other input images are identified. Item listing image identifier 118 can determine that an item feature has a greater prominence score relative to other item features in the input image or another input image based on the relatively greater confidence score for the item feature. In an aspect, the item feature prominence score is determined by the machine learned model also based on a comparison of a background size of the item listing image relative to the item in the item listing image.

Additionally, item features of the item listing images may include features of the item itself, such as an item identification (e.g., the item depicted is a sapphire ring), one or more colors of the item, a category of the item, a use of the item, and so forth. Further, image features of the item listing images can include features of the image itself, such as a blurriness score associated with a resolution. The blurriness score can indicate an overall blurriness evaluation of the image by considering various portions of the item listing image. Further, the item listing image identifier 118 can determine a blurriness score for a background of the image and for the item of the image separately. In embodiments, a high blurriness score would reduce the overall item feature prominence score for that particular item listing image. In aspects, the blurriness score is determined using a spatial resolution. The spatial resolution refers to a number of pixels and an angle subtended, θ, by each pixel on a viewer's retina. For instance, an item listing image having a better spatial resolution than a second item listing image would have a higher number of pixels than the second item listing image. A spatial resolution may also depend upon interactions with a compressions system and a display. Further, spatial resolution may depend upon a size and a quality of a detector.

Additional features of the image itself may also include image brightness, image type, and aspect ratio. For example, a gallery image may have a higher overall item brightness score relative to the background of the image than an image directly taken by the computing device 102 for the search query. Further, additional features of the item listing images include size, cropping, angle view, background quality, frame, watermarks, inclusion of human body parts, and additional items besides the main product for sale. The machine learned model can be trained for recognizing each of these features within the item listing images from the search results. For instances, each of these features may be used as tags when labeling the images of the training dataset.

The features of the item listing images can be classified by a classifier and indexed based on a feature vector. In embodiments, the machine learned model is trained over a large collection of images uploaded by e-commerce sellers as part of their listing process. In some embodiments, the sellers upload the images from both a local and remote database (e.g., database 130). In embodiments, the machine learned model can be trained to classify item listing image features using search image results from users clustered in the same vector space.

In one aspect, the training images are of a similar resolution that is within a particular range, are of similar size within a particular range, have overall blurriness scores that are within a particular range, have similar background qualities (e.g., the background is lighter for a dark item), and have the same file extension type. In one embodiment, the training data includes non-image data that corresponds to a text file from a text-based search query.

Based on the training using the training data, when employed, the trained machine learned model determines a confidence score indicating an item feature prominence relative to other item features. For example, a classifier may be trained to identify sunglasses having “photochromic” lenses in an image. Based on the training, the machine learned model provides confidence scores for a set of item listing images for whether the item listing images prominently contain the sunglasses with photochromic lenses.

In an embodiment, if a facet determined by the facet determiner 116 is “material” for an apparel item, a set of facet options for the “material” facet could include the following: leather, cotton, silk, polyester, linen, wool, and nylon. As such, the item listing image identifier 118 identifies the most prominently displayed silk material within the item listing images, the most prominently displayed silk material having a highest item feature prominence score compared to other images of the item listing images displaying silk apparel. The item listing image having the most prominently displayed silk material for the apparel is the image identified for the facet option of “silk.”

By way of another non-limiting example for a user searching for a black kitten, the black kitten item listing image identified for a second facet option (the first facet option being “black” and the second facet option being “kitten”) could have a lighter background so that the black kitten would stand out, the black kitten in the image would have a spatial resolution above a threshold so that the user could tell that the image includes a kitten with a full set of solid black fur, the image would not have any distractors in the background (such as multiple cat toys), and the image would clearly indicate that the kitten is still a kitten and not an adult cat.

Turning now to facet option generator 120, when employing the facet option generator 120, the visual facet search engine 110 generates a facet option having the identified item listing image displayed as the image for the facet option. In embodiments, the facet option having the item listing image is provided for display at the search engine. Further, facet option generator 120 can also provide for display a second set of facet options for a second facet determined for the image that was searched for the search query. A second facet option within the second set of facet options may have a second item listing image from the item listing images, the second item listing image being different than the image for the facet option. In embodiments, facet options may be provided as selectable filters for narrowing the search query.

Upon receipt of a selection of the facet option, item listing images for other facet options not selected change based on the selection. For example, if a “blue” facet option image is selected, then the set of facet options that were not selected will have updated item listing images that are blue and that have an item feature prominence score above a threshold. In other embodiments, the set of facet options that were not selected will have updated item listing images that are blue and that have a highest item feature prominence score compared to other item listing images corresponding to each particular facet option. Further, the order that the facet options are presented is also changed based on the selection.

Turning to database 130, database 130 may comprise item data 132, user history 134, training data set 136, and machine learning model(s) 138. In aspects, database 130 generally stores information including data, computer instructions (e.g., software program instructions, routines, or services), or models used in embodiments of the described technologies. Although depicted as a single database component, database 130 may be embodied as one or more data stores or may be in the cloud. An example data store that is suitable for use with the present technology includes memory 612, which is described in more detail in FIG. 6.

Turning to item data 132, item data 132 may include structured and unstructured data. Structured data includes data that is organized in some scheme that allows the data to be easily exported and indexed as item data 132 with minimal processing. Structured data can generally be collected and rearranged to comport to the index of item data within item data 132. Unstructured data is anything other than structured data. It relates to an item and generally discusses the item within context. However, unstructured data generally requires additional processing in order to store it in a computer-useable format within item data 132. In embodiments, item data comprises identification data (e.g., the item depicted in the item listing images is a shirt), one or more colors of the item, a category of the item (e.g., kitchenware, gardening items, jewelry, makeup, etc.), a use of the item, a size of the item (e.g., small, medium, large), and so forth.

In some embodiments, item data may include, for example, features in the item listing image, such as features of the item in the item listing image or features of the image. In embodiments, an image feature may include a low pixel density, an item feature may be that the item within the image is surrounded by other items in the image, the image may have a watermark feature, an item feature may be that the item is small relative to the image as a whole, the background of the image may obscure the item, light captured in the image may cause a glare effect that disrupts the prominence of the item in the image, the brightness of the item in the image may be too low, and so forth.

In aspects, the features of the item listing images may be extracted from the item listing images. For example, a processor can perform item key point feature identification and key point spatial information coding, descriptor encoding functionality to encode descriptors, and query compression for transmission as part of a visual search request. In some embodiments, local-feature algorithms extract detecting features that do not vary based on scale or resolution of the item in the image. In some embodiments, a computer vision database extracts metadata describing an item listing image. In some embodiments, color, shape, and texture corresponding to the item are extracted from the image via local descriptors (e.g., by using scale invariant feature transform or speed up robust features from feature points detected within the image). In some embodiments, hash-based algorithms are used to perform wavelet transformation for compressing extracted features. In some embodiments, background features, brightness, glare, and spatial resolution in the image are extracted.

Turning now to user history 134, visual facet search engine 110 may use user history 134 stored in database 130 for determining item feature prominence scores. The user history 134 may comprise, for example, any identifying aspect of the user, such as a name, a home or work general location and addresses, education, clothing size, occupation, hobbies, age, gender, and so on. The user history 134 could include aspects such as frequency of purchases, items previously purchased, search query history, item returns, prices of items, top brands purchased, days of the year in which particular items were most commonly purchased, subscriptions, and so on. The user history 134 may also include item options for previously purchased items, and the item option categories with which the item options are associated. The user history 134 can include historical interactions with historical search results.

Turning now to training data set 136, the training data set 136 includes training images corresponding particular features of particular items in images and particular types of images (e.g., images having a certain pixel density, spatial resolution and number of independent pixel values per unit length, optical resolution, atmospheric distortion, focus, etc.). The training images can be labeled to indicate the particular features, such that when used to train a machine learned model, the trained model identifies the features in an image. In some embodiments, the training images may include images corresponding to known user information for a particular user or known user purchase histories for a plurality of users. In one aspect, the training data sets are grouped based on similar spatial resolution that is within a particular range, similar ground sample size within a particular range, a number of independent pixel values per unit length being within a particular range, and similar background qualities (e.g., the background is lighter for a dark item).

Turning now to machine learning model(s) 138, machine learning algorithms or networks, such as deep learning neural networks or convolutional neural networks, may be trained using training data set 136. In an aspect, the deep learning neural network may comprise a first set of levels for different image feature sets and a second set of levels corresponding to two or more levels in an image feature classification hierarchy. In embodiments, classifiers may use an objective function combining recognition results for the second set of levels in the deep learning neural network. The classifiers may share item feature representation based on the first set of levels in the deep learning neural network.

In embodiments, the machine learning model(s) 138 is a neural network. In embodiments, the machine learning model(s) 138 is a random forest model. In aspects, the machine learning model(s) 138 can include many different sizes, numbers of layers and levels of connectedness. For neural networks trained with large datasets, a number of layers and a layer size can be increased by using dropout for overfitting. In some embodiments, a neural network can be designed to forego use of fully connected upper layers at the top of a network. In some embodiments, a number of learned parameters may be reduced by forcing the network to go through dimensionality reduction in middle layers. In some embodiments, the machine-learned model can be configured using embeddings, batch normalization, layer normalization, gradient clipping, attention mechanisms, etc. One example suitable for use in training a model for image recognition using the labeled training data of training data set 136 is a convolutional neural network (CNN). A CNN can be trained for image recognition of item features and outputs a confidence score indicating the likelihood that the CNN has correctly identified the item feature in an input image.

Turning now to FIG. 2, example graphical user interface (GUI) 200 is displayed to a user at a display device. Here, an online market place 202 provides item listings that may be determined by search server 108 employing visual facet search engine 110. As illustrated, an image may be dragged or uploaded to the search query input 204. For example, search image 206 corresponding to a “shirt” is received as a search query at a search engine. Stated differently from the perspective of a user, example search image 206 is provided as a search query at the search engine. In response to search server 108 receiving the search query from the image receiving component 112, search results comprising item listings 208 are provided for display on the GUI 200.

In aspects, the item listings 208 are identified based on extracted item features of the search image 206. For example, a subset of item listings 208 may be presented on GUI 200, and the subset of the item listings 208 may be presented in a predetermined order based on user interaction histories from prior search queries at the search engine. In some embodiments, item listings 208 comprise item listings associated with item listing images 208A-208F. For example, item listing image 208A has long sleeves; item listing image 208B has long sleeves and is a button-up dress shirt with a collar; item listing image 208C is a short sleeved t-shirt that has a tighter fit and a hole at the torso; item listing image 208D has a particular design with a collar, buttons, and shirt cuffs; item listing image 208E is a belly-top with a single thin-strap; and item listing image 208F is a flowing tank top.

In embodiments, the visual facet search engine 110 may identify the item listing images 208A-208F for facet options 210 using item listing image identifier 118. In other embodiments (not depicted), facet options for a sleeve length facet, for example, may include long sleeve, short sleeve, single-sleeve, and spaghetti straps, for example. Continuing the example, facet options for the style facet may include collar, button-up, shirt cuffs, and business-casual, for example. In example GUI 200, the item listing images for men's dress shirts facet option 210A and men's patterned shirts facet option 210B are identified from the item listing images 208A-208F.

In an embodiment, GUI 200 comprises a search results page from the search visual facet search engine 110. The search results page may comprise a set of facet options for a facet, such as facet options 210. Further, the set of facet options may be presented as item listing images of item listings 208 related to the search query. In aspects, each item listing image for each facet option 210A and 210B of the set of facet options 210 is based on an item feature prominence score determined by a machine learned model. Continuing the example, the machine learned model may compare other item feature prominence scores of other item listing images of the item listings. In aspects, the item feature prominence score for facet options 210A and 210 may be associated with an item feature of the facet.

Turning now to FIG. 3, GUI 300 is displayed to a user at a display device. GUI 300 comprises the online market place 202, upload search image 204, first selected facet option 310, and a second set of facet options 320. For example, a first selected facet option 310 includes an item listing image for men's patterned shirts facet option 210B. Upon receipt of a selection of the men's patterned shirts facet option 210B, GUI 300 provides for display the second set of facet options 320 having item listing images with the item feature of men's patterned shirts. In an aspect, the set of facet options 320 includes “floral” 322, “striped” 324, and “checkered” 326. The second set of facet options 320 are presented in an ordered view. In embodiments, the ordered view is displayed based on user purchase histories from prior search queries at the search engine (e.g., displaying the item listing images having higher user purchase history rankings prior to item listing images having lower user purchase history rankings).

In an embodiment (not depicted), GUI 300 may provide for display a different item listing image for a facet after the selection of a facet option. Continuing the example, the different item listing image may have an item feature associated with the selection that the prior item listing image did not have. To illustrate, if the color “blue” was selected, then the facet options would change the item listing images for facet options to reflect the color blue. Further, in aspects of the present example, the different item listing image is determined by the machine learned model based on an item feature prominence score of the different item listing image.

The item feature prominence score may be determined based on a spatial resolution and a number of independent pixel values per unit length of the item listing image. For example, a number of pixels that make up an image may be expressed by a number of columns and a number of rows, or directly by a total number of pixels. Spatial resolution may depend upon an image resolution, which indicates how big an image is. Spatial resolution may refer to the smallest detectable object and may be expressed as 1p/mm. additionally, the item feature prominence score may be determined based on a ground sampling distance (GSD), in mm/px, which is the distance between the centers of two adjacent pixels measured on an observed object. In embodiments, a GSD of 1 mm/px means that one pixel on the image represents 1 mm in the real world. Furthermore, the different item listing image may also be determined based on user purchase histories from prior search queries at the search engine.

In embodiments, visual facet search engine 110 may select the item listing images for the second set of facet options 320 based on user interaction history. In some embodiments, the user interaction history may include actions wherein the user takes an active step (e.g., click or swipe) with respect to the item listing images. In an aspect, visual facet search engine 110 may use user history 134 that corresponds with the first selected facet option 310 to determine a highest number of clicks; and using the highest number of clicks to identify each of the item listing images for the floral facet option 322, the striped facet option 324, and the checkered facet option 326. In some embodiments, the visual facet search engine 110 uses user history 134 that corresponds with the first selected facet option 310 to determine a highest number of multiple user interactions (e.g., purchased the item corresponding to the item listing more than once or viewed the item listing image multiple times); and using the highest number of multiple user interactions to identify the item listing images for the floral facet option 322, the striped facet option 324, and the checkered facet option 326.

With regard to FIGS. 4 and 5, block diagrams are provided to illustrate example methods for visual facet searches. The methods may be performed using visual facet search engine 110 of FIG. 1. In aspects, one or more computer storage media having computer-executable instructions embodied thereon that when executed by at least one processor, cause the at least one processor to perform operations of a method, such as the example methods 400 and 500 of FIGS. 4 and 5.

Referencing now method 400 of FIG. 4, at block 402, a search image as a search query is received at a search engine. The search query may be received (e.g., from the search query input 204 from a computing device, such as computing device 102). At block 404, search results having item listings having item listing images for the search query are identified. The item listing images may be associated with item listings. The search results may be identified based on search image features extracted from the search image. For example, the search results may be identified by search results identifier 114 in FIG. 1, and the search results may include relevant item listing images. Additionally, relevant items may be identified by querying an item database having an indexed set of item listings and identifying item listing images from an image database associated with item listings.

In embodiments, a subset of the search results may be identified (e.g., identified by identifier 114) based on an image quality score determined for a feature of each item listing image. For example, the feature determined may be a background feature, such as a background color, a background positioning, a background brightness, and so forth.

At block 406, facets may be determined based on the search results for the search image. In aspects, the facets may be determined by facet determiner 116 in FIG. 1. Each of the facets identified comprise facet options. An order for presenting a first set of facet options and a second set of facet options may also be determined (e.g., by the facet determiner 116). Further, the order of the first set of facet options and the order of the second set of facet options may change based on received selections of the facet options. In aspects, a set of facets that are available for the search results are identified. Further, a subset of facets from the set of facets may be identified based on a user interaction history, wherein the subset of facets is provided as the facets for the search image.

Turning now to block 408, a machine learned model is employed to identify an item listing image among the item listing images based on an item feature prominence score determined by the machine learned model. The item feature prominence score is associated with an item feature of the item listing image and indicates a relative prominence of the item feature within the item listing images. In embodiments, the machine learned model determines the item feature prominence score for just the item listing images of the subset of the search results identified at block 404. In embodiments, the machine learned model determines the item feature prominence score based on an image recognition confidence for the item feature within the item listing image. In embodiments, the item feature prominence score of the identified item listing image is the greatest item feature prominence score compared to other item feature prominence scores of other item listing images that have the particular item feature.

In embodiments, the machine learned model may determine the item feature prominence score based on a background quality of a target item in the item listing image. For example, a poor background quality may correspond to the background having items other than the target item depicted. Further, a poor background quality may have overlapping coloring with the target item. In embodiments, the overlapping coloring of the background is below a threshold for a prominent display of the target item. In some embodiments, the machine learned model also determines the item feature prominence score based on a resolution of the target item.

In some embodiments, the machine learned model also determines the item feature prominence score based on user histories of textual refinements of prior search queries at the search engine. Continuing the example, the prior search queries are associated with the first facet.

Turning now to block 410, facet option generator 120 provides for display at the search engine the item listing image identified by the machine learned model as a first facet option included within a first set of facet options for a first facet determined for the search image. Upon receiving a selection of the first facet option, the first set of facet options that were not selected are changed by the facet option generator 120 based on the selection. For example, a second item listing image for a second facet option of the first set of facet options is identified by the machine learned model for display at the search engine. In embodiments, an order of display of the first set of facet options are also changed based on the selection.

Turning now to FIG. 5, method 500 is provided to illustrate another example of identifying and providing visual facet search. At block 502, a search image is provided as a search query at a search engine. At block 504, a search results page is received from the search engine in response to the search query. The search results page may comprise a set of facet options for a facet. The set of facet options may be presented as item listing images of item listings related to the search query. In aspects, each item listing image for each facet option of the set of facet options is based on an item feature prominence score determined by a machine learned model by comparing other item feature prominence scores of other item listing images of the item listings. The item feature prominence score may be associated with an item feature of the facet.

At block 506, a facet option is selected from the set of facet options and a subset of the item listings is received. The subset of the item listings may be received from the search engine. Each item listing of the subset of the item listings may comprise the item feature associated with the facet option selected.

Having described an overview of embodiments of the present technology, an example operating environment in which embodiments of the present technology may be implemented is described below in order to provide a general context for various aspects. Referring initially to FIG. 6, in particular, an example operating environment for implementing embodiments of the present technology is shown and designated generally as computing device 600. Computing device 600 is but one example of a suitable computing environment and is not intended to suggest any limitation as to the scope of use or functionality of the technology. Neither should computing device 600 be interpreted as having any dependency or requirement relating to any one or combination of components illustrated. Computing device 600 is suitable for performing computerized methods, including the methods or any variation of that have been discussed, using one or more processors.

The technology of the present disclosure may be described in the general context of computer code or machine-useable instructions, including computer-executable instructions such as program modules, being executed by a computer or other machine, such as a personal data assistant or other handheld device. Generally, program modules including routines, programs, objects, components, data structures, etc. refer to code that perform particular tasks or implement particular abstract data types. The technology may be practiced in a variety of system configurations, including hand-held devices, consumer electronics, general-purpose computers, more specialty computing devices, etc. The technology may also be practiced in distributed computing environments where tasks are performed by remote-processing devices that are linked through a communications network.

With reference to FIG. 6, computing device 600 includes bus 610 that directly or indirectly couples the following devices: memory 612, one or more processors 614, one or more presentation components 616, input/output ports 618, input/output components 620, and illustrative power supply 622. Bus 610 represents what may be one or more busses (such as an address bus, data bus, or combination thereof). Although the various blocks of FIG. 6 are shown with lines for the sake of clarity, in reality, delineating various components is not so clear, and metaphorically, the lines would more accurately be grey and fuzzy. For example, one may consider a presentation component, such as a display device, to be an I/O component. As another example, processors may also have memory. Such is the nature of the art, and it is again reiterated that the diagram of FIG. 6 is merely illustrates an example computing device that can be used in connection with one or more embodiments of the present technology. Distinction is not made between such categories as “workstation,” “server,” “laptop,” “hand-held device,” etc., as all are contemplated within the scope of FIG. 6 and reference to “computing device.”

Computing device 600 typically includes a variety of computer-readable media. Computer-readable media can be any available media that can be accessed by computing device 600 and includes both volatile and nonvolatile media, and removable and non-removable media. By way of example, and not limitation, computer-readable media may comprise computer storage media and communication media.

Computer storage media include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer-readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computing device 600. Computer storage media excludes signals per se.

Communication media typically embodies computer-readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared, and other wireless media. Combinations of any of the above should also be included within the scope of computer-readable media.

Memory 612 includes computer storage media in the form of volatile or nonvolatile memory. The memory may be removable, non-removable, or a combination thereof. Example hardware devices include solid-state memory, hard drives, optical-disc drives, etc. Computing device 600 includes one or more processors that read data from various entities such as memory 612 or I/O components 620. Presentation component(s) 616 present data indications to a user or other device. Examples of presentation components include a display device, speaker, printing component, vibrating component, etc.

I/O ports 618 allow computing device 600 to be logically coupled to other devices including I/O components 620, some of which may be built in. Illustrative components include a microphone, joystick, game pad, satellite dish, scanner, printer, wireless device, and so forth.

Embodiments described above may be combined with one or more of the specifically described alternatives. In particular, an embodiment that is claimed may contain a reference, in the alternative, to more than one other embodiment. The embodiment that is claimed may specify a further limitation of the subject matter claimed.

The subject matter of the present technology is described with specificity herein to meet statutory requirements. However, the description itself is not intended to limit the scope of this disclosure. Rather, the inventors have contemplated that the claimed or disclosed subject matter might also be embodied in other ways, to include different steps or combinations of steps similar to the ones described in this document, in conjunction with other present or future technologies. Moreover, although the terms “step” or “block” might be used herein to connote different elements of methods employed, the terms should not be interpreted as implying any particular order among or between various steps herein disclosed unless and except when the order of individual steps is explicitly stated.

For purposes of this disclosure, the word “including” or “having” has the same broad meaning as the word “comprising,” and the word “accessing” comprises “receiving,” “referencing,” or “retrieving.” Further the word “communicating” has the same broad meaning as the word “receiving” or “transmitting” facilitated by software or hardware-based buses, receivers, or transmitters using communication media. Also, the word “initiating” has the same broad meaning as the word “executing” or “instructing” where the corresponding action can be performed to completion or interrupted based on an occurrence of another action.

In addition, words such as “a” and “an,” unless otherwise indicated to the contrary, include the plural as well as the singular. Thus, for example, the constraint of “a feature” is satisfied where one or more features are present. Furthermore, the term “or” includes the conjunctive, the disjunctive, and both (a or b thus includes either a or b, as well as a and b).

For purposes of a detailed discussion above, embodiments of the present technology described with reference to a distributed computing environment; however, the distributed computing environment depicted herein is merely an example. Components can be configured for performing novel aspects of embodiments, where the term “configured for” or “configured to” can refer to “programmed to” perform particular tasks or implement particular abstract data types using code.

From the foregoing, it will be seen that this technology is one well adapted to attain all the ends and objects described above, including other advantages that are obvious or inherent to the structure. It will be understood that certain features and subcombinations are of utility and may be employed without reference to other features and subcombinations. This is contemplated by and is within the scope of the claims. Since many possible embodiments of the described technology may be made without departing from the scope, it is to be understood that all matter described herein or illustrated the accompanying drawings is to be interpreted as illustrative and not in a limiting sense.

Some example aspects of the technology that may be practiced from the forgoing disclosure include the following:

Aspect 1: A computer-implemented method for a visual facet search, the method comprising: receiving a search image as a search query at a search engine; identifying search results for the search query based on search image features extracted from the search image, the search results comprising item listings associated with item listing images; determining facets for the search image based on the search results, each facet comprising facet options; employing a machine learned model to identify an item listing image among the item listing images based on an item feature prominence score determined by the machine learned model, the item feature prominence score associated with an item feature of the item listing image and indicating a relative prominence of the item feature within the item listing images; and providing for display at the search engine the item listing image identified by the machine learned model as a first facet option included within a first set of facet options for a first facet determined for the search image.

Aspect 2: Aspect 1, further comprising: providing for display a second set of facet options for a second facet determined for the search image, the second facet comprising a second item listing image from the item listing images of the item listings; receiving a selection of the first facet option of the first facet; and changing the second item listing image of the second facet to another item listing image from the item listing images of the item listings based on the selection.

Aspect 3: Any of Aspects 1-2, wherein the second item listing image is changed based on the second item listing image not including an item listing feature associated with the first facet option, and wherein the another item listing image comprises the item listing feature associated with the first facet option.

Aspect 4: Any of Aspects 1-3, further comprising: determining an order for presenting the first set of facet options and the second set of facet options; displaying the first set of facet options and the second set of facet options in the order determined; and rearranging the order of the first set of facet options and the second set of facet options in response to receiving the selection of the first facet option.

Aspect 5: Any of Aspects 1-4, wherein the machine learned model determines the item feature prominence score based on a background size of the item listing image relative to an item feature size of the item feature.

Aspect 6: Any of Aspects 1-5, wherein the machine learned model determines the item feature prominence score based on an image recognition confidence for the item feature within the item listing image.

Aspect 7: Any of Aspects 1-6, wherein determining the facets based on the search results further comprises: identifying a set of facets available for the search results; and selecting a subset of facets from the set of facets based on a user interaction history, wherein the subset of facets is provided as the facets for the search image.

Aspect 8: Any of Aspects 1-7, further comprising employing the machine learned model to identify a second item listing image based on the second item listing image having a highest feature prominence score for a second item feature within a plurality of the item listing images; and providing for display at the search engine the second item listing image identified by the machine learned model as a second facet option included within a second set of facet options for a second facet determined for the search image.

Aspect 9: one or more computer storage media storing computer-readable instructions that when executed by a processor, cause the processor to perform operations for providing an item option, the operations comprising: receiving search results comprising item listing images associated with item listings in response to using a search image as a search query at a search engine; selecting a subset of the search results based on an image quality score determined by a background feature of each item listing image of the item listing images; employing a machine learned model to determine item feature prominence scores for the item listing images of the subset of the search results, the item feature prominence scores associated with an item feature of the item listing images of the subset, the item feature prominence scores indicating a relative prominence of the item feature within the item listing images; identifying an item listing image based on an item feature prominence score of the item listing image; and providing for display at the search engine the item listing image identified by the machine learned model as a first facet option included within a first set of facet options for a first facet determined for the search image.

Aspect 10: Aspect 9, further comprising: determining the first set of facet options for the first facet based on user histories of textual refinements of prior search queries at the search engine, the prior search queries associated with the first facet; and providing for display via a user interface the first facet option at the beginning of an ordered display of the first set of facet options, the ordered display being displayed based on a predetermined score of difficulty to illustrate by text.

Aspect 11: Any of Aspects 9-10, wherein the item feature prominence score of the item listing image is determined by the machine learned model, and the item listing image is selected based on the user interaction histories from the prior search queries at the search engine.

Aspect 12: Any of Aspects 9-11, wherein the item feature prominence score of the item listing image is determined by the machine learned model based on a spatial resolution and a number of independent pixel values per unit length of the item listing image.

Aspect 13: Any of Aspects 9-12, wherein the first set of facet options is displayed in an order, the order based on user purchase histories from the prior search queries at the search engine.

Aspect 14: Any of Aspects 9-13, further comprising: receiving a selection of the first facet option of the first facet; changing the item listing images of a second set of facet options that were not selected to modified item listing images based on the item listing images of the second set of facet options not having the item feature of the first facet option selected; and changing the ordered display of the second set of facet options having the modified item listing images based on the selection of the first facet option and the user histories of the textual refinements.

Aspect 15: A system for a visual facet search, the system comprising: at least one processor; and one or more computer storage media storing computer-readable instructions that when executed by the at least one processor, cause the at least one processor to perform operations comprising: provide a search image as a search query at a search engine; receive a search results page from the search engine in response to the search query, the search results page comprising a set of facet options for a facet, the set of facet options being presented as item listing images of item listings related to the search query, wherein each item listing image for each facet option of the set of facet options is selected based on an item feature prominence score determined by a machine learned model, wherein item feature prominence scores are associated with an item feature of the facet, and wherein the item listing image is selected based on the item feature prominence score being greater than other item feature prominence scores of other item listing images of the item listings; select a facet option from the set of facet options; and receive from the search engine a subset of the item listings, each item listing of the subset of the item listings comprising the item feature associated with the facet option selected

Aspect 16: Aspect 15, wherein the subset of the item listings are presented on a user interface in a order determined from user interaction histories from prior search queries at the search engine, and wherein the operations further comprise: receive an ordered view of the set of facet options each having an item listing image selected based on the item feature prominence score determined by the machine learned model and corresponding to the item feature; and receive a different item listing image for one facet option of the set of facet options after the selection of the facet option, the different item listing image determined by the machine learned model based on the different item listing image having the item feature of the facet option selected.

Aspect 17: Any of Aspects 15-16, wherein the different item listing image is determined by the machine learned model based on user purchase histories from the prior search queries at the search engine.

Aspect 18: Any of Aspects 15-17, wherein the item feature prominence score of an item listing image of the facet option selected is determined by the machine learned model based on a spatial resolution and a number of independent pixel values per unit length of the item listing image.

Aspect 19: Any of Aspects 15-18, wherein the machine learned model determines the item feature prominence score based on a comparison of a background size of the item listing image relative to the item feature within the item listing image.

Aspect 20: Any of Aspects 15-19, the operations further comprising, in response to selecting the facet option, receive an ordered view of the set of facet options, each facet option in the set of facet options having a different item listing image based on item feature prominence scores of the different item listing image determined by the machine learned model, the ordered view displayed based on user purchase histories from prior search queries at the search engine.

VISUAL FACET SEARCH ENGINE

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

CROSS-REFERENCE TO RELATED APPLICATIONS

Provisional Applications (1)