Natural language search queries enable a user to input a product search request using questions framed in natural conversational statements that identify a desired item (product) and search terms identifying a facet (attributes) of the desired item. This information can be parsed from the conversational statements using natural language processing (NLP) systems. However, current systems are frequently limited to queries having a single facet and/or a single filter action. This forces users to provide numerous separate queries through numerous conversational turns to eventually obtain a desired search result. This is inefficient and time-consuming for the user. Moreover, many system are limited in the types of facets and filter actions which the system is capable of recognizing and applying. This results in sub-optimal search results which are frequently inaccurate and/or incomplete.
Some examples provide a system and method for more accurate and relevant conversational item search results using multi-facet actions. In some embodiments, a multi-facet filter manager obtains a single utterance of a user comprising a conversational search query. The conversational search query includes a plurality of words identifying an item and a plurality of search refinement terms associated with the item. The multi-facet filter manager extracts a plurality of facet-actions corresponding to the plurality of search refinement terms from the plurality of words. A facet includes an item attribute. A facet-action includes a set of one or more filter actions corresponding to the item attribute. The multi-facet filter manager applies a plurality of multi-facet filters to a plurality of items in a catalog. The plurality of multi-facet filters corresponds to the plurality of facet-actions. The items remaining after application of the plurality of multi-facet filters are candidate items. The multi-facet filter manager scores the plurality of candidate items using a set of scoring criteria. Scoring the candidate items includes generating an item relevance score for each candidate item. The multi-facet filter manager selects one or more candidate items from the plurality of candidate items having the highest scores indicating the item is more relevant as a search result than other lower scoring candidate items. The multi-facet filter manager generates a search result including the selected candidate item. The search result is presented to the user via a user interface.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
Corresponding reference characters indicate corresponding parts throughout the drawings.
A more detailed understanding can be obtained from the following description, presented by way of example, in conjunction with the accompanying drawings. The entities, connections, arrangements, and the like that are depicted in, and in connection with the various figures, are presented by way of example and not by way of limitation. As such, any and all statements or other indications as to what a particular figure depicts, what a particular element or entity in a particular figure is or has, and any and all similar statements, that can in isolation and out of context be read as absolute and therefore limiting, can only properly be read as being constructively preceded by a clause such as “In at least some examples, . . . ” For brevity and clarity of presentation, this implied leading clause is not repeated ad nauseum.
As the next generation in user interaction approach, conversational artificial technology (AI) technology, such as chatbots, is penetrating many applications on cell phones, smart watches, smart speakers, TVs, computers, etc. Conversational artificial intelligence (AI) technology is helping people in diverse situations, such as automated call centers, digital assistants, and online retail. Conversational shopping is one of the important attempts to deliver an improved online shopping experience to users, which also faces significant technical challenges.
In the online retail area, conversational product search (CPS) assists users in locating desired items in an online catalog of items (products) where the user can type their search queries through texting or entering search phrases in a search field. CPS can also provide verbal search requests through human speech (talking), as if the user were shopping with a real store associate. The search refinement in CPS is based on the dialog itself. However, most CPS systems can only handle search refinement by a single item facet. A facet is an attribute of an item, such as, for example, brand, price, item review rating, discounts, etc. It can be challenging due to the presence of diverse item facets and the difficulty in identifying appropriate filter actions when multiple facets are present in a refinement query.
In addition, when multiple product facets are present, different refinement types are needed to identify, filter, and coordinate the logical relationships between different items and item facets. Some CPS systems can help customers perform multiple-rounds of search refinement using multiple refining search queries, but have limits in dealing with multiple product facets given in a single utterance, which increases the number of undesirable items returned to the user and increases the number of conversations turns necessary if the users want to refine the search results with multiple facets. Moreover, in many CPS systems, unordered entities like brand, material, and color are unavailable for search refinement, where refinement actions like “exact” and/or “exclude” are the only actions applicable.
To close this gap with existing approaches, the embodiments provide a multi-facet filter system including a language model to identify item facets and aspect-oriented entity role(s) to identify facet-actions. A facet-action is a filter-related action applied to remove irrelevant items during a search, such as, but not limited to, an exact action which exactly matches an item or item type, a greater than action, less than (lesser) action, a range, an exclude action, etc.
Referring to the figures, examples of the disclosure enable a multi-facet filter manager for improved conversational item search refinement. In some examples, the multi-facet filter manager extracts multiple facets and multiple facet-actions from a single user utterance. The utterance includes a verbal utterance or written (text) utterance including an identification of an item and multiple search refinement terms. This enables the system to handle complex search queries with a single conversational turn (single sentence utterance) rather than requiring a separate conversational turn for each different search refinement term. This enables faster and more efficient natural language searches while reducing system processor usage and network bandwidth usage due to the fewer number of conversational turns.
In other embodiments, the system provides a plurality of multi-facet filters corresponding to multiple facet-actions extracted from a single utterance. The multi-facet filters are applied to items in a catalog to filter for candidate items which are most relevant to the search conversational search query while minimizing processor and memory usage, as fewer conversational turns are required to filter for all search refinement terms desired by the user. This results in reduced system resource usage and improved accuracy and efficiency generating search results.
The system, in some embodiments, recognizes multiple facet actions in a product search query. This enables the system to generate search results that are more accurate and further reduces the number of user actions while engaged in an online search.
The computing device is used in an unconventional manner by extracting multiple facets and facet-actions from a single utterance which is used to apply multi-facet filters to items in a catalog of items. In this manner, the computing device is used in an unconventional way, and allows more accurate and relevant conversation item search refinement while reducing overall system resource usage, thereby improving the functioning of the underlying computing device.
In still other embodiments, the search results generated via the multi-facet filter manager are generated and presented to a user for viewing via a user interface (UI) device. The search results include the highest scoring candidate items having the most relevance in view of the user's search query identified using fewer conversational turns. This enables increased speed in obtaining accurate search results, improved accuracy of search results requiring fewer additional search refinements, improved user efficiency via the UI, as well as improved user interaction performance.
Referring again to
In some examples, the computing device 102 has at least one processor 106 and a memory 108. The computing device 102, in other examples includes a user interface device 110.
The processor 106 includes any quantity of processing units and is programmed to execute the computer-executable instructions 104. The computer-executable instructions 104 is performed by the processor 106, performed by multiple processors within the computing device 102 or performed by a processor external to the computing device 102. In some examples, the processor 106 is programmed to execute instructions such as those illustrated in the figures (e.g.,
The computing device 102 further has one or more computer-readable media such as the memory 108. The memory 108 includes any quantity of media associated with or accessible by the computing device 102. The memory 108 in these examples is internal to the computing device 102 (as shown in
The memory 108 stores data, such as one or more applications. The applications, when executed by the processor 106, operate to perform functionality on the computing device 102. The applications can communicate with counterpart applications or services such as web services accessible via a network 112. In an example, the applications represent downloaded client-side applications that correspond to server-side services executing in a cloud.
In other examples, the user interface device 110 includes a graphics card for displaying data to the user and receiving data from the user. The user interface device 110 can also include computer-executable instructions (e.g., a driver) for operating the graphics card. Further, the user interface device 110 can include a display (e.g., a touch screen display or natural user interface) and/or computer-executable instructions (e.g., a driver) for operating the display. The user interface device 110 can also include one or more of the following to provide data to the user or receive data from the user: speakers, a sound card, a camera, a microphone, a vibration motor, one or more accelerometers, a BLUETOOTH® brand communication module, wireless broadband communication (LTE) module, global positioning system (GPS) hardware, and a photoreceptive light sensor. In a non-limiting example, the user inputs commands or manipulates data by moving the computing device 102 in one or more ways.
The network 112 is implemented by one or more physical network components, such as, but without limitation, routers, switches, network interface cards (NICs), and other network devices. The network 112 is any type of network for enabling communications with remote computing devices, such as, but not limited to, a local area network (LAN), a subnet, a wide area network (WAN), a wireless (Wi-Fi) network, or any other type of network. In this example, the network 112 is a WAN, such as the Internet. However, in other examples, the network 112 is a local or private LAN, such as an intranet.
In some examples, the system 100 optionally includes a communications interface device 114. The communications interface device 114 includes a network interface card and/or computer-executable instructions (e.g., a driver) for operating the network interface card. Communication between the computing device 102 and other devices, such as but not limited to a user device 116 and/or a cloud server 118, can occur using any protocol or mechanism over any wired or wireless connection. In some examples, the communications interface device 114 is operable with short range communication technologies such as by using near-field communication (NFC) tags.
The user device 116 represents any device executing computer-executable instructions. The user device 116 can be implemented as a mobile computing device, such as, but not limited to, a wearable computing device, a mobile telephone, laptop, tablet, computing pad, netbook, gaming device, and/or any other portable device. The user device 116 includes at least one processor and a memory. The user device 116 can also include a user interface device. In this example, a user provides a natural language search query in the form of an utterance 122.
The utterance is a conversational search query 142 provided as a verbal utterance or a written utterance in text form. The utterance 122 includes one or more word(s) 144 identifying an item 124 and search refinement terms 126. A search restriction term is a term intended to restrict search results based on one or more item attributes. An attribute is a characteristic or feature of an item, such as, for example, a color, size, brand, price, etc.
The utterance 122 is provided via an application or other software component on the user device 116. The utterance 122 is transmitted to the computing device 102 and/or the cloud server 118 via the network for utilization by the multi-facet filter manager 120.
The cloud server 118 is a logical server providing services to the computing device 102 or other clients, such as, but not limited to, the user device 116. The cloud server 118 is hosted and/or delivered via the network 112. In some non-limiting examples, the cloud server 118 is associated with one or more physical servers in one or more data centers. In other examples, the cloud server 118 is associated with a distributed network of servers.
In this example, the cloud server 118 is associated with a merchant system hosting a plurality of items 156 in a catalog 154 of items. Each item in the catalog 154 is associated with one or more attributes 158 provided in a description of each item in the catalog 154. In this embodiment, the catalog 154 is stored in a database on the cloud server 118. In this example, the multi-facet filter manager 120 obtains data from the catalog 154 via the network 112. However, in other embodiments, the catalog 154 is stored in a database on the data storage device 128 on the computing device 102.
The system 100 can optionally include a data storage device 128 for storing data, such as, but not limited to a set of one or more scoring criteria, item facets 132, candidate item(s) 138 and/or score(s) calculated for the candidate item(s) 138. The set of scoring criteria 130 includes criteria such as scoring metrics, scoring formulas for calculating a score, weighting for scores, etc.
The facets 132 include attributes of items extracted from the utterance 122. A facet 132 includes at least one entity 134 and at least one role 136. An entity is an attribute of a desired item described in a conversational search query 142 provided in the utterance. The role, in this example, is the value associated with the entity. For example, if a search restriction provided in the utterance states “shirts with a price less than fifty dollars,” the item is a shirt, the entity 134 is price, and the role 136 is fifty dollars.
The candidate item(s) 138 includes items from the plurality of items 156 remaining after the multi-facet filter manager 120 applies multi-facet filters 146 corresponding to one or more facet-actions 148 to remove items from the plurality of items 156 which are not responsive to the conversational search query 142 associated with the utterance 122. The facet-actions 148 includes search refinement actions such as, but not limited to, “exact”, “equals”, “range”, “greater”, “lesser” and/or “exclude.”
An “Exact” match indicates that the user intends to retrieve an item having an attribute that is an exact match. An exact match refers to an “equals” operation on the item facet (attribute) description. For example, if the user utterance states, “Show me only Nike blue t-shirts” where the brand and color entity are specified. The facet-action “range” includes a range of facet-role values. For example, if the user utterance includes the words “Show me men's t-shirts with prices between $20 and $50,” the user wants items having a price that is within the range from 20 to 50 dollars. The facet-action “greater” refers to filtering for values that are greater than the stated value in the search request. The facet-action “lesser” refers to filtering for items having an attribute value that is less than a stated value in the search query. Thus, the intended filter operations for “greater” and “lesser” are greater than and less than. For example, an utterance including, “Show disposable face masks with three star rating 069 or higher with price less than 10 dollars.” In another example, a conversational search request including the words “Search for brand “A” and brand “B” phones greater than 400 dollars,” indicates facet-actions “exact” and “greater.”
The data storage device 128 can include one or more different types of data storage devices, such as, for example, one or more rotating disks drives, one or more solid state drives (SSDs), and/or any other type of data storage device. The data storage device 128 in some non-limiting examples includes a redundant array of independent disks (RAID) array. In some non-limiting examples, the data storage device(s) provide a shared data store accessible by two or more hosts in a cluster. For example, the data storage device may include a hard disk, a redundant array of independent disks (RAID), a flash memory drive, a storage area network (SAN), or other data storage device. In other examples, the data storage device 128 includes a database.
The data storage device 128 in this example is included within the computing device 102, attached to the computing device, plugged into the computing device, or otherwise associated with the computing device 102. In other examples, the data storage device 128 includes a remote data storage accessed by the computing device via the network 112, such as a remote data storage device, a data storage in a remote data center, or a cloud storage.
The memory 108 in some examples stores one or more computer-executable components, such as, but not limited to, the multi-facet filter manager 120. The multi-facet filter manager 120 is a software component that analyzes an utterance to extract facet-actions and apply corresponding multi-facet filters to item data for identifying candidate items relevant to a conversational search query, such as the conversational search query 142. In some embodiments, the multi-facet filter manager 120 is a machine learning (ML) component that utilizes pattern recognition, modeling, deep learning, or other ML algorithms to analyze data associated with a conversational search query and/or database information to generate more relevant search results responsive to the conversational search query using fewer conversational turns.
The multi-facet filter manager 120, when executed by the processor 106 of the computing device 102, obtains a single utterance 122 of a user including the conversational search query 142. The conversational search query 142 includes a plurality of words identifying a desired item or type of item. The conversational search query 142 also identifies search refinement terms 126 associated with the desired item or type of item.
The multi-facet filter manager 120 extracts one or more facet-actions 148 corresponding to the search refinement terms 126 from the word(s) 144 in the utterance 122. A facet-action in the one or more facet-actions 148 includes a filter action associated with an entity-role of a facet. The multi-facet filter manager 120 applies one or more multi-facet filters 146 to the plurality of items 156 in the catalog 154. The multi-facet filter manager 120 identifies a plurality of candidate items from the plurality of items remaining after application of the plurality of multi-facet filters 146. The multi-facet filter manager 120 generates one or more score(s) 140 for each candidate item remaining after filtering of the items in the catalog. The scoring is performed using the set of scoring criteria 130.
In some embodiments, the multi-facet filter manager 120 selects a candidate item from the plurality of candidate items having a highest score and returns the highest scoring item in a search result 150. In other embodiments, the multi-facet filter manager 120 selects a threshold 152 number of highest scoring items from the plurality of candidate item(s) 138. The result 150 is presented to the user via a UI, such as, but not limited to, the user interface device 110.
The multi-facet filter manager 120 allows customers to find an item or items in a catalog of items by speaking in an interactive manner, as if the user were chatting with a person. The system 100 supports multiple facet search refinement, on a handheld mobile device, such as the user device and/or via a webpage through voice/text. A webpage may be hosted on a webserver, such as, but not limited to, the cloud server 118. The system has the ability to parse and understand diverse actions (e.g., “exact”, “exclude”, “greater”, “lesser”, and “range”) on a product facet. The system 100 accurately returns relevant items from catalog-based filter values. The system is able to improve a customer's online shopping experience by reducing the number of search turns required to locate a desired item within a catalog of items available for purchase by the user.
In the example shown in
Given a conversational search query utterance and an extracted named entity (or facet or item attribute), there is a need to identify the appropriate facet action or filter operation for search refinement. For instance, in a user search query say, “blue shirts with more than 3 star rating” with tagged entities “blue” (color) and “3 star rating” (rating), the respective facet actions are “exact” (i.e., user is interested in an exact item match with blue color) and “greater” (i.e. user is interested in ratings above 3). In such cases, the entity type holds additional information that represents its semantic role (for instance, greater than, lesser than, etc.), therefore, use of entity roles is beneficial.
In other embodiments, the multi-facet filter manager includes a trained natural language processing (NLP) model and API. The system 100 classifies entity roles by feeding the trained or pre-trained NLP model features into transformers coupled with a conditional random field (CRF) decoder layer at the end, as shown in
In the example of
A tokenizer 206 tokenizes the word(s) 204 in the utterance 202 into a plurality of bidirectional tokens 208. However, the embodiments are not limited to bidirectional tokens. In other embodiments, the utterance is tokenized into unidirectional tokens.
An extraction component 210 includes an encoder 212 and a decoder 214. The encoder 212 optionally includes an input encoder and/or a context encoder. The input encoder encodes basic elements (e.g., 108 tokens, characters, and gazetteers) of raw text into distributed representations, which are further fed into the context encoder. Character is another unit to encode raw text instead of or in addition to tokens. The context encoder encodes the output of the input encoder into a semantic representation of context using unidirectional or bidirectional recurrent neural networks (RNN) to encode sequence context. Pre-trained transformer models, such as generative pre-trained (GPT) and bidirectional encoder representations from transformers (BERT) models can be used to improve NER performance. RNN and CRF are available for use at tag decoders of NER models. The decoder 214 predicts a sequence of entity labels and entity roles from stack of encoder transformer outputs sequences corresponding to input sequence.
The extraction component 210 extracts one or more facet(s) 216 and/or facet-action(s) 218 from the utterance 202. The facet(s) 216 represents one or more attribute(s) 220 of the desired item or type of item identified in the utterance. The facet(s) includes one or more entities 222 and one or more corresponding role(s) 224 of those entities.
The facet-action(s) 218 includes one or more filter actions associated with the facet(s). In this example, an action 230 is associated with a specific entity 226 and role 228 of the entity 226. The system 100 enables named entity recognition (NER) to identify mentions of attributes and search refinement actions from text. The item name entities are named entities (NEs). The search refinement terms are identified and mapped to a specific entity.
A filter component 232 selects one or more multi-facet filter(s) 234 for application against a catalog of items. The filtered item(s) 238 are removed. The remaining items are candidate item(s) 236.
A scoring component 240 generates a plurality of scores 242 associated with the candidate item(s) 236. Each score 244 is generated using scoring criteria. In some embodiments, the scores are weighted in accordance with a weight 246 corresponding to one or more criteria in the scoring criteria. One or more of the candidate items with the highest score(s) 248 are selected by a selection component 250 for inclusion in search result(s) 260 returned to the user.
The selection component 250 optionally utilizes one or more threshold(s) 254 to select the one or more selected item(s) 252 from the candidate item(s) 236 based on the plurality of scores 242. In some embodiments, the selection component selects a threshold number of highest scoring or highest ranking candidate items for inclusion in the search result(s) 260. In other embodiments, the selection component 250 selects items having a score which exceeds a threshold score.
A notification engine 256 generates notifications which are transmitted to the user via a user device. In some embodiments, the notification engine 256 generates an error message 258 if the system fails to identify at least one item responsive to the conversational search query. In other examples, the notification engine 256 generates one or more result(s) 260 including one or more responsive item(s) 262 for transmission to the user. The result(s) 260 are optionally included in a result notification which is displayed to the user via a user interface on a user device, such as the user device 116 in
Turning now to
The tokenizer 206 tokenizes the text of the provided attribute of the incoming message. In this example, the tokenizer 206 generates a plurality of bidirectional tokens representing the phrase.
A featurizer 304 generates vectors representing the words in the phrase. The featurizer 304 computes sentence and sequence level feature representations for dense feature attributes. The featurizer optionally includes a convert model, a count vectorizer and/or regex features. In an example, a count vectorizer that creates a sequence of token counts features 4-gram char features: combinations of ‘n’ grams.
An extraction component 210 extracts an item, entities, roles, and actions. In this example, the item (product) is “jeans”. The entity-roles are brand “LV” and price “30”. The action for the brand is “exact” as the user desires search results conforming to a specific brand. The action for the entity-role price is “lesser” as the user desires search results including items that are priced at less than thirty dollars. In this manner, the system provides for specification of the entity and entity role in each token from an utterance. The multi-facet filter system architecture optionally includes a stack of heavy transformers and CRF tagging for entity and entity role prediction, as shown in
Input sequences are treated as sequence of tokens where tokens are words/sub-words based on the language model tokenization. For dense features, the system extracts input sequence features from the various pre-trained language models, such as GPT 2, BERT, DistilBERT, conversational representations from transformers (convert) from PolyAI and robustly optimized BERT (ROBERTa) through HuggingFace. Side sparse features (SSF) are one-hot token level encodings and multi-hot character n-gram (n<=4) features. The sparse features (SF) are passed to the feed-forward (F-F) layer whose weights are shared through the input sequence. The feed forward neural network (FNN) layer output and dense sequence features are concatenated before 197 passing to Transformer encoder layers.
For bidirectional context encoding in the input sequences, the system uses stacked transformer layers with relative position attention. Each transformer encoder layer is composed of multi-headed attention layers and point wise feed forward layers. These sub layers produce an output of dimension dmodel=25.
The number of attention heads: Nheads=4. The number of units in transformer: S=256. The number of transformer layer: L=4.
The decoder for named entity recognition task is CRF which jointly models the sequence of tagging decisions of an input sequence. CRF layers are ordered by entity followed by entity role recognition. The decoder predicts a sequence of entity labels and entity roles from transformer output sequences corresponding to input sequences.
The mask modeling task is to predict random masked input tokens. In this example, a random 15% of all tokens in each sequence is masked. If the token “T” is selected, 70% of the time the “T” token is replaced with [_MASK_] token, and in 10% of the time with a random token vector, while in the remaining 20% of the time the “T” token is unchanged. This is to mitigate the mismatch between pre-training and fine-tuning as [_MASK_] token does not present in fine-tuning. The transformer output of selected “T” token is passed to dot-product loss. The model 400 is trained in multi-task manner by optimizing the total loss as follows:
where is total loss. For model training, in some embodiments, an optimizer with an initial learning rate of 0.001 is used. The batch size is increased throughout training progress as a source of regularization from 64 to 256.
The process begins by receiving an utterance including a conversational search query at 502. The utterance includes verbal utterances or written utterances, such as an utterance entered via text. The multi-facet filter manager extracts facet-actions at 504. The facet-actions are extracted from the utterance by an extractor, such as, but not limited to, the extraction component 210 in
While the operations illustrated in
Referring now to
The process begins by identifying an entity-role associated with an item using an utterance at 602. The entity-role is extracted from the utterance by an extraction component using an encoder and decoder, such as, but not limited to, the encoder 212 and decoder 214 in
While the operations illustrated in
A conversational search request is received at 702. An attempt is made to find an identification of an item and facets in the conversational search query using NLP at 704. A determination is made whether the item and facets are found at 706. In one example, the facets include price and product ratings. If not, the system sends an error message at 708. Default search results are returned to the user at 710. The process terminates thereafter.
If an item and facets are found in the conversational search request, a search API is called with the item and facets at 712. A web browser search API is called to fetch item details at 714. Results are filtered which have empty facets at 716. If the facets are price and product, the items having price and/or product information that fail to conform to the facet values are removed (filtered out). The remaining item(s) (candidate items) details are sent at 718. The candidate items are included in a search result notification or other search result page output to the user. The process terminates thereafter.
While the operations illustrated in
The process begins by performing an item search using multi-facet actions extracted from a single utterance at 802. The multi-facet actions are extracted by a multi-facet filter component, such as the multi-facet filter manager 120 in
While the operations illustrated in
Referring now to
For evaluation of the multi-factor filter manager, a conversational search multiple factor analysis (MFA) data set consisting of two-hundred and three thousand conversational search utterances from voice shopping was created. This dataset has a training subset with one-hundred eighty three thousand utterances and a test subset with twenty-three thousand utterances. The utterance in each data example has a random number of (between 1 and 3) item attributes and refinement actions for each attribute along with product. Follow-up conversational search utterances may not contain item (product) identifying information. This data is generated from ten-thousand seed voice-based search refinement utterance variations, using synonym-tuple based data augmentation of the most popular items from various departments like home, clothing, household essentials, and item attributes such as brand, color, savings, material, price, and rating. The test data created with unseen items and attributes also from the same categories.
Table 900 in
In some embodiments, a system is provided for multi-facet product search from dialog. The entity actions are used to extract facet actions from item search queries in e-commerce domain, achieving reasonably good F1 scores (96%). Several factors like demarcation of entity spans within utterance, use of language models for modelling and availability of entity type information to the system, are shown to influence the facet action predictions affirmatively.
In this example, the search results include two candidate items which are responsive to a user's conversational search query. If the user selects one of the two items, the selected item is placed into the user's shopping cart for purchase.
A cloud platform as a service (PaaS) 1210 optionally hosts the multi-facet search service API 1208. The multi-facet search system 1212 stores data in a data storage 1213 and/or retrieves data from the data storage 1213. The data storage 1213 is a device for storing data, such as, but not limited to, the data storage device 128 in
The cloud computing platform 1202 optionally includes a developer's console 1220 that is used to access, update, and utilize a search platform API 1216 and/or a cloud messaging 1222 service. A training manager 1218 enables training of natural language processing ML models using customized training data. A database 1224 stores data, such as training data 1226 and/or configurations data 1228. The training data 1226 includes annotated training data sets and configurations 1228 for training the ML models to perform the multi-facet filtering.
In some embodiments, the multi-facet filter system enables conversational search recognition of named entities (e.g., products and their attributes), entity roles, and their actions, as well as enabling backend search to generate bidirectional tokenized query strings from the identified item and its facet-actions (attribute and refinement actions).
The system, in some examples, is a conversational product search system used for searching products with facets and facet actions (to help customers find items through talking or texting). The system parses natural language (spoken words and text) using an NLP ML model(s) to convert words into products and product filter actions. The system handles multiple product filter utterances without requiring conjunctions or punctuation. The machine learning model includes a featurizer, a transformer(s), and an extractor to identify products, product attributes, and actions. The system refines search queries using multiple product facets. The system identifies product facets and aspect-oriented entity role to identify facet actions in search queries.
In an example scenario, a two-turn sample voice search conversation parsed by the multi-facet filter system search: “women's watches with four plus star rating and price less than 100 dollars not from brand “A” with an additional refinement utterance of “add price below 150 too.” In this example, the first utterance is analyzed to extract the item “women's watches,” facets of “rating” and “price” with facet-actions of “exact,” “greater,” and “lesser.” The follow-up phrase omits the item “watches” and adds the facet “price” and facet-action “lesser.”
In another example, the multi-facet filter system analyzes a phrase “show disposable face masks with three star rating or higher with price less than ten.” In this example, the system extracts the item “disposable face masks” with facets “rating” and price.” The facet-actions include “greater” and “lesser.” The multiple facets and facet-actions are extracted from a single utterance in a single conversational turn. The system is enabled to return relevant items in a first set of search results based on a single search query, thereby improving efficiency and accuracy of the search results returned to the user while reducing system resource usage consumed in generating search results.
In some embodiments, the system includes an AI/ML NLP model that identifies both product facets (desired attributes) and aspect-oriented entity role to identify facet actions in natural language conversational search queries. The model is a non-generative AI model trained using customized training data for model. The system makes use of both the multi-facet aspects of the search engine as well as the natural language based criteria which are introduced on the fly. The system extracts the facet related terms and uses the search engine to filter the data based upon those criteria.
The system, other embodiments, performs multiple facet search refinement of a search query input as either a spoken or written (text) utterance. The search query is a query for searching products in a catalog. The search refinement is performed using multi-facet filter actions identified in the utterance. The system evaluates the non-facet portions and uses the natural language processes to interpret the query and applies this to the previously filtered data. The system also assesses the similarity of words, such as colors used in the search (i.e., a blue shirt). The system parses natural language (spoken words and text) search queries (utterances) using one or more NLP ML models to convert words into products and product filter actions without requiring conjunctions or punctuation.
In other embodiments, the multi-facet filter manager includes a featurizer, transformers (convert to bidirectional tokens), and extractor (encoder and decoder to extract entity and entity-role and corresponding entity-role actions from query) to identify product, product attributes (facets) and actions. A trained ML model is used to identify the facets and facet actions from the query string. The system identifies candidate items from the catalog using the identified facets and filter actions. Results are scored.
In some embodiments, a score is given to each entry from the search engine and another score is given to each entry from the NLP portion. These two sub-scores are weighted and summed to provide a single score for each result. The weights are configurable by the user depending upon their preference. The addition of the NLP and scoring improves the search results. A higher score indicates a more relevant candidate. The system returns more accurate search results which include more relevant items from a catalog based on filter values to reduce the number of search returns and reduce the number of user actions required to obtained search results responsive to the natural language search result.
Alternatively, or in addition to the other examples described herein, examples include any combination of the following:
At least a portion of the functionality of the various elements in
In some examples, the operations illustrated in
In other examples, a computer readable medium having instructions recorded thereon which when executed by a computer device cause the computer device to cooperate in performing a method of multi-facet item search refinement, the method comprising receiving a single utterance of a user comprising a conversational search query, the single utterance including a plurality of words identifying an item and a plurality of search refinement terms associated with the item; extracting a plurality of facet-actions corresponding to the plurality of search refinement terms from the plurality of words, a facet comprising an item attribute and a facet-action comprising an action corresponding to the item attribute; identifying a plurality of multi-facet filters corresponding to the plurality of facet-actions; applying the plurality of multi-facet filters to a plurality of items in a catalog, the plurality of multi-facet filters removing items which are irrelevant to the conversational search query, wherein a plurality of candidate items remaining after application of the plurality of multi-facet filters; scoring the plurality of candidate items based on a degree of responsiveness of each candidate item to the conversational search query; selecting a set of items from the plurality of candidate items having a score exceeding a threshold score; and generating a search result comprising the selected set of items, wherein the search result is presented to the user via a user interface device.
While the aspects of the disclosure have been described in terms of various examples with their associated operations, a person skilled in the art would appreciate that a combination of operations from any number of different examples is also within scope of the aspects of the disclosure.
The term “Wi-Fi” as used herein refers, in some examples, to a wireless local area network using high frequency radio signals for the transmission of data. The term “BLUETOOTH®” as used herein refers, in some examples, to a wireless technology standard for exchanging data over short distances using short wavelength radio transmission. The term “NFC” as used herein refers, in some examples, to a short-range high frequency wireless communication technology for the exchange of data over short distances.
Exemplary computer-readable media include flash memory drives, digital versatile discs (DVDs), compact discs (CDs), floppy disks, and tape cassettes. By way of example and not limitation, computer-readable media comprise computer storage media and communication media. Computer storage media include volatile and nonvolatile, removable, and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules and the like. Computer storage media are tangible and mutually exclusive to communication media. Computer storage media are implemented in hardware and exclude carrier waves and propagated signals. Computer storage media for purposes of this disclosure are not signals per se. Exemplary computer storage media include hard disks, flash drives, and other solid-state memory. In contrast, communication media typically embody computer-readable instructions, data structures, program modules, or the like, in a modulated data signal such as a carrier wave or other transport mechanism and include any information delivery media.
Although described in connection with an exemplary computing system environment, examples of the disclosure are capable of implementation with numerous other special purpose computing system environments, configurations, or devices.
Examples of well-known computing systems, environments, and/or configurations that can be suitable for use with aspects of the disclosure include, but are not limited to, mobile computing devices, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, gaming consoles, microprocessor-based systems, set top boxes, programmable consumer electronics, mobile telephones, mobile computing and/or communication devices in wearable or accessory form factors (e.g., watches, glasses, headsets, or earphones), network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. Such systems or devices can accept input from the user in any way, including from input devices such as a keyboard or pointing device, via gesture input, proximity input (such as by hovering), and/or via voice input.
Examples of the disclosure can be described in the general context of computer-executable instructions, such as program modules, executed by one or more computers or other devices in software, firmware, hardware, or a combination thereof. The computer-executable instructions can be organized into one or more computer-executable components or modules. Generally, program modules include, but are not limited to, routines, programs, objects, components, and data structures that perform tasks or implement abstract data types. Aspects of the disclosure can be implemented with any number and organization of such components or modules. For example, aspects of the disclosure are not limited to the specific computer-executable instructions, or the specific components or modules illustrated in the figures and described herein. Other examples of the disclosure can include different computer-executable instructions or components having more functionality or less functionality than illustrated and described herein.
In examples involving a general-purpose computer, aspects of the disclosure transform the general-purpose computer into a special-purpose computing device when configured to execute the instructions described herein.
The examples illustrated and described herein as well as examples not specifically described herein but within the scope of aspects of the disclosure constitute exemplary means for multi-facet conversational item search. For example, the elements illustrated in
Other non-limiting examples provide one or more computer storage devices having a first computer-executable instructions stored thereon for providing multi-facet conversational item search with improved search refinement. When executed by a computer, the computer performs operations including to obtain a single utterance of a user comprising a conversational search query, the conversational search query comprising a plurality of words identifying an item and a plurality of search refinement terms associated with the item; extract a plurality of facet-actions corresponding to the plurality of search refinement terms from the plurality of words, a facet comprising an item attribute, wherein a facet-action comprises a set of filter actions corresponding to the item attribute; apply a plurality of multi-facet filters a plurality of items in a catalog, the plurality of multi-facet filters corresponding to the plurality of facet-actions; identify a plurality of candidate items from the plurality of items remaining after application of the plurality of multi-facet filters; score the plurality of candidate items using a set of scoring criteria; select a candidate item from the plurality of candidate items having a highest score; and generate a search result comprising the selected candidate item, wherein the search result is presented to the user via a user interface.
The order of execution or performance of the operations in examples of the disclosure illustrated and described herein is not essential, unless otherwise specified. That is, the operations can be performed in any order, unless otherwise specified, and examples of the disclosure can include additional or fewer operations than those disclosed herein. For example, it is contemplated that executing or performing an operation before, contemporaneously with, or after another operation is within the scope of aspects of the disclosure.
The indefinite articles “a” and “an,” as used in the specification and in the claims, unless clearly indicated to the contrary, should be understood to mean “at least one.” The phrase “and/or,” as used in the specification and in the claims, should be understood to mean “either or both” of the elements so conjoined, i.e., elements that are conjunctively present in some cases and disjunctively present in other cases. Multiple elements listed with “and/or” should be construed in the same fashion, i.e., “one or more” of the elements so conjoined. Other elements may optionally be present other than the elements specifically identified by the “and/or” clause, whether related or unrelated to those elements specifically identified. Thus, as a non-limiting example, a reference to “A and/or B”, when used in conjunction with open-ended language such as “comprising” can refer, in one embodiment, to A only (optionally including elements other than B); in another embodiment, to B only (optionally including elements other than A); in yet another embodiment, to both A and B (optionally including other elements); etc.
As used in the specification and in the claims, “or” should be understood to have the same meaning as “and/or” as defined above. For example, when separating items in a list, “or” or “and/or” shall be interpreted as being inclusive, i.e., the inclusion of at least one, but also including more than one of a number or list of elements, and, optionally, additional unlisted items. Only terms clearly indicated to the contrary, such as “only one of” or “exactly one of,” or, when used in the claims, “consisting of,” will refer to the inclusion of exactly one element of a number or list of elements. In general, the term “or” as used shall only be interpreted as indicating exclusive alternatives (i.e., “one or the other but not both”) when preceded by terms of exclusivity, such as “either” “one of” “only one of” or “exactly one of.” “Consisting essentially of,” when used in the claims, shall have its ordinary meaning as used in the field of patent law.
As used in the specification and in the claims, the phrase “at least one,” in reference to a list of one or more elements, should be understood to mean at least one element selected from any one or more of the elements in the list of elements, but not necessarily including at least one of each and every element specifically listed within the list of elements and not excluding any combinations of elements in the list of elements. This definition also allows that elements may optionally be present other than the elements specifically identified within the list of elements to which the phrase “at least one” refers, whether related or unrelated to those elements specifically identified. Thus, as a non-limiting example, “at least one of A and B” (or, equivalently, “at least one of A or B,” or, equivalently “at least one of A and/or B”) can refer, in one embodiment, to at least one, optionally including more than one, A, with no B present (and optionally including elements other than B); in another embodiment, to at least one, optionally including more than one, B, with no A present (and optionally including elements other than A); in yet another embodiment, to at least one, optionally including more than one, A, and at least one, optionally including more than one, B (and optionally including other elements); etc.
The use of “including,” “comprising,” “having,” “containing,” “involving,” and variations thereof, is meant to encompass the items listed thereafter and additional items.
Use of ordinal terms such as “first,” “second,” “third,” etc., in the claims to modify a claim element does not by itself connote any priority, precedence, or order of one claim element over another or the temporal order in which acts of a method are performed. Ordinal terms are used merely as labels to distinguish one claim element having a certain name from another element having a same name (but for use of the ordinal term), to distinguish the claim elements.
Having described aspects of the disclosure in detail, it will be apparent that modifications and variations are possible without departing from the scope of aspects of the disclosure as defined in the appended claims. As various changes could be made in the above constructions, products, and methods without departing from the scope of aspects of the disclosure, it is intended that all matter contained in the above description and shown in the accompanying drawings shall be interpreted as illustrative and not in a limiting sense.