The application relates generally to Natural Language Processing (NLP) and, more particularly, to product searching based on NLP.
Nowadays, consumers heavily rely on computer technology to obtain information about brands and products. Current solutions that allow users to obtain information about a given brand or its products are generally based on full-text search. For example, on a website, a customer can type a product name in a search box meant for that purpose. If the product looked for requires some filtering (e.g., by model or age range), this information will be provided in a separate context (e.g., in a filter section). However, this is not suitable in the context of chat sessions (i.e. real-time conversations between users and a chatbot). Indeed, in such a context, it becomes critical to readily determine whether a user's sentence relates to a product search (even though the context may not suggest it), and what the product is. While some solutions based on Natural Language Processing (NLP) exist, these solutions are resource intensive and fail to be reliable given out-of-context searches, a large number of products, products unrelated to each other, or in the event that new products become available.
Therefore, improvements are needed.
In one aspect, there is provided a computer-implemented method for product searching based on Natural Language Processing (NLP). The method comprises receiving, via a conversational user interface, user-inputted natural language data, the natural language data indicative of a user query, performing, using a classification engine, a classification procedure to identify, from the natural language data, a user intent and one or more products of interest associated with the user query, performing, using a product search engine, a product search, comprising querying a product database based on an outcome of the classification procedure, and triggering, based on an outcome of the product search, at least one action to be performed by a virtual agent in response to the natural language data.
In some embodiments, the method further comprises identifying an input language associated with the natural language data and performing the product search based on the input language as identified.
In some embodiments, the method further comprises assigning a confidence score to the input language as identified and setting, based on at least the confidence score, a new response language to be used in at least one response message to be output by the virtual agent, via the conversational user interface, in response to the natural language data.
In some embodiments, setting the new response language comprises comparing the confidence score to a threshold, when the confidence score is below the threshold, setting a current response language currently used by the virtual agent as the new response language, and when the confidence score is above the threshold, setting a language different from the current response language as the new response language.
In some embodiments, performing the classification procedure comprises simultaneously or sequentially performing an intent classification procedure to identify the user intent and an entity classification procedure to identify the one or more products of interest.
In some embodiments, performing the intent classification procedure comprises applying a binary classifier to the natural language data and outputting a Boolean result indicative of whether the natural language data relates to a query for the one or more products of interest.
In some embodiments, performing the product search comprises extracting one or more relevant words and removing one or more non-relevant words from the natural language data, and performing a search for the one or more products using the one or more relevant words.
In some embodiments, the method further comprises assigning a confidence score to an outcome of the search for the one or more products, and comparing the confidence score to a threshold.
In some embodiments, triggering the at least one action at the virtual agent comprises, when the confidence score is above the threshold, generating one or more signals to cause the outcome of the search to be presented via the conversational user interface, and when the confidence score is below the threshold, applying a character n-grams based technique to improve the confidence score.
In some embodiments, triggering the at least one action at the virtual agent comprises, when the confidence score is above the threshold, generating one or more signals to cause the outcome of the search to be presented via the conversational user interface, and when the confidence score is below the threshold, generating one or more signals to cause a choice of one or more relevant products to be presented via the conversational user interface to improve the confidence score.
In some embodiments, triggering the at least one action at the virtual agent comprises, when the confidence score is above the threshold, generating one or more signals to cause the outcome of the search to be presented via the conversational user interface, and when the confidence score is below the threshold, generating one or more signals to cause one or more questions related to the one or more products to be presented via the conversational user interface to improve the confidence score.
In some embodiments, performing the product search further comprises pre-processing the natural language data.
In another aspect, there is provided a system for product searching based on Natural Language Processing (NLP). The system comprises a processing unit and a non-transitory computer-readable memory having stored thereon program instructions executable by the processing unit for receiving, via a conversational user interface, user-inputted natural language data, the natural language data indicative of a user query, performing a classification procedure to identify, from the natural language data, a user intent and one or more products of interest associated with the user query, performing a product search, comprising querying a product database based on an outcome of the classification procedure, and triggering, based on an outcome of the product search, at least one action to be performed by a virtual agent in response to the natural language data.
In some embodiments, the program instructions are executable by the processing unit for identifying an input language associated with the natural language data and for performing the product search based on the input language as identified.
In some embodiments, the program instructions are executable by the processing unit for assigning a confidence score to the input language as identified and for setting, based on at least the confidence score, a new response language to be used in at least one response message to be output by the virtual agent, via the conversational user interface, in response to the natural language data.
In some embodiments, the program instructions are executable by the processing unit for setting the new response language comprising comparing the confidence score to a threshold, when the confidence score is below the threshold, setting a current response language currently used by the virtual agent as the new response language, and when the confidence score is above the threshold, setting a language different from the current response language as the new response language.
In some embodiments, the program instructions are executable by the processing unit for performing the classification procedure comprising simultaneously or sequentially performing an intent classification procedure to identify the user intent and an entity classification procedure to identify the one or more products of interest.
In some embodiments, the program instructions are executable by the processing unit for performing the intent classification procedure comprising applying a binary classifier to the natural language data and outputting a Boolean result indicative of whether the natural language data relates to a query for the one or more products of interest.
In some embodiments, the program instructions are executable by the processing unit for performing the product search comprising extracting one or more relevant words and removing one or more non-relevant words from the natural language data, and performing a search for the one or more products using the one or more relevant words.
In some embodiments, the program instructions are further executable by the processing unit for assigning a confidence score to an outcome of the search for the one or more products, comparing the confidence score to a threshold, when the confidence score is above the threshold, generating one or more signals to cause the outcome of the search to be presented via the conversational user interface, and when the confidence score is below the threshold, one of: applying a character n-grams based technique to improve the confidence score, generating one or more signals to cause a choice of one or more relevant products to be presented via the conversational user interface to improve the confidence score, and generating one or more signals to cause one or more questions related to the one or more products to be presented via the conversational user interface to improve the confidence score.
In a further aspect, there is provided a non-transitory computer medium for receiving, via a conversational user interface, user-inputted natural language data, the natural language data indicative of a user query, performing a classification procedure to identify, from the natural language data, a user intent and one or more products of interest associated with the user query, performing a product search, comprising querying a product database based on an outcome of the classification procedure, and triggering, based on an outcome of the product search, at least one action to be performed by a virtual agent in response to the natural language data.
Many further features and combinations thereof concerning embodiments described herein will appear to those skilled in the art following a reading of the instant disclosure.
Reference is now made to the accompanying figures in which:
It will be noted that throughout the appended drawings, like features are identified by like reference numerals.
Embodiments described herein relate to Natural Language Processing (NLP), which is a field of computer science and artificial intelligence (AI) for natural language understanding, speech recognition, natural-language generation, and the like. As used herein, the term “virtual agent” may refer to one or more computing components configured to automatically converse (i.e. define the conversation flow) with a human (referred to herein as a “user” or “customer”) in real-time using text, speech, or a combination thereof, based on NLP. In one embodiment, the virtual agent is integrated with a chatroom to receive queries from one or more users. In particular, the virtual agent may implement an interactive “chatbot” configured to interact with users in real-time (e.g., during a chat session) via an interface application (also referred to herein as a “conversational user interface” or a “virtual agent interface”), which is associated with an electronic messaging platform (or other suitable dialogue system) and is provided on a user device. As used herein, the term “real-time” should be understood to refer to a conversation or interaction between a virtual agent (or chatbot) and user(s) where messages are exchanged within a timeframe of about one (1) to two (2) seconds.
As will be described herein with reference to
The virtual agent is configured to provide, via the interface application, a real-time output related to the information requested by the users. In this manner, a dialogue is established in real-time between the virtual agent and each user. In one embodiment, the interface application can be installed on the user device to display an interface of visual elements representative of the current chat session. Any suitable visual elements may apply. Users may be registered and authenticated (e.g., using a login, unique identifier, and password for example) prior to being provided access to the chat sessions and the interface application. In particular, a unique identifier (e.g., a username) may be assigned to each user in order to uniquely identify the user (and accordingly distinguish users from one another) during the chat session. It should however be understood that, in some embodiments, access to the chat sessions and interface application may be granted without prior authentication.
In one embodiment, the virtual agent is configured to converse with users using text. As will be described further below, the systems and methods described herein use machine learning (ML) and/or AI techniques (i.e. an NLP model) to determine whether a user's query (i.e. the input text, also referred to herein as a “natural language input”, provided by the user as input during the conversation with the virtual agent) is a query for product(s), i.e. relates to a product search. The determination is illustratively performed automatically, in real-time (i.e. during the chat session), and accurately, even when the context of the user's message does not suggest that the query relates to a product search. The systems and methods described herein are also configured to use the NLP model to computationally identify product(s) the user is inquiring about (referred to herein as an “entity classification”) as well as to computationally identify the user's intent (referred to herein as an “intent classification”) based on the input text. At least one suitable action to be performed next (e.g., which dialogue the virtual agent should present to the user) is then determined accordingly, in real-time. The user then makes a selection in response to the action.
In this manner, the systems and methods described herein may advantageously obtain useful information from the text input by users during a conversation with the virtual agent and transform the input text into structured information in order to build the conversation flow between the user and the virtual agent in real-time. It should be understood that the systems and methods described herein may be applicable to a broad range of application domains where it is desirable to promptly understand users' queries and to answer such queries in real-time, meaningfully and efficiently. The systems and methods described herein may allow to manage multi-domain conversations more efficiently and using fewer computing resources than with conventional technologies. For example, the same conversational user interface may be used for both sales domain conversations (e.g., user queries related to product searches) and support domain conversations (e.g., user queries about store locations, shipping policies, or the like). Moreover, the systems and methods described herein may be applied to a broad range of products (e.g., sports equipment, electronics, appliances, etc.) and may, as such, allow to recognize a wide variety of product search queries. By allowing for more efficient processing of the user input, which can in turn reduce computer processing cycles, power consumption, memory consumption, and processing bandwidth usage, the systems and methods described herein can enhance the operation of computer systems (e.g., virtual agents) used to interface with users using NLP.
Referring to
In one embodiment, the input text comprises short sentences that include a number of keywords and full sentences used by the method 100 to proceed with product searching. In particular, and as described elsewhere, the method 100 applies (at steps 104 and 106) NLP inference on multiple models (built from custom trained and/or pre-trained models using ML training algorithms, data augmentation, and various datasets) to obtain structured information. The NLP response is then used (at step 108) by a dialog manager to make a decision on the virtual agent's next response, which can include an action to fetch additional information from an internal or external product database. A rich message (comprising, for example, a carousel interface comprising a plurality of frames that can be shuffled to display different content, image(s), price(s), title(s), etc.) is then generated and sent back to the user.
Still referring to
Referring now to
Still referring to
Step 204 may be achieved using any suitable technique. In one embodiment, when the input text comprises a single word, one or more dictionaries are built and the input text is correlated therewith to identify the language associated with the input text. If the word from the input text cannot be found in any of the dictionaries, step 204 returns an indication that the language of the input text is unknown.
In one embodiment, the systems and methods described herein are able to identify two languages, namely French and English, and, for this purpose, a French dictionary and an English dictionary may be used. Still, it should be understood that this is for illustrative purposes only and that other languages may apply.
When the input text comprises more than one word, a Naive Bayes technique (i.e. based on applying Baye's theorem with strong independent assumptions between features) with character n-grams (i.e. contiguous sequences of n characters from the input text) is illustratively used to determine the language (e.g., English or French) of the input text. In one embodiment, the output of the Naive Bayes model is a floating point number having a value between zero (0) and one (1). Each language has a pre-determined number range associated therewith. If the output of the pre-trained model is in a range associated with a given language, the given language is identified at step 204 as being the language of the input text.
If the language of the input text cannot be readily identified using the dictionaries or the Naive Bayes model, it is determined that the language is neither French nor English. A pre-trained conversational language identification model may then be used to detect the language other than French and English. In one embodiment, the pre-trained conversational language identification model is configured to detect the language of the input text from a variety of (e.g., about one hundred) languages.
In one embodiment, the default language used by the virtual agent (referred to herein as the “response language”) is French and the systems and methods described herein are illustratively configured to detect messages received in any other language (e.g., in English) and to automatically determine the language (i.e. the response language) to be used for the next dialogue presented by the virtual agent (i.e. to update the language of the response message output by the virtual agent in real-time). For this purpose, when it is determined (at step 204) that the received input text is in a given language that differs from the default language (e.g., an English sentence is received as input from the user), it is assessed at step 206 whether the confidence in the language identification (performed at step 204) is above a predetermined confidence threshold.
In one embodiment, step 206 may comprise assessing whether the input text contains a number of words in the given language (identified at step 204) which is above a given threshold. If this is the case, it can be determined that the given language was properly identified at step 204, meaning that the confidence level in the language identification is above the confidence threshold. Since the given language identified at step 204 differs from the default language, the next step 208 is then to change the response language. In one embodiment, step 208 comprises changing the response language from the default language to the language of the input text identified at step 204 (e.g. setting the response language to English). Otherwise, if it is determined at step 206 that the input text contains a number of words in the given language which is below the given threshold, it can be determined that the given language was not properly identified at step 204 and that the confidence level in the language identification is below the confidence threshold. The next step 210 is thus to use the current (or default) response language (e.g., French) for the virtual agent's response.
In another embodiment, step 206 may comprise using rules based on message history to assess the confidence in the language identification performed at step 204 and setting the response language accordingly. For example, step 206 may comprise first assessing whether the input text is the first message received from the user during the current chat session. If this is the case, the language of the next dialogue presented by the virtual agent (i.e. the language of the virtual agent's response to the input text) is changed at step 208. If it is determined at step 206 that the input text is not the first message received from the user, step 206 may further comprise assessing whether the last two (2) messages received from the user were written in the same language. If this is the case (e.g. the last two (2) responses were written in English), the response language is changed (step 208), e.g. set to English rather than to the default French language. Otherwise, the current (or default) response language (e.g., French) is kept (step 210).
Referring now to
Prior to proceeding with intent classification and entity classification, the input text may be pre-processed at step 302. It should be understood that the pre-processing step 302 may, in some embodiments, be optional. In other embodiments, it may be desirable to pre-process the input text at step 302 in order to improve performance. Pre-processing the input text at step 302 (i.e. for product search) may comprise a number of sub-steps including, but not limited to, removing symbols, replacing double space by single space, removing beginning and end spaces if present, removing accents, singularizing nouns, converting ideograms (e.g., emojis, smileys, and the like) to text, retaining specific words based on pre-determined Part-of-Speech (POS) tags (e.g., adjectives, nouns, proper nouns, numbers, verbs, etc.), retaining hyphens, and discarding common stop words or stop words from a curated list specific to the product search. These pre-processing sub-steps may be performed using any suitable technique including, but not limited to, using language dictionaries and Regular Expression (Regex). Other embodiments may apply. It should be understood that the pre-processing sub-steps may be performed in any order or simultaneously. Also, one or more of the pre-processing sub-steps described herein may be performed at step 302.
Referring back to
The step 304 of performing intent classification illustratively comprises applying a binary classifier to the input text to provide a Boolean result indicative of whether or not the input text relates to a product search (versus something else such as a query about store opening hours, greetings, or the like). In other words, the output of the binary classifier is “True” if a product search is recognized from the user's input text and “False” otherwise. For example, the input texts “Blue running shoes” and “Red camping chair”, which each relate to a product search, evaluate to “True” while the input text “Do you ship to Canada?”, which does not relate to a product search, evaluates to “False”. In one embodiment, the output vector(s) of the pre-trained model used for the pre-processing step 302 of
Any suitable ML technique may be used to implement the binary classifier. In one embodiment, a deep neural network is used. In some embodiments, the binary classifier uses a so-called “simple classifier” model that is trained using an augmented dataset. The augmented dataset may be created using any suitable technique and pre-processed (e.g., in the manner described above with reference to step 302 of
Still referring to
Referring now to
Referring to
If it is determined at step 314 that the confidence in the product search results is above the threshold (i.e. product search intent is found as a result of performing step 312), the next step 404 is to present the user with the search result(s), in the appropriate response language (as determined in
Looking now at a detailed example for illustrative purposes, a user may connect to a chat session (e.g., upon accessing, using the user device, a website associated with a given brand or product) and begin a live conversation with a virtual agent. The user seeking product information about the given brand or product may provide input text (i.e. one or more messages) during the conversation with the virtual agent. For example, the user may enter the following input text as their first message: “Hi, I'm John. I'm looking for Nike™ Vaporfly™ shoes. Ideally in red”. The input text is then received (step 102 of
Product search is then performed (step 106 of
The confidence level in the product search is then assessed (step 314 of
The conversation with the user may then continue further, with the customer inputting additional queries. For instance, the following input text may be received from the user: “Avez-vous des promotions?”. The language of the input text may accordingly be identified as French, as per step 204 of
In response, the user may input additional text, such as “Et quelle est votre politique de retour?”. The language of the input text may accordingly be identified as French. The method may further determine that this message is not the first message received from the user and that the last two messages from this user were written in the same language (i.e. in French). As a result, the response language is changed to French (step 208 of
In some embodiments, rather than presenting the chat session as a chat bubble (e.g. positioned at the lower right corner of a webpage presented to the user via the interface application), a conversational widget may be made part of the webpage. The conversation between the user and the virtual agent may then be put in the core of the website. It should however be understood that other embodiments may apply. For instance, the chat session may be presented as a full page. It should therefore be understood that the systems and methods described herein may use different embodiments of the conversational user interface, depending on the application domain. For example, in some embodiments, the conversational user interface may comprise a search box (also referred to as a “search bar”) incorporated in a website. Other embodiments may apply.
In yet other embodiments, the user may be transferred to a persistent communication channel of their choice (e.g., email, text message, or the like). For example, a user entering an automated chat session on a given website may ask for follow up, either by a human or in an automated way (e.g. restock or new product arrival notifications).
The virtual agent 508 may be configured to provide a dynamic user experience with natural language interaction. In one embodiment, the virtual agent 508 may comprise a layered architecture, with each layer being defined by software instructions (that may be written in the same or different programming language). For example, the virtual agent 508 may be implemented as comprising a front-end layer (e.g., React Framework or Javascript), a backend layer (e.g., Springboot Framework or Java), and a data processing layer (e.g., Python). Each layer of the virtual agent 508 may be configured to communicate with other layers without data transformation. The virtual agent 508 may use WebSockets to enable real-time communication. In some embodiments, the virtual agent 508 may enable single sign-in for users over a Hypertext Transfer Protocol (HTTP) session that captures user information of a user having signed in and transfers the information to the front-end layer using WebSocket protocol. In this manner, the virtual agent 508 may be personalized and alleviate the need for the user to input his/her information to sign in.
The front-end layer may be configured to receive user-inputted text and to send a data representation of the input text to the backend layer. In some embodiments, the front-end layer may be configured to generate one or more signals to provide (e.g., via the interface application associated with the virtual agent 508) an output on the devices 504. For example, the response message(s) of the virtual agent 508 to the user's input may be provided using the front-end layer and presented on the devices 504 via the interface application. In some embodiments, the interface application is a visual interface comprising one or more visual elements, as described herein above. In other embodiments, the interface application is an audio interface such that a speech-to-text decoder (e.g., a Sphinx decoder) and a text-to-speech encoder (e.g., a Microsoft™ Speech Application Programming Interface (SAPI) text-to-speech converter) is used to convert audio data to text, and vice versa.
The server 502 may comprise a series of servers corresponding to a web server, an application server, and a database server. These servers are all represented by server 502 in
The memory 512 accessible by the processor 510 may receive and store data. The memory 512 may be a main memory, such as a high speed Random Access Memory (RAM), or an auxiliary storage unit, such as a hard disk or flash memory. The memory 512 may be any other type of memory, such as a Read-Only Memory (ROM), Erasable Programmable Read-Only Memory (EPROM), electrically-erasable programmable read-only memory (EEPROM), Ferroelectric RAM (FRAM), or optical storage media such as a videodisc and a compact disc. Also, although the system 500 is described herein as comprising the processor 510 having the applications 514a, . . . , 514n running thereon, it should be understood that cloud computing may also be used. As such, the memory 512 may comprise cloud storage.
One or more databases 516 may be integrated directly into the memory 512 or may be provided separately therefrom and remotely from the server 502 (as illustrated). In the case of a remote access to the databases 516, access may occur via any type of network 506, as indicated above. The databases 516 described herein may be provided as collections of data or information organized for rapid search and retrieval by a computer. The databases 516 may be structured to facilitate storage, retrieval, modification, and deletion of data in conjunction with various data-processing operations. The databases 516 may consist of a file or sets of files that can be broken down into records, each of which consists of one or more fields. Database information may be retrieved through queries using keywords and sorting commands, in order to rapidly search, rearrange, group, and select the field. The databases 516 may be any organization of data on a data storage medium, such as one or more servers. As discussed above, the system 500 may use cloud computing and it should therefore be understood that the databases 516 may comprise cloud storage.
In one embodiment, the databases 516 are secure web servers and Hypertext Transport Protocol Secure (HTTPS) capable of supporting Transport Layer Security (TLS), which is a protocol used for access to the data. Communications to and from the secure web servers may be secured using Secure Sockets Layer (SSL). Identity verification of a user may be performed using usernames and passwords for all users. Various levels of access authorizations may be provided to multiple levels of users.
Alternatively, any known communication protocols that enable devices within a computer network to exchange information may be used. Examples of protocols are as follows: IP (Internet Protocol), UDP (User Datagram Protocol), TCP (Transmission Control Protocol), DHCP (Dynamic Host Configuration Protocol), HTTP (Hypertext Transfer Protocol), FTP (File Transfer Protocol), Telnet (Telnet Remote Protocol), SSH (Secure Shell Remote Protocol).
As used herein, the term “engine” is directed to a computer implemented mechanism including one or more software and/or hardware components that are specially configured to perform one or more actions and/or one or more computations. The engine, in some embodiments, describes software implemented code modules or components. In other embodiments, the engine describes hardware implementations including specially configured machines. A combination of hardware and software is possible.
Embodiments of the methods described herein (e.g., method 100 of
Computer-executable instructions may be in many forms, including program modules, executed by one or more computers or other devices. Generally, program modules include routines, programs, objects, components, data structures, etc., that perform particular tasks or implement particular abstract data types. Typically the functionality of the program modules may be combined or distributed as desired in various embodiments.
The embodiments described herein are implemented by physical computer hardware, including computing devices, servers, receivers, transmitters, processors, memory, displays, and networks. The embodiments described herein provide useful physical machines and particularly configured computer hardware arrangements. It should however be understood that the embodiments described herein may also provide virtual machines. The embodiments described herein are directed to electronic machines and methods implemented by electronic machines adapted for processing and transforming electromagnetic signals which represent various types of information. The embodiments described herein pervasively and integrally relate to machines, and their uses; and the embodiments described herein have no meaning or practical applicability outside their use with computer hardware, machines, and various hardware components. Substituting the physical hardware particularly configured to implement various acts for non-physical hardware, using mental steps for example, may substantially affect the way the embodiments work. Such computer hardware limitations are clearly essential elements of the embodiments described herein, and they cannot be omitted or substituted for mental means without having a material effect on the operation and structure of the embodiments described herein. The computer hardware is essential to implement the various embodiments described herein and is not merely used to perform steps expeditiously and in an efficient manner.
Various aspects of the systems and methods described herein may be used alone, in combination, or in a variety of arrangements not specifically discussed in the embodiments described in the foregoing and is therefore not limited in its application to the details and arrangement of components set forth in the foregoing description or illustrated in the drawings. For example, aspects described in one embodiment may be combined in any manner with aspects described in other embodiments. Although particular embodiments have been shown and described, it will be apparent to those skilled in the art that changes and modifications may be made without departing from this invention in its broader aspects. The scope of the following claims should not be limited by the embodiments set forth in the examples, but should be given the broadest reasonable interpretation consistent with the description as a whole.
The embodiments described in this document provide non-limiting examples of possible implementations of the present technology. Upon review of the present disclosure, a person of ordinary skill in the art will recognize that changes may be made to the embodiments described herein without departing from the scope of the present technology. Yet further modifications could be implemented by a person of ordinary skill in the art in view of the present disclosure, which modifications would be within the scope of the present technology.
The present application claims the benefit of U.S. Provisional Patent Application No. 63/114,164 filed on Nov. 16, 2020, the contents of which are hereby incorporated by reference.
Number | Date | Country | |
---|---|---|---|
63114164 | Nov 2020 | US |