The present application claims the benefit of priority under the Paris Convention to Chinese Patent Application No. 201811511713.9, entitled “Method and System for Generating Interactive Applications,” filed Dec. 11, 2018, and Chinese Patent Application No. 201811511714.3, entitled “Method and System for Service Information Interaction,” filed Dec. 11, 2018, each of which is incorporated herein by reference in its entirety.
The disclosed implementations relate generally to information technologies, and more specifically to a rule-based response system and method for structuring and serving information using a conversational user interface.
A conversational user interface (UI) allows a user to interact with a computing system or device such as a smart phone using verbal or textual commands to obtain service or information. Conversational UIs have become popular tools for content or service providers to distribute content or provide customer service, as well as for individual users to accomplish certain tasks such as setting up reminders, turning off lights, making dinner reservations, etc. The most alluring feature of conversational interfaces is the natural and frictionless experience a user can obtain when interacting with a computing system.
In general, conversational UIs use a voice assistant that communicates with users orally, and/or chat bots that communicate with users through text. These conversational UIs combine voice detection technologies, artificial intelligence reasoning, and contextual awareness to carry on conversation and to acquire more information from the user until the user's request is fulfilled and/or the requested task accomplished.
Conventionally, a conversational UI is typically based on a decision tree that starts with a single node representing, for example, a multiple choice question, and branches into possible answers to the multiple choice question. Each of the possible answers may then lead to additional nodes or questions, which may branch off into other possible answers, and so forth. Thus, such a conversational UI navigates a conversation flow by moving from node to node asking one question after another until no more questions are left to be asked. This process makes designing the decision tree to map out the steps a difficult task, achievable only by highly-trained professionals. For example, depending on how many questions that need to be answered in order to fulfill a request and the possible answers one can give for each of those questions, there may be millions of different ways a conversation could be carried out. Furthermore, the more questions needed to be asked to fulfill a request, the longer the user will have to be engaged in the conversation, or the slower the request can be fulfill. As a result, the user's experience with the UI is negatively impacted.
In some embodiments, system and method for structuring and serving information in an efficient manner using a rule-based conversational UI are provided. In some embodiments, semantics of task-oriented conversations are organized into a knowledge database using high level abstraction. The knowledge database includes a collection of individual frames, each frame corresponding to a semantic framework for a particular topic or category of tasks or information (e.g., providing medical diagnostics, setting reminders, making reservations, purchasing event tickets, etc.). The knowledge database also includes rules. Each rule is a logic basis (e.g., a logical equation) that includes one or more conditions and a response to be provided to the user when the one or more conditions are satisfied. The frames and rules allow information or service providers to build rule-based conversation UIs for complex real world problems without requiring a high level of technical skills.
In some embodiments, a declarative approach is used to construct a dialogue in a conversation using structured information and a methodology that does not require specific mapped steps. This approach allows the development process to be streamlined by reducing the complexity of system maintenance. Additionally, such an approach allows a conversational UI system to be adaptable and portable among multiple domains and applications.
Using the declarative approach, a domain expert (e.g., conversational UI developer for an information or service provider) can build a knowledge database without detailed understanding of the techniques embedded in the conversational UI system, such as machine learning models, algorithms, statistics, etc. The domain expert, such as a retail shop manager, or a pre-diagnosis medical receptionist, can simply input rules that lead a user to the correct system response. For example, in a retail application, the response could be providing a user with a link to purchase a requested merchandise or a link to a check out webpage that has the requested merchandise automatically loaded in a on-line shopping cart. In another example, for a pre-diagnosis application, the response could be a recommendation to call or seek emergency medical services with the relevant phone number or address. Each of the rules includes a set of conditions with specific attribute values and a response when the conditions are satisfied based on answers in the dialogue. Based on the rules, a frame storing the attributes included in the rules is created. A domain expert can select training sentences for the particular domain (e.g., topic) to be used to train the machine learning model(s) to correctly match user requests to a relevant set of rules associated with a relevant frame. With the knowledge database components defined, a rule-based conversational UI system can be used to interact (e.g., orally, or via written text) with the users on a specific topic or in a specific domain, and service the user by fulfilling the user's requests.
Thus methods, systems, and interfaces are provided with regards to a rule-based conversational UI system, and development and performance thereof.
In accordance with some implementations, a method is performed by one or more computer systems that are coupled to a network and include one or more processors. The method includes receiving, by a processor of the one or more processors, a user request from a user device in the network. The method also includes determining a frame related to the user request. The frame includes a plurality of attribute. Each respective attribute of the plurality of attributes have a respective name, a respective type, and a respective prompt question for inquiring about a value for the respective attribute. The method further includes selecting a set of rules from a rule database associated with the frame based on the request. Each rule of the set of rules includes one or more conditions and a corresponding response, and each condition includes an attribute related to the user request and a value or range of values for the attribute. In response to the set of rules including more than one rule the method includes selecting one or more attributes that are included in at least one rule of the set of rules; transmitting, to the user device, one or more prompt questions associated with the one or more attributes; receiving one or more answers to the one or more prompt questions from the user device, the one or more answers including one or more values for the one or more attributes; and eliminating one or more rules from the set of rules based on the one or more answers. In response to all other rules except one remaining rule having been eliminated from the set of rules, the method includes transmitting the response included in the one remaining rule to the user device.
In accordance with some implementations, a method to generate knowledge databases corresponding to a plurality of expert system coupled to a network and associated with a plurality of distinct knowledge domains is performed by one or more computer systems coupled to a network. The one or more computer systems include one or more processors. The method includes, for each respective knowledge domain of the plurality of distinct knowledge domains, launching, by a processor of the one or more processors, at least one respective user interface to receive respective inputs from a respective expert system associated with the respective knowledge domain. The at least one respective user interface includes respective attribute input fields and respective rule input fields. The respective inputs include a plurality of attributes that are received via the respective attribute input fields. Each attribute of the plurality of attributes has a name, a type, and a prompt question for inquiring about a value for the each attribute. The respective inputs also include a plurality of rules received via the respective rule input fields. Each rule of the plurality of rules includes one or more conditions and a corresponding response. Each condition of the one or more conditions includes one or more attributes and a value or range of values for each of the one or more attributes. The method further includes forming a respective frame corresponding to the respective knowledge domain using the plurality of attributes and associating the plurality of rules with the respective frame.
For a better understanding of the aforementioned implementations of the invention as well as additional implementations, reference should be made to the Description of Implementations below, in conjunction with the following drawings in which like reference numerals refer to corresponding parts throughout the figures.
Like reference numerals refer to corresponding parts throughout the drawings.
Reference will now be made in detail to implementations, examples of which are illustrated in the accompanying drawings. In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be apparent to one of ordinary skill in the art that the present invention may be practiced without these specific details.
In some implementations, computing platform 120 is implemented using one or more servers and/or other computing devices, such as desktop computers, laptop computers, tablet computers, and other computing devices with one or more processors capable of running or hosting the user application(s) 146 (e.g., conversational UI application 146) and/or the expert application 142. Knowledge database 124 may be stored in one or more memory and/or storage devices associated with the one or more servers and/or other computing devices, or in a network storage accessible by the one or more servers and/or other computing devices, including capabilities for database organization (e.g., organizing information in the knowledge database 124 into frames) as well as capabilities for adding new information or removing existing information in existing databases.
As shown in
In some implementations, the memory 160 includes high-speed random-access memory, such as DRAM, SRAM, DDR RAM or other random-access solid-state memory devices. In some implementations, the memory 160 includes non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid-state storage devices. In some implementations, the memory 160 includes one or more storage devices remotely located from the CPUs 171. The memory 160, or alternately the non-volatile memory device(s) within the memory 160, comprises a non-transitory computer readable storage medium. In some implementations, the memory 160, or the computer readable storage medium of the memory 160, stores the following programs, modules, and data structures, or a subset thereof:
Each of the above identified executable modules, applications, or set of procedures may be stored in one or more of the previously mentioned memory devices, and corresponds to a set of instructions for performing a function described above. The above identified modules or programs (i.e., sets of instructions) need not be implemented as separate software programs, procedures, or modules, and thus various subsets of these modules may be combined or otherwise re-arranged in various implementations. In some implementations, the memory 160 stores a subset of the modules and data structures identified above. In some implementations, the memory 160 stores additional modules or data structures not described above.
Although
In some implementations, the memory 184 includes high-speed random-access memory, such as DRAM, SRAM, DDR RAM or other random-access solid-state memory devices. In some implementations, the memory 184 includes non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid-state storage devices. In some implementations, the memory 184 includes one or more storage devices remotely located from the CPUs 183. The memory 184, or alternately the non-volatile memory device(s) within the memory 184, comprises a non-transitory computer readable storage medium. In some implementations, the memory 184, or the computer readable storage medium of the memory 184, stores the following programs, modules, and data structures, or a subset thereof:
In some implementations, the memory 184a includes high-speed random-access memory, such as DRAM, SRAM, DDR RAM or other random-access solid-state memory devices. In some implementations, the memory 184a further includes non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid-state storage devices. In some implementations, the memory 184a includes one or more storage devices remotely located from the CPUs 183. The memory 184a, or alternately the non-volatile memory device(s) within the memory 184a, comprises a non-transitory computer readable storage medium. In some implementations, the memory 184a, or the computer readable storage medium of the memory 184a, stores the following programs, modules, and data structures, or a subset thereof:
Each of the above identified executable modules, applications, or set of procedures may be stored in one or more of the previously mentioned memory devices, and corresponds to a set of instructions for performing a function described above. The above identified modules or programs (i.e., sets of instructions) need not be implemented as separate software programs, procedures, or modules, and thus various subsets of these modules may be combined or otherwise re-arranged in various implementations. In some implementations, the memory 184 stores a subset of the modules and data structures identified above. In some implementations, the memory 184 stores additional modules or data structures not described above.
Although
The knowledge database 124 includes frames 210, such as frames 210-1, 210-2, . . . , 210-x. Each frame corresponds to a domain or a topic (e.g., medical diagnosis, event tickets, reservations, etc.). Each frame 210 includes a plurality of attributes that relates to the frame. For example, frame 210-1 includes attributes 220-1, 220-2, . . . , 220-a and frame 210-2 includes attributes 222-1, 222-2, . . . , 222-b. While each attribute in the frames 210 is distinct from one another (e.g., attribute 220-1 may be related to medical diagnosis and may have the attribute name “cough” attribute 222-1 may be related to scheduling and may have the attribute name “time”), some attributes may have a similar or same name while being associated with different frames. For example, attribute 222-1, related to scheduling, may have the attribute name “date” and attribute 224-1 may be related to purchasing event tickets and may also have the attribute name “date”.
The knowledge database 124 also includes rules 230 organized in a plurality of rule databases, such as rule databases 230-1, 230-2, . . . , 230-x, associated, respectively, with the frames 210-1, 210-2, . . . , 210-x. The rules 230 and frames 210 are related to one another via the attributes. The frames serve as the basis for the rules in respective rule databases. For example, a particular frame defines the attributes in a particular domain or related to a particular topic, to which a user request is directed. A rule in the rule database associated with the particular frame provides conditions that need to be met in order to execute a response to the user request. Thus, rules and frames are used in conjunction with one another when executing a conversational UI application 146. The frames 210 and rules 230 are provided (e.g., input) by domain experts via knowledge input UI 122 and expert application 142 and are used by conversational UI application(s) 146 to fulfill users' requests.
The frames 210 and the structured data (e.g., attributes) in the frames set the foundation of the conversational UI system and are used to keep conversational flows efficient. For a problem in a particular domain (e.g., on a particular topic), information related to the problem and their properties are defined in one frame. For example, frame 210-1 may correspond to a medical assistant domain and the information (e.g., symptoms) related to certain diagnosis is defined as attributes (e.g., attributes 220-1, 220-2, . . . , 220-a) in the frame 210-1. The rules (e.g., rules in rule database 230-1) related to the frame would include potential solutions (e.g., responses) to the user's problem or request. The rules are defined using attributes. In order to meet a condition in a particular rule, the value of a particular attribute needs to be within certain range or have a certain value. The conversational UI system analyzes the user's request to determine the user's intention (e.g., context of the user request) and retrieve the relevant rules. Based on the attributes mentioned in the rule conditions, the conversational UI system then acquires the value of that attribute from the user and compares the value provided by the user with the criteria of condition to determine if the condition is met.
In some implementations, conversational UI application 146 also includes the technical features such as Automatic Speech Recognition (ASR), Natural Language Understanding (NLU), etc. These features are used to obtain and interpret user input in order to determine user intent or to extract information required to complete the user request. These features are integrated into the conversational UI system such that they are automatically generated as a product of defining a conversation using the conversational UI system. These technical components are hidden from an expert user (e.g., a conversational UI developer) so that no additional work is needed to integrate these components into conversational UI application 146. These technical components include machine learning models and training sentences that are used to map a user's input into information that can be processed and used by a conversational UI system in fulfilling the user's request.
A plurality of training sentences 154 are used to train machine learning models 156. Once trained, the machine learning models 156 are configured to allow the ASR/NLU components to correlate a user request with a relevant frame and to extract information from a user's input (e.g., answers) in order to determine a value associated with an attribute.
In order to determine a relevant frame based on a user's initial request. In some implementations, such as ones where the user input is a voice command, the machine learning models 156 are trained to receive the user's audio input (e.g., sound waves) and translate the user's input into plain text. In some implementations, the machine learning models 156 are also trained to match the translated user input to one or more training sentences that have been defined for each frame 210 in the knowledge database 124. In order to perform the comparison (and match), the training sentences include trigger phrases. Each of the trigger phrases are associated to a specific frame 210 via a frame identifier. In some implementations, the trigger phrases are labeled with a frame identifier by a human developer. Some examples of training sentences are:
In the first example, “My stomach hurts” is the trigger phrase and “Medical Assistant” is the frame identifier. In some implementations, as shown, the frame identifier includes text, such as a frame name or a string of characters (e.g., “Med”). Alternatively, the frame identifier may include numerical values or may be a numerical identifier (e.g., “MED_1”, “MED1”, or “210-1”).
The translated user input is compared to the list of trigger phrases. If an exact match is found, then that particular frame tagged with that training phrase is selected as the context for the conversation. This match is the fastest method. However, the user input may not be an exact match to one of the trigger phrases. In such cases, the machine learning models 156 may use a text similarity model to classify the translated user input as being associated with a particular trigger phrase. In some implementations, three layers of training are performed for the machine learning model:
A trained text similarity model, analyzes the translated user input to determine a similarity between the translated user input and any trigger phrases in the training sentences 250. The trained text similarity model assigns a similarity value to each comparison. For example, a similarity value close to 0 corresponds to little or no similarity between the translated user input and the particular trigger phrase and a similarity value close to 1 indicates that the translated user input is very similar to the particular trigger phrase. For a translated user input at the start of the conversation (e.g., an initial user request), no context is provided and the translated user input is compared to all possible trigger phrases. Since there can be a large number of trigger phrases, two layers of machine learning models may be used: a coarse model (e.g., keyword model or shallow neural network model) and a refined model (e.g., deep neural network model or Bidirectional Encoder Representations from Transformers similarity based neural matching model). Both models are trained by the same approach as described herein. These models include two mechanisms, an encoder that reads the text input and a decoder that produces an output prediction for the task. The coarse model has fast response time with acceptable accuracy. Compared to the coarse model, the refined model, which may be implemented with deeper learning neural network, has improved accuracy but a slower response time due to the heavy computation involved. In some implementations, the coarse model (e.g., a small model framework) is distinguished from the refined model by a relatively lightweight shallow convolutional neural network (CNN). CNN is a class of deep neural networks of neurons at input and an output layer, as well as multiple hidden layers. Each neuron in a neural network computes an output value by applying some function to the input values coming from the receptive field in the previous layer. The function that is applied to the input values is specified by a vector of weights and a bias (typically real numbers). Learning in a neural network progresses by making incremental adjustments to the biases and weights. A lightweight shallow model can be used in the coarse model, then supplemented with a deep large model with regards to specific input features so that the combined predication model is flexible yet includes enough detail to be accurate.
In some implementations, the coarse model is first applied to the translated user text in order to compare the translated user text to a large number of trigger phrases and select a group of most relevant phrases. From the selected group of relevant phrases, a group of most relevant (e.g., highest scores) trigger phrases are selected. The selected group of most relevant trigger phrase is reduced from a large pool (e.g., 100,000) to a small pool (e.g., 50) of trigger phrases. The small pool of trigger phrases will mostly likely contain the correct trigger phrase that corresponds most closely to the translated user input. The refined model compares the translated user input with the trigger phrases in the small pool and selects the most relevant trigger phrase (e.g., the trigger phrase with the highest score). The frame identifier tag associated with the selected trigger phrase is returned to indicate that the user's input corresponds to the particular frame. The coarse refined models are used to achieve a balance of accuracy and time efficiency.
There are many possible steps taken within the refined model's analysis in determining similarity values. The NLU processing pipeline is one such method to analyze the translated user input. The method may include:
Using the methods described herein, machine learning models 156 may extract keywords from sentences that correspond to attributes stored in a frame database 209 in order to determine a frame that is relevant to the user's request. For example, sentence 252-1 states “I have a headache” and corresponds to a frame corresponding to medical diagnosis. The machine learning models 156 may extract the word “headache” and match it to an attribute that is stored in a frame corresponding to medical diagnosis. The machine learning models use such training sentences to learn which keywords are relevant and can be used for determining relevant frames. Using the machine learning models 156, a conversational UI application 146 is capable of matching a user's request to a relevant frame.
Additionally, during the conversation, the conversational UI system may ask the user questions in order to gather additional information and check if the conditions in the rule(s) are met. The trained machine learning models are also used to interpret the user's input to extract a value associated with an attribute. The training sentences are a set of data to pair user utterance into specific value. Once the machine learning model(s) are trained, using the training sentences, the machine learning model(s) are expected to be able to extract information from user input into relevant values and determine if conditions within a rule are met or if the rule(s) should be eliminated.
When using the machine learning model(s) 156 to extract information regarding values that correspond to an attribute, since the context of the conversation is known and the relevant frame has been identified, the machine learning models 156 used here are trained using training sentence have the format of (Prompt question, answer, polarity value). In some implementations, the polarity value is a Boolean value that indicates that the user's response corresponds to a simple “yes” or “no”. In some implementations, the polarity value can be, for example, a 0 to indicate negative answer, a 1 to indicate a positive answer, and a 2 for uncertain answers. Some examples of such training sentences are:
In the example, for the given prompt question, a “yes” or “no” answer is expected. “I am burning” means yes and it has to be interpreted after that training sentence is taken into the model. An answer such as “I don't know . . . ” is interpreted as “No”.
For the answer that requires an attribute value, a training sentence has the format of (Prompt question, answer, relevance value). For example, a relevance value of 0 indicates that the answer is not relevant, a relevance value of 1 indicates a relevant answer, and a relevance value of 2 indicates a uncertain answer. Some examples of such training sentences are:
If the user answer is relevant to the prompt question, NLU techniques are used to break the answer and extract the values into the attribute of interest.
For example, frame 310 corresponds to a domain for purchasing concert tickets and includes a plurality of attributes, such as “artist” attribute 320-1, “date” attribute 320-2, and “location” attribute 320-4. As shown, each attribute includes a type (e.g., type 321-T), a value (e.g., value 321-V), and a prompt question (e.g., 321-Q). For example, the artist attribute may a type that is a person's name and the value as “Kelly Clarkson”. Each attribute may also include one or more prompt questions that can be used by the conversation UI to ask the user in order to fill the content for that attribute. For example, for the artist attribute, the prompt question could be “What is the name of the artist?” or “Which artist would you like to see?”
In some implementations, the value of an attribute can be Boolean (e.g., “yes” or “no”), a text string (e.g., “Las Vegas”), or numerical (e.g., “17-20”, or “137”), etc.
In some implementations, as shown in
As shown in
For each user request, a set of rules are used to determine an appropriate response that fulfills the user's request. Each rule includes a set of conditions and a response when the set of conditions are met. Each condition includes an attribute and a specified value or range of values for the attribute. Thus, the rules include attributes, which are stored in specific frames. As a result, each rule is inherently associated with a corresponding frame that stores the attributes in the rule.
In certain embodiments, the conversation may be a single task of purchasing a ticket. If the user starts the conversation with an intent of buying a particular a concert ticket, the response could be the website or phone number for purchasing the requested ticket, or a webpage with the ticket loaded in a shopping cart and fields for making a payment for the ticket. The conversation is navigated to collect information defined in the relevant rules. The condition may also include fan club membership information, with the value of “yes” or “no”. In this example, the information for each attribute can be obtained in any order, regardless of which attribute information was provided by the user first, or several attributes can be communicated at the same time since each attribute is on the same structured information level and do not have any prior relationships with one another. Usually, information for each condition in the rule needs to be obtained from the user or another source in order to trigger the response in the rule.
For example, a rule related to purchasing tickets includes a condition that requires that an “Artist” attribute has the value “Kelly Clarkson,” a “Date” condition that has the value “Dec. 20, 2020”, and a “Location” condition that has the value “Las Vegas.” An example of this rule may look like the following:
AND (artist=“Kelly Clarkson”, date=“12/20/2020”, location=“Las Vegas”)
Each conversational session ends with a response or a feedback to the user. In this example rule, after all required information has been collected, the conversational UI provides a response that includes a phone number or a link to a webpage for purchasing tickets for that particular concert.
In some embodiments, if the conditions in the rule are met, a defined action or response is provided to the user. If the conditions in the a rule are not met, the rule is disregarded and the user response will be selected from a different rule which has all of its conditions met. deployed. For a session related to medical diagnosis, the response is a recommendation after analyzing all the symptoms. The logic to derive such a diagnosis is defined in the rules. In the example rule, if the age is less than 14 years old, and the symptoms include nausea and a fever greater than 102° F. for 4 days or longer, then medical attention is recommended (e.g., call emergency medical care), and the action/response may include a phone number to call for medical care.
As described, a rule may include one or more composite conditions each being a combination of multiple conditions bound together using operands or logic operators, such as AND, and OR, for example.
In some implementations, (steps 510 and 512) a conversational session is initiated in response to the system receiving a user request. In some implementations, the user request may only include an indication of the user's intent, such as an intention to buy ticket(s). Alternatively, the user request may also include one or more attributes or attribute values, such as name of the artist or a date range. In step 514, the conversational UI system converts (e.g., translates) the user's input into full or partial semantic frames, as described herein. Then, in step 516, the conversational UI may use one or more machine learning models to identify keywords and/or attributes in the user's request in order to determine which frames in the frame database 209 are relevant to the user's request. Once the relevant frame have been identified, the conversational UI system identifies a set of rules that are associated with the relevant frame based on the user's request in step 518. In step 520, the conversation UI determines whether or not the set of rules associated with the relevant frame includes more than one rule. In the case that the set of rules includes more than one rule, (step 522) the conversational UI asks a prompt question and, upon receiving a user response to the prompt question, eliminates at least one rule from the set of rules based on the user response. In the case that the set of rules does not include more than one rule (e.g., all rules in the set of rules have been eliminated except for one remaining rule), (step 524) the conversational UI transmits a response to the user in order to fulfill the user request. The response is derived from the one remaining rule.
For example, a user request may state “Buy tickets for a concert this weekend.” The conversational UI may extract the words “buy”, “tickets”, and “concert” as relevant keywords to use in identifying a relevant frame—in this case, a frame related to purchasing tickets. Additionally, the conversational UI may also identify the phrase “this weekend” and automatically eliminate all rules associated with the relevant frame that relate to concerts with dates that are not this weekend. If more than one rule is left in the set of rules, the conversational UI will continue to ask prompt questions in order to elicit information from the user's responses and eliminate rules in the set of rules based on the information obtained from the user's input. During this process, the UI system retrieves relevant rules from the knowledge database 124. The conditions of the rules will need to be checked to determine the most accurate response to the user. Thus, the value of attributes as provided by the user needs to be compared with the required value in the conditions in the rules. The conversational UI system solicits values of these relevant attributes if they are not yet known by asking one or more prompt questions. Once the value for an attribute is determined, the conversational UI system can proceed to eliminate (e.g., exclude) rules whose conditions require that an attribute have a value that is different from the value derived from the user's response. In some implementations, when the conversational UI system needs to obtain a value of an attribute, the value is obtained by template matching. For example, in a structured message with two parameters, <departure city> and <destination city>, the template “from <departure city> to <destination city> . . . ” is set. When the user expresses “buy airline ticket from Beijing to Shanghai”, the UI system can extract the <departure city> as “Beijing”, and <destination city> as “Shanghai” according to the template. The conversational UI system repeats this process until only one remaining rule is left. Provided that the conditions in the remaining rule are met, the conversational UI would deploy the response of the one remaining rule. For example, the conversational UI may ask the user for venues, artist, concert time, etc. until only one rule remains. If, for example, the user indicates that he/she is interested in tickets for Kelly Clarkson any time after 5 pm, any rules that include a condition that the artist attribute has a value that is another artist's name (e.g., artist=“Sam Smith”) will be eliminated. Similarly, any rules that include a condition that the concert starts at a time before 5 pm (e.g., time=3:30 pm) will also be eliminated. Once the conversation UI determines that there is only one remaining rule in the set of rules and all of the conditions of the rule are met, then the response of the rule is deployed (e.g., a link to purchase tickets for a Kelly Clarkson concert starting at 7 pm this Saturday is presented to the user, or the conversational UI loads the ticket in a shopping cart at the ticket purchase website and confirms the details of the purchase with the user before submitting payment).
Referring to
Referring to
Attribute interface 701 includes an attribute input interface 710 that includes one or more fields (e.g., “name”, “type”, “value”, “prompt question”) for entering information regarding an attribute. In some implementations, as shown, attribute interface 701 also displays attributes that have been previously entered by an expert user (e.g., attributes 720 and 730). In some implementations, the “value” of an attribute in the attribute interface 701 is a placeholder (see attribute 720). In some implementations, the “value” of an attribute in the attribute interface 701 includes a base value or default value that is defined by an expert user (see attribute 730).
Rule interface 702 includes a rule input interface 740 that includes one or more fields (e.g., “attribute name”, “attribute type”, “attribute value”, “operand”, and “response”) for entering information regarding an rule. In the rule input interface 740, an expert user can define conditions (e.g., attributes and values) as well as define operands for combining conditions (e.g., [condition 1 OR condition 2] AND [condition 3]) to form composite conditions. The rule input interface 740 also includes one or more field(s) 754 for receiving information regarding a response associated with a specified rule. In some implementations, as shown, rule interface 702 also displays rules that have been previously entered by an expert user (e.g., rule 750). An example of an expert user input interface is shown in
In some implementations, (step 830) selecting a set of rules from a rule database associated with the frame based on the request includes (step 832) identifying one or more first keywords from the user's request, and (step 834) matching the one or more first keywords to one or more attributes.
In some implementations, the method includes, prior to receiving a user request, receiving one or more first training sentences and matching the one or more first keywords includes matching the one or more first keywords to at least a portion of a first example sentence. The one or more first training sentences include an example sentence and a corresponding domain of interest or frame identifier and the identified set of rules corresponds to the same domain as the first example sentence.
In some implementations, the method includes, in response to receiving one or more answers to the one or more prompt questions from the user device, determining the one or more values for the one or more attributes by extracting one or more second keywords that are associated with an attribute of the one or more attributes, and extracting one or more third keywords that are associated with possible values of the attribute.
In some implementations, eliminating the one or more rules includes eliminating a rule that does not include any of the one or more attributes.
In some implementations, the one or more values include a first value for a first attribute, and eliminating the one or more rules includes eliminating a rule that has a condition that has the first attribute and a value for the first attribute that is different from the first value.
In some implementations, the method 800 further includes (step 812), in response to the user input, starting a conversation session and creating a session log for the conversation session. In some embodiment, a user profile is created or preexisting (if the user has had previous conversations with the rule-based conversational UI system 200), and the session log is associated with the user profile. In some embodiments, the user request, and subsequent prompt questions, and answers are recorded in the session log. In some implementations, the one or more second attributes are selected based at least on recorded data in the session log after receiving the one or more answers. In some implementations, the session log is stored for a predetermined duration.
In some implementations, the one or more conditions include a composite condition formed using one or more operands that logically combine a plurality of conditions.
In some implementations, the method includes, prior to receiving a user request, receiving one or more second training sentences and comparing the one or more answers to the one or more second training sentences. The one or more second training sentences include a prompt question corresponding to an attribute, a second example sentence, and an indicator. The method also includes selecting a sentence of the one or more second training sentences that is most similar to the one or more answers and recording a value corresponding to the attribute based on at least an indicator corresponding to the selected sentence.
In some implementations, at least one of the set of rules includes a composite attribute. The composite attribute includes a primary attribute and one or more secondary attributes that are dependent on the primary attribute.
In some implementations, the one or more attributes include a first attribute and a second attribute, and the one or more prompt questions include one or more first prompt questions that are associated with the first attribute and one or more second prompt questions that are associated with the second attribute. The one or more answers include one or more first answers to the one or more first prompt questions and one or more second answers to one or more second prompt questions. Eliminating one or more rules includes eliminating one or more first rules after receiving the one or more first answers and before receiving the one or more second answers as well as eliminating one or more second rules after receiving the one or more second answers. The one or more second attributes are selected after eliminating the one or more first rules and before eliminating the one or more second ruled. The one or more second prompt questions are transmitted after the one or more second attributes are selected.
In some implementations, the one remaining rule includes a first number of attributes, the one or more prompt questions include a second number of prompt questions, and the second number is at least one less than the first number.
In some implementations, the method 900 further includes (step 950) constructing a knowledge database that includes a plurality of distinct frames corresponding, respectively, to the plurality of distinct knowledge domains.
In some implementations, (step 920) constructing a respective frame corresponding to the respective knowledge domain using the plurality of attributes includes (step 932) storing the plurality of attributes as part of the frame.
In some implementations, the plurality of distinct knowledge domains include a plurality of distinct conversation topics.
In some implementations, the plurality of distinct knowledge domains include a plurality of distinct categories of tasks.
In some implementations, the one or more conditions include a composite condition formed by combining multiple conditions with one or more operands.
In some implementations, the plurality of attributes are stored in a respective attributes database as the respective frame. In some implementations, the plurality of rules are stored in a respective rule database associated with the respective frame.
In accordance with some implementations a method performed by one or more computer systems that are coupled to a network and include one or more processors includes receiving, by a processor of the one or more processors, a user request from a user device in the network. The method also includes determining a frame related to the user request. The frame includes a plurality of attribute. Each respective attribute of the plurality of attributes have a respective name, a respective type, and a respective prompt question for inquiring about a value for the respective attribute. The method further includes selecting a set of rules from a rule database associated with the frame based on the request. Each rule of the set of rules includes one or more conditions and a corresponding response, and each condition includes an attribute related to the user request and a value or range of values for the attribute. In response to the set of rules including more than one rule the method includes selecting one or more attributes that are included in at least one rule of the set of rules; transmitting, to the user device, one or more prompt questions associated with the one or more attributes; receiving one or more answers to the one or more prompt questions from the user device, the one or more answers including one or more values for the one or more attributes; and eliminating one or more rules from the set of rules based on the one or more answers. In response to all other rules except one remaining rule having been eliminated from the set of rules, the method includes transmitting the response included in the one remaining rule to the user device.
In some implementations, eliminating the one or more rules includes eliminating a rule that does not include any of the one or more attributes.
In some implementations, the one or more values include a first value for a first attribute, and eliminating the one or more rules includes eliminating a rule having a condition that has the first attribute and a value for the first attribute that is different from the first value.
In some implementations, the method further comprises determining that the one or more conditions in the one remaining rule is satisfied based on the user request and the one or more answers before transmitting the response included in the one remaining rule to the user device.
In some implementations, the user request and the one or more answers are recorded in a user session that is associated with a user profile and stored for a predetermined duration.
In some implementations, at least a first rule in the set of rules includes one or more operands that logically combine the one or more conditions included in the first rule to form the first rule.
In some implementations, at least one of the set of rules includes a composite attribute. The composite attribute includes a primary attribute and one or more secondary attributes that are dependent on the primary attribute.
In some implementations, the one or more attributes include a first attribute and a second attribute, the one or more prompt questions include one or more first prompt questions associated with the first attribute and one or more second prompt questions associated with the second attribute, and the one or more answers include one or more first answers to the one or more first prompt questions and one or more second answers to one or more second prompt questions. In some implementations, eliminating one or more rules includes eliminating one or more first rules after receiving the one or more first answers and before receiving the one or more second answers and eliminating one or more second rules after receiving the one or more second answers. In some implementations, the one or more second attributes are selected after eliminating the one or more first rules and before eliminating the one or more second rules, and the one or more second prompt questions are transmitted after the one or more second attributes are selected.
In some implementations, the method further includes providing a user session in response to the user request and recording, in the user session the user request, each of the one or more prompt questions, and each of the one or more answers. The one or more second attributes are selected based at least on recorded data in the user session after receiving the one or more first answers.
In some implementations, the one remaining rule includes a first number of attributes, the one or more prompt questions include a second number of prompt questions, and the second number is at least one less than the first number.
In some implementations, selecting a set of rules from a rule database associated with the frame includes identifying one or more first keywords from the user's request and matching the one or more first keywords to one or more attributes.
In some implementations, the user request is a voice command, and determining a frame related to the user request includes transcribing the voice command into text such that the text can be used to identify the one or more first keywords
In some implementations, the method further includes, prior to receiving a user request, receiving one or more first training sentences. The one or more first training sentences include an example sentence and a corresponding domain of interest or frame identifier. Additionally, matching the one or more first keywords includes matching the one or more first keywords to at least a portion of a first example sentence. The identified set of rules corresponds to the same domain as the first example sentence.
In some implementations, the method further includes, in response to receiving one or more answers to the one or more prompt questions from the user device, determining the one or more values for the one or more attributes. Determining the one or more values for the one or more attributes includes extracting one or more second keywords that are associated with an attribute of the one or more attributes, and extracting one or more third keywords that are associated with possible values of the attribute.
In some implementations, the method further includes, prior to receiving a user request, receiving one or more second training sentences that include a prompt question corresponding to an attribute, a second example sentence, and an indicator. The method also includes comparing the one or more answers to the one or more second training sentences, selecting a sentence of the one or more second training sentences that is most similar to the one or more answers, and recording a value corresponding to the attribute based on at least an indicator corresponding to the selected sentence.
In accordance with some implementations, a method to generate knowledge databases corresponding to a plurality of expert systems coupled to a network and associated with a plurality of distinct knowledge domains is performed by one or more computer systems coupled to a network. The one or more computer systems include one or more processors. The method includes, for each respective knowledge domain of the plurality of distinct knowledge domains launching, by a processor of the one or more processors, at least one respective user interface to receive respective inputs from a respective expert system associated with the respective knowledge domain. The at least one respective user interface includes respective attribute input fields and respective rule input fields. The respective inputs include a plurality of attributes that are received via the respective attribute input fields. Each attribute of the plurality of attributes has a name, a type, and a prompt question for inquiring about a value for the each attribute. The respective inputs also include a plurality of rules received via the respective rule input fields. Each rule of the plurality of rules includes one or more conditions and a corresponding response. Each condition of the one or more conditions includes one or more attributes and a value or range of values for each of the one or more attributes. The method further includes forming a respective frame corresponding to the respective knowledge domain using the plurality of attributes and associating the plurality of rules with the respective frame.
In some implementations, the method also includes storing the plurality of attributes in a respective attributes database as the respective frame. In some implementations, the method also includes storing the plurality of rules in a respective rule database associated with the respective frame.
In some implementations, constructing a knowledge database includes a plurality of distinct frames corresponding, respectively, to the plurality of distinct knowledge domains.
In some implementations, the plurality of distinct knowledge domains include a plurality of distinct conversation topics.
In some implementations, the plurality of distinct knowledge domains include a plurality of distinct categories of tasks.
In some implementations, the one or more conditions include a composite condition formed by combining multiple conditions with one or more operands.
In some implementations, constructing a respective frame corresponding to the respective knowledge domain using the plurality of attributes includes storing the plurality of attributes as part of the respective frame.
The terminology used in the description of the invention herein is for the purpose of describing particular implementations only and is not intended to be limiting of the invention. As used in the description of the invention and the appended claims, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term “and/or” as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, elements, components, and/or groups thereof.
The foregoing description, for purpose of explanation, has been described with reference to specific implementations. However, the illustrative discussions above are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The implementations were chosen and described in order to best explain the principles of the invention and its practical applications, to thereby enable others skilled in the art to best utilize the invention and various implementations with various modifications as are suited to the particular use contemplated.
Number | Date | Country | Kind |
---|---|---|---|
201811511713.9 | Dec 2018 | CN | national |
201811511714.3 | Dec 2018 | CN | national |