This invention relates generally to a natural language interface to a database system, and, more specifically, to converting a natural language query to a structured database query in a B2B environment.
With the advent of natural language chatbots, such as SIRI and ALEXA, users are increasingly employing bots to complete routine tasks, such as playing music, checking the weather, etc. Natural language bots are typically used in the consumer space, and they are designed to work with search engines that perform searches based on natural language key words.
Natural language bots can also be useful in business-to-business (B2B) applications. However, B2B systems are driven by heavy-duty data that is powered by complex databases, and chatbots are not designed to interface with such databases. One cannot query such a database using natural language key words due to the variability, complexity, and inherent ambiguity in natural language utterances.
Accessing data in B2B database requires a highly-structured database query language, such as SQL. A typical database query will reference a database object (e.g., a database table), one or more subject fields corresponding to a database object, one or more conditions referencing database fields, and sort/order by criteria. There are no implicit fields, semantic ambiguity, or fuzzy terms in a database query (see description of implicit fields, semantic ambiguity, and fuzzy terms below).
The highly-structured and complex nature of database queries present a challenge for natural language bots. Natural language queries are relatively unstructured and highly variable. For example:
A natural language bot for a B2B application must be able to effectively translate a natural language query to a database query. Known natural language bots use rudimentary natural language processing to parse part of a sentence, which, when used as an interface to a B2B base, results in an incomplete translation of the natural language query to a database query. This leads to incorrect or suboptimal results.
Therefore, there is demand for a system that can effectively translate a natural language query to a database query in B2B applications.
The present disclosure relates to a natural language system for querying a database in a B2B system. Specifically, the present disclosure describes a system, method, and computer program for converting a natural language query to a structured database query.
A computer system (i.e., a natural language bot) receives a user's natural language query for a B2B application. An NLU engine within the system applies an NLU model to the query to identify an intent and entities associated with the query. The NLU engine tags the entities with a type.
The system identifies a database object in a B2B database corresponding to the identified intent. The system also identifies candidate query fields and operands for the query based on the entities associated with the query.
The system evaluate the candidate query fields and operands to identify any subject fields, conditional expressions, record count limit, and ordering/sorting criteria for the query. The system creates a query plan with the results of such evaluation and then generates a database query based on the query plan.
To evaluate the candidate query fields and operands, the system obtains query parameters for the query. The query parameters include specifications for standard and object-specific fields, as well as default fields for operand types.
The system preprocesses or “cleans up” the candidate query fields and operands for further processing. In one embodiments, this comprises removing redundant, trivial, and subsumed candidate query fields and operands from the list of candidate fields and operands being evaluated.
The system then determines if any of the candidate query fields are subject fields based on whether there are any queryable fields between the object name and an interrogative (e.g., “when,” “who,” “what”) or lookup action entity (e.g., “tell me”).
The system matches any candidate query fields that are not subject fields to operands based on the query parameters, operands types, and the location of operands relative to the candidate query fields. Any remaining operands are matched to default fields based on the default fields for the operands specified in the query parameters. The system adds the matched operands and query fields to query plan as parameters for a conditional expression. If the user query does not include explicit order or sorting criteria, the system determines whether to add any implicit ordering or sorting criteria to the query plan.
The present disclosure relates to a natural language system for querying a database in a B2B system. Specifically, the present disclosure describes a system, method, and computer program for converting a natural language query to a structured database query. A structured database query is a query that requires specific syntax and delineates the database object, subject fields of the query, and any conditional parameters. A SQL query is an example of a structured database query. The method is performed by a computer system.
A natural language understanding (NLU) engine within the system applies an NLU model to the query to identify an intent and entities associated with the query (step 120). An NLU model is a set of rules and training data aimed to teach the NLU engine how to classify an incoming user utterance with respect to an intent and entities.
The intent is the action the NLU engine predicts that the user wants to take, for example, to look up a quote or retrieve certain agreements. The NLU model is trained to identify intents based on training queries labeled as corresponding to an intent.
The entities correspond to the parameters of the query. For each entity, the NLU engine outputs a standardized value for the entity and an entity type. The NLU model is trained to map various words and phrases in a natural language query to standardized values for entities. For example, the NLU may be trained to map the words “total value,” “value,” “amounts,” “worth,” “net worth,” and “annual worth” to the entity value “amount.”
Training phrases and words labeled with an entity type are used to train the NLU model to recognize entity types. The entity type associated with an entity allows the system to determine whether the entity is a query field or operand and whether the entity requires further processing for purposes of determining how to handle the entity in a structured database query. In one embodiment, there are entity type tags for the following:
The table below illustrates an example of the tagged entity types and values for the phrase “Who created the top 3 Acme quotes from last year that are over $20k that expire in the next 24 months and when?” in a B2B application that enables users to create quotes and agreements. This query is referred to herein as “the example query.” The intent associated with the example query is to look up a quote (e.g., “lookupQuote”).
The system identifies the database object being queried based on the intent with which the query was labeled by the NLU engine (step 130). For example, if the NLU engine determines that the intent for the phrase, “show me last 3 agreements over $200k from last year that I created ending in Q4 2020,” is “lookup Agreement,” then the system determines that “Agreement” is the database object. Likewise, if the NLU determines that the intent for the phrase “Who created the top 3 Acme quotes from last year that are over $20k that expire in the next 24 months and when?” is “lookup Quote,” then the system determines that “Quote” is the database object.
As state above, each entity identified by the NLU engine is tagged with an entity type. The system identifies candidate query fields and operands for further processing based on an entity's type tag (step 140). Standard fields, object-specific fields, and filter modifiers are categorized as candidate query fields. Filter operation and date/time/currency entities are categorized as candidate operands. In certain embodiments, entities tagged with record count or an object-specific contextual entities are treated as operands for purposes of step 140. In other embodiments, the system disregards contextual entities as this stage and adds them to the query plan at a later stage (e.g., as part of step 160)
The tables below illustrates an example of how the system would identify candidate query fields and operands from the tagged entities in Table 1:
Candidate Query Fields
Candidate Operands
In Table 2a, the candidate query fields are the entities from Table 1 that are tagged as a standard field (“created”), an object-specific field (“expire”), or a filter modifier (“top” and “last”). In Table 3a above, the operands are the entities from Table 1 that are tagged as filter operations (“from,” “over”), currency (“$20k), or a date range (“from last year,” “in the next 24 months”). In addition, the entity “3” is a contextual entity (i.e., record count) that the system treats as an operand in this example embodiment.
The system evaluates the candidate query fields and operands to identify any subject fields, conditional parameters, record count limit, and ordering/sorting criteria for the query (step 150). This includes identifying and matching the query fields and operands corresponding to the conditional parameters of the query. This step is described in more details with respect to
The system creates a query plan with the results of such evaluation (step 160). Any contextual entities not processed in step 150 are added to the conditional parameters of the query plan. The system then creates a database query based on the query plan (step 170). In creating the database query, the system maps query fields to actual database fields using a simple mapping of query field values to database fields. For each query field in the query plan, it then creates the applicable expression/statement within the database query using the applicable database field and the corresponding operand and operator associated with the query field. For each query field corresponding to a conditional expression, the system creates a simple condition for the database query using the mapped database field and corresponding operator and operand. All the simple conditions are combined for the actual conditional expression in the database query (e.g., all the where clause conditions are ANDed with each other to form an actual WHERE clause).
2.1 Obtaining Query Parameters
In order to know how to process the candidate query fields and operands, the system obtains query parameters, including object-specific query parameters (step 210). The query parameters include specifications for standard fields and object-specific fields, as well as default fields for operand types. The specifications for a standard or object-specific query field may include the operand type accepted by the query field, whether the query field is a default field for the operand type, any matching rules for the query field (e.g., “match only to operands to the right of the query field”), and whether the query field is ambiguous. For example, the query parameters would specify that the field “validUntilDate” takes a date operand.
An ambiguous query field is one in which the entity associated with the field can map to two or more fields. For example, the entity “created” could be associated with the “createdBy” field in a database or the “createdDate” field in the database, depending on whether the user is referring to a person or a date.
In one embodiment, the query parameters are determined by a developer of the system, and the system accesses the applicable query parameters from a list or library of query parameters. In one embodiment, query parameters are defined for each database object.
2.2 Preprocessing Fields and Operands
The system preprocesses or “cleans up” the candidate query fields and operands to prepare them for further processing (step 220). For example, the system may preprocess the fields and operands by removing any redundant, trivial, and subsumed query fields and operands from the candidate query fields and operands. In one embodiment, this comprises the following steps:
For example, in the example query above, the word “from” is subsumed by the phrase “from last year.” The word “last” is also subsumed by the phrase “from last year.” Therefore, the system remove “from” from the list of operands to be processed and “last” from the list of query fields to be processed. Therefore, for purposes of the example query, this leaves the following query fields and operands for further processing:
Candidate Query Fields
Operands
2.3 Identify any Subject Fields of the Query Based on Interrogatives
The system determines if any of the candidate query fields are “subject fields” (step 230). Subject fields are fields from which values will be returned to the user. For example, in a SQL query the subject fields are the fields in a SELECT statement. In one embodiment, the system determines if any of the candidate query fields are subject fields based on whether there are any queryable fields between an interrogative or lookup action entity in the query. A method for identifying the fields that are the subject of the query are described in more detail below with respect to
In the example query above, the system identifies both “createdBy” and “createdDate” as being subject fields due to the interrogatives “who” and “when” in the natural language query (see discussion related to
Subject Fields
For some user queries, subject query fields will not be identified at this stage and will not appear in the query plan, such as the case when the user is asking for instances of a database object. For instance, in the query, “show me the last 5 agreements I created,” the user is asking the system for electronic copies of certain agreements. Therefore, among candidate query fields and operands for this query, there are no subject fields. Instead, the system inserts the applicable default subject field(s) when creating the database query from the query plan. In other words, if the user does not explicitly reference a subject field, the system retrieves a configure list of default field(s) based on the database object.
2.4 Matching Remaining Query Fields and Operands
At this point the database object and the subject fields (if any) have been identified. For a SQL query this means that the database object for the FROM statement and the database fields (if any) for the SELECT statement have been identified. Any remaining query fields and operands relate to other statements in a database query, such as a conditional expression (e.g., a WHERE statement), an ordering/sorting clause (e.g., an ORDERBY clause), and a limit on the number of records returned (e.g., a LIMIT statement).
In order to process the remaining query fields and operands for such clauses/statements, the system matches the remaining candidate query fields to operands based on the query parameters, the operand type of the operands (where the operand type of an operand is the entity type with which the operand is tagged by the NLU engine), and the location of the operands relative to the query fields (step 240). The query parameters are used to identify the operand type accepted by a query field, as well any specific matching rules pertaining to a query field or operand (e.g., certain fields may only match with operands appearing after the query field) An implementation of this step is described in more detail with respect to
In the example query, the below candidate query fields are remaining after the subject fields have been removed:
Query Fields
For the reasons set forth with respect to
Matched Query Fields and Operands
This would leave the following operands unmatched after step 240:
Unmatched Operands
If any unmatched operands are remaining after step 240, than the fields corresponding to these operands in the user's query must be implicit. The query parameters specify default fields for operand types, and the system uses the query parameters to pair unmatched operands with default fields (step 250). Filter operation operands are associated with the closest following operand-query field pair and used to determine the operator associated with the pair. In the example query, the filter modifier “over” is used to apply the “greater than” operator to the match between “$20k” and “net price.”
In the example user query, the unmatched operands in Table 7 would be matched as follow:
Default Field-Operand Matches
“CreatedDate” is the default query field corresponding to “from last year.” “NetPrice” is the default query corresponding to “$20k.”
The summary of the query field-operand matches from the example query are as follows:
All Query Field-Operand Matches
The system adds the matched operands and query fields to the query plan as conditional parameters for a query (e.g., for the WHERE clause) (step 260). In one embodiment, contextual entities are added to the conditional parameters of a query, even if they are not part of the matching process above. For example, “recordName=Acme” may be added to the conditional parameters for the example query in the query plan. The contextual entities may be added to the conditional parameters in making the query plan or in step 170 when the system generates a query based on the query plan.
The system associates certain filter modifiers, such as “top,” or “last,” with a record count limit, and adds the record count limit to the query plan. They may be paired with a default record count operand or an explicit record count contextual entity (e.g., “3” in the example above).
2.5 Identify any Implicit Sorting or Ordering Parameters for the Query
If the query entities to do not include an explicit filter operand for ordering or sorting in the query results, the system determines if there are any implicit ordering and sorting criteria (step 260). In one embodiment, this comprise the following:
The system adds any identified ordering/sorting criteria to the query plan.
2.6 Example Query Plan
Below is a summary of the query plan for the example query.
Query Plan
Intent
LookupQuote
Subject Fields
Conditional Statement
Sorting and Ordering
sortBy netPrice
Record Count
3.1 Pass #1: Sequential Match
In matching query fields to operands, the most straight forward matches are when an operand of the correct type immediately follow a query field (e.g., “the top 3,” or “expires in the next 24 months”). This is the idea behind the sequential match pass.
For each query field, the system identifies any operands for the query field within the range parameters for a sequential match (step 310). The range parameters for a sequential match are: (1) the operand appears after the query field but before any subsequent query field, AND (2) the operand satisfies the specifications for the query field as set forth in the query parameters. For example, the query parameters will specify the operand type accepted by the query field and may specify certain matching rules (e.g., “match only to operands after the query field”).
For each query field with at least one operand within the range parameters for the first pass, the system assigns the query field to the closest operand within the range parameters (step 320). The system then marks any unmatched query fields in the first pass for processing in the second pass (step 325).
In the example query, the sequential match rules would result in the query fields “top” and “validUntilDate” being matched as follows:
Since the system was able to match both remaining query fields in the first pass, the system would not need to proceed with the second and third passes in the case of the example query. However, there are many queries for which the second and third passes are applicable.
3.2 Pass #2: Left Shift
For each unmatched query field after the first pass, the system ranks all operands that satisfy the specification for the query field (step 330). In one embodiment, the system ranks the operands by running a typical sorting algorithm on the operands with a comparator comparing two operands at a time, wherein the comparator works as follows:
In this embodiment, an operand that is on the “left” of a query field, has a lower start index than the query field, and an operand that is on the “right” of a query field has a higher start index than the query field.
The system matches each unmatched query field to the highest-ranked operand that satisfies the specifications for the query field and that is not yet claimed by another query field, prioritizing query fields from left to right (i.e., prioritizing query fields by lowest start index) (step 340). Any query fields unmatched after the second pass are marked for processing in the third pass (step 350).
3.3 Pass #3: Right Shift
For each unmatched query field after the second pass, the system ranks all operands that satisfy the specification for the query field in accordance with the sorting algorithm described above (step 360).
The system matches each unmatched query field to the highest-ranked operand that satisfies the specifications for the query field and is not yet claimed by another query field, prioritizing query fields from right to left (step 370).
If one or more of the entities are tagged as interrogatives or a lookup action, the system identifies any queryable query fields (e.g., standard fields, object-specific fields) between the interrogative/lookup action and an entity corresponding to the database object (e.g., an entity tagged “object name”), and selects all such field(s) as subject field(s) (step 430). If a subject field is an ambiguous field and there is an interrogative entity, the system resolves any ambiguities based on the value of the interrogative (steps 440, 450). For example, if the subject query field is “created,” which may have the value createdBy or createdDate, and the interrogative before the subject query field is “who,” the ambiguity will be resolved as “createdBy.” Likewise, if the interrogative before the field is “when” the ambiguity will be resolved as “createdDate.” In one embodiment, “what” is also resolved in favor of date fields. If there is a second interrogative after the database object (and there is no second database object), then the ambiguity will be resolved in favor of both ambiguous field values.
For instance, take the example query: “Who created the top 3 Acme quotes from last year that are over $20k that expire in the next 24 months and when?” As discussed above, the query has following candidate query fields:
There are two interrogatives in the query, “who,” and “when.” “Created” and “top” are the two query fields between the interrogative “who” and the object “quotes.” Since “created” is of type “standardField”, which is a queryable field, the system identifies “created” as the subject field. “Top” is of the type “filterModifier,” which is not a queryable field and, therefore, cannot be a subject field.
“Created” is an ambiguous field that can have value “createdBy” or “createdDate.” Because of the interrogative “who,” the system will resolve this ambiguity in favor or “createdBy.” However, because there are no query fields or database objects after the interrogative “when,” the system will assume that this interrogative also corresponds to “created” and also add “createdDate” as a subject field.
Example system 500 includes a NLU Interface 510, which enables a user to input a natural language query to the system. An NLU Engine 520 applies an NLU model 525 to a user's natural language query, and Query Planner Module 530 creates a query plan in accordance with the method of
Those skilled in art will appreciate the system 500 may include additional modules, not relevant to the methods described herein, for providing B2B application functionality.
In one embodiment, system 500 is any system that is backed by or uses a database, such a customer relationship management (CRM) system or a quote-to-cash system. Quote-to-cash systems integrate and automate end-to-end sell-side processes, from creating a quote for a prospective customer to collecting revenue and managing renewals. For example, quote-to-cash systems facilitate sales transactions by enabling users to configure products, price products, generate quotes, provide product recommendations, create and sign contracts, manage billings, and perform other sell-side business functions. An example of a quote-to-cash system is the APTTUS quote-to-cash suite of products running on the SALESFORCE platform. In one embodiment, a quote-to-cash system is any system that performs at least one or more of the following business functions: (1) configure, price, and quote; (2) contract generation and management; (3) revenue management (e.g., billing and financial reporting); and (4) product recommendations (e.g., identifying upsell and cross sell opportunities) and other machine learning recommendations to optimize the sales process.
The methods described herein are embodied in software and performed by one or more computer systems (each comprising one or more computing devices) executing the software. A person skilled in the art would understand that a computer system has one or more memory units, disks, or other physical, computer-readable storage media for storing software instructions, as well as one or more processors for executing the software instructions.
As will be understood by those familiar with the art, the invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. Accordingly, the above disclosure is intended to be illustrative, but not limiting, of the scope of the invention, which is set forth in the following claims.
Number | Name | Date | Kind |
---|---|---|---|
5960407 | Vivona | Sep 1999 | A |
6473084 | Phillips et al. | Oct 2002 | B1 |
7328177 | Lin-Hendel | Feb 2008 | B1 |
7574381 | Lin-Hendel | Aug 2009 | B1 |
7725358 | Brown et al. | May 2010 | B1 |
8498954 | Malov et al. | Jul 2013 | B2 |
8644842 | Arrasvuori et al. | Feb 2014 | B2 |
9519907 | Carter, III et al. | Dec 2016 | B2 |
10289261 | Aggarwal et al. | May 2019 | B2 |
10521491 | Krappe et al. | Dec 2019 | B2 |
10621640 | Krappe et al. | Apr 2020 | B2 |
10783575 | Krappe et al. | Sep 2020 | B1 |
11232508 | Krappe | Jan 2022 | B2 |
11455373 | Krappe et al. | Sep 2022 | B2 |
20020040332 | Maari et al. | Apr 2002 | A1 |
20030033240 | Balson et al. | Feb 2003 | A1 |
20060100912 | Kumar et al. | May 2006 | A1 |
20060136470 | Dettinger et al. | Jun 2006 | A1 |
20070016536 | Mirlas et al. | Jan 2007 | A1 |
20070039209 | White et al. | Feb 2007 | A1 |
20070087756 | Hoffberg | Apr 2007 | A1 |
20070162373 | Kongtcheu | Jul 2007 | A1 |
20080046355 | Lo | Feb 2008 | A1 |
20080091551 | Olheiser et al. | Apr 2008 | A1 |
20090048937 | Contreras et al. | Feb 2009 | A1 |
20090222319 | Cao et al. | Sep 2009 | A1 |
20090234710 | Belgaied Hassine et al. | Sep 2009 | A1 |
20090299974 | Kataoka et al. | Dec 2009 | A1 |
20090327166 | Carter, III et al. | Dec 2009 | A1 |
20100179859 | Davis et al. | Jul 2010 | A1 |
20100262478 | Bamborough et al. | Oct 2010 | A1 |
20100306120 | Ciptawilangga | Dec 2010 | A1 |
20110246136 | Haratsch et al. | Oct 2011 | A1 |
20110246434 | Cheenath | Oct 2011 | A1 |
20120173384 | Herrmann et al. | Jul 2012 | A1 |
20120221410 | Bennett et al. | Aug 2012 | A1 |
20120246035 | Cross et al. | Sep 2012 | A1 |
20120254092 | Malov et al. | Oct 2012 | A1 |
20120259801 | Ji et al. | Oct 2012 | A1 |
20130103391 | Millmore et al. | Apr 2013 | A1 |
20130132273 | Stiege et al. | May 2013 | A1 |
20140025529 | Honeycutt et al. | Jan 2014 | A1 |
20140136443 | Kinsey, II et al. | May 2014 | A1 |
20140149273 | Angell et al. | May 2014 | A1 |
20150120526 | Peterffy et al. | Apr 2015 | A1 |
20150142704 | London | May 2015 | A1 |
20150309705 | Keeler et al. | Oct 2015 | A1 |
20150348551 | Gruber et al. | Dec 2015 | A1 |
20150378156 | Kuehne | Dec 2015 | A1 |
20160034923 | Majumdar et al. | Feb 2016 | A1 |
20170004588 | Isaacson et al. | Jan 2017 | A1 |
20170068670 | Orr et al. | Mar 2017 | A1 |
20170124176 | Beznos et al. | May 2017 | A1 |
20170124655 | Crabtree et al. | May 2017 | A1 |
20170235732 | Williams | Aug 2017 | A1 |
20170243107 | Jolley et al. | Aug 2017 | A1 |
20170351241 | Bowers et al. | Dec 2017 | A1 |
20170358024 | Mattingly et al. | Dec 2017 | A1 |
20180005208 | Aggarwal et al. | Jan 2018 | A1 |
20180096406 | Krappe et al. | Apr 2018 | A1 |
20180218032 | Wong et al. | Aug 2018 | A1 |
20180285595 | Jessen | Oct 2018 | A1 |
20180293640 | Krappe | Oct 2018 | A1 |
20180336247 | Ignatyev | Nov 2018 | A1 |
20180349324 | Krappe et al. | Dec 2018 | A1 |
20180349377 | Verma | Dec 2018 | A1 |
20190258632 | Pal | Aug 2019 | A1 |
20190370388 | Li et al. | Dec 2019 | A1 |
20200057946 | Singaraju et al. | Feb 2020 | A1 |
20200065354 | Krappe et al. | Feb 2020 | A1 |
20210089587 | Gupta | Mar 2021 | A1 |
20210090575 | Mahmood et al. | Mar 2021 | A1 |
20220148071 | Krappe | May 2022 | A1 |
Number | Date | Country |
---|---|---|
2742395 | Jan 2019 | CA |
1315705 | Mar 2001 | CN |
106910091 | Jun 2017 | CN |
2650776 | Oct 2013 | EP |
3073421 | Sep 2016 | EP |
2001290977 | Oct 2001 | JP |
2017146909 | Aug 2017 | JP |
0052605 | Sep 2000 | WO |
03003146 | Jan 2003 | WO |
2015106353 | Jul 2015 | WO |
Entry |
---|
Oracle: Automating the Quote-to-Cash Process: An Oracle White Paper, Jun. 2009, pp. 1-19, 2009. |
McCormick, M., “What is Quote to Cash?” Jan. 20, 2016, Blog, BlackCurve, pp. 1-8, 2016. |
Microsoft/APTTUS: Ultimate Guide to Quote-To-Cash for Microsoft Customers, Web Archives, Oct. 1, 2015, pp. 1-28. |
Morelli et al., “IBM SPSS Predictive Analytics: Optimizing Decisions at the point of impact”, pp. 1-59, 2010. |
Wainewright, Phil, “Salesforce, Microsoft quote-to-cash partner Apttus raises $88m”, Sep. 29, 2016, pp. 1-7. |
Wainewright, Phil, Apttus Applies Azure Machine Learning to Quote-to-Cash, Apr. 3, 2016, pp. 1-5. |
Wireless News: Banglalink Keeps Mobile Subscribers Using Predictive Analytics with KXEN, Close-Up Media, Inc., pp. 1-2, Oct. 5, 2013. |
Riggins, J., “Interview Quote-to-Cash Pioneers Apttus Links Leads to Revenue”, May 21, 2014, pp. 1-7. |
Xie, Qitao et al., “Chatbot Application on Cyrptocurrency”, 2019 IEEE Conference on Computational Intelligence for Financial Engineering & Economics, pp. 1-8, 2019. |
Spedicato, G., et al., Machine Learning Methods to Perform Pricing Optimization. A Comparison with Standard GLMs, Dec. 2018, pp. 1-21. |