The present disclosure relates generally to systems and methods for data analytics platforms, and more specifically to automating data analytics platforms.
Data analytics is a vital tool for many businesses and entities, allowing these organizations to quantify and summarize stored data. While automated data analytics systems have been implemented to provide data stored in a database in response to structured queries, such systems typically require users to be familiar with a specific query syntax to obtain required information. The query syntax is often complex and requires substantial time to learn and use effectively. Systems that provide previously generated queries to users in a human readable format are often inflexible as to which stored data is accessible and how the data is presented. Accordingly, there is a need for an approach to obtaining stored data that is adaptable to the needs of users.
Without limiting the scope of the appended claims, after considering this disclosure, and particularly after considering the section entitled “Detailed Description,” one will understand how the parameters of various embodiments are used to improve generation of database queries and corresponding sample requests.
The present disclosure addresses, among others, these needs in the art for systems and methods for training an agent, using a data model that comprises entities that relate to a data subset of one or more databases, to respond to natural language queries provided by a user. In this way, an agent is enabled to respond to a request (e.g., a flexibly structured natural language request for information that corresponds to stored data). For example, the agent generates a plurality of sample requests using entities (e.g., metrics, dimensions, and/or filters) stored by the one or more databases to efficiently determine sample requests (e.g., user input natural language queries) and to obtain data using database queries that correspond to the sample requests. Accordingly, in accordance with the present disclosure, a user is enabled to quickly set up an agent to enable access to a data source via natural language queries.
Accordingly, various aspects of the present disclosure provide systems and methods for training an agent. In some embodiments, a system includes a first computer system that has one or more processing units and a memory. The memory is coupled to at least one of the one or more processing units and includes one or more instructions retrieving a first data model including a first set of one or more entities. A respective entity of the first set of one or more entities relates to a data subset of a first set of one or more databases and corresponds to at least one of a metric, a dimension, or a filter. The memory further includes instructions for collecting data that is stored on the first set of one or more databases. The memory further includes instructions for generating, based on the first data model, a training set for training a first agent. The first agent is configured to respond to user input queries formulated in natural language. The training set for training the first agent includes a plurality of sample requests and a plurality of database queries for the one or more databases. At least one respective database query of the plurality of database queries corresponds to at least one respective sample request of the plurality of sample requests.
In some embodiments, the memory further includes instructions for receiving, by the first agent, from a remote user device, a user query. The user query corresponds to data on the first set of one or more databases. Additionally, in some embodiments, the memory further includes instructions for determining, by the first agent, a first sample request of the plurality of sample requests that corresponds to the user query. In some embodiments, the memory further includes instructions for transmitting, from the first agent, to the first set of one or more databases, a first database query that corresponds to the first sample request. In some embodiments, the memory further includes instructions for transmitting, to the user device, a response that corresponds to the first database query.
In some embodiments, the memory further includes instructions for altering the first data model.
In some embodiments, altering the first data model occurs in response to receiving an indication from the user device of a requested alteration to the first data model.
In some embodiments, altering the first data model includes determining, by the first computer system, a suggested alteration to the first data model. Once determined, the suggested alteration to the first data model is transmitted to the user device for display. An indication is received from the user device of a verification of the suggested alteration to the first data model.
In some embodiments, the information corresponding to the suggested alteration of the first data model includes at least a portion of the first data model.
In some embodiments, the information corresponding to the suggested alteration of the first data model includes at least a portion of the data subset of the first set of one or more databases.
In some embodiments, altering the first data model includes adding one or more relations between the domains of the first data model.
In some embodiments, altering the first data model includes modifying one or more identifiers associated with a respective entity of the first data model.
In some embodiments, modifying one or more identifiers of the respective entity of the first data model includes substituting a synonym of an identifier associated with the respective entity of the first data model for the identifier associated with the respective entity of the first data model.
In some embodiments, the synonym is selected from a list of synonyms for the one or more identifiers of the one or more entities.
In some embodiments, generating the training set for training the first agent includes generating one or more sample requests based on the altered first data model.
In some embodiments, the first data model is retrieved in accordance with a defined scope of access to the one or more databases.
In some embodiments, generating the training set for training the first agent includes generating at least one sample request of the plurality of sample requests by replacing a keyword in a template request with a respective value from a set of values of the data subset of the first set of one or more databases.
In some embodiments, the training set for training the first agent includes at least one sample request that is generated based on one or more queries received from the user device.
In some embodiments, generating the training set for the training the first agent includes accessing a query log of the user device, analyzing at least one query of the query log, and generating at least one sample request of the plurality of sample requests based on analyzing the at least one query of the query log.
In some embodiments, generating the plurality of sample requests includes replacing a keyword in a type of query of the query log.
In some embodiments, the memory further includes instructions for retrieving a second data model including a second set of one or more entities. A respective entity of the second set of one or more entities relates to a data subset of a second set of one or more databases. The memory further includes instructions for generating, based on the second data model, a training set for training a second agent. The memory further includes instructions for receiving a first user input query. The memory further includes instructions for determining, using agent selection criteria, a respective agent of a plurality of agents including the first agent and the second agent for providing a response to the first user input query.
In some embodiments, training the agent includes incorporating feedback provided by one or more users of the second computer system.
In some embodiments, training the agent includes utilizing a named-entity recognition extraction to alter an entity.
In some embodiments, a method includes, at a first computer system, retrieving a first data model including a first set of one or more entities. A respective entity of the first set of one or more entities relates to a data subset of a first set of one or more databases and corresponds to at least one of a metric, a dimension, or a filter. The method further includes generating, based on the first data model, a training set for training a first agent. The first agent is configured to respond to user input queries formulated in natural language. The training set for training the first agent includes a plurality of sample requests and a plurality of database queries for the one or more databases. At least one respective database query of the plurality of database queries corresponds to at least one respective sample request of the plurality of sample requests.
In some embodiments, a non-transitory computer readable storage medium includes one or more programs for execution by one or more processors of a computer system. The one or more programs include instructions for retrieving a first data model including a first set of one or more entities. A respective entity of the first set of one or more entities relates to a data subset of a first set of one or more databases and corresponds to at least one of a metric, a dimension, or a filter. The one or more programs further include instructions for generating, based on the first data model, a training set for training a first agent. The first agent is configured to respond to user input queries formulated in natural language. The training set for training the first agent includes a plurality of sample requests, and a plurality of database queries for the one or more databases. At least one respective database query of the plurality of database queries corresponds to at least one respective sample request of the plurality of sample requests
So that the present disclosure can be understood in greater detail, a more particular description may be had by reference to the features of various embodiments, some of which are illustrated in the appended drawings. The appended drawings, however, merely illustrate pertinent features of the present disclosure and are therefore not to be considered limiting, for the description may admit to other effective features.
Like reference numerals refer to corresponding parts throughout the several views of the drawings.
Numerous details are described herein in order to provide a thorough understanding of the example embodiments illustrated in the accompanying drawings. However, some embodiments may be practiced without many of the specific details, and the scope of the claims is only limited by those features and parameters specifically recited in the claims. Furthermore, well-known processes, components, and materials have not been described in exhaustive detail so as not to unnecessarily obscure pertinent aspects of the embodiments described herein.
In some embodiments, systems and methods for automated data analytics platforms include retrieving a data model. The data model includes a set of one or more entities that describe an aspect of data of the data model. A respective entity of the set of one or more entities relates to a data subset of a set of one or more databases and corresponds to at least one of a metric, a dimension, or a filter of the data subset. Accordingly, a training set is generated based on the data model that is used to train a first agent. The first agent is configured to respond to a variety of user input queries that are formulated in natural language. A training set includes a plurality of sample requests and a plurality of database queries for the one or more databases of the data model. At least one of the database queries corresponds to at least one of the respective sample requests. This training set enables the agent to respond to user queries requesting information that is not expressly set forth in the one or more databases (e.g., a user query for profit in accordance with a determination that available data sales consist of sales and expenses). A sample request is, for example, a natural language user input query (e.g., a user input query of “What was is the average temperature on October 2nd in California for the past two decades?”). A database query is, for example, a query run on the one or more databases to obtain the requested information (e.g., a request formatted in accordance with a query language, such as SQL).
For example, a natural language user input is received by an agent from a remote user device. The agent determines at least one database query that corresponds to the natural language user input (e.g., by determining whether the natural language user input corresponds to one or more previously generated sample requests). A database query is transmitted from the agent to the first database. The agent determines a response to the user input query based on data returned from the database in response to the database query. Training an agent (e.g., by generating a training set for use by one or more agents) based on a retrieved data model allows responses to be generated to user queries with increased efficiency (e.g., in comparison with systems that require a user to provide input for establishing each query in a set of natural language queries that may be processed by a system). Training the agent (e.g., to be responsive to particular types of queries that correspond to a particular database, set of databases, or a common set of queries for an industry) increases the efficiency with which a system responds to user queries (e.g., by producing training data that is available to the agent prior to receiving a query, in contrast to systems that must parse natural language queries and determine appropriate corresponding database queries at the time user input is received). Training an agent as described herein allows responses to natural language queries to be provided with increased speed and reduced processing.
A detailed description of a system 48 for creating automated data analytics platforms in accordance with the present disclosure is described in conjunction with
Referring to
It will be recognized that other topologies of the system 48 other than the one depicted in
In some embodiments, the memory 102 of the agent system 100 for facilitating data analytics stores:
As described above, the agent system 100 includes one or more agents 112. For example, in some embodiments, an agent 112 is associated with (e.g., trained for) a respective database 200 or set of databases (e.g., collects data from and/or generates one or more sample requests (e.g., sample requests 142) and/or database queries (e.g., database query 152) for the respective databases). In some embodiments, a first agent 112-1 is trained based on data associated with a first database 200 (e.g., a first data model). In some embodiments, a first agent 112-1 is trained based on data associated with a first database (e.g., 200-1) and is also trained based on data associated with a second database (e.g., 200-2) (e.g., a first agent 112-1 is trained based on a first training set of a first data model 120-1 and a second training set of a second data model 120-2). In some embodiments, a first agent 112-1 is trained based on data associated with a first database (e.g., 200-1) and a second agent 112-2 is trained based on data associated with a second database (e.g., 200-2) (e.g., an agent is trained independently). In some embodiments, an agent (e.g., agent 112-1) is a chat-bot accessible to a user through the Internet (e.g., via an application executed by an Internet browser running on a user device and/or an application executed by the user device, such as an instant messaging application or dedicated query application). For example, in some embodiments, the agent provides automated responses to user input queries. Agent 112 converses with users (e.g., using natural language queries and responses). For example, an agent 112 receives a request for information from a user and transmits a result of the request (e.g., a result of a database query) to the user (e.g., by displaying the result at a user device associated with the respective user). In some embodiments, the agent system 100 generates training sets for training respective agents. A training set includes one or more sample requests 142 (e.g., natural language query sentences) based on data model 120. Agent system 100 is trained to generate one or more database queries 152 that correspond to the generated sample requests 142. In some embodiments, a respective agent 112 is associated with a particular subject matter (e.g., a particular database, a particular industry, a particular organization, etc.) in order to make information accessible to users through commonly performed searches and/or common expressions used by members of the particular organization and/or industry. For instance, in some embodiments, an agent is associated with a travel industry and becomes an expert and responding to sample requests related to the travel industry.
Agent 112 includes a database information store 114 that stores data and/or information (e.g., database details 116) related to database 200 that is associated with the corresponding agent. In some embodiments, this data and/or information of the database details 116 include the local database cache 118, which replicates at least a portion of data stored by the corresponding database 200. In some embodiments, this data and/or information of the database details 116 also include the data model 120, which is, for example, a schema or other representation of the corresponding database 200. In some embodiments, the data model 120 includes entities 210 of the data stored on the corresponding database 200 (e.g., as explained below in more detail). In some embodiments, data model 120 includes, for example, tables, foreign keys, etc. that indicate a structure of the data in the database 200 and/or one or more relations between tables of the database. In some embodiments, a data model 120 is converted into a multidimensional data model and stored by the agent system 100. In some embodiments, the data model 120 is collected and/or identified using one or more rules (e.g., rules 164 of
In some embodiments, and as described above, agent 112 is associated with a set of one or more databases. In some embodiments, a set of databases is formed according to a subject matter of the databases (e.g., databases associated with the travel industry form a first set of databases). In some embodiments, a set of databases is formed according to ownership and/or access to the respective databases (e.g., databases owned by a particular company form a set of databases). In some embodiments, a set of databases is formed according to a user definition (e.g., a user selects which databases form a particular set). Accordingly, in some embodiments, a first agent 112-1 creates and/or identifies a first data model 120 that corresponds to a first set of one or more databases 200. In some embodiments, the first data model 120-1 and/or a first training set generated using the first data model is applied to a second agent 112-2 that is associated with a second set of one or more databases 200, which allows for the second agent to benefit from information already gained through the first training set. In some embodiments, the second agent 112-2 (e.g., trained using the first data model 120-1 and/or a first set of training data generated using the first data model) creates and/or identifies a second data model 120-2 and/or a second training set using the second data model that correspond to a second set of one or more databases 200.
In some embodiments, agent 112 includes database access information 122 that enables the respective agent to access corresponding databases 200 (e.g., by providing credentials and privileges). In some embodiments, the database access information 122 is provided by a respective user of the corresponding database 200, and/or accessed through data stored in the corresponding database. In some embodiments, the database access information 122 includes a username and/or password associated with the corresponding database 200. For example, the user name and/or password is associated with a database 200 in a database management system (e.g., Postgres, MySQL, Greenplum, etc.). In some embodiments, the database access information 122 includes an access token and a refresh token that are collected from the corresponding database 200 (e.g., an API-based server such as Jira or SFDC). In some embodiments, use of these tokens require an authorization process (e.g., OAuth 2, etc.). In some embodiments, the database access information 122 includes user information and/or information about user access rights (e.g., control access to the corresponding database 200). The database access information 112 allows agent 112 to access respective databases 200 without human intervention in accordance with a determination that a user has provided proper credentials.
In some embodiments, the database details 116 include database query log 128 (e.g., a record of queries provided to the corresponding database 200). In some embodiments, the queries of the database query log 128 include one or more queries that were communicated from various user devices 300 to the corresponding database 200. In some embodiments, the database query log 128 is analyzed by the agent 112 for generating and/or augmenting a training set (e.g., for generation of one or more sample requests 142 and/or database queries). For example, in some embodiments, a database query log 128 is accessed by the agent 112 to identify and/or extrapolate one or more entities of the data model associated with the database.
In some embodiments, the agent 112 includes a skill module 130, which stores one or more skills 132 (e.g., a trained skill of the agent 112). In some embodiments, a skill 132 corresponds to the data model 120 of the database 200. For example, in some embodiments, a skill 132 includes a defined set of one or more entities (e.g., domains, metrics (e.g., quantifiable numbers such as revenue, a number of transactions or sales count, a number of tickets, a commission earned, a number of events, etc.), and/or filters), dimensions (e.g., a column of a table of a database and/or a result or set of results of an operation performed on one or more elements of a table), and/or synonyms. In some embodiments, the skills 132 are used to generate one or more sample requests 142. For example, in some embodiments, the skills 132 include the above described metrics (e.g., revenue as a metric). Accordingly, in some embodiments, one or more sample requests 142 is generated to account for each permutation of request that includes revenue as a metric (e.g., “What is the revenue for @dimension?” generates sample request permutations for “What is the revenue for our biggest buyer?”, “What is the revenue for that buyer in California, Oregon, and Washington?”, etc.). The skills 132 enable the agent 112 to determine sample requests 142 based on predetermined actions that are created by a user device 300 and/or the agent 112, such as the bookmark of the database 200 (e.g., filters and/or data alterations defined in the bookmark). For example, in some embodiments, the skills 132 provide alterations to the data model 120. In some embodiments, one or more skills 132 are created and/or altered by a user of the system (e.g., via input at a respective user device 300 as described below with reference to at least
In some embodiments, agent 112 includes sample request store 140 that stores one or more sample requests 142. In some embodiments, sample requests 142 are based on information from the data model 120 and/or the data of the database 200. For example, in some embodiments, a sample request 142-1 is based on one or more names of data fields (e.g., “What is X of Y,” such that all permutations of inputs of for data field X and/or data field Y are considered by the sample request 142-1). In some embodiments, a sample request 142 is a natural language query sentence (e.g., “What was our profit for beer in the third quarter?”). In some embodiments, a sample request 142 is associated with one or more other sample requests. For example, in some embodiments, if a sample request 142 describes “What is @metric for @dimension_1?”, an associated sample request describes “How about in @dimension_2?”. This allows for the user to communicate with agent 112 as if holding a natural conversation, instead of needing to input a full search request (e.g., instead of “How about in @dimension_2?”, the user inputs “What is @metric for @dimension_2?”). In some embodiments, the sample requests 142 are used to train a corresponding agent 112 based on a particular database and/or set of databases. Training is accomplished by generating sample requests 142 that are interpolated for use in another database 200 and/or set of databases.
The agent 112 also includes the database query module 150, which includes one or more database queries 152. A database query 152 is a structured query for requesting information and/or data from a database 200. For example, a sample request 142 is a natural language sentence (e.g., “Who are the employees in the San Francisco office?”) and the corresponding database query is a data construct in a query language (e.g., SELECT*FROM Employees WHERE City=‘San Francisco’). In some embodiments, a database query 152 corresponds to one or more sample requests 142. For example, multiple sample requests (e.g., “Who are the employees in the San Francisco office?” and “Who are the staff in the San Francisco office?”) correspond to a single database query (e.g., SELECT*FROM Employees WHERE City=‘San Francisco’). In some embodiments, a sample request 142 corresponds to one or more database queries 152. In some embodiments, the database query module 150 stores one or more queries that are extracted and/or extrapolated from the database query log 124. In some embodiments, the database query module 150 stores one or more database queries 152 that are extracted and/or extrapolated from the corresponding database query log 128, from another database query log (e.g., a second database in a set of databases associated with the corresponding database), from one or more user devices 300, or a combination thereof.
In some embodiments, the agent 112 includes a data identifier module 160, which stores one or more rules 164. In some embodiments, one or more rules 164 include at least one sub-rule 166. For example, a rule 164 instructs an agent 112 to determine a gross profit from provided revenue and expense data fields (e.g., gross profit is revenue minus expense) and a sub-rule 166 of this rule includes an instruction to extrapolate a gross profit margin (e.g., gross profit margin is a ratio of gross profit to revenue). These rules 164, and optional sub-rules 166, are used by the agent 112 to identify and/or calculate various parameters of the set of one or more databases 200 that are associated with the agent. In some embodiments, a second agent 112-2 includes one or more rules 164 that are based on rules generated for a first agent 112-1. In some embodiments, rules 164 include predetermined operations for retrieving tables, foreign keys, and/or other parameters of the data model 120 (e.g., to identify domains and/or relations). In some embodiments, the rules 164 include using types that are indicated in the data model 120 to identify a role of a data field (e.g., a role of a column). For example, a date or a location (e.g., country, city, etc.) is identified as a dimension, a number is identified as a metric, etc. In some embodiments, the rules 164 include using values identified in the database 200 to identify portions of the data (e.g., a text field with only country names is identified as a dimension, a text field with unique values is identified as an identifier of a dimension, etc.).
In some embodiments, an agent 112 shares information with at least one other agent (e.g., via communication bus 213 of the agent system 100 and/or through the communications network 20). The shared information includes, for example, information stored in database information store 114, skill module 130, sample request store 140, database query module 150, and/or data identifier module 160. For example, in some embodiments, it is desirable for a first agent 112-1 to share a training set (e.g., queries extracted from a database query log 128) with a second agent 112-2 for the purpose of training the second agent based on knowledge gained by the first agent.
In some embodiments, an agent 112 compares a database query log 128 with a data model 120 in order to enhance the data model. For example, if the data model 120 includes entities 210 for revenue and expenses, and a query log 128 includes a query for gross profit margin, the agent is trained from the query log to include a skill 132 that includes an indication of gross profit margin. Accordingly, the training set generated for the respective data model 120 includes the sample requests and/or database queries for gross profit margin.
The above identified modules (e.g., data structures, and/or programs including sets of instructions) need not be implemented as separate software programs, procedures or modules, and thus various subsets of these modules may be combined or otherwise re-arranged in various embodiments. In some embodiments, memory 102 stores a subset of the modules identified above. Furthermore, the memory 102 may store additional modules not described above. In some embodiments, the modules stored in the memory 102, or a non-transitory computer readable storage medium of memory 102, provide instructions for implementing respective operations in the methods described below. In some embodiments, some or all of these modules may be implemented with specialized hardware circuits that subsume part or all of the module functionality. One or more of the above identified elements may be executed by the one or more processors 176. In some embodiments, user device 300 includes one or more processors (e.g., as described with regard to processor 176; e.g., processor 374 of
It should be appreciated that the database 200 illustrated in
In some embodiments, the memory 202 of the database 200 stores:
Accordingly, the database entity store 208 stores one or more entities 210 of data stored on the database 200 (e.g., stored by stored data module 206). In some embodiments, the entities 210 are predefined by the data stored in the database 200 (e.g. a column is expressly labeled “Sales”), are extracted and/or extrapolated by a respective agent 112, are provided by a use of the system, or a combination thereof. For example, in some embodiments, one or more entities 210 are determined through a retrieved data model 120 associated with a respective database. Accordingly, these entities 210, or identifiers of entities, are stored for future reference.
In some embodiments, the database scope module 224 stores one or more database scopes 226 that define a scope of access to data that corresponds to data stored by one or more databases 200. This defined scope of data (e.g., one or more columns, tables, dimensions, relations, metrics, filters, pivots, and/or functions applied and/or available to apply to database 200) and/or the state of the selected subset of data (e.g., the presentation format and/or application state) as a data bookmark. The bookmark includes a pointer that, in accordance with a determination that the pointer is communicated to another user, is utilized to access the defined scope of data.
In some embodiments, the database 200 includes a database query log 228. The database query log 228 is accessed by respective agents 112. In some embodiments, the respective agents 112 are trained based on a training set that includes information determined from the database query log 228, such as various roles of entities 210 in the data stored on the database 200, as well as propose (e.g., extrapolate) new entities from these query logs for use in the training set.
In some embodiments, the database 200 includes the database access module 230 which facilitates (e.g., permits and/or restricts) access to data stored on the database. In some embodiments, access to the data stored on the database is limited by the one or more database scopes 226. In some embodiments, the database access module 230 stores at least one security token for controlling access to the one or more scopes of data defined by the database scopes 226. In some embodiments, a database scope 226 is associated with a particular user or group of users, and user access information 236 associated with the database scope is used to limit access to the scope of data defined by the database scope. In some embodiments, access to a database scope 226 is revoked by changing an entity stored by the database (e.g., at database scope 226 and/or user access information 236).
The above identified modules (e.g., data structures, and/or programs including sets of instructions) need not be implemented as separate software programs, procedures or modules, and thus various subsets of these modules may be combined or otherwise re-arranged in various embodiments. In some embodiments, memory 202 stores a subset of the modules identified above. Furthermore, the memory 202 may store additional modules not described above. In some embodiments, the modules stored in the memory 202, or a non-transitory computer readable storage medium of memory 202, provide instructions for implementing respective operations in the methods described below. In some embodiments, some or all of these modules may be implemented with specialized hardware circuits that subsume part or all of the module functionality. One or more of the above identified elements may be executed by the one or more processors 274. In some embodiments, user device 300 includes one or more processors (e.g., as described with regard to processor 176; e.g., processor 374 of
In some embodiments, the user database query store 306 is accessed by, or communicated to, an agent 112 that is associated with the corresponding user device 300. Using a query log provided by a user enables the respective agent to be trained based a training set that includes information derived from the contents of the user database query store 306. In some embodiments, a database user query log 308 stores a history of user queries for the corresponding database (e.g., database user query log 308-1 stores a history of user queries for the corresponding database 200-1). In some embodiments, the user database query store 306 stores a history of conversations between the user device 300 and another user device or external server. For example, if a user discusses data with another user through an instant messaging application, and a history of this conversation is stored within the user device 300 (e.g., in the user database query store 306), this conversation history is accessible by a respective agent 112. Accessing this information allows the agent to be trained based on these real, natural conversations and include this information in a respective training set. This training augments and improves an ability of the agent to provide specific purpose (e.g., subject matter specific) responses to natural language queries on that respective database. The training set that includes information derived from these logs is utilizable by other agents, which improves the abilities of the other agents.
In some embodiments, user device 300 is, for example, a portable electronic device (e.g., portable communications device, tablet computer, laptop computer, and/or wearable device), desktop computer, and/or server computer.
Block 502.
With reference to block 502 of
Block 504.
Referring to block 504
In some embodiments, a respective agent 112 is trained to determine whether to access a first database 200-1 or a second database for responding to a user-input query. For example, if a first database 200-1 stores information related to sales at a state-wide level and a second database stores information related to sales at a country-wide level, the corresponding agent 112, which has access to both the first database and the second database, may determine whether to access the first database, the second database, or both databases to provide a response to the user-input query.
Block 506.
Referring to block 506 of
Blocks 508 and 510.
Referring to blocks 508 and 510 of
Block 512.
Referring to block 512 of
Block 514.
Referring to block 514 of
Block 516.
Referring to block 516 of
Block 518.
Referring to block 518 of
In some embodiments, training the agent 112 includes analyzing the one or more entities 210 of the data model 120 to create one or more new entities of the data model. For example, if a first entity is identified as a table listing revenue a second entity is identified as a table listing costs, a third entity is created and identified as a table listing profits. In some embodiments, the created entity is stored in a corresponding database entity store 208.
Block 520.
Referring to block 520 of
Block 522.
Referring to block 522 of
Block 524 Through 530.
Referring to blocks 524 through 530 of
In some embodiments, analysis of a log is used to determine that an entity (e.g., a column) is used in a particular role. For example, in accordance with a determination that an entity is used in an aggregation (e.g., a sum) the entity is determined to be a metric; in accordance with a determination that an entity is used in a group by clause (e.g., a pivot), the entity is determined to be a dimension; in accordance with a determination that an entity is used in a where clause that includes the symbol “=”, the entity is determined to be a filter, and in accordance with a determination that an entity is used in a where clause that includes they symbol “>” or the term “between,” the entity is determined to be a metric.
In some embodiments, a potential role that corresponds to an entity of a data model 120 is determined. In some embodiments, a confidence level (e.g., between 0 and 1) is assigned to a role that is determined to correspond to an entity. The confidence level indicates a degree of confidence of a role determined to correspond to an entity. In some embodiments, in accordance with a determination that a confidence level is above a first threshold (e.g., a high range threshold that is approximately 1, such as 0.9), a user is not required to validate the entity. In some embodiments, in accordance with a determination that a confidence level is below a second threshold (e.g., a low range threshold that is lower than the first threshold and approximately 0, such as 0.1), a user is presented with a list of suggested options and is prompted to enter a correct value. In some embodiments, in accordance with a determination that a confidence level is below a third threshold (e.g., a threshold, such as a mid-range threshold (e.g., 0.5), that is between the first threshold and the second threshold), a user is required to validate the entity (e.g., by disambiguating between a set of highest-rated propositions).
Block 532.
Referring to block 532 of
Block 534.
Referring to block 534 of
Blocks 536 Through 556.
Referring to block 536 of
Referring to block 550 of
In some embodiments, modifying the one or more entities of the data model includes modifying (e.g., automatically and/or in response to user input) data stored by the database. In some embodiments, a synonym (e.g., derived from a predetermined list of synonyms and corresponding terms) is substituted for a data value stored by the database. For example, if an identifier includes the abbreviated expression “CA” to refer to the state California, the data value is modified to substitute predefined synonym “California” for the data value “CA.”
Block 552.
Referring to block 552 of
Block 540.
Referring to block 540 of
Referring to block 554 of
Block 558.
Referring to block 558 of
Blocks 560 and 562.
Referring to blocks 560 and 562 of
Block 564.
Referring to block 564 of
Referring to
The user interface depicted in
The user interface depicted in
Users are enabled to configure skills 132 using the user interface illustrated in
Features of the present invention can be implemented in, using, or with the assistance of a computer program product, such as a storage medium (media) or computer readable storage medium (media) having instructions stored thereon/in which can be used to program a processing system to perform any of the features presented herein. The storage medium (e.g., memory 102, memory 190, memory 202, memory 290, memory 302, memory 390) can include, but is not limited to, high-speed random access memory, such as DRAM, SRAM, DDR RAM or other random access solid state memory devices, and may include non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid state storage devices. Memory 202 optionally includes one or more storage devices remotely located from the CPU(s) 274. Memory 202, or alternatively the non-volatile memory device(s) within memory 202, comprises a non-transitory computer readable storage medium.
Stored on any one of the machine readable medium (media), features of the present invention can be incorporated in software and/or firmware for controlling the hardware of a processing system, and for enabling a processing system to interact with other mechanism utilizing the results of the present invention. Such software or firmware may include, but is not limited to, application code, device drivers, operating systems, and execution environments/containers.
Communication systems as referred to herein (e.g., network interface 186) optionally communicate via wired and/or wireless communication connections. Communication systems optionally communicate with networks (e.g., network 20), such as the Internet, also referred to as the World Wide Web (WWW), an intranet and/or a wireless network, such as a cellular telephone network, a wireless local area network (LAN) and/or a metropolitan area network (MAN), and other devices by wireless communication. Wireless communication connections optionally use any of a plurality of communications standards, protocols and technologies, including but not limited to Global System for Mobile Communications (GSM), Enhanced Data GSM Environment (EDGE), high-speed downlink packet access (HSDPA), high-speed uplink packet access (HSDPA), Evolution, Data-Only (EV-DO), HSPA, HSPA+, Dual-Cell HSPA (DC-HSPDA), long term evolution (LTE), near field communication (NFC), wideband code division multiple access (W-CDMA), code division multiple access (CDMA), time division multiple access (TDMA), Bluetooth, Wireless Fidelity (Wi-Fi) (e.g., IEEE 102.11a, IEEE 102.11ac, IEEE 102.11ax, IEEE 102.11b, IEEE 102.11g and/or IEEE 102.11n), voice over Internet Protocol (VoIP), Wi-MAX, a protocol for e-mail (e.g., Internet message access protocol (IMAP) and/or post office protocol (POP)), instant messaging (e.g., extensible messaging and presence protocol (XMPP), Session Initiation Protocol for Instant Messaging and Presence Leveraging Extensions (SIMPLE), Instant Messaging and Presence Service (IMPS)), and/or Short Message Service (SMS), or any other suitable communication protocol, including communication protocols not yet developed as of the filing date of this document.
It will be understood that, although the terms “first,” “second,” etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the claims. As used in the description of the embodiments and the appended claims, the singular forms “a,” “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term “and/or” as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
As used herein, the term “if” may be construed to mean “when” or “upon” or “in response to determining” or “in accordance with a determination” or “in response to detecting,” that a stated condition precedent is true, depending on the context. Similarly, the phrase “if it is determined [that a stated condition precedent is true]” or “if [a stated condition precedent is true]” or “when [a stated condition precedent is true]” may be construed to mean “upon determining” or “in response to determining” or “in accordance with a determination” or “upon detecting” or “in response to detecting” that the stated condition precedent is true, depending on the context.
The foregoing description, for purpose of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the claims to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The embodiments were chosen and described in order to best explain principles of operation and practical applications, to thereby enable others skilled in the art.