The advantages of modern data-warehouse-as-a-service (DWaaS) systems have motivated many companies to migrate from on-premises warehouse systems (e.g., Teradata or Oracle) to cloud-native systems (e.g., Microsoft Azure SQL, Amazon RedShift and other database APIs). Adaptive data virtualization can reduce the costs of data warehouse migration by translating queries in real time from the on-premise system to the new vendor's SQL dialect. This shortens the time for migrating data warehouse applications from years to days or weeks. However, selecting an optimal target for database migration (or “re-platforming”) requires granular insight into the legacy system workload, and how that workload is generated by different database client applications.
Some embodiments of the invention provide a novel method for identifying client applications that use a database, and characteristics of these applications' workload on the database. The method of some embodiments receives a log for the database that identifies several queries that were performed on the database during a duration of time. In some embodiments, the log is a log that is maintained by the database, while in other embodiments the log is a log that one or more programs (e.g., one or more scripts) extract from another log maintained by the database.
The method then analyzes the received log to identify a set of client applications that each generated one or more queries to the database during the duration of time. In some embodiments, the received log includes for each query an application identifier (or combination of other attributes) that uniquely identifies an application that generated the query. In these embodiments, the method identifies the set of client applications from the log by first identifying the queries in the received log, and then using the application identifiers or other attributes to identify the application that generated the queries.
From the identified queries, the method in some embodiments identifies sets of queries performed by different applications. Based on a set of properties for each set of queries for each application, the method in some embodiments generates a complexity indicator that represents a level of difficulty to understand the set of queries, e.g. by a person or machine. In some embodiments, the complexity indicator also represents a level of difficulty in rewriting the set of queries for another database, such as a candidate database for migrating the data from the original database. In some embodiments, the complexity indicator also represents a level of difficulty in verifying the results of the rewritten query after it has been executed on the candidate database after migration.
As further described below, the method in some embodiments generates the complexity indicator for each application's queries by analyzing the queries individually to identify the components of the queries and generating complexity scores for the queries based on complexity values associated with the components and patterns within the queries. The method then includes the complexity indicators in a report in order to identify the complexity of understanding and rewriting each application.
In analyzing the queries, the method in some embodiments removes duplicate queries by only maintaining one query in several queries that have the same semantic structure, e.g., the same set of query components. The method removes duplicate queries as it does not need to assess duplicate queries to compute the same complexity score multiple times for two or more queries that have the same semantic structure. To identify duplicate queries, the method in some embodiments generates for each query a query identifier based on a set of components of the query, and then identifies two queries as duplicates when they have the same generated query identifier. In some embodiments, constants and semantically insignificant parts of the query are removed before generating the query identifier. A query's generated identifier in some embodiments is a hash value that is generated from the identifiers of the query components.
While removing duplicate queries, the method in some embodiments generates metadata (e.g., count values) regarding the removed duplicate queries, and then uses the generated metadata to express the complexity indicators for the queries associated with a client application. For instance, in some embodiments, the method of some embodiments computes a complexity score for each semantically unique query in the set of queries and then aggregates the computed complexity scores for the queries in the set, in order to obtain an aggregate complexity score that is a complexity indicator for the entire set of queries, i.e., the logical complexity of the database workload. This aggregation uses the metadata generated during the duplication removal in computing the aggregate complexity score.
To generate the complexity score for a query that references a set of one or more database objects in the database, the method of some embodiments identifies a set of properties associated with a set of components of the query and then computes, from the extracted set of properties, a complexity score that represents a level of difficulty in understanding and rewriting the query. In some embodiments, the set of properties include a grouping of the components according to a set of component types and a number of each component type in the database query. The number for each component type is used in some embodiments to calculate the complexity score for the query.
In some embodiments, each component type has a cost value associated with it and the complexity score for a query is computed by using the cost values of its identified component types. In some embodiments, the complexity score is computed as a weighted sum. In some of these embodiments, each component type's cost value is a weight value. In other embodiments, each component type's cost value is a normalizing cost score that is added with the other cost scores of the query's other component types through weight values.
In some embodiments, the identified set of components for a query include reference to a set of one or more objects in the database and at least one command that specifies an operation with respect to the set of referenced database objects. The identified set of properties for the query components in some embodiments includes a command type for the command and a set of attributes associated with the referenced database objects, such as the number of referenced database objects, the object types for the referenced database objects, etc. In some embodiments, a component type specifies whether the referenced object is one of a persistent table, a temporary table, a view, a column, a macro, and a stored procedure. The command type in some embodiments also specifies whether the command is a SQL command and a non-SQL command.
In some embodiments, the identified set of properties specify an operation type for the operation specified by the command. Examples of operation types include an inner join operation, an outer join operation, an implicit join operation and a partition operation. Also, in some embodiments, the identified set of properties include a function type for a function used in the operation specified by the command. Examples of function type or operation include a data filtering function, a data format function, a data manipulation function, and a case sensitivity function.
To identify the set of components of a query, the method of some embodiments performs a parsing operation to parse the query into an abstract query syntax from which the components are identified. To do this, the method in some embodiments associates query components and properties identified from the abstract query syntax to another set of query components and properties defined for a database-independent form.
Some embodiments of the method also identify a set of overall workload attributes that quantify the impact of performing the queries on the database. In some embodiments, the workload attributes are identified based on an analysis of operational metadata that is generated by the database during execution of the queries. In some embodiments, the operational metadata is included in the received log. The method then includes the workload attributes in a report in order to further characterize the workload.
Some of the workload attributes are associated with the identified applications. For example, these attributes include a count of the queries performed by each application, a count of database objects referenced by queries performed by each application, and a throughput of queries performed over a unit interval of time by each application. In some embodiments the count of queries is a count of unique queries after removing duplicate queries.
Some of the workload attributes are global attributes that describe the workload of the database overall. For example, these attributes include a number of queries in each of a set of query types, a count of database constructs (e.g., database objects and schemas) referenced by the queries, a total size of the database, a throughput of the queries performed on the database over a unit interval of time, and a count of unused database objects that were not referenced by the queries. In such embodiments, the count of the database constructs is a count of the number of constructs of different types. In some of these embodiments, the throughput of queries includes a peak throughput and an average throughput, and the unit interval time is expressed in minutes, hours, and days.
To report the complexity indicators and workload attributes, some embodiments of the method provide a graphical user interface. In some embodiments, the user interface allows the user to select a period of time and specific database applications for filtering the workload to be analyzed. In some embodiments, the user also specifies a candidate database system to evaluate as a re-platforming target.
The preceding Summary is intended to serve as a brief introduction to some embodiments of the invention. It is not meant to be an introduction or overview of all of the inventive subject matter disclosed in this document. The Detailed Description that follows and the Drawings that are referred to in the Detailed Description will further describe the embodiments described in the Summary as well as other embodiments. Accordingly, to understand all the embodiments described by this document, a full review of the Summary, Detailed Description and the Drawings is needed. Moreover, the claimed subject matters are not to be limited by the illustrative details in the Summary, Detailed Description and the Drawing, but rather are to be defined by the appended claims, because the claimed subject matters can be embodied in other specific forms without departing from the spirit of the subject matters.
The novel features of the invention are set forth in the appended claims. However, for purposes of explanation, several embodiments of the invention are set forth in the following figures.
Some embodiments of the invention provide a novel method for identifying client applications that use a database, and characteristics of these applications' workload on the database. In some embodiments, the method is performed by a database insight engine for informing decisions on re-platforming the original database to a different candidate database. The database insight engine of some embodiments receives a log for the database that identifies a number of queries that were performed on the database during a duration of time. These queries are expressed in some embodiments as Structured Query Language (SQL) queries, and each query includes multiple SQL statements. In other embodiments the queries are expressed using other APIs or commands.
The database is accessed by at least one client system 112 that runs an application 115, which executes queries 117 on the database. In some embodiments, a client system runs more than one application, each of which executes its own queries. The objects stored in the database are described by the database schema 120 (also referred to as the database catalog), and the database generates a query log 125 and operational metadata 130 while performing the queries from the client applications 115. The log and metadata describe the database workload that the application queries place on the database.
The process 200 begins at 205 by receiving the database query log 125, schema 120, and operational metadata 130 from the database 110 for analysis. The process 200 performs pre-processing operations at 210 on the log, including consolidating queries across multiple log entries, removing duplicate queries, and organizing the queries according to client-database sessions, as will be described in further detail below.
The process 200 analyzes at 215 the queries in the log 125, in order to identify the applications that perform the queries. Each query in the log 125 is associated with an application identifier (or other attributes) that identifies which application issued the query. These application identifiers or attributes are unique to each application that accesses the database. The process 200 further analyses at 217 the operational metadata 130, in order to identify attributes of the workload. These workload attributes include throughput, data size, and unused data.
The process 200 also interprets at 220 each query in the log 125 to identify query attributes, query components, and properties of the query components. These properties are in some embodiments used to generate indicators that describe the complexity of the database workload on a per-query and a per-application basis. In some embodiments, the insight engine provides the schema and the queries to a database virtualization system (DVS) 135. The DVS 135 analyzes each query by instantiating the objects from the schema 120 and then interpreting the queries using the instantiated objects. The DVS 135 identifies the components of each query, and provides the properties of the components to the insight engine 105.
The process 200 uses the properties and attributes to calculate at 225 insights into the database workload. In some embodiments, these insights include a complexity indicator for each query 117, which are aggregated to determine a complexity indicator for each application 115. The process 200 then provides (at 230) a report of the workload insights gained from the results of the analysis. In some embodiments, the report is provided on a user interface 145. The process 200 then ends.
In some embodiments, the log is a log that is maintained by the database 110, while in other embodiments the log is a log that one or more programs (e.g., one or more scripts) extract from another log maintained by the database. The log also includes in some embodiments the schema 120 and the operational metadata 130.
The process 300 begins by receiving through the user interface 145 (at 305) the address 405 of a server or host computer that hosts the database 110. In some embodiments, filenames 410 for individual tables of the database are also received, since these tables may be renamed by the database owner from their default (e.g., for security reasons).
The process also receives through the user interface 145 (at 310) filtering information. For example, the filtering information includes in some embodiments start and end dates 415 to specify a time interval for the analysis. The filtering information also includes in some embodiments specific applications (not shown) that the database owner wishes to evaluate.
After receiving all the information, the process 300 (at 315) uses the information to generate a data extraction script that is tailored to the database 110. In some embodiments, the process 300 generates the script when the user selects a control 420 (e.g., a button) on the user interface 145.
After generation, the script is downloaded by the database owner, who then executes it (at 320) on the database 110. In some embodiments the database owner provides administrative credentials through the user interface 145 (not shown), so that the process 300 can directly transfer and execute the generated script on the database 110. In other embodiments, the database owner manually uploads the script to the database, so that the database insight engine does not have access to the administrative credentials.
Executing the script creates an archive file as output. The process 300 receives at 325 the archive file, which in some embodiments contains a record of the queries issued on the database (e.g. the log), the database schema, and operational metadata generated by the database during query execution.
The process 300 receives at 325 the output archive from the database owner, who uploads it in some embodiments through a file selection control 425 of the user interface 145. In some embodiments, the database owner also specifies comments (e.g., indicating the context of the database, why it is being analyzed, etc.) via another user interface field 430. After selecting the file and entering any comments, the database owner indicates that the file is ready for upload and to initiate analysis by selecting another control (e.g., an “Upload and Run” button 435). The process 300 then performs the analysis at 330 and the process ends.
In some embodiments, the database owner has the option to save the file for later analysis (e.g., by selecting a “Save for Later” button 440). In such embodiments, the database owner can return to the user interface 145 at a later time, review the comments, select the desired uploaded archive, and select the user control to initiate the upload.
As discussed above with reference to operation 210 in
The process 500 begins at 505 by consolidating individual queries in the log. In certain database systems (e.g., Teradata) the logs are stored in multiple separate tables, such as a database query log (DBQL) log table that contains information about the queries being logged, and a DBQL SQL table that contains raw query text. Further, the SQL table store the raw text of a single (large) query across multiple rows in some cases, due to field size limitations. Therefore, individual query entries must be identified across the tables and the rows, (e.g., using a database-assigned query identifier), and then stitched together into a single entry prior to analysis. In some embodiments, this stitching is performed using a query identifier that is assigned by the database during generation of the log files during execution of the queries. In such embodiments, this database-assigned query identifier is unique for all individual queries, even if two queries are duplicates of each other.
The process further removes at 510 duplicate queries from the query log based on the text of the query.
The process 600 determines at 615 whether the text of the identified duplicates contains data definition language (DDL) statements. In some embodiments, queries that contain DDL statements (e.g., CREATE, ALTER, DROP, etc.) are not de-duplicated, because they may have an effect on subsequent queries in the session. If the process determines that the duplicate contains DDL statements, then the process returns to 610 and identifies the next set of duplicates.
If the process 600 determines that the query does not contain DDL statements, then the process applies at 620 a set of duplication rules. Rules are used in some embodiments to classify as unsafe for duplication those queries that may potentially affect objects, e.g. rules based on values of the statement type, the statement group, and the presence of semicolons in the query text (which potentially indicate multiple statement boundaries). Conversely, basic DML statements like INSERT and SELECT will not impact the analysis of other queries or affect database objects and are classified as safe for de-duplication based on these rules.
After applying the rules, the process 600 determines at 625 if the query was marked safe for de-duplication. If the query is not safe for de-duplication, the process returns to 610 and identifies the next set of duplicates.
If the query was marked safe for de-duplication, then the process 600 generates and stores metadata at 630 regarding the removed duplicate queries. The duplicates may have been issued by different “constituents” (e.g., applications, clients, or users). Accordingly, the metadata may include identifiers for each of these possible constituents, so that the removed queries may still be properly accounted for their contribution to the database workload from each constituent. The process 600 then consolidates at 635 the duplicate queries into a single entry.
After consolidating the query, the process 600 determines at 640 if there are any more identified duplicates. If so, then the process returns to 610 and identifies the next set of duplicates. If not, then the process 600 ends. In some embodiments, duplicate queries are removed on a per-log basis, whereas in other embodiments they are removed on a per-application basis.
Returning to
To construct the actual sessions, the process 700 assigns at 710 each query in the log to different sessions, based on the session identifier. All the queries in a session accordingly share the same session identifier.
For each constructed session, the process 700 calculates at 715 the session's size. In some embodiments, the size is calculated as the total number of queries in each session. In other embodiments, the size is calculated as the total number of characters of all the queries in each session. The sessions are then sorted in order of decreasing size, so that the largest sessions are processed first.
Within each session, the queries are sorted at 720 in order of ascending execution order (e.g., start time). This ensures that the queries are processed within a session in the same order that they were executed on the database 110. The process 700 then ends.
As a result of the process 700, for each session, the queries are processed in ascending execution order, and the sessions are processed (in some embodiments, by DVS 135 worker processes) in parallel. As each worker process completes processing a session, it is assigned the next largest session, as determined by the session size. In these embodiments, the largest sessions are started first, and smaller ones are started when there is room for them, to help all workers run out of sessions to process at roughly the same time.
Due to the de-duplication and the parallel processing of the sessions, in some embodiments the session reconstruction also results in temporal compression. Since the log could potentially contain queries over a substantial period of time (e.g. weeks or months, if not longer), real-time processing would take a comparable amount of time if the queries are processed in sequential order. By removing duplicate queries and reconstructing the sessions from the log for parallel processing, the queries are processed in time-accelerated fashion.
After completing the above pre-processing steps, the process 500 ends. As discussed above with reference to operation 215 in
As discussed above with reference to operation 220 in
To identify the attributes or components of a query, the database insight engine of some embodiments performs a parsing operation to parse the query into an abstract query syntax from which the attributes and components can be identified. To do this, the database insight engine in some embodiments associates query components and properties identified from the abstract query syntax to another set of query components and properties defined for a database-independent form. In some embodiments, this association is performed as part of a transforming operation that a transformer module of the database insight engine performs on the query. In other embodiments, this association is performed as part of a transforming or binding operation that a binding module of the database insight engine performs on the query. Both the transformer and the binder modules will be described in further detail below with reference to
In some embodiments, the database insight engine uses an adaptive database virtualization system (DVS), such as Datometry Hyper-Q, for analyzing the queries (e.g., performing the parsing, binding, and transforming operations described above) and identifying properties of the query components.
The process 900 begins by instantiating at 905 the schema objects received from the database insight engine. The process 900 then receives at 910 a single query 805 for analysis. As noted above, in some embodiments this query belongs to a certain session 810 that has been assigned to the process 900, and is being processed in order of ascending execution order relative to other queries in the session.
The process 900 receives the query 805 at a cross compiler module 806 that coordinates query processing by a query interpreter module 807 and a query translator module 808. The process 900 parses at 915 the raw text of the received query 805 into an abstract syntax 815. In some embodiments, the parsing operation is performed by a parser module 820 of the query interpreter 807. During parsing, the parser 820 performs a syntactical interpretation operation on the received query 805 to convert the query to the abstract syntax representation. One example of such a syntactical interpretation operation is a text replacement operation for converting a non-standard SQL command keyword (e.g., SEL) into standard SQL equivalents (e.g., ansi_select).
The process 900 associates at 920 the parsed query 815 to an algebraic representation 825 that is in a form that is independent of any database (e.g., an SQL representation). In some embodiments, this association is a part of a binding operation performed by a binder module 830 of the query interpreter 807. The binder module 830 performs a semantic interpretation operation on the parsed query, to convert it from the abstract syntax to the database-independent form. One example of this semantic interpretation operation is to reference names such as tables and views by performing metadata lookups and normalizing the query tree. In other embodiments, this association is part of a transforming operation, performed by either the binder module 830 or a transformer module 832 (which in some embodiments is a component of the Query Translator module 808). The transformer module 832 in some such embodiments performs the semantic interpretation operation on the parsed query, to convert it from the abstract syntax to the database-independent form. In some such embodiments, the transformer module 832 also converts the query from the database-independent form to a different query syntax.
The process 900 identifies at 925 the components of the query 805. In some embodiments, these components include query attributes such as size of the query, number of lines of code, objects (e.g. tables) that are touched by the query, and other query-specific information. In some embodiments, at least some of the identified components are syntactic components identified using the abstract syntax. Examples of such syntactic components include commands (e.g., ansi_select), operators (e.g., EQ), and constants (e.g. integer value 789). In some such embodiments, these syntactic components are identified by the parser module 820. Alternatively or conjunctively, at least some of the identified components are semantic components identified using the database-independent form. Examples of semantic components include SQL commands (e.g., select), referenced database objects (e.g., CUSTOMER) and temporary tables (e.g., TODAYSALE). In some such embodiments, these semantic components are identified by the binder module 830 of the query interpreter 807. In other such embodiments, these semantic components are identified by the transformer module 832 of the query translator 808.
The process 900 identifies at 930 various properties associated with each identified component. Examples of such properties include component identifiers, component types (object, operator, etc.), the count of the number of occurrences of the component in the query text, and base tables associated with the component. In some embodiments, the properties are identified using a property derivation framework (not shown) that is populated when the schema objects from the database 110 were instantiated. In such embodiments, the parser module and/or the binder module use the property derivation framework to derive properties such as query type (select/insert/delete, etc.), output columns of operators, and data types of expressions.
In some embodiments, the semantic components and/or properties are identified by the query translator module 808, e.g. by a serializer component (not shown) and/or a query transformer 832 component of the query translator 808. In some such embodiments, the binder module 830 provides the abstract syntax 825 to the query translator module 808 (as indicated by the dashed line). The query translator then provides the components and/or properties to the cross compiler 806 for returning to the database insight engine 105.
The process 900 also generates at 935 a unique identifier 835 for the query. The identifier is generated in some embodiments by a hasher component 840 of the query interpreter 807, that computes a hash value from the query. For example, in some embodiments, the hash value is calculated by treating the entire query as a single string. In some embodiments, the query string is the abstract syntax 815 received from the parser 820. In other embodiments, the query string is the database-independent form 825 received from the binder 830 (as indicated by the dashed line). In other embodiments, the query identifier is calculated as a Merkle tree over nodes of the abstract syntax 815 or the database-independent form 825.
In still other embodiments, the hash value is calculated from the identified query components and properties. For example, the hash value is calculated from the query components by hashing each component's identifier. In some such embodiments, the component identifiers are sorted into a particular order (using various criteria such as alphabetical, length, order of operation priority, etc.). In some such embodiments the component identifiers include semantic component identifiers, syntactical component identifiers, or some combination of both types of identifiers. These component identifiers are identified in some embodiments from the database-independent form instead of the database-dependent syntax (e.g., the abstract syntax).
After computing the query identifier and identifying the query components and properties, the process 900 provides them at 940 to the cross compiler 806. The cross compiler than returns them to the database insight engine 105. In some embodiments, the results are provided in a data storage structure 845 that contains the identified query components and their properties, as well as the unique query identifier 840. The process 900 then ends.
As described above, in some embodiments, the query identifier is calculated from the query components by the database virtualization system (DVS) 135, which then provides it to the database insight engine 105. However, in some embodiments, the DVS cannot identify all the query components or fully represent the query in the abstract syntax and/or the database-independent form. For example, the query contains in some embodiments non-standard SQL features (e.g., macros, stored procedures, updateable views), data types (e.g., PERIOD), column properties (e.g., case sensitivity) or other features. In such a case, the DVS still calculates a hash value from the raw text 805 of the query rather than the query components. In other embodiments, the database insight engine itself calculates the hash value in case of an error from the DVS 135. In such embodiments, the process 900 also returns the error at 940 to the database insight engine.
As discussed above with reference to operation 225 in
The complexity indicator in some embodiments is a category of complexity, e.g. low, medium, and high. In some embodiments, the complexity indicator represents a complexity expression of the database query, that is a value that quantifies the degrees of complexity of a database query. In some embodiments, the complexity expression of individual queries is also aggregated over the query workload from a specific application or other constituent, or over the entire database workload. These degrees of complexity describe in some embodiments the understandability of the query by a human or a machine, such as understanding what the query does, what its components are, etc. In some embodiments, the degrees of complexity alternatively or conjunctively describe how difficult it is to rewrite the query to have the same result, when executed on or submitted to a different database than its original database (for which the query was natively intended to run prior to rewriting). The degrees of complexity also describe in some embodiments the verifiability of the query's results, if rewritten and executed on another database, in comparison to the results of executing the query on the original database.
In some embodiments, the complexity indicator is quantified by the database insight engine with a numeric complexity score that is useful as a proxy for migration and query/application rewriting cost and other variables. The overall complexity of a workload is a combination of the complexity of the individual queries in the workload mix and the complexity of the underlying schema.
As further described below, the database insight engine in some embodiments calculates the complexity score for each application's queries by analyzing the queries individually to identify the components of the queries, and generating complexity scores for the individual queries based on complexity values associated with the components and patterns within the queries.
In embodiments where the query includes a number of individual statements, the query complexity score is calculated based on the aggregate complexity scores of the query's statements, based on the components of each statement. Complexity scores are also calculated in some embodiments for objects such as views, for example based on queries that define the object, and/or queries that reference the object.
Individual complexity scores for queries are aggregated in some embodiments by application, client, and other dimensions of the workload. These aggregate complexity scores are used in some embodiments as a proxy for the database resources and cost required to support the database workload, and to assess the relative cost in resources, time, and money for migrating some or all of the workload to a different database. The database insight engine then includes the aggregate complexity scores in a report in order to identify the complexity of understanding and rewriting each application. In some embodiments, the aggregate complexity scores are also used as a proxy to assess the relative cost in resources, time, and money for verifying the results from queries executed by the rewritten applications on the new database after migration.
In analyzing the queries, the database insight engine in some embodiments removes duplicate queries by only maintaining one query out of several queries that have the same semantic structure, e.g., the same set of query components. In some embodiments, these duplicates do not have identical text (e.g., syntactical structure). Two queries may have different syntactical structure and still have identical semantic structure, and thus be identified as duplicates. As an example, one query may define a variable “A=5” and another query may define the same variable as “A=10,” in which case the two queries are semantically both assigning a constant value to the variable. These constants are semantically insignificant parts of the query. Hence, these queries are considered in some embodiments to be semantic duplicates even though the syntactical components (namely, the constant value) are different. The database insight engine removes semantically duplicate queries from the complexity calculation in some embodiments, as it does not need to assess duplicate queries to compute the same complexity score multiple times for two or more queries that have the same semantic structure.
While removing duplicate queries, the database insight engine in some embodiments generates metadata (e.g., count values) regarding the removed duplicate queries, and then uses the generated metadata to express the complexity indicators for the queries associated with a client application.
In some embodiments, complexity indicator is a complexity score that is calculated by the database insight engine for each semantically unique query in the set of queries. The complexity scores are then aggregated over all the queries in the set, in order to obtain an aggregate complexity score that represents the level of difficulty in understanding the queries, rewriting the queries for another database, and verifying the results of the rewritten queries when executed on the other database after migration. This aggregation uses the metadata generated during the duplication removal in computing the aggregate complexity score, so that the complexity score contribution of a semantically unique query that is issued by multiple applications is still properly counted towards all those applications' aggregate complexity scores.
In some embodiments, the received properties include a grouping of the components according to a set of component types and a number of each component type in the database query. The number for each component type is used in some embodiments to calculate the complexity score for the query. In some embodiments, the database insight engine 105 counts the number of each component type, instead of receiving it from the DVS 135.
After receiving the query identifier and the properties, the process 1000 extracts at 1010 an application identifier that identifies which application issued the query. The identifier is extracted from the log in some embodiments prior to processing the query by the DVS, or is returned in other embodiments as a query property after processing by the DVS. In some embodiments, the identifier is a constituent identifier that identifies another type of constituent that issued the query instead of an application, e.g. a database user or a database client.
The process 1000 uses the received query identifier to determine at 1015 whether the query is a unique query. In some such embodiments, the query identifier incorporates the application identifier (or another constituent identifier). If the process determined that the query identifier matched a previously stored query identifier, then the query is considered a duplicate of at least one previously processed query. The process 1000 then stores metadata (at 1020) that is associated with the duplicate query (e.g., the application identifier or other constituent identifiers) to ensure that during aggregation (described below) the complexity of the duplicate is correctly assigned to any applications or other constituents for which the complexity score is being aggregated. The process 1000 then continues to 1045 which is described below.
If the process determined that the query identifier did not match a previously stored query identifier, then the query is considered a unique query that is not a duplicate of any previously processed query. The process 1000 then stores at 1025 the unique query identifier for comparison to future queries.
The complexity score for the query is computed at 1030. In some embodiments, each of the component types has a cost value associated with it, and the complexity score is calculated as a weighted sum of the cost values for the query components. In some of these embodiments, each component type's cost value is the weight value. In other embodiments, each component type's cost value is added with the other cost scores of the query's other component types using separate weight values for normalization. In some embodiments, the cost values depend only upon the original database whose workload is being analyzed. In other embodiments, however, the cost values also depend on what target database is being considered as a candidate for migration.
In some embodiments the weights are assigned using a lookup table. The weight values in the lookup table is pre-calibrated in some such embodiments by reference to a standardized query workload used for database performance testing, such as TPC-H. In other such embodiments, the calibration of the weights is performed using a linear regression.
Alternatively or conjunctively, the query complexity score is calculated at 1030 by using other properties of the query components besides counts of component types. In some embodiments (not shown), the process 1000 stores the computed query complexity score as query metadata associated with the unique query identifier.
In some embodiments, the complexity score represents an intrinsic complexity of the original query (or aggregate of queries, by application, other constituent, or total workload). A query that is multiple pages long is more complex for example than a single line query, in most cases. Alternatively or conjunctively, the complexity score also represents the complexity of understanding and rewriting the query for a different database, in the scenario of database migration. As a result the complexity score (or a portion of the score, or a second complementary score) varies in some such embodiments based on what target database for migration is selected (provided, for example, as an argument to the database virtualization system 135 by the database insight engine 105 when providing a query for analysis).
The complexity score for the query is then added at 1035 to an aggregate complexity score for the application identified by the application identifier (at 1010), to compute the total complexity score for each identified application. In some embodiments, the complexity scores are aggregated as a running total that is updated as each query is processed. In other embodiments (not shown), the complexity scores of the entire query cohort, or in some embodiments a session cohort, are processed first, and then the results of the cohort are processed to aggregate them by application identifier. In some embodiments, other aggregations are also performed, such as by user of the database, by database client, or any other constituents for which there are identifiers, which were identified during operation 1010.
As discussed above, in some embodiments the complexity scores are also dependent on the candidate database for migration. In some embodiments, these aggregate complexity scores are in some embodiments a compatibility score, that represents the compatibility of the workload from the application or other constituent, or the overall workload as a whole, with a specific candidate database. In other embodiments, such as those where the complexity scores are not dependent on the candidate database, the aggregate complexity scores are a portability score that represents the portability of the workload from the application or other constituent, or the portability of the overall workload as a whole.
In some embodiments, duplicate queries are not included in these aggregate complexity scores in some embodiments. This avoids an inflation of the complexity score due to repeated queries. These duplicates are effectively removed from the aggregation process by the determination at 1015. In other embodiments, duplicate queries are included in the aggregate complexity score. In such embodiments, the complexity score for the duplicated query is retrieved (not shown) from the storage of the query metadata, where it would have been stored during operation 1030. In some embodiments, duplicates of certain types of queries are included in the aggregate complexity scores, whereas duplicates of other types of queries are not included in the aggregates.
The process 1000 uses the application identifier to increment at 1040 a query counter (i.e., metadata) associated with the application, in order to keep track of the total number of queries issued by the application. Query counters for other identified constituents are also incremented as available. In some embodiments, if the query had been determined to be a duplicate in operation 1015, then the query will still be included in the query count for the application or constituent. In other embodiments, duplicate queries are excluded from the query count, or excluded based on the type of query. The process 1000 then ends.
In some embodiments, the identified set of components for a query includes a reference to a set of one or more objects in the database, and at least one command that specifies an operation with respect to the set of referenced database objects. The identified set of properties for the query components in some embodiments includes a command type for the command and a set of attributes associated with the referenced database objects, such as the number of referenced database objects, the object types for the referenced database objects, etc. In some embodiments, a component type specifies whether the referenced object is one of a persistent table, a temporary table (e.g., global temporary tables), a view (including updateable views), a column, a macro, and a stored procedure.
The command type in some embodiments also specifies whether the command is a SQL command and a non-SQL command. Some examples of SQL command types are SET, SELECT, INSERT, UPDATE, MERGE, DELETE, CREATE, GROUP BY, and UPSERT. Some examples of non-SQL command types are a bulk load command, a bulk export command, a bulk import command, and a backup command.
In some embodiments, the identified set of properties specify a database operation type for the operation specified by the command. Examples of operation types include an inner join operation, an outer join operation, an implicit join operation, and a partition operation (e.g., a table partitioning clause). Also, in some embodiments, the identified set of properties includes a function type for a function used in the operation specified by the command. Examples of function type include a data filtering function, a windowing function, a data format function, a string format function, a date manipulation function, a case sensitivity cast function, a subquery (including correlated subqueries and vector subqueries), aggregate functions, and a recursive construct.
The identified properties for a query also include in some embodiments the number of lines of code in the query and in any macros or stored procedures, syntax differences that can be replaced syntactically (i.e. regex, search and replace, etc.), and the deepest nesting level of joins and/or subqueries. They also include in some embodiments unsupported scalar types, exotic types (e.g. BLOBs), SET tables, default expressions, unique constraints, columns with DATETIME precision, and triggers.
As discussed above with reference to operations 217 and 225 in
Some of the workload attributes are associated with the identified applications. For example, these attributes include a count of the queries performed by each application, a count of database objects referenced by queries performed by each application, and a throughput of queries performed over a unit interval of time by each application. In some embodiments the count of queries is a count of unique queries after removing duplicate queries.
Some of the workload attributes are global attributes that describe the workload of the database overall. For example, these attributes include a number of queries in each of a set of query types, a count of database constructs (e.g., database objects and schemas) referenced by the queries, a total size of the database, a throughput of the queries performed on the database over a unit interval of time, and a count of unused database objects that were not referenced by the queries. In such embodiments, the count of the database constructs is a count of the number of constructs of different types. In some of these embodiments, the throughput of queries includes a peak throughput and an average throughput and the unit interval time is expressed in minutes, hours, and days.
As discussed above with reference to operation 217 in
In some embodiments, the report generated by the database insight engine provides a concise summary of insights into the database workload to inform a user decision to re-platform the original database. Re-platforming the database involves migrating the stored data in the database to a new database, and migrating applications that have access to the database so that they still work in relation to the migrated data on the new database. In some embodiments, the database is re-platformed to a different database type which may or may not support the same features as the original database, but may provide cost or performance advantages. In other embodiments, the database is re-platformed to a new database of the same type, in order to optimize the database and applications to obtain the performance or cost advantages. The report generated by the database insight engine provides insight into which new databases makes the best candidates for re-platforming, which apps are most essential, and how difficult (e.g., in terms of time or cost) the migration process will be.
In some embodiments, the report includes an executive summary of what was analyzed, such as the version of the database, and the time period being analyzed. In some embodiments, the report is organized into different “buckets” of information tailored for different purposes. These buckets include functional, operational, governance, and performance-related purposes.
In some embodiments, the report also includes a breakdown by type of statement.
As conceptually illustrated in
If a specific new database was specified, then in some embodiments the report also provides an assessment of how suitable it will be for the original database workload. For example, in some embodiments the assessment includes whether the new database system can handle the overall size, characteristics, and throughput of the database workload. The report also includes in some embodiments an assessment of whether the new database should be compute-heavy or storage-heavy. In other embodiments where the user does not specify a desired database system, the database insight engine provides a recommendation of a new database system based on a best match to the characteristics of the workload being analyzed.
Many of the above-described features and applications are implemented as software processes that are specified as a set of instructions recorded on a computer readable storage medium (also referred to as computer readable medium). When these instructions are executed by one or more processing unit(s) (e.g., one or more processors, cores of processors, or other processing units), they cause the processing unit(s) to perform the actions indicated in the instructions. Examples of computer readable media include, but are not limited to, CD-ROMs, flash drives, RAM chips, hard drives, EPROMs, etc. The computer readable media does not include carrier waves and electronic signals passing wirelessly or over wired connections.
In this specification, the term “software” is meant to include firmware residing in read-only memory or applications stored in magnetic storage, which can be read into memory for processing by a processor. Also, in some embodiments, multiple software inventions can be implemented as sub-parts of a larger program while remaining distinct software inventions. In some embodiments, multiple software inventions can also be implemented as separate programs. Finally, any combination of separate programs that together implement a software invention described here is within the scope of the invention. In some embodiments, the software programs, when installed to operate on one or more electronic systems, define one or more specific machine implementations that execute and perform the operations of the software programs.
The bus 1205 collectively represents all system, peripheral, and chipset buses that communicatively connect the numerous internal devices of the electronic system 1200. For instance, the bus 1205 communicatively connects the processing unit(s) 1210 with the read-only memory 1230, the system memory 1225, and the permanent storage device 1235.
From these various memory units, the processing unit(s) 1210 retrieve instructions to execute and data to process in order to execute the processes of the invention. The processing unit(s) may be a single processor or a multi-core processor in different embodiments.
The read-only-memory (ROM) 1230 stores static data and instructions that are needed by the processing unit(s) 1210 and other modules of the electronic system. The permanent storage device 1235, on the other hand, is a read-and-write memory device. This device is a non-volatile memory unit that stores instructions and data even when the electronic system 1200 is off. Some embodiments of the invention use a mass-storage device (such as a magnetic or optical disk and its corresponding disk drive) as the permanent storage device 1235.
Other embodiments use a removable storage device (such as a floppy disk, flash drive, etc.) as the permanent storage device. Like the permanent storage device 1235, the system memory 1225 is a read-and-write memory device. However, unlike storage device 1235, the system memory is a volatile read-and-write memory, such as random-access memory. The system memory stores some of the instructions and data that the processor needs at runtime. In some embodiments, the invention's processes are stored in the system memory 1225, the permanent storage device 1235, and/or the read-only memory 1230. From these various memory units, the processing unit(s) 1210 retrieve instructions to execute and data to process in order to execute the processes of some embodiments.
The bus 1205 also connects to the input and output devices 1240 and 1245. The input devices enable the user to communicate information and select commands to the electronic system. The input devices 1240 include alphanumeric keyboards and pointing devices (also called “cursor control devices”). The output devices 1245 display images generated by the electronic system. The output devices include printers and display devices, such as cathode ray tubes (CRT) or liquid crystal displays (LCD). Some embodiments include devices such as a touchscreen that function as both input and output devices.
Finally, bus 1205 also couples electronic system 1200 to a network 1265 through a network adapter (not shown). In this manner, the computer can be a part of a network of computers (such as a local area network (“LAN”), a wide area network (“WAN”), or an Intranet, or a network of networks, such as the Internet. Any or all components of electronic system 1200 may be used in conjunction with the invention.
Some embodiments include electronic components, such as microprocessors, storage and memory that store computer program instructions in a machine-readable or computer-readable medium (alternatively referred to as computer-readable storage media, machine-readable media, or machine-readable storage media). Some examples of such computer-readable media include RAM, ROM, read-only compact discs (CD-ROM), recordable compact discs (CD-R), rewritable compact discs (CD-RW), read-only digital versatile discs (e.g., DVD-ROM, dual-layer DVD-ROM), a variety of recordable/rewritable DVDs (e.g., DVD-RAM, DVD-RW, DVD+RW, etc.), flash memory (e.g., SD cards, mini-SD cards, micro-SD cards, etc.), magnetic and/or solid state hard drives, read-only and recordable Blu-Ray® discs, ultra-density optical discs, any other optical or magnetic media, and floppy disks. The computer-readable media may store a computer program that is executable by at least one processing unit and includes sets of instructions for performing various operations. Examples of computer programs or computer code include machine code, such as is produced by a compiler, and files including higher-level code that are executed by a computer, an electronic component, or a microprocessor using an interpreter.
While the above discussion primarily refers to microprocessor or multi-core processors that execute software, some embodiments are performed by one or more integrated circuits, such as application specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs). In some embodiments, such integrated circuits execute instructions that are stored on the circuit itself.
As used in this specification, the terms “computer”, “server”, “processor”, and “memory” all refer to electronic or other technological devices. These terms exclude people or groups of people. For the purposes of the specification, the terms display or displaying means displaying on an electronic device. As used in this specification, the terms “computer readable medium,” “computer readable media,” and “machine readable medium” are entirely restricted to tangible, physical objects that store information in a form that is readable by a computer. These terms exclude any wireless signals, wired download signals, and any other ephemeral signals.
While the invention has been described with reference to numerous specific details, one of ordinary skill in the art will recognize that the invention can be embodied in other specific forms without departing from the spirit of the invention. In addition, at least one figure conceptually illustrates a process. The specific operations of this process are not performed in some embodiments in the exact order shown and described. The specific operations are not performed in one continuous series of operations, and different specific operations are performed in different embodiments. Furthermore, the process could be implemented using several sub-processes, or as part of a larger macro process. Thus, one of ordinary skill in the art would understand that the invention is not to be limited by the foregoing illustrative details, but rather is to be defined by the appended claims.
Number | Name | Date | Kind |
---|---|---|---|
5970490 | Morgenstern | Oct 1999 | A |
6108649 | Young et al. | Aug 2000 | A |
6704747 | Fong | Mar 2004 | B1 |
6889227 | Hamilton | May 2005 | B1 |
7123608 | Scott et al. | Oct 2006 | B1 |
7146376 | Dettinger et al. | Dec 2006 | B2 |
7203678 | Petropoulos et al. | Apr 2007 | B1 |
7263590 | Todd et al. | Aug 2007 | B1 |
7346635 | Whitten et al. | Mar 2008 | B2 |
7359916 | Werner | Apr 2008 | B2 |
7480661 | Seefeldt et al. | Jan 2009 | B2 |
7620665 | George et al. | Nov 2009 | B1 |
7664795 | Balin et al. | Feb 2010 | B2 |
7693913 | Goto | Apr 2010 | B2 |
7805583 | Todd et al. | Sep 2010 | B1 |
7877397 | Nagarajan et al. | Jan 2011 | B2 |
7885968 | Hamamatsu et al. | Feb 2011 | B1 |
8015233 | Li et al. | Sep 2011 | B2 |
8275810 | Barton | Sep 2012 | B2 |
8429601 | Andersen | Apr 2013 | B2 |
8447774 | Robie et al. | May 2013 | B1 |
8533177 | Huck et al. | Sep 2013 | B2 |
8612468 | Schloming | Dec 2013 | B2 |
8631034 | Peloski | Jan 2014 | B1 |
8793797 | Meenakshisundaram | Jul 2014 | B2 |
8874609 | Singh et al. | Oct 2014 | B1 |
8898126 | Dai | Nov 2014 | B1 |
8918416 | Gross et al. | Dec 2014 | B1 |
8943181 | Kasten et al. | Jan 2015 | B2 |
8972433 | McLean et al. | Mar 2015 | B2 |
8996555 | Kuchmann-Beauger et al. | Mar 2015 | B2 |
9075811 | Nayyar et al. | Jul 2015 | B2 |
9158827 | Vu et al. | Oct 2015 | B1 |
9201606 | Taylor et al. | Dec 2015 | B1 |
9208032 | McAlister et al. | Dec 2015 | B1 |
9305070 | Zhu et al. | Apr 2016 | B2 |
9697484 | Mohen et al. | Jul 2017 | B1 |
9785645 | Chen et al. | Oct 2017 | B1 |
9811527 | Esposito et al. | Nov 2017 | B1 |
10108914 | Mohen et al. | Oct 2018 | B2 |
10120920 | Alva et al. | Nov 2018 | B2 |
10162729 | Snyder et al. | Dec 2018 | B1 |
10241960 | Kent, IV et al. | Mar 2019 | B2 |
10255336 | Waas et al. | Apr 2019 | B2 |
10261956 | Jugel et al. | Apr 2019 | B2 |
10445334 | Xiao et al. | Oct 2019 | B1 |
10585887 | Tran et al. | Mar 2020 | B2 |
10594779 | Waas et al. | Mar 2020 | B2 |
10614048 | Fuglsang et al. | Apr 2020 | B2 |
10628438 | Waas et al. | Apr 2020 | B2 |
10649965 | Barbas et al. | May 2020 | B2 |
10649989 | Lereya et al. | May 2020 | B2 |
10762100 | Waas et al. | Sep 2020 | B2 |
10762435 | Yang et al. | Sep 2020 | B2 |
10769200 | Lin | Sep 2020 | B1 |
10783124 | Barbas et al. | Sep 2020 | B2 |
10846284 | Park et al. | Nov 2020 | B1 |
11016954 | Babocichin et al. | May 2021 | B1 |
20010034733 | Prompt et al. | Oct 2001 | A1 |
20020087587 | Vos et al. | Jul 2002 | A1 |
20020133504 | Vlahos et al. | Sep 2002 | A1 |
20020161907 | Moon | Oct 2002 | A1 |
20030126156 | Stoltenberg et al. | Jul 2003 | A1 |
20030184793 | Pineau | Oct 2003 | A1 |
20040122953 | Kalmuk et al. | Jun 2004 | A1 |
20040139061 | Colossi et al. | Jul 2004 | A1 |
20040215629 | Dettinger et al. | Oct 2004 | A1 |
20050038781 | Ferrari et al. | Feb 2005 | A1 |
20050080914 | Lerner et al. | Apr 2005 | A1 |
20050234889 | Fox et al. | Oct 2005 | A1 |
20060026179 | Brown et al. | Feb 2006 | A1 |
20060136448 | Cialini et al. | Jun 2006 | A1 |
20060167896 | Kapur et al. | Jul 2006 | A1 |
20060173985 | Moore | Aug 2006 | A1 |
20060242297 | Aronoff et al. | Oct 2006 | A1 |
20070027905 | Warren et al. | Feb 2007 | A1 |
20070061266 | Moore et al. | Mar 2007 | A1 |
20070136311 | Kasten et al. | Jun 2007 | A1 |
20070219959 | Kanemasa | Sep 2007 | A1 |
20070239774 | Bodily et al. | Oct 2007 | A1 |
20070283144 | Kramer et al. | Dec 2007 | A1 |
20090024622 | Chkodrov et al. | Jan 2009 | A1 |
20090059785 | Jogalekar et al. | Mar 2009 | A1 |
20090132503 | Sun et al. | May 2009 | A1 |
20090150367 | Melnik et al. | Jun 2009 | A1 |
20090240711 | Levin | Sep 2009 | A1 |
20090319498 | Zabokritski et al. | Dec 2009 | A1 |
20100023481 | McGoveran | Jan 2010 | A1 |
20100082646 | Meek et al. | Apr 2010 | A1 |
20100094838 | Kozak | Apr 2010 | A1 |
20100169377 | Galeazzi et al. | Jul 2010 | A1 |
20110010379 | Gilderman et al. | Jan 2011 | A1 |
20110055231 | Huck et al. | Mar 2011 | A1 |
20110154461 | Anderson et al. | Jun 2011 | A1 |
20110191343 | Heaton | Aug 2011 | A1 |
20110276623 | Girbal | Nov 2011 | A1 |
20110320444 | Yehaskel et al. | Dec 2011 | A1 |
20120005148 | Horvitz et al. | Jan 2012 | A1 |
20120036146 | Annapragada | Feb 2012 | A1 |
20120102022 | Miranker et al. | Apr 2012 | A1 |
20120131567 | Barros et al. | May 2012 | A1 |
20120158650 | Andre et al. | Jun 2012 | A1 |
20120179677 | Roselli et al. | Jul 2012 | A1 |
20120221817 | Yueh | Aug 2012 | A1 |
20120296942 | Arora et al. | Nov 2012 | A1 |
20120303913 | Kathmann et al. | Nov 2012 | A1 |
20120304179 | Devarakonda et al. | Nov 2012 | A1 |
20130066948 | Colrain et al. | Mar 2013 | A1 |
20130085989 | Nayyar et al. | Apr 2013 | A1 |
20130163737 | Dement et al. | Jun 2013 | A1 |
20130179476 | Saam et al. | Jul 2013 | A1 |
20130262425 | Shamlin et al. | Oct 2013 | A1 |
20130311729 | Navarro et al. | Nov 2013 | A1 |
20130325927 | Corbett et al. | Dec 2013 | A1 |
20140095534 | Aingaran et al. | Apr 2014 | A1 |
20140136516 | Clifford et al. | May 2014 | A1 |
20140143011 | Mudugu et al. | May 2014 | A1 |
20140195514 | Stein | Jul 2014 | A1 |
20140244610 | Raman et al. | Aug 2014 | A1 |
20140325096 | Jung et al. | Oct 2014 | A1 |
20150019488 | Higginson et al. | Jan 2015 | A1 |
20150095381 | Chen et al. | Apr 2015 | A1 |
20150172206 | Anderson et al. | Jun 2015 | A1 |
20150248404 | Pazdziora et al. | Sep 2015 | A1 |
20150269223 | Miranker et al. | Sep 2015 | A1 |
20150310057 | Fu et al. | Oct 2015 | A1 |
20150339361 | Kumar et al. | Nov 2015 | A1 |
20150347585 | Klotz | Dec 2015 | A1 |
20150363396 | Sengupta et al. | Dec 2015 | A1 |
20160127325 | Odenheimer et al. | May 2016 | A1 |
20160261576 | Nivala et al. | Sep 2016 | A1 |
20160292164 | Kedia et al. | Oct 2016 | A1 |
20160308941 | Cooley | Oct 2016 | A1 |
20160321097 | Zhu et al. | Nov 2016 | A1 |
20160328442 | Waas et al. | Nov 2016 | A1 |
20160335361 | Teodorescu et al. | Nov 2016 | A1 |
20170034095 | Kamalakantha et al. | Feb 2017 | A1 |
20170063936 | Waas et al. | Mar 2017 | A1 |
20170116207 | Lee et al. | Apr 2017 | A1 |
20170116249 | Ravipati et al. | Apr 2017 | A1 |
20170116260 | Chattopadhyay | Apr 2017 | A1 |
20170116298 | Ravipati et al. | Apr 2017 | A1 |
20170124146 | Lereya et al. | May 2017 | A1 |
20170177650 | Devine et al. | Jun 2017 | A1 |
20170235790 | Bruso et al. | Aug 2017 | A1 |
20170249353 | Gulik et al. | Aug 2017 | A1 |
20170262777 | Mohen et al. | Sep 2017 | A1 |
20180081946 | Bondalapati et al. | Mar 2018 | A1 |
20180121433 | Nevrekar et al. | May 2018 | A1 |
20180144001 | Lai et al. | May 2018 | A1 |
20180150361 | Khandelwal | May 2018 | A1 |
20180173775 | Li et al. | Jun 2018 | A1 |
20180218030 | Wong et al. | Aug 2018 | A1 |
20180234327 | Kono | Aug 2018 | A1 |
20180246886 | Dragomirescu et al. | Aug 2018 | A1 |
20180285475 | Grover et al. | Oct 2018 | A1 |
20180292995 | Tal et al. | Oct 2018 | A1 |
20180329916 | Waas et al. | Nov 2018 | A1 |
20180357291 | Choi et al. | Dec 2018 | A1 |
20190065536 | Becker et al. | Feb 2019 | A1 |
20190138402 | Bikumala et al. | May 2019 | A1 |
20190155805 | Kent, IV et al. | May 2019 | A1 |
20190294763 | Coatrieux et al. | Sep 2019 | A1 |
20190303379 | Waas et al. | Oct 2019 | A1 |
20190327154 | Sahoo et al. | Oct 2019 | A1 |
20200104375 | Earnesty, Jr. et al. | Apr 2020 | A1 |
20200104376 | Earnesty, Jr. et al. | Apr 2020 | A1 |
20200104377 | Earnesty, Jr. et al. | Apr 2020 | A1 |
20200169600 | Waas et al. | May 2020 | A1 |
20200183929 | Lee et al. | Jun 2020 | A1 |
20200201860 | Vogelsgesang et al. | Jun 2020 | A1 |
20210034636 | Waas et al. | Feb 2021 | A1 |
Number | Date | Country |
---|---|---|
107122361 | Sep 2017 | CN |
2007121571 | Nov 2007 | WO |
Entry |
---|
Aleyasen, A., et al., “High-Throughput Adaptive Data Virtualization via Context-Aware Query Routing,” 2018 IEEE International Conference on Big Data, Dec. 10-13, 2018, 10 pages, Seattle, WA, USA. |
Antova, Lyublena, et al., “Datometry Hyper-Q: Bridging the Gap Between Real-Time and Historical Analytics,” SIGMOD'16, Jun. 26-Jul. 1, 2016, 12 pages, ACM, San Francisco, CA, USA. |
Antova, Lyublena, et al., “Rapid Adoption of Cloud Data Warehouse Technology Using Datometry Hyper-Q,” Month Unknown, 2017, 12 pages, Datometry, Inc. |
Author Unknown, “AWS Database Migration Service,” Month Unknown, 2017, 8 pages, Amazon Web Services, Inc., retrieved from https://aws.amazon.com/dms/. |
Author Unknown, “DBBest Technologies,” Month Unknown, 2017, 3 pages, DB Best Technologies, LLC, retrieved from https://www.dbbest.com/. |
Author Unknown, “Ispirer,” Month Unknown, 2017, 5 pages, Ispirer Systems, LLC, retrieved from https://www.ispirer.com/. |
Author Unknown, “Linked Servers (Database Engine),” Oct. 14, 2019, 5 pages, Microsoft, retrieved from https://docs.microsoft.com/en-us/sql/relational-databases/linked-servers/linked-servers-database-engine?view=sql-server-ver15. |
Author Unknown, “Replatforming Custom Business Intelligence,” Case Study, Month Unknown 2018, 2 pages, Datometry, Inc. |
Author Unknown, “Teradata Data Warehouse—End of Useful Life,” Case Study, Month Unknown 2018, 2 pages, Datometry, Inc., retrieved from https://datometry.com/resources/case-studies/replatform-teradata-end-of-useful-life-replatforming-to-cloud/. |
Gupta, Anurag, et al., “Amazon Redshift and the Case for Simpler Data Warehouses,” SIGMOD'15, May 31, 2015-June 4, 2015, 7 pages, ACM, Melbourne, Victoria, Australia. |
Non-Published Commonly Owned U.S. Appl. No. 16/599,049, filed Oct. 10, 2019, 54 pages, Datometry, Inc. |
Non-Published Commonly Owned U.S. Appl. No. 16/599,055, filed Oct. 10, 2019, 53 pages, Datometry, Inc. |
Non-Published Commonly Owned U.S. Appl. No. 16/599,057, filed Oct. 10, 2019, 54 pages, Datometry, Inc. |
Non-Published Commonly Owned U.S. Appl. No. 16/599,061, filed Oct. 10, 2019, 53 pages, Datometry, Inc. |
Non-Published Commonly Owned U.S. Appl. No. 16/599,064, filed Oct. 10, 2019, 53 pages, Datometry, Inc. |
Non-Published Commonly Owned U.S. Appl. No. 16/599,071, filed Oct. 10, 2019, 53 pages, Datometry, Inc. |
Simitsis, Alkis, et al., “Optimizing Analytic Data Flows for Multiple Execution Engines,” SIGMOD '12, May 20-24, 2012, 12 pages, ACM, Scottsdale, Arizona, USA. |
Soliman, Mohamed A., et al., “Orca: A Modular Query Optimizer Architecture for Big Data,” SIGMOD'14, Jun. 22-27, 2014, 12 pages, ACM, Snowbird, Utah, USA. |
Antova, Lyublena, et al., “An Integrated Architecture for Real-Time and Historical Analytics in Financial Services,” Month Unknown 2015, 8 pages, San Francisco, CA, USA. |
Author Unknown, “46.2. Message Flow,” PostgreSQL 9.2.24 Documentation, Chapter 46: Frontend/Backend Protocol, Nov. 9, 2017, 8 pages, The PostgreSQL Global Development Group, retrieved from https://www.jostgresql.org/docs/92/static/protocol-flow.html. |
Author Unknown, “46.5. Message Formats,” PostgreSQL 9.2.24 Documentation, Chapter 46: Frontend/Backend Protocol, Nov. 9, 2017, 11 pages, The PostgreSQL Global Development Group, retrieved from https://www.30stgresql.org/docs/92/static/protocol-message-formats.html. |
Author Unknown, “Greenplum Database: Critical Mass Innovation,” Architecture White Paper, Aug. 2010, 17 pages, EMC Corporation, San Mateo, CA, USA. |
Author Unknown, “Informatica Data Services,” Data Sheet, Month Unknown, 2015, 4 pages, Informatica LLC, Redwood City, CA, USA. |
Author Unknown, “Informatica PowerCenter Data Virtualization Edition,” Data Sheet, Month Unknown, 2013, 8 pages, Informatica LLC, Redwood City, CA, USA. |
Author Unknown, “Introduction to InfoSphere Federation Server,” IBM InfoSphere Foundation Tools IBM nfoSphere Information Server, Version 8.7.0, Oct. 1, 2011,4 pages, IBM. |
Author Unknown, “MemSQL Overview,” White Paper, Aug. 2018, 4 pages, MemSQL Inc. |
Author Unknown, “Microsoft Analytics Platform System,” Jan. 2016, 2 pages, Microsoft Corporation. |
Author Unknown, “Reference/ipcprotocol,” Feb. 13, 2017, 8 pages, retrieved from http://code.kx.com/wiki/Reference/! pcprotocol. |
Author Unknown, “Teradata Vantage,” Data Sheet, Month Unknown 2018, 3 pages, Teradata Corporation, San Diego, CA, USA. |
Author Unknown, “The 21st Century Time-series Database,” White Paper, Jul. 2016, 10 pages, Kx Systems, Inc. |
Author Unknown, “The Denodo Platform 7.0,” White Paper, Month Unknown, 2018,44 pages, Denodo Technologies. |
Bear, Chuck, et al., “The Vertica Database: Sql Rdbms For Managing Big Data,” MBDS'12, Sep. 21, 2012, 2 pages, ACM, San Jose, CA, USA. |
Bornea, Mihaela A., et al., “One-Copy Serializability with Snapshot Isolation Under the Hood,” 2011 IEEE 27th International Conference on Data Engineering, Apr. 11-16, 2011, 12 pages, IEEE, Hannover, Germany. |
Cecchet, Emmanuel, et al., “Middleware-based Database Replication: The Gaps Between Theory and Practice,” SIGMOD'08, Jun. 9-12, 2008, 14 pages, Vancouver, Canada. |
Chang, Lei, et al., “Hawq: A Massively Parallel Processing SQL Engine in Hadoop,” SIGMOD'14, Jun. 22-27, 2014, 12 pages, ACM, Snowbird, Utah, USA. |
Elnikeiy, Sameh, et al., “Tashkent: Uniting Durability with Transaction Ordering for High-Performance Scalable Database Replication,” EuroSys'06, Apr. 18-21, 2006, 14 pages, ACM, Leuven, Belgium. |
Flores, Carlos, et al., “SeDiM: A Middleware Framework for Interoperable Service Discovery in Heterogeneous Networks,” ACM Transactions on Autonomous and Adaptive Systems, Feb. 2011,8 pages, vol. 6, No. 1, Article 6, ACM, New York, NY, USA. |
Garland, Simon, “Big Data Analytics: Tackling the Historical Data Challenge,” CIOReview, Month Unknown 2014, 3 pages, retrieved from https://data-analytics.cioreview.com/cxoinsight/big-data-analytics-tackling-the-historical-data-challenge-nid-4298-cid-156.html. |
Gog, Ionel, et al. “Musketeer: all for one, one for all in data processing systems,” EuroSys'15, Apr. 21-24, 2015, 16 pages, ACM, Bordeaux, France. |
Gorman, Ciaran, “Columnar Database and Query Optimization,” Technical Whitepaper, Mar. 2013, 26 pages, Kx Systems, Inc. |
Gray, Jim, et al., “The Dangers of Replication and a Solution,” Technical Report—MSR-TR-96-17, Sigmod '96, Jun. 4-6,1996, 12 pages, ACM, Montreal, Quebec, Canada. |
Hanna, James, “Multi-Partitioned kdb+ Databases: An Equity Options Case Study,” Technical Whitepaper, Oct. 2012, 13 pages, Kx Systems, Inc. |
Heimbigner, Dennis, et al., “A Federated Architecture for Information Management,” ACM Transactions on Information Systems (TOIS), Jul. 1985, 26 pages, vol. 3, Issue 3, New York, NY, USA. |
Hwang, San-Yih, “The MYRIAD Federated Database Prototype,” SIGMOD '94 Proceedings of the 1994 Acm Sigmod International Conference on Management of Data, May 24-27, 1994, 1 page, ACM, New York, New York, USA. |
Kache, Holger, et al., “POP/FED: Progressive Query Optimization for Federated Queries in DB2,” Vldb '06, Sep. 12-15, 2006, 4 pages, ACM, Seoul, Korea. |
Lin, Yl, et al., “Middleware based Data Replication Providing Snapshot Isolation,” SIGMOD 2005, Jun. 14-16, 2005, 12 pages, ACM, Baltimore, Maryland, USA. |
Non-Published Commonly Owned U.S. Appl. No. 16/270,575 (DATO.P0003), filed Feb. 7, 2019, 36 pages, Datometry, Inc. |
Non-Published Commonly Owned U.S. Appl. No. 16/542,133 (DATO.P0005), filed Aug. 15, 2019, 70 pages, Datometry, Inc. |
Non-Published Commonly Owned U.S. Appl. No. 16/542,140 (DATO.P0006), filed Aug. 15, 2019, 70 pages, Datometry, Inc. |
Non-Published Commonly Owned U.S. Appl. No. 16/542,143 (DATO.P0007), filed Aug. 15, 2019, 71 pages, Datometry, Inc. |
Non-Published Commonly Owned U.S. Appl. No. 16/542,145 (DATO.P0008), filed Aug. 15, 2019, 71 pages, Datometry, Inc. |
Non-Published Commonly Owned U.S. Appl. No. 16/542,146 (DATO.P0009), filed Aug. 15, 2019, 71 pages, Datometry, Inc. |
Non-Published Commonly Owned U.S. Appl. No. 16/542,148 (DATO.P0010), filed Aug. 15, 2019, 71 pages, Datometry, Inc. |
Non-Published Commonly Owned U.S. Appl. No. 16/684,785 (DATO.P0011), filed Nov. 15, 2019, 54 pages, Datometry, Inc. |
Non-Published Commonly Owned U.S. Appl. No. 16/684,803 (DATO.P0012), filed Nov. 15, 2019, 54 pages, Datometry, Inc. |
Non-Published Commonly Owned U.S. Appl. No. 16/684,847 (DATO.P0013), filed Nov. 15, 2019, 54 pages, Datometry, Inc. |
Owrang O., M. Mehdi, et al., “A Parallel Database Machine for Query Translation in a Distributed Database System,” SAC '86: Proceedings of the 1986 Workshop on Applied Computing, Oct. 1986, 6 pages. |
Plattner, Christian, et al., “Ganymed: Scalable Replication for Transactional Web Applications,” Middleware '04, Oct. 18-22, 2004,20 pages, Toronto, Canada. |
Pullokkaran, Laljo John, “Analysis of Data Virtualization & Enterprise Data Standardization in Business ntelligence,” Working Paper CISL# 2013-10, May 2013, 60 pages, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA. |
Schlamb, Kelly, “dashDB: Self-service data warehousing in the cloud at any scale,” DB2 Tech Talk, Jun. 25, 2015, 55 pages, IBM Corporation. |
Wang, Jinfu, et al., “Anomaly Detection in the Case of Message Oriented Middleware,” MidSec '08, Dec. 1-5, 2008, 3 pages, ACM, Leuven, Belgium. |
Whitney, Arthur, et al., “Lots o' Ticks: real-time high performance time series queries on billions of trades and quotes,” ACM SIGMOD 2001, May 21-24, 2001, 1 page, ACM, Santa Barbara, California, USA. |
Zhang, Mingyi, et al., “Workload Management in Database Management Systems: A Taxonomy,” IEEE Transactions on Knowledge and Data Engineering—Manuscript, Month Unknown 2017, 18 pages, IEEE. |
Hohensi Ein, Uwe, et al., “Improving Connection Pooling Persistence Systems,” 2009 First International Conference on Intensive Applications and Services, Apr. 20-25, 2009, 7 pages, IEEE, Valencia, Spain. |
Number | Date | Country | |
---|---|---|---|
62890572 | Aug 2019 | US | |
62859693 | Jun 2019 | US | |
62859695 | Jun 2019 | US | |
62824994 | Mar 2019 | US | |
62817533 | Mar 2019 | US | |
62782337 | Dec 2018 | US |