The present invention relates to a method of and apparatus for processing query representation, and particularly to a method of and apparatus for extraction and analysis of macro operations within query language statement
There exist various Business Intelligence (BI) tools that allow users to analyze business oriented data stored in one or more underlying data sources. Users ask business questions through a reporting tool, which generates a BI query against the underlying data source.
BI Queries responding to user's business questions, in majority of cases, lead to very complex Structured Query Language (SQL) or Multi Dimensional Expression (MDX) queries depending on the data source against which the query is executed.
Even though the SQL and MDX is not expected to be understood by users, there are quite a few scenarios when the SQL and MDX must be analyzed, the work that can be done by Information Technology (IT) personnel supporting the BI tools: for example, performance optimization of queries and databases, unexpected query results, and data access auditing to extract the database use patterns.
While definition of the BI Query semantics is theoretically enough to predict the outcome of BI Queries, when a performance or logical issue arises, the BI Server or a database administrator needs to analyze native SQL and MDX queries generated by the system, as it is the main factual evidence expresses in a independently documented language.
The challenges that SQL and MDX analysis poses include: large SQL and MDX statements require some top down analysis techniques, locating an error in SQL and MDX statements, analysis of the relationships of the SQL/MDX generated in a Query Engine system with the rest of Execution plan (Run Tree), profiling the efficiency of SQL and MDX statements blocks. Tools are needed to help in understanding the various blocks and components in these SQL/MDX statements.
Modern database systems are equipped with the tools simplifying the analyses of the SQL/MDX statements. However, generation of a high level view of SQL statements has been limited to the extraction of joining structure of the outmost select query.
Therefore, there is a need to provide a mechanism that can provide better representation of the SQL/MDX statements.
It is an object of the invention to provide an improved method of and apparatus for processing query representation that obviates or mitigates at least one of the disadvantages of existing systems.
The invention extracts and analyzes macro operations within a query language statement.
In accordance with an aspect of the present invention, there is provided a method of processing query representation comprising receiving a query to analyze, the query containing at least a statement; analyzing the statement to determine a query structure of elements forming the statement; formatting text of the statement to reflect the query structure; replacing the formatted text with an equivalent tree of logical blocs representing the statement in a higher level; shortening the representation of one or more of the logical blocks; and generating a data flow tree of macro operation nodes based on the logical blocks for presentation to a user.
In accordance with another aspect of the present invention, there is provided a query representation processing system comprising a query statement structure analyzer, a statement text format manager, a logical blocks representation manager and a tree representation manager. The query statement structure analyzer is provided for receiving a query and analyzing statements in the query to determine a query structure of elements forming the statement. The statement text format manager is provided for formatting text of the statement to reflect the query structure, and replacing the formatted statement text with an equivalent statement that visually expresses logical blocs comprising the statement in a higher level. The logical blocks representation manager is provided for shortening the representation of one or more of the logical blocks based on abbreviation rules. The tree representation manager is provided generating a data flow tree of macro operation nodes based on the logical blocks for presentation to the user.
In accordance with another aspect of the present invention, there is provided a computer program product for processing query representation, the computer program product comprising a computer usable medium having computer usable program code embodied therewith. The computer usable program code comprises computer usable program code configured to perform a method comprising receiving a query to analyze, the query containing at least a statement; analyzing the statement to determine a query structure of elements forming the statement; formatting text of the statement to reflect the query structure; replacing the formatted text with an equivalent tree of logical blocs representing the statement in a higher level; shortening the representation of one or more of the logical blocks; and generating a data flow tree of macro operation nodes based on the logical blocks for presentation to a user.
This summary of the invention does not necessarily describe all features of the invention.
These and other features of the invention will become more apparent from the following description in which reference is made to the appended drawings wherein:
As will be appreciated by one skilled in the art, the present invention may be embodied as a system, method or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit”, “module” or “system”. Furthermore, the present invention may take the form of a computer program product embodied in any tangible medium of expression having computer usable program code embodied in the medium.
Any combination of one or more computer usable or computer readable medium(s) may be utilized. The computer-usable or computer-readable medium may be, for example but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CDROM), an optical storage device, a transmission media such as those supporting the Internet or an intranet, or a magnetic storage device. Note that the computer-usable or computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted, or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory. In the context of this document, a computer-usable or computer-readable medium may be any medium that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. The computer-usable medium may include a propagated data signal with the computer-usable program code embodied therewith, either in baseband or as part of a carrier wave. The computer usable program code may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc.
Computer program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
The present invention is described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable medium produce an article of manufacture including instruction means which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
Referring now to
The query representation processing system 100 receives the queries generated by the report system 20, and processes the queries to generate visual high level representations of the queries that visually represent significant operations, i.e., macro operations, within query languages statements. The high level representation of query language statements can open the macro operations up for external component access.
The method shown in
The top down analysis 204 of SQL/MDX statements breaks a statement into a hierarchy of blocks or elements, representing significant elements of the statement.
The significant logical elements that are recognized in SQL statements include (1) Table operation blocks, (2) Select blocks, and (3) collections of elements. A Table operation block (1) may include Select blocks; Joins; Table set operations, such as, UNION, INTERSECT, and EXCEPT; and/or References to database Tables, Views, and Table Functions. A Select block (2) in turn can be broken into the fragments, such as “Projection” item expressions, following the starting Select statement keywords; “From clause” block, referencing one or join/cross-product of several tables; “With block”, a collection of derived tables used in multiple contexts; “Where clause”, expressing pre-grouping filtering; “Having clause”, expressing post-grouping filtering; “Group-by clause”; “Order-by clause”; and/or “Request options” block, dialect specific, containing options like Fetch First. The collection of elements (3) includes “Projection”, “With block”, “Group-by clause”, and “Order-by clause” Elements of these collections can also be recognized significant components of the statement structure.
The significant logical elements that are recognized in MDX statements include (1) Select statement; and (2) collections of elements. A Select statement (1) can be broken into the fragments, such as “With block”, a collection of calculated members and named sets definitions; “Axis expressions”, following the starting Select statement keywords; “From clause” block, referencing a cube; and/or “Where clause” expressing a slicer. The collections of elements (2) include “With block”, and “Axis expressions”. Elements of these collections can also be recognized significant components of the statement structure.
The groups of the significant logical elements are described above from the large scope to lower scope structural elements. They are treated as levels of the top-down analysis of the SQL/MDX statements.
While exact syntax of expressions used in different statement parts vary, all expressions in SQL and MDX statements bear the main components including Literals; Brackets that play the role of operation grouping; Unary operators; Binary operators; Operators involving more than two parameters; and Functions. There are more complex syntactical constructs, such as the CASE operator in SQL.
From the point of view of top-down visual analysis of the expressions, it is also noted that the associative binary operations may form clusters of operations that do not change their meaning depending of the sequence of the operator evaluation. The important operators of this type are: “AND”, “OR” and “+”. For example, “AND” in (T1.A=T2.A) AND (T1.B=T2.B) AND (T1.C=T2.C) is this type of operators.
The above top-down analysis determines the query structure of the SQL/MDX statement. This query structure of the SQL/MDX statement is used to format the SQL/MDX structure text (280 in
The text formatting is done according to formatting rules, by inserting spaces and line breaks in a language string, given these symbols are largely ignored by syntax rules, apart from being separators between language tokens. Thus, the SQL/MDX statement is broken into a tree of structural blocks described above.
In the text formatting approach, child blocks are padded with spaces or tab characters to appear shifted to the right versus there parent blocks. A first child may be put on the same line as parent block. Sibling blocks are placed on new lines with the same padding space. Alternatively, the XML or HTML tags can be inserted into the text to identify the beginning and the end of every block. HTML formatting rules may then be applied to make the child blocks to be shifted to the left from the left boundaries of the parent blocks.
For example, the following demonstrates how these formatting rules work in the case of the UNION combining two Select blocks:
Assuming a level padding on the left is based on two spaces: the above statement is formatted as follows:
The structure of this presentation can be thought of in the form of hierarchy of blocks, as shown in
The formatting process transitions this presentation to more detail presentation. The diagram shown in
For example, the original query looks as follows after transitioning to the lower level of details:
In the block form, this detail presentation can be seen as the structure shown in
The formatting process further transitions the detail presentation to a higher level view. Using the example of the original query formatted up to the sections of select statements shown in
In terms of the structure, this represents the SQL statement as a tree of table operations, as shown in
The view of the block structure shown in
In this case, the block structure is also significantly different as shown in
Associating larger structural parts with visualization logical blocks can be used in order to provide a higher level view, as described above and shown in
However, those logical blocks identified in this process may be populated by a very long multi-line string, which may lose the value of the higher level view of the SQL statement. In that case, the block representation shortening process (210 in
For example, the query representation shown in
Another example to further demonstrate simplification of the representation is provided for the following SQL statement:
A shortened version of this SQL statement is:
The abbreviation rules for block text representation used in the shortening process are now described. A logical block has four states in respect to its text representation: (1) closed; (2) open abbreviated, children closed; (3) open, children in abbreviated state; and (4) open, children open.
The abbreviated form is enclosed in square brackets: [ ]. The abbreviated form of block text representation is defined per block type. The abbreviated form can be either based on a substring of the text that belongs to this block, excluding the text of its children, as follows:
Expressions have no abbreviation;
SQL Projection items are abbreviated by their aliases;
SQL From-items are abbreviated by their aliases;
SQL Select blocks are abbreviated by the keywords with which the blocks start;
SQL Derived table aliases are abbreviated by their alias;
MDX Calculated member are abbreviated by their names;
MDX Named sets are abbreviated by their names; and
MDX axis are abbreviated by the axis identifier.
If no text is chosen for the abbreviation and its children are in a closed state, the ellipsis symbol “ . . . ” is used under the brackets.
The visualization rules, including the formatting rules used during the statement text formatting process and/or the abbreviation rules used during the block representation shortening process, may be declared.
The declaration of visualization rules is now described for the embodiments that use Backus-Naur Form (BNF). The structure of SQL/MDX concepts can be aligned with the syntactic elements of these languages. For example, the following trimmed BNF is considered for a sample dialect of SQL syntax:
The significant blocks of a SQL statement identified above can be associated with the elements of the BNF, as shown in Table 1.
Hence, the BNF rules can be extended to mark those syntactical elements in the BNF notation. The BNF notations are associated with visualization blocks as follows.
The formatting rules may be defined as extension of syntax rules. Extending description of BNF elements with formatting instructions can drive the visualization of the corresponding query language.
There are two properties that drive language formatting: (1) a flag marking a syntactical element to be associated with a formatting block, and (2) parts of the element to be used for its abbreviation. These properties can be incorporated into BNF notion, for example, using the method below:
Alternatively, a separate table can be used, listing all syntax elements formatted as blocks with their abbreviation rules and syntax element that are not formatted individually but have also an non-trivial abbreviation rule. Table 2 shows an example of such a table listing the syntax elements.
The interpretation of the abbreviation rule is as follows. The abbreviation rule lists components comprising the shortened form of the element in a sequence that they appear in the result string. The components of abbreviation base on references to syntax elements imply that the abbreviated form of the referenced element is to be used. For elements that do not have an explicitly defined abbreviated form, it is implied that the whole string is to be used. The results of the abbreviation rules may be stored temporarily in memory while they are being generated, and then in the end combined into the overall view of the shortened query representation.
The options (separated by the symbol ‘|’) in the abbreviation rules are checked in a sequence. If the first option rendered an empty string, then the second one is checked, and so on.
As an abbreviation rule references child syntax elements, there should be only one reference to a given type of syntax element used in an abbreviation rule in order to make the reference unambiguous. If an abbreviation rule happens to reference another element few times, this abbreviation rule needs to be disambiguated. For example, the abbreviation rule:
A.abbreviation ::=B
for the element defined as:
A ::=B AND B
Is ambiguously defined. The ambiguity of the abbreviation rule can be fixed by replacing the element definition as:
A ::=B AND B2
B2 ::=B
The process of generating the tree representation of a statement (214 in
The tree representation is comprised by the tree nodes with titles and body content pre-defined based on the logical blocks of the statement, as exemplified in
As tree nodes are logically independent of parent nodes, they can be analyzed separately. For example, the tree nodes can be executed to show results. Also, the tree nodes can be profiled to capture their performance, though the performance of these nodes is not always independent of their parent node context as, for example, parent node filtering can be passed down to their children nodes. Still independent profiling of tree nodes can identify those nodes that contain inefficient function/operations.
The query statement structure analyzer 110 receives a query and analyzes statements in the query to determine the query structure of elements or blocks forming the statement. The statement text format manager 120 formats the text of the statement based on formatting rules to reflect the query structure, and replaces the formatted statement text with an equivalent statement that visually expresses logical blocs comprising the statement in a higher level. The logical block representation manager 130 shortens the representation of one or more of the logical blocks based on abbreviation rules. The visualization rules manager 140 manages declaration of visualization rules including the formatting rulers and abbreviation rules. The tree representation manager 150 generates a data flow tree of macro operation nodes based on the logical blocks for presentation to the user.
The table operation analyzer 111 analyzes table operations in SQL statements to identify select blocks and other components of the table operation definition for which further analysis should be made. The table operation analyzer 111 executes a table operation when the analysis of the table operation needs to understand results of the table operation. The select block analyzer 112 reviews select statement blocks to understand their semantics and identify components in the select statement blocks on which the analysis should be focused.
The component analyzer 113 analyzes the identified components to identify a component that is based on an expression. The referenced element analyzer 114 analyzes referenced elements for a component that is based on a reference to another element. The expression analyzer 115 analyzes the expression by structuring the expression to reflect the sequence of operations. The expression analyzer 115 may break the expression into multiple sub-expressions to focus on a particular part of the expression.
The MDX element analyzer 116 analyzes a list of axis, slicer, calculated members and named sets in the MDX statements to identify components for which further analysis should be made.
The query statement structure analyzer 110 uses the table operation analyzer 111, select block analyzer 112, component analyzer 113, referenced element analyzer 114 and expression analyzer 115 to perform the top-down analysis of a SQL statement, as described above referring to
The query statement structure analyzer 110 uses the MDX element analyzer 116, component analyzer 113, referenced element analyzer 114 and expression analyzer 115 to perform the top-down analysis of a MDX statement, as described above referring to
The text structure modifier 121 inserts spaces and line breaks in a language string, according to the formatting rules to brake the statement into a tree of structural blocks. The text structure modifier can alternatively insert XML or HTML tags to reflect the beginning and the end of the structural blocks. The block details handler 122 transitions the presentation of the tree of structural blocks to more detail presentation. The higher level view handler 123 transitions the detail presentation to a higher level view represented by a tree of logical blocks.
As described above, the query representation processing system 100 applies a flexible method of macro operation extraction out of SQL/MDX statements. Thus, SQL/MDX statements are decomposed using a general notion of the macro operation providing a flexibility of what is recognized as a macro operation within these statements. The query representation processing system 100 can expose SQL/MDX statement components for interaction with service provides in a query framework system that processes queries using various loosely coupled service providers. This method exposes macro components of the SQL/MDX statements for logging, debugging and profiling functionality developed for the service providers of the query framework system. The query representation processing system 100 makes the operations based on SQL and MDX statements to be more deeply integrated in the query framework system. The query framework system allows for maximum flexibility for query interception, interpretation, and transformation. On the other hand, it facilitates the reuse of the functionality of components already available in the system. The query framework system provides the base for pluggable component architecture; querying multidimensional, relational, and vendor query datasources; data agnostic query service: supporting same operations regardless of a datasource with a consistent plug-in API; and cross-datasource joins.
The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.