The present invention relates to a system and method for providing suppression of zero, null, overflow, and divide by zero values in sparse query results.
Many organizations use data stores for storing business data, such as financial data and operational data. In order to assist business users to examine their data, various data analyzing applications are proposed. Those data analyzing applications provide various views or reports of data to users. The data analyzing applications have query engines that access the underlying data stores to obtain desired data. Resultant reports with the desired data are often used as Business Intelligence reports such that report consumers can analyze various Business Intelligence of their organization. Some data analyzing applications have Online Analytical Processing (OLAP) engines to provide multidimensional views of data.
A Business Intelligence report consumer often faces a problem when the underlying queries in Structured Query Language (SQL) and/or Multi Dimensional Expression (MDX) for his/her report return data with a large number of zero values. In this case, the report consumer often likes to have access to a quick filtering method that removes the zero values from his/her report. However, Business Intelligence users needed to use complex filtering expressions in order to deal with this data sparcity problem.
It is therefore desirable to provide an easy filtering of unwanted values from in sparse query results.
It is an object of the invention to provide an improved system and method for providing suppression of unwanted values, e.g., zero, null, overflow, and divide by aero, in sparse reports of query results that obviates or mitigates at least one of the disadvantages of existing systems.
The invention uses a suppression provider that saves suppression criteria in a query plan, and suppresses values and/or edge values based on the suppression criteria during the execution of the query plan.
In accordance with an aspect of the present invention, there is provided a suppression provider for suppressing unwanted values in reports. The suppression provider comprises a query preparation unit and a query execution unit. The query preparation unit is provided for analyzing a request from a client for data from one or more underlying data sources to obtain suppression criteria from an edge suppression specification in the request, and creating a query plan and saving the suppression criteria in the query plan. The query execution unit is provided for receiving execution result data including values, each associated with one or more edge values, determining if each value is to be suppressed based on the suppression criteria, and identifying one or more edge values to be suppressed when an edge value has only values that are determined to be suppressed such that values and/or edge values identified to be suppressed are suppressed in a resultant report.
In accordance with another aspect of the present invention, there is provided a method of suppressing unwanted values from a report. The method comprises receiving a request from a client for data from one or more data sources; obtain suppression criteria from an edge suppression specification in the request; generating a query plan and saving the suppression criteria in the query plan; receiving execution result data including values, each value being associated with one or more edge values; determining if each value is to be suppressed based on the suppression criteria; identifying one or more edge values to be suppressed when an edge value has only values that are determined to be suppressed; and suppressing values and/or edge values identified to be suppressed in a resultant report.
In accordance with another aspect of the present invention, there is provided a computer readable medium storing instructions or statements for use in the execution in a computer of a method of suppressing unwanted values from a report. The method comprises receiving a request from a client for data from one or more data sources; obtain suppression criteria from an edge suppression specification in the request; generating a query plan and saving the suppression criteria in the query plan; receiving execution result data including values, each value being associated with one or more edge values; determining if each value is to be suppressed based on the suppression criteria; identifying one or more edge values to be suppressed when an edge value has only values that are determined to be suppressed; and suppressing values and/or edge values identified to be suppressed in a resultant report.
This summary of the invention does not necessarily describe all features of the invention.
These and other features of the invention will become more apparent from the following description in which reference is made to the appended drawings wherein:
As will be appreciated by one skilled in the art, the present invention may be embodied as a system, method or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit”, “module” or “system”. Furthermore, the present invention may take the form of a computer program product embodied in any tangible medium of expression having computer usable program code embodied in the medium.
Any combination of one or more computer usable or computer readable medium(s) may be utilized. The computer-usable or computer-readable medium may be, for example but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CDROM), an optical storage device, a transmission media such as those supporting the Internet or an intranet, or a magnetic storage device. Note that the computer-usable or computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted, or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory. In the context of this document, a computer-usable or computer-readable medium may be any medium that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. The computer-usable medium may include a propagated data signal with the computer-usable program code embodied therewith, either in baseband or as part of a carrier wave. The computer usable program code may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc.
Computer program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
The present invention is described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable medium produce an article of manufacture including instruction means which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
Referring now to
The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.
Referring to
The suppression provider 100 is implemented as a component in the query framework system 10 which is a system of components with different responsibilities: query planners providing the translation of the system query language into the query language of underling system (e.g. SQL, MDX); query transformation providers responsible for preprocessing of the query; and feature adapters responsible for local operations, where any component can be replaced, new components can be added or extracted out the system with the minimum disruption to the system.
When a BI report, ad hoc query, analysis, or visualization is submitted by the interface components against an OLAP or a Relational data source, the suppression provider 100 intercepts the associated BI query if it has an edge suppression specification. The suppression provider 100 plans the query by relying on the associated metadata model, creates a plan of execution. Then, it intercepts the results during execution and locally processes the results from the OLAP or relational planners by filtering zero, null, overflow, and divide by zero values.
Prior to describing the suppression provider 100 further, the query framework system 10 is described. The query framework system 10 intercepts user requests generated by the data analyzing system 30. It processes and executes the user requests to retrieve desired data from the data sources 40. A user request is in a query language that the query analyzing system 30 uses to issue the user request. The user request includes one or more queries.
As shown in
Each query operation provider 15 is capable of performing a specific operation on queries, as further exemplified below. In
The query framework system 10 uses a query framework (QF) query. A QF query plays the role of a query specification that the query operation providers 15 use to communicate with each other and with the coordination planner 16 within the query framework system 10. The QF query definition is an extension of the user request specification defined by the data analyzing system 30. The QF query has one or more query sections or patterns.
The query framework system 10 divides the query processing into two phases: query planning or preparation phase and a query execution phase. The final result of the query planning process phase is a query execution plan, which is executed during the query execution process phase. During the query preparation phase, the coordination planner 16 interacts with query operation providers 15 in order to identify and plan the operations associated with each provider 15, and to determine the sequence of these operations expressed in an execution plan. The coordination planner 16 may use one or more query operation providers 15 during the query preparation phase. During the query execution phase, the coordination planner 16 distributes the query operations to associated query operation providers 15, invoking the query operation providers 15 in accordance with the sequence expressed by the execution plan determined at the preparation phase.
In this example, the query framework system 10 has two having two types of query operation providers 15: planner providers 70-78, 100, and query transformation providers 80-92. The query planner providers replace the QF query representing the received user request with one or more provider queries. Each provider query is executable by an associated query provider, i.e., it has no children query patters and hence do not need input data streams during the execution phase. Planner providers support execution of a provider query without accepting incoming data streams. In this example, the query framework system 10 has relational query planner provider (RQP) 70, OLAP query planner provider (OQP) 72, metadata query provider (MQP) 74, drill through provider (DTP) 76, OLAP data providers (ODP) 78 and suppression provider 100. Query transformation providers are responsible for preprocessing of a QF query for the consumption of the transformed query by other query operation providers. In this example, the query framework system 10 has a multi dimensional operation provider 80, tabular operation provider 82, tabular function provider 84, no data mode provider 86, master detail provider 88, reporter mode provider 90 and cube builder provider 92. In a different embodiment, a different set of query operation providers may be used.
The query framework system 10 facilitates reuse of the planning logic compiling all query operation provider actions in a single execution plan, as further described in United States patent application publication No. US 2006/0294076 A1 published Dec. 28, 2006, which is hereby incorporated by reference.
The suppression provider 100 in accordance with an embodiment of the present invention is now described in details.
The suppression provider 100 provides easy filtering of unwanted values by suppressing those values and/or edge values having those values in sparse reports of query results. Unwanted values may be zero, null, overflow, and/or divide by zero values. An edge value is a value in an edge, e.g., corresponding to a specific row or a specific column in a crosstab. A list has a single edge, which may be represented as a tree of nodes. In a list, an edge value may be also considered as a row in the edge rowset.
By suppressing these unwanted values, the data sparsety problem can be solved, and the resultant reports can be more compact, containing useful data only. It becomes easy to use and analyze the reports.
A data analyzing system issues a user request of a report. The user request includes one or more queries to obtain desired data from one or more underlying data sources and layout information of the data in the report. When a user request is received and translated into a QF query representing the user request, the coordination planner 16 invokes the query operation providers 15 in a specified order to prepare the QF query for execution. During this query preparation phase, each invoked query operation provider 15 receives the QF query from its previous query operation provider and processes a relevant section or pattern of the QF query to generate a provider query containing a query plan, and replaces the relevant pattern of the QF query with the generated provider query. The processed QF query is passed onto the next query operation provider for further planning.
When the suppression provider 100 is invoked, the query preparation unit 120 receives the QF query from the previous query operation provider 15 in the query planning sequence.
In the query preparation unit 120, the edge suppression specification analyzer 122 analyzes the received QF query to see if it contains an edge suppression specification. An edge suppression specification is defined in the user request to specify suppression criteria, including information as to what type(s) of values should be suppressed, and what edge value(s) should be considered for possible suppression, e.g., rows only, columns only or a combination of rows and columns. When the edge suppression specification is found, the edge suppression specification analyzer 122 extracts the suppression criteria. The query plan generator 124 generates a provider query containing a query plan of the suppression provider 100. The query plan generator 124 stores the extracted suppression criteria in the query plan. The QF query partially replaced with the query plan of the suppression provider 100 is passed onto the next query operation provider 15′ in the planning sequence.
During the query execution phase, data is obtained from the underlying data sources. The obtained data includes values, each associated with one or more edge values, depending on the layout information in the user request. Values are cell values in a case of a cross tab, and edge measure values in a case of a list.
When the data is passed onto the suppression provider 100 from the previous query operation provider 15′, the query execution unit 140 receives the obtained data and processes the data based on the query plan of the suppression provider 100 generated by the query preparation unit 120 during the query preparation phase.
In the query execution unit 140, the original edge capturer 152 of the edge processor 150 captures edges information in the received data, and stores it in the edge snapshot 154. The value analyzer 160 analyzes the values of the data based on the suppression criteria stored in the query plan. For a crosstab, the cell value analyzer 162 analyzes each cell value to identify cells having values that meet the suppression criteria. For a list, the edge measure value analyzer 164 analyzes each edge measure value to identify edge measure values having values that meet the suppression criteria. The edge mask constructor 172 of the edge mask processor 170 finds which edge values should be analyzed for possible suppression based on the suppression criteria, e.g., if it should analyze only rows, only columns or both rows and columns in a case of a crosstab. It determines if all cells or edge measure values associated with each edge value are identified as having met the suppression criteria and to be suppressed. When the edge mask constructor 172 determines that a specific edge value meets this condition, it sets a mask to the edge value indicating that this edge value should be suppressed, and stores the mask information in association with the edge value in the mask store 174.
The mask information is used to update the edge snapshot 154.
The suppression handler 180 uses the updated edge information stored in the edge snapshot 154 and the mask information stored in the mask store 174, and passes to the client or next query operation provider 15 only the report values which are not identified to be suppressed. The report values are cell values and edge values in a crosstab, and edge measure values in a list. Thus, the suppression provider 100 can suppress unwanted values and/or edges having all unwanted values in the resultant report.
The operation of the suppression handler 180 is further described using examples of crosstabs and lists. The term “crosstabs” is used below as a short for crosstabs and charts.
For a crosstab, during the preparation phase, the query preparation unit 120 generates a query plan, which represents a prepared crosstab query result set definition (QRD). The suppression criteria saved in the query plan includes (a) value suppression criteria, i.e., what types of values should be suppressed: e.g., null, overflow, zero, or divide by zero, or any combination thereof; and (b) edge suppression criteria, i.e., which edges should be suppressed: e.g., each rows edge, each columns edge, or a pair edge of corresponding rows and columns.
For a list, during the preparation phase, the query preparation unit 120 generates a query plan, which represents a prepared list QRD. The suppression criteria saved in the query plan includes (a) value suppression criteria, i.e., what types of values should be suppressed: e.g., null, overflow, zero, or divide by zero, or any combination thereof; and (b) edge suppression criteria, i.e., which data items should be suppressed: e.g., a list identifying each data item referenced in the QRD as being a measure or not.
An example of the processing logic used by the query preparation unit 120 during the preparation or planning phase is now described. The query preparation unit 120 scans the request. For any multi-edge query result definition (QRD) found in the request, the query preparation unit 120 identifies row/column/cells edges. The query preparation unit 120 identifies all the QRDs having at least one <edge> descendant having an <edgeSupprssion> sub-element as candidates for suppression and collects them. For each of the candidates, the query preparation unit 120 (1) removes the original QRD from its parent element, and (2) creates a provider query. This provider query replaces the QRD's corresponding <query> element and has the following characteristics: it contains a plan having the structure described above; and it is sourced on a query/QRD combination that is a clone of the original one and that has the <edgeSuppression> tags removed.
The query preparation unit 120 do not modify all other QRDs and their corresponding queries. The query preparation unit 120 disregards as candidates for suppression some special-case crosstabs that have no cells, e.g., one-edge crosstabs or two-edged crosstabs with no measure.
Also, for lists, the query preparation unit 120 marks all the referenced data items as measures or non-measures according to their expression and its model references.
Finally, the query preparation unit 120 removes all the <edgeSuppression> tags.
An example of the processing logic used by the query execution unit 140 during the execution phase is now described. The input of the query execution unit 140 is the master dataset, which is called “original dataset”, produced by the next active query operation provider in the planning processing sequence, which is called “the underlying provider”. This is the underlying provider's implementation of the query framework result set API.
The output of the query execution unit 140 is the suppression provider 100 master dataset. This is the suppression provider 100 implementation of the query framework result set API. A set of implementations that allow the client of the ZS functionality, which is called the “client”, to navigate through the original dataset by skipping all the cell rows and/or columns satisfying the suppression criteria and their corresponding edge values in the case of crosstabs.
The information saved in the query plan during the prepare/plan phase is used during the execution phase. To assist in the implementation of these navigational methods, the query execution unit 140 performs the following processing.
For crosstabs, the query execution unit 140 scans the underlying cells set. The query execution unit 140 employs specific logic to pre-determine whether the scanning should be done in a row-major or column-major order. For each edge to be suppressed, the query execution unit 140 constructs a vector of Boolean values, which is called a “mask”. The query execution unit 140 employs negative logic, through which the query execution unit 140 inserts a false (F) entry into the mask if at least one of the values in the cells having the edge coordinate corresponding to the current index in the mask does not satisfy the suppression criteria. The query execution unit 140 inserts a true (T) entry into the mask otherwise.
Next, for each edge, which is internally represented as a tree structure due to the nesting, cross-product-ing and summarization, the query execution unit 140 constructs an internal tree representation of the underlying edge, which is called “edge snapshot”, by fully navigating that underlying edge. The query execution unit 140 stores in the process all the parentage information between the nodes, as well as other additional original information. Then, the query execution unit 140 marks all the nodes in the edge snapshot as T or as F according to the following set of rules, wherein T means that it should be skipped/suppressed, and F means that it should not be skipped/suppressed: all nodes are by default set to T; if a node is a leaf and it corresponds to an edge mask value of F, it is marked as F; and if a node is F, its parent is marked as F.
For lists, the processing logic during the execution is similar to that for the crosstab case. A difference is that the query execution unit 140 constructs an edge snapshot employing the following logic or algorithm which is a non-negative type logic. List edges are conceptually considered as nodes in a list tree. Each row in the list edge rowset, i.e., each node in the list tree, is being marked as “T” if it needs to be suppressed, or as “F” otherwise. By default all the nodes, i.e., rows, are marked as T.
The query execution unit 140 performs two passes:
The query execution unit 140 also performs extra processing: if a row is a footer or header with no corresponding details, suppress it, and its ancestor if any.
For both crosstabs and lists, the query execution unit 140 redirects all the edge and cells client navigation using the data structures having the masks, as described above.
In addition, all the requests from the client regarding the report metadata, which is descriptive information about the edges/cells, e.g., data type, precision and number of columns, are intercepted by the suppression provider 100 and redirected to the underlying provider's corresponding metadata calls, typically after being subject to some preparatory transformations.
The query execution unit 140 uses specific logic to allow the client page-size navigation/rendering, e.g., page down, page up, first page, and last page.
As described above, the suppression provider 100 and the method carried out by the suppression provider 100 allow suppression of rows and/or columns in sparse BI reports, lists, crosstabs, and charts. Thus, suppression of values in sparse data can be performed using simple UI gestures. For a BI user, it is faster and more reliable to use simple UI than to edit complex filtering expression in order to solve the data scarcity problem in his report or analysis. Also, since the suppression provider is a part of the query framework system, it can handle various types of reports, e.g., crosstabs, lists and charts, without being limited to specific reports based on specific data sources.
The query framework system 10 allows for maximum flexibility for query interception, interpretation, and transformation. On the other hand, it facilitates the reuse of the functionality of components already available in the system. Thus, the query framework system 10 provides the base for pluggable component architecture; querying multidimensional, relational, and vendor query data sources; data agnostic query service: supporting same operations regardless of a data source with a consistent plug-in API; and cross-data source joins.
The integration of the suppression provider 100 within the query framework system 10 allows it to leverage the openness of the query framework system 10 to a variety of data base technologies serving a variety of BI and CPM needs.
The suppression provider of the present invention may be implemented by any hardware, software or a combination of hardware and software having the above described functions. The software code, instructions and/or statements, either in its entirety or a part thereof, may be stored in a computer readable memory. Further, a computer data signal representing the software code, instructions and/or statements may be embedded in a carrier wave may be transmitted via a communication network. Such a computer readable memory and a computer data signal and/or its carrier are also within the scope of the present invention, as well as the hardware, software and the combination thereof.
While particular embodiments of the present invention have been shown and described, changes and modifications may be made to such embodiments without departing from the scope of the invention. For example, the elements of the suppression provider are described separately, however, two or more elements may be provided as a single element, or one or more elements may be shared with other components in one or more computer systems.
This application claims the benefit of U.S. Provisional Application No. 61/059,390, filed Jun. 6, 2008.
Number | Date | Country | |
---|---|---|---|
61059390 | Jun 2008 | US |