Financial data is often viewed in the form of a spreadsheet containing rows and columns of figures, or data. Such spreadsheets are commonly implemented on computers, so that changes to one item may be automatically reflected in any other items, which use the altered item as a basis for a calculation. Before any such manipulation of data can occur, however, the data must be imported from storage or input by the user. Many companies and individuals now routinely enter their basic financial data into computers for such later retrieval and manipulation.
Leaf-level multi-dimensional calculation often is the bottleneck for applications dealing with large amounts of data and contributes to significant performance degradation in relational server analysis. The performance degradation is mainly caused by search space explosion and an inherent complexity of algorithmically reducing the scope. A simple calculation may result in significant adverse performance impact of a query script or in exceeding internal space restriction for the query.
Leaf-level, multi-dimensional calculation is optimized by translating the multi-dimensional calculation to a query script. The query script transposes rows into columns while reducing a number of affected rows by reversing calculation relationships. Actual calculations are then performed on calculated columns, which are generated by the query script.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
Embodiments of the present disclosure now will be described more fully hereinafter with reference to the accompanying drawings, which form a part hereof, and which show, by way of illustration, specific exemplary embodiments for practicing the invention. This disclosure may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope to those skilled in the art. Among other things, the present disclosure may be embodied as methods or devices. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. The following detailed description is, therefore, not to be taken in a limiting sense.
Illustrative Operating Environment
Referring to
In addition to program modules 106, calculation application 120 may also be executed within operating system 105. Calculation application 120 may be a business application performing calculations on multi-dimensional data to analyze and report on status, goals, progress, financial analysis, and the like. To optimize leaf-level multi-dimensional calculations, calculation application 120 may translate the calculation relationship(s) to a query script (122) that transposes rows into columns and reduces a number of affected rows for efficient resource utilization.
To perform the actions described above, calculation application 120 may include and/or interact with other computing devices and applications and application interfaces (APIs) residing in other applications such as the database shown in
Computing device 100 may have additional features or functionality. For example, computing device 100 may also include additional data storage devices (removable and/or non-removable) such as, for example, magnetic disks, optical disks, or tape. Such additional storage is illustrated in
System memory 104, removable storage 109 and non-removable storage 110 are all examples of computer storage media. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computing device 100. Any such computer storage media may be part of device 100. Computing device 100 may also have input device(s) 112 such as retail devices, keyboard, mouse, pen, voice input device, touch input device, etc. Output device(s) 114 such as a display, speakers, printer, etc. may also be included.
Computing device 100 also contains communication connections 116 that allow the device to communicate with other computing devices 118, such as over a network. Communication connections 116 are one example of communication media. Communication media may typically be embodied by computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media.
A calculation application, such as calculation application 120 of
In another embodiment, server(s) 202 may interact with database 204 that is arranged to store data for business applications. The calculation application may translate leaf-level calculations to a query script before performing the actual calculations reducing a number of operations and utilized resources such as memory significantly.
The calculation application may interact with client devices, such as handheld computer 214, desktop computer 212, and laptop computer 216, over network 210 to receive instructions such as the relationship for the calculation, provide reports, and perform other calculation related tasks. Client devices communicating with server(s) 202 may include any type of computing device, not limited to the examples shown herein. In one embodiment, the calculation application may be implemented in one of the client devices and interact with database 204 through server(s) 202.
Network 210 may be a secure network such an enterprise network, or an unsecure network such as a wireless open network. Network 210 provides communication between the nodes described above. By way of example, and not limitation, network 210 may include wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media.
Many other configurations of computing devices, communications, applications, and distribution systems may be employed to implement a multi-dimensional calculation application that uses query scripts to optimize leaf-level calculations.
Illustrative Embodiments For Optimization Of Leaf-Level Multi-Dimensional Calculations Using Query Scripts
Embodiments are directed to optimizing leaf-level multi-dimensional calculations by translating the multi-dimensional calculations to efficient query scripts. Databases store leaf-level information in rows. For calculation operations, each row has to be drawn completely leading to large numbers of memory operations in sparse data environments. On the other hand, columns provide an efficient calculation mechanism. In column format, only the rows containing data of interest are drawn for calculation.
Compared to brute force multi-dimensional expression scripts, translated scripts utilizing transposed rows may perform faster by order of magnitudes. To generate efficient query scripts, rows are transposed to columns while affected rows are reduced by reversing calculation relationships. Transposing row-to-column enables fast arithmetic calculations. Reversing the calculation relationships enables a reduction of the rows that need to be processed.
Calculation application 320 is arranged to perform calculations on leaf-level, multi-dimensional data based on user defined or selected rules. Calculation application 320 may interact with one or more user interfaces, modeling applications, and the like to receive the calculation rules and user requests to perform the calculation.
In one embodiment, calculation engine 322 translates calculation rules to a query script that is arranged to transpose rows to columns while selecting affected rows by reversing calculation relationships. The query scripts may be in Multi-Dimensional expression (MDX) syntax, Sequential Query Language (SQL) syntax, and the like. The calculation relationship may be based on a single rule or a rule set. Calculation engine 322 may generate one or more query scripts based on the calculation relationship. Once a query script is generated, it may be executed by the calculation engine, stored in volatile or non-volatile memory for later execution, etc.
When calculation engine 322 executes the query script(s), rows are transposed selectively and calculated columns generated based on individual terms of the calculation relationship. Data store engine 302 is arranged to apply operators of the calculation relationship to the calculated columns for the actual calculation rendering the result values based on the specified relationship. Data store engine 302 may be part of a separate application such as a server analysis application or part of calculation application 320.
Source data for performing the above described operations, such as transposition, selection of rows, generation of calculated columns, and actual calculation, as well as transposed column table and results may be stored in data store 304.
According to one embodiment, rows are transposed into columns and calculated columns are used to perform actual calculation. Traditional inner join approach may not perform well because large number of rows in the fact table may result in increased resource usage and system slow down. By transposing the rows, multiple join conditions are reduced to a one-pass aggregation, thus resulting in significantly increased performance gain. While transposing the rows into columns, the number of the affect rows is reduced using reverse relationships to further decrease the rows that need to be calculated.
Diagram 400 shows example data table 432 in data store 430. Data table 432 includes n dimensions, Dal through Dan and m rows, Ral through Ram. Transposition operation 452, as described in more detail below, selects rows of interest (e.g. rows Ra2 and Ra3) using reverse relationship terms and transposes them into columns such that calculated columns can be generated corresponding to terms of the calculation relationship.
Example transposed table 442 in data store 440 includes transposed columns Cb1, Cb2, and Cb3. While Cb1, and Cb2 correspond to the transposed rows, Cb3 may be used as a calculated column for one of the terms in the calculation relationship. A transposed data table (structure) may include as many columns as necessary depending on the number of terms in the calculation relationship. The transposed table may further include an additional column for the result values. Once the rows are transposed and calculated columns generated, calculation 454 may be performed by a database engine to derive result values by applying the calculation operators to the individual term values. Data stores 430 and 440 may be the same data store or distinct data stores.
Other functional blocks, data structures, and operations may be implemented using the principles described herein.
Diagram 500 includes example data on which the calculation is to be performed in table 510. Column 512 of table 510 is the “Account” dimension of the data. The “Account” dimension includes two members: accounts payable and accounts receivable. Next dimension is “Time” in column 514 of table 510. “Time” dimension includes members: 2004-1, 2005-1, 2005-2, and 2005-3. The digits after the dash refer to months of each year. Finally, column 516 includes values corresponding to the listed account types in the listed time period.
The example calculation relationship is based on the rule that a value for an accounts receivable member is to be calculated based on multiplying the accounts receivable value of the previous time period with 1.1 and adding the accounts payable value for the current time period.
The relationship is expressed in form of a pseudo script as shown in the figure. The pseudo script includes scope 522, which includes for “Time” dimension members 2005-1, 2005-2, and 2005-3; and for “Account” dimension the receivables member.
The calculation relationship is stated in multi-dimensional expression syntax. First term 526 in the relationship is formed by multiplying a value determined from a member (TC1) corresponding to an intersection of a previous “Time” member with receivables member of “Account” dimension. First term 526 is used to form first calculated column 536 in transposed data table 530. Second term 528 is formed by an intersection of a current “Time” dimension member and a payable member of the “Account” dimension. Second term 528 is used to form second calculated column 538 in transposed data table 530.
Transposition 520 is the process where rows are transposed into columns while only affected rows are selected such that calculated columns corresponding to terms in the relationship are generated. Transposed data table 530 is obtained as a result of transposition 520 by the calculation engine.
Transposed data table 530 includes first column 532 listing members of the “Account” dimension within the scope defined for the calculation. Second column 534 includes members of the “Time” dimension within the defined scope. Third column 536 (TC1) and fourth column 538 (TC2) include calculated values corresponding to each term in the calculation relationship. Thus, a reverse relationship method is used in generating third column 536 (TC1) and fourth column 538 (TC2). Fifth column 539 (Result) lists calculated values of the complete calculation. Fifth column 539 is generated by the database engine applying the operations defined by the calculation to the terms listed in third column 536 (TC1) and fourth column 538 (TC2).
For example, to determine third column 536 value for accounts receivable in 2005-1, a value of accounts receivable for the previous time period is drawn. Since no value for 2004-12 exists, TC1 has a “0” value for this term. Fourth column 538 for accounts receivable in 2005-1 is determined from a current time period value of accounts payable. In this case, the value is “100” from table 510. Once TC1 and TC2 are determined, the calculated columns can be used to determine the result by multiplying TC1 value with 1.1 (still “0”) and adding TC2 value. This operation renders the result value in first row of fifth column 539 as “100”.
For the second example, first term TC1 is calculated from previous time period value of accounts receivable again. Because the current time period is 2005-2 in this case, TC1 is “100” from fifth row of table 510. TC2 value is determined from current value of accounts payable, which is “200”. Performing the actual calculation on the calculated column values TC1 and TC2, the result value “310” is obtained.
It should be noted that during the transposition and reduction of the rows, a hierarchy within the data structure is preserved.
Other data structures, data types, hierarchies, naming conventions, and the like may be implemented using the principles described herein.
Process 600 is generally directed to determining and deploying rules for optimizing a leaf-level, multi-dimensional calculation by transposing rows into columns and reducing affected rows using reverse relationships before performing the actual calculation.
Process 600 begins at operation 602, where a calculation engine imports a data model. The data model may include calculation relationship(s), information about data structure, and the like. Processing proceeds from operation 602 to operation 604.
At operation 604, the calculation engine defines the calculation. One or more rules provided by a user may be parsed to generate a query script that defines the actions for efficient multi-dimensional calculation. Processing advances from operation 604 to operation 606.
At operation 606, the rule(s) are deployed such that they can be used for performing the calculation based on a procedural method or a definitional method. Process 600 ends after the deployment of the rule(s), and may be followed by process 700 of
Process 700 is directed to performing the leaf-level multi-dimensional calculation with transposition and reduction of rows using the rule(s) deployed in process 600 above. Process 700 may begin in a definitional system upon being triggered by a change to the data or in a procedural system upon being triggered by a user request.
In a definitional system, a determination is made, at decision operation 702, whether the data has been changed. For a definitional relationship, the rule is applied anytime the affected data is involved in an operation. The application of the rule is monitored by the system without a specific user request to perform the calculation based on the rule. If the determination is affirmative, processing moves to operation 706.
On the other hand, a specific user request is expected for executing the calculation for a procedural relationship. At decision operation 704, a determination is made whether such a user request is received. If the determination is affirmative, processing advances to operation 706.
At operation 706, select rows are transposed to columns based on the relationship specified by the calculation rule(s). The number of rows is thereby reduced using reverse relationship in selecting the rows to be transposed. In one embodiment, each transposed row generates a column for one of the calculation terms. For example, for a simple calculation expression of A=2*B+C, a column containing values for B and another column containing values for C may be generated by transposing the selected rows. Processing advances from operation 706 to operation 708.
At operation 708, the actual calculation is executed by a database engine. In the example above, the actual calculation corresponds to multiplying values of B with 2 and adding corresponding values of C to determine A. By limiting a number of operations to be performed by the database engine using the transposition and reduction of the rows, memory and processor resources are saved significantly resulting in increased performance. Processing moves from operation 708 to operation 710.
At operation 710, the results are presented to the user. Presentation of the results may include at least one of: providing result data to an application, storing the result data to the database, and the like. After operation 710, processing moves to a calling process for further actions.
The operations included in processes 600 and 700 are for illustration purposes. Optimizing leaf-level multi-dimensional calculations using query scripts may be implemented by a similar process with fewer or additional steps, as well as in different order of operations.
The above specification, examples and data provide a complete description of the manufacture and use of the composition of the embodiments. Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims and embodiments.