The present disclosure relates generally to data processing and, more specifically, to computer-implemented systems and methods for processing multi-dimensional data structures.
On-line Analytical Processing (OLAP) technology may enable users to analyze multi-dimensional data. Some applications of OLAP include business reporting for sales, marketing, management reporting, forecasting, and financial reporting. Large amounts of data can be analyzed for OLAP applications, and such data can be organized into multi-dimensional data cubes. Sometimes, each dimension of a multidimensional cube may represent a different type of data.
In accordance with the teachings described herein, systems and methods are provided for processing a multi-dimensional data structure represented as multi-dimensional cubes. A first multi-dimensional cube and a second multi-dimensional cube may be received, the first multi-dimensional cube including first cube property data and first user data, the second multi-dimensional cube including second cube property data and second user data. A virtual multi-dimensional cube including virtual cube property data for accessing and performing computer-based operations upon the first user data and the second user data may be generated, the virtual cube property data including a first mapping from the first cube property data to the virtual cube property data and a second mapping from the second cube property data to the virtual cube property data.
For example, a third multi-dimensional cube is received, the third multi-dimensional cube including third cube property data. The third cube property data is combined into the virtual cube property data, the virtual cube property data further including a third mapping from the third cube property data to the virtual cube property data. The first multi-dimensional cube includes one or more first dimensions, a first dimension including one or more first dimension levels, the first cube property data including information associated with the first dimensions and the first dimension levels. The second multi-dimensional cube includes one or more second dimensions, a second dimension including one or more second dimension levels, the second cube property data including information associated with the second dimensions and the second dimension levels. As another example, the virtual multi-dimensional cube includes one or more virtual dimensions corresponding to the first dimensions and the second dimensions, a virtual dimension including one or more virtual dimension levels corresponding to the first dimension levels and the second dimension levels.
In one example, the first mapping includes data associated with mapping the first dimensions to the virtual dimensions and data associated with mapping the first dimension levels to the virtual dimension levels. The second mapping includes data associated with mapping the second dimensions to the virtual dimensions and data associated with mapping the second dimension levels to the virtual dimension levels. In another example, the first multi-dimensional cube includes an on-line analytical processing (OLAP) cube, and the second multi-dimensional cube includes an on-line analytical processing (OLAP) cube. Data of the first multi-dimensional cube is stored on one or more first nodes in a first connected grid of computers, and data of the second multi-dimensional cube is stored on one or more second nodes in a second connected grid of computers.
As an example, the virtual cube property data includes information selected from the group consisting of locations of the first user data and the second user data, version information of a software used to create the virtual multi-dimensional cube, information associated with the first nodes and the second nodes, output variables, a list of class variables, a number of horizons, and cash flow dates. The virtual multi-dimensional cube and the virtual cube property data are dynamically updated in response to the first multi-dimensional cube or the second multi-dimensional cube being updated. Data related to the first multi-dimensional cube and the second multi-dimensional cube is stored in a memory when the virtual multi-dimensional cube is generated.
In another embodiment, a computer-implemented system is provided for processing a multi-dimensional data structure represented as multi-dimensional cubes. The example system may include one or more data processors, and a computer-readable storage medium encoded with instructions for commanding the data processors to execute operations. The operations may include, receiving a first multi-dimensional cube and a second multi-dimensional cube, the first multi-dimensional cube including first cube property data and first user data, the second multi-dimensional cube including second cube property data and second user data, and generating a virtual multi-dimensional cube including virtual cube property data for accessing and performing computer-based operations upon the first user data and the second user data, the virtual cube property data including a first mapping from the first cube property data to the virtual cube property data and a second mapping from the second cube property data to the virtual cube property data.
In yet another embodiment, a non-transitory computer readable medium comprising programming instructions is provided for processing a multi-dimensional data structure represented as multi-dimensional cubes. The programming instructions may be configured to cause a processing system to execute the following operations, receiving a first multi-dimensional cube and a second multi-dimensional cube, the first multi-dimensional cube including first cube property data and first user data, the second multi-dimensional cube including second cube property data and second user data, and generating a virtual multi-dimensional cube including virtual cube property data for accessing and performing computer-based operations upon the first user data and the second user data, the virtual cube property data including a first mapping from the first cube property data to the virtual cube property data and a second mapping from the second cube property data to the virtual cube property data.
Data cubes are convenient and flexible mechanisms for representing multidimensional data, but some problems remain. For example, consider a company having multiple divisions with data from each division represented by a data cube. Though data of a particular division may be accessed and analyzed using the corresponding data cube, a complete cube view and cross-cube analysis of the entire company is often not available. When combining data, for example, data cubes of different divisions may have different dimensions, or a same dimension of two data cubes may include different hierarchy levels. Even if data of all divisions can be combined into a single cube, data of these divisions may be finalized at different times during a day.
Often, OLAP data cubes are created and used in one or more memory devices (e.g., random-access memory devices) which may span multiple nodes in a connected grid of machines.
An example syntax for generating the virtual cube 302 to join the sub-cubes 304 is as follows:
The sub-cubes 304 may be created independently at different times or in parallel, which provides flexibility to perform different tasks. For example, a company may create an initial large sub-cube that represents a portfolio of assets of the company. During a particular time period, sub-cubes representing changes to the portfolio can be created. A virtual cube can be generated to join the subsequent sub-cubes with the initial sub-cube to represent the updated portfolio of the company. In another example, a company desires an enterprise view of the business risks. Sub-cubes may be generated to represent risks along different lines of business (e.g., market risk and credit risk). Then, a virtual cube may be created to join these sub-cubes to provide an enterprise view of the company. In yet another example, sub-cubes can be created regularly (e.g., every day, every month) to represent risks (e.g., Value at Risk) of a company. A virtual cube may be created to join these sub-cubes continuously. Then, managers can monitor risk spikes over time through the virtual cube. Further, a virtual cube that joins one or more sub-cubes may be updated when a new version of a sub-cube becomes available. Each sub-cube in the virtual cube may include metadata for the creation time. Thus when the virtual cube is accessed, the staleness of the data contained in the sub-cubes can be indicated.
The metadata of the sub-cubes 304 may be different from each other because these sub-cubes may include different user data (e.g., measures, outputs), different dimensions and different dimension levels. The metadata of the virtual cube 302 may be generated by joining/mapping the metadata of the sub-cubes 304.
For example, the sub-cubes represent forecast data made every day during a week. Each of the sub-cubes may include a number of dimensions representing different business lines. Thus, the metadata of the virtual cube may include a list of these dimensions. As another example, the sub-cubes may include overlapping dimensions, while dimension levels for these overlapping dimensions vary among the sub-cubes. The metadata 400 may merge these dimension levels of the sub-cubes to generate a list of different dimension levels. In addition, mappings between the generated list and the dimension levels of each of the sub-cubes may be created.
As an example, consider a company having 3000 banking books which are finalized at different times during a business day. A complete cube view of the whole company is desired before all of the banking books are finalized. A sub-cube may be created for each banking book, and each sub-cube may take less than 30 seconds to put on a connected grid of computers. It takes less than 15 minutes to upload all 3000 banking books. It takes less than 1 second to create a virtual cube to join the 3000 sub-cubes. A particular sub-cube may be updated during the business day. The virtual cube may then be updated by joining the 2999 sub-cubes that are not updated and the updated sub-cube.
As shown in
The examples used in this disclosure can vary. For example, a computer-implemented system and method can be configured for performing real time incremental Value at Risk (VaR) analysis. As another example, a computer-implemented system and method can be configured for combining data cubes with different outputs, such as a credit risk data cube and a market risk data cube, or combining data cubes with different horizons. As another example, a computer-implemented system and method can be configured for creating a virtual cube that may require little space and can be updated throughout a business day. As another example, a computer-implemented system and method can be configured such that a multi-dimensional data structure processing system 802 can be provided on a stand-alone computer for access by a user, such as shown at 800 in
Additionally, the methods and systems described herein may be implemented on many different types of processing devices by program code comprising program instructions that are executable by the device processing subsystem. The software program instructions may include source code, object code, machine code, or any other stored data that is operable to cause a processing system to perform the methods and operations described herein. Other implementations may also be used, however, such as firmware or even appropriately designed hardware configured to carry out the methods and systems described herein.
The systems' and methods' data (e.g., associations, mappings, data input, data output, intermediate data results, final data results, etc.) may be stored and implemented in one or more different types of computer-implemented data stores, such as different types of storage devices and programming constructs (e.g., RAM, ROM, Flash memory, flat files, databases, programming data structures, programming variables, IF-THEN (or similar type) statement constructs, etc.). It is noted that data structures describe formats for use in organizing and storing data in databases, programs, memory, or other computer-readable media for use by a computer program.
The systems and methods may be provided on many different types of computer-readable media including computer storage mechanisms (e.g., CD-ROM, diskette, RAM, flash memory, computer's hard drive, etc.) that contain instructions (e.g., software) for use in execution by a processor to perform the methods' operations and implement the systems described herein.
The computer components, software modules, functions, data stores and data structures described herein may be connected directly or indirectly to each other in order to allow the flow of data needed for their operations. It is also noted that a module or processor includes but is not limited to a unit of code that performs a software operation, and can be implemented for example as a subroutine unit of code, or as a software function unit of code, or as an object (as in an object-oriented paradigm), or as an applet, or in a computer script language, or as another type of computer code. The software components and/or functionality may be located on a single computer or distributed across multiple computers depending upon the situation at hand.
The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
While this specification contains many specifics, these should not be construed as limitations on the scope or of what may be claimed, but rather as descriptions of features specific to particular embodiments. Certain features that are described in this specification in the context or separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.
Thus, particular embodiments have been described. Other embodiments are within the scope of the following claims. For example, the actions recited in the claims can be performed in a different order and still achieve desirable results.