See Application Data Sheet.
Not applicable.
Not applicable.
Not applicable.
Not applicable.
The present invention belongs to the OLAP big data information field, and particularly relates to a novel OLAP pre-calculation model and a modeling method.
In conventional OLAP pre-calculation, to adapt to possible query scenarios, more Cuboids are contained as far as possible during Cube building. Usually, for a Cube with N dimensions, the quantity of Cuboids may be up to Nth power of 2. Therefore, in a case that the scale of data is large and the quantity of dimensions is large, a great deal of time will be consumed and the pre-calculation results will occupy storage space heavily in the modeling process. Through some measures may be taken to remove some Cuboids, a certain number of Cuboids that are never used in the query process always exist, resulting in severe waste. On the other hand, the granularity of building with the existing scheme in the prior art is Cube. Once a Cube is built, the metadata in it can't be modified any more. To make any modification to the original Cube, even to add a new dimension or measurement, a new Cube has to be built completely, and a rebuilding process is required. Consequently, the previous calculations results can't be utilized, and the flexibility is poor.
The technical problem to be solved in the present invention is: the granularity of building in the prior art is Cube. Once a Cube is defined and built, the metadata in it can't be modified any more. Consequently, the previous calculation results can't be utilized, and the flexibility is poor.
To solve the technical problem described above, the present invention provides a novel OLAP pre-calculation model.
The novel OLAP pre-calculation model comprises: a query engine, a SQL converter, and a dimension combination storage device; wherein,
the SQL converter is configured to convert an inputted SQL query statement into a corresponding dimension combination;
the query engine is configured to query for the dimension combination among a plurality of built dimension combinations in the dimension combination storage device according to the corresponding dimension combination to ascertain whether a dimension combination matching the SQL query statement exists;
the query engine is further configured to log information on the corresponding dimension combination and send information on the corresponding dimension combination to the dimension combination storage device, if no matching dimension combination exists; and
the dimension combination storage device is configured to build the matching dimension combination according to the correlation among the discrete dimension combinations and the information on the corresponding dimension combination, and form a new topological hierarchy structure with the matching dimension combination and the built dimension combinations hierarchically.
The present invention has the following beneficial effects: With the model described above, dimension combinations can be updated continually in the dimension combination storage device, so that the model not only supports staged building in a time increment manner, but also supports building in a dimension and measurement increment manner; in addition, the model greatly improves query efficiency, reduce required storage space, and ensures query response rate.
Furthermore, the dimension combination storage device is further configured to query for a result directly from source data, if no dimension combination matching the SQL query statement exists.
Furthermore, the dimension combination storage device comprises: a plurality of built dimension combinations, wherein, some dimension combinations are dimension combinations that are built on the basis of a MapRecuce computing framework and have a hierarchical topology structure, while the other dimension combinations are dimension combinations that are discrete from each other and don't have a hierarchical topology structure.
Furthermore, in the dimension combinations that have a hierarchical topology structure, pre-calculation results of dimension combinations at a lower layer are obtained through aggregation calculation of pre-calculation results of dimension combinations at a higher layer.
Furthermore, the dimension combination storage device is specifically configured to build a new dimension combination formed as a result of dimension or measurement increment according to the correlation among the discrete dimension combinations and the information on the corresponding dimension combination, and merge the new dimension combination and a dimension combination among the built dimension combinations into the matching dimension combination.
The present invention further relates to a novel modeling method for OLAP pre-calculation model, which comprises:
S1. reading a SQL query statement, by a SQL converter;
S2. converting an inputted SQL query statement into a corresponding dimension combination, by the SQL converter;
S3. querying for the dimension combination among built dimension combinations in the dimension combination storage device according to the corresponding dimension combination to ascertain whether a dimension combination matching the SQL query statement exists, by the query engine;
S4. logging information on the corresponding dimension combination and sending information on the corresponding dimension combination to the dimension combination storage device, by the query engine, if no matching dimension combination exists; and
S5. building the matching dimension combination according to the correlation among the discrete dimension combinations and the information on the corresponding dimension combination, and forming a new topological hierarchy structure with the matching dimension combination and the built dimension combinations hierarchically, by the dimension combination storage device.
The present invention has the following beneficial effects: With the modeling method described above, the dimension combinations can be updated continually, and staged building in a time increment manner and building in a dimension or measurement increment manner are supported; in addition, the building efficiency can be improved greatly, the required storage space can be reduced, and the query response rate can be ensured.
Furthermore, the step S4 further comprises: querying for a result directly from source data, if no dimension combination matching the SQL query statement exists.
Furthermore, the dimension combination storage device comprises: a plurality of built dimension combinations, wherein, some dimension combinations are dimension combinations that are built on the basis of a MapRecuce computing framework and have a hierarchical topology structure, while the other dimension combinations are dimension combinations that are discrete from each other and don't have a hierarchical topology structure.
Furthermore, in the dimension combinations that have a hierarchical topology structure, pre-calculation results of dimension combinations at a lower layer are obtained through aggregation calculation of pre-calculation results of dimension combinations at a higher layer.
Furthermore, the operation of building the matching dimension combination in the step S5 comprises:
building a new dimension combination formed as a result of dimension or measurement increment, and merging the new dimension combination and a dimension combination among the plurality of built dimension combinations into the matching dimension combination.
Hereunder the principle and features of the present invention will be detailed with reference to the accompanying drawings. However, it should be noted that the embodiments are provided only to interpret the present invention but don't constitute any limitation to the scope of the present invention.
As shown in
The novel OLAP pre-calculation model comprises: a query engine, a SQL converter, and a dimension combination storage device; wherein,
the SQL converter is configured to convert an inputted SQL query statement into a corresponding dimension combination;
the query engine is configured to query for the dimension combination among a plurality of built dimension combinations in the dimension combination storage device according to the corresponding dimension combination to ascertain whether a dimension combination matching the SQL query statement exists;
the query engine is further configured to log information on the corresponding dimension combination and send information on the corresponding dimension combination to the dimension combination storage device, if no matching dimension combination exists; and
the dimension combination storage device is configured to build the matching dimension combination according to the correlation among the discrete dimension combinations and the information on the corresponding dimension combination, and form a new topological hierarchy structure with the matching dimension combination and the built dimension combinations hierarchically.
It can be understood: in the embodiment 1, the model is implemented by adding a SQL converter on the basis of the conventional model, and the SQL converter is mainly used to convert a SQL query statement submitted by the user into a corresponding Cuboid (dimension combination), a Cube concept is involved in the conventional model, but is not involved in the model in the embodiment 1; instead, a set of Cuboids obtained through conversion with the SQL converter are used; in that way, the model in the embodiment 1 is improved from original Cube granularity to finer and more flexible Cuboid granularity, and thereby time incremental building and dimension incremental building are supported. Finally, the discrete Cuboids are organized by means of a Spanning Tree to find out the most reasonable topology structure for building, and thereby the building efficiency is ensured.
In addition, in the query process in the conventional OLAP pre-calculation model, the most appropriate Cuboid has to be found out for the query according to the SQL query statement. However, the specific query scenario is unknown when a Cube is built in the conventional OLAP pre-calculation model. Therefore, an optimal Cuboid may not always be hit for each SQL query statement, and other Cuboids have to be used for the query. Consequently, the query effect may not be ideal. In contrast, with the model in the embodiment 1, after the user submits a SQL query statement, first, the system will find out a usable Cuboid for the query in a set of Cuboids stored previously. If no appropriate Cuboid is found, the query will be handed over to another query engine, and the Cuboid that is required by the SQL query statement but doesn't exist will be logged and put into a set of Cuboids to be built (i.e., a dimension combination storage device) at the same time.
With the model in the embodiment 1, dimension combinations can be updated continually in the dimension combination storage device, so that the model not only supports staged building in a time increment manner, but also supports building in a dimension or measurement increment manner; in addition, the model greatly improves query efficiency, reduce required storage space, and ensures query response rate.
Optionally, in an embodiment 2, the dimension combination storage device is further configured to query for a result directly from source data, if no dimension combination matching the SQL query statement exists.
It can be understood: the embodiment 2 provides a scheme for implementing a different embodiment on the basis of the embodiment 1. In the embodiment 2, a result is obtained directly from the source data if no dimension combination matching the SQL query statement exists in the dimension combination storage device.
Optionally, in an embodiment 3, the dimension combination storage device comprises: a plurality of built dimension combinations, wherein, some dimension combinations are dimension combinations that are built on the basis of a MapRecuce computing framework and have a hierarchical topology structure, while the other dimension combinations are dimension combinations that are discrete from each other and don't have a hierarchical topology structure.
It can be understood: the embodiment 3 provides a scheme for implementing a different embodiment on the basis of the above embodiment. In the existing OLAP pre-calculation building, Cuboids can be built hierarchically only after required models and Cube are defined. Therefore, the prior art supports building in a time increment manner, but doesn't support building in a dimension or measurement increment manner, because a Cube can't be modified once it is defined in the prior art. All Cuboids are confined by measurements and dimensions defined by Cube. In contrast, the embodiment 3 supports building at Cuboid granularity. It is only confined by the definition of the model. Therefore, dimensions and measurements can be added or deleted at any time within the scope of the model.
Optionally, in an embodiment 4, in the dimension combinations that have a hierarchical topology structure, pre-calculation results of dimension combinations at a lower layer are obtained through aggregation calculation of pre-calculation results of dimension combinations at a higher layer.
It can be understood: the embodiment 4 provides a scheme for implementing a different embodiment on the basis of the above embodiment. In the embodiment 4, since the model is not confined by the definition of a specific Cube, each Cuboid is independent, and the dimensions and measurements may be different between Cuboids. Therefore, a hierarchical relation may not always exist between Cuboids. However, since the dimensions and measurements of each Cuboid is not beyond the scope defined by the model, it is possible that a correlation exists between different Cuboids. Therefore, Cuboids that are correlated with each other are grouped together as far as possible, to avoid repeated aggregation calculation in the building process. As shown in
To describe the tree building process better, suppose a data model contains four dimensions D1, D2, D3 and D4 and contains four measurements M1, M2, M3 and M4. After the user submits a query, the SQL converter will generates 3 Cuboids (see
Optionally, in an embodiment 5, the dimension combination storage device is specifically configured to build a new dimension combination formed as a result of dimension or measurement increment according to the correlation among the discrete dimension combinations and the information on the corresponding dimension combination, and merge the new dimension combination and a dimension combination among the built dimension combinations into the matching dimension combination.
It can be understood: the embodiment 5 provides a scheme for implementing a different embodiment on the basis of the above embodiment. As shown in
As shown in
S1. reading a SQL query statement, by a SQL converter;
S2. converting an inputted SQL query statement into a corresponding dimension combination, by the SQL converter;
S3. querying for the dimension combination among built dimension combinations in the dimension combination storage device according to the corresponding dimension combination to ascertain whether a dimension combination matching the SQL query statement exists, by the query engine;
S4. logging information on the corresponding dimension combination and sending information on the corresponding dimension combination to the dimension combination storage device, by the query engine, if no matching dimension combination exists;
S5. building the matching dimension combination according to the correlation among the discrete dimension combinations and the information on the corresponding dimension combination, and forming a new topological hierarchy structure with the matching dimension combination and the built dimension combinations hierarchically, by the dimension combination storage device.
It can be understood: in the embodiment 6, the model is implemented by adding a SQL converter on the basis of the conventional model, and the SQL converter is mainly used to convert a SQL query statement submitted by the user into a corresponding Cuboid (dimension combination). A Cube concept is involved in the conventional model, but is not involved in the model in the embodiment 6; instead, a set of Cuboids obtained through conversion with the SQL converter are used, or dimension combinations pre-stored in the dimension combination storage device may be used. In that way, the model in the embodiment 6 is improved from original Cube granularity to finer and more flexible Cuboid granularity, and thereby time incremental building and dimension incremental building are supported. Finally, the discrete Cuboids are organized by means of a Spanning Tree to find out the most reasonable topology structure for building, and thereby the building efficiency is ensured.
In addition, in the query process in the conventional OLAP pre-calculation model, the most appropriate Cuboiod has to be found out for the query according to the SQL query statement. However, the specific query scenario is unknown when a Cube is built in the conventional OLAP pre-calculation model. Therefore, an optimal Cuboid may not always be hit for each SQL query statement, and other Cuboids have to be used for the query. Consequently, the query effect may not be ideal. In contrast, in the embodiment 6, after the user submits a SQL query statement, first, the system will find out a usable Cuboid for the query in a set of Cuboids stored previously. If no appropriate Cuboid is found, the query will be handed over to another query engine, and the Cuboid that is required by the SQL query statement but doesn't exist will be logged and put into a set of Cuboids to be built (i.e., a dimension combination storage device) at the same time. With the modeling method in the embodiment 6, the dimension combinations can be updated continually, and staged building in a time increment manner and building in a dimension or measurement increment manner are supported; in addition, the building efficiency can be improved greatly, the required storage space can be reduced, and the query response rate can be ensured.
Optionally, in an embodiment 7, the step S4 further comprises: querying for a result directly from source data, if no dimension combination matching the SQL query statement exists.
It can be understood: the embodiment 7 provides a scheme for implementing a different embodiment on the basis of the embodiment 6. In the embodiment 7, a result is obtained directly from the source data if no dimension combination matching the SQL query statement exists in the dimension combination storage device.
Optionally, in an embodiment 8, the dimension combination storage device comprises: a plurality of built dimension combinations, wherein, some dimension combinations are dimension combinations that are built on the basis of a MapRecuce computing framework and have a hierarchical topology structure, while the other dimension combinations are dimension combinations that are discrete from each other and don't have a hierarchical topology structure.
It can be understood: the embodiment 8 provides a scheme for implementing a different embodiment on the basis of the above embodiment. In the existing OLAP pre-calculation building, Cuboids can be built hierarchically only after required models and Cube are defined. Therefore, the prior art supports building in a time increment manner, but doesn't support building in a dimension or measurement increment manner, because a Cube can't be modified once it is defined in the prior art. All Cuboids are confined by measurements and dimensions defined by Cube. In contrast, the embodiment 8 supports building at Cuboid granularity. It is only confined by the definition of the model. Therefore, dimensions and measurements can be added or deleted at any time within the scope of the model.
Optionally, in an embodiment 9, in the dimension combinations that have a hierarchical topology structure, pre-calculation results of dimension combinations at a lower layer are obtained through aggregation calculation of pre-calculation results of dimension combinations at a higher layer.
It can be understood: the embodiment 9 provides a scheme for implementing a different embodiment on the basis of the above embodiment. In the embodiment 9, since the model is not confined by the definition of a specific Cube, each Cuboid is independent, and the dimensions and measurements may be different between Cuboids. Therefore, a hierarchical relation may not always exist between Cuboids. However, since the dimensions and measurements of each Cuboid is not beyond the scope defined by the model, it is possible that a correlation exists between different Cuboids. Therefore, Cuboids that are correlated with each other are grouped together as far as possible, to avoid repeated aggregation calculation in the building process. As shown in
To describe the tree building process better, suppose a data model contains four dimensions D1, D2, D3 and D4 and contains four measurements M1, M2, M3 and M4. After the user submits a query, the SQL converter will generates 3 Cuboids (see
Optionally, in an embodiment 10, the operation of building the matching dimension combination in the step S5 comprises:
building a new dimension combination formed as a result of dimension or measurement increment, and merging the new dimension combination and a dimension combination among the plurality of built dimension combinations into the matching dimension combination.
It can be understood: the embodiment 10 provides a scheme for implementing a different embodiment on the basis of the above embodiment. As shown in
In this document, the exemplary expression of the above terms may not necessarily refer to the same embodiment or example. Moreover, the specific features, structures, materials, or characteristics described can be combined appropriately in any one or more embodiments or examples. Furthermore, those skilled in the art may combine or assemble different embodiments or examples and features in different embodiments or examples described herein, provided that there is no contradiction between them.
While the present invention is described above in some preferred embodiments, the present invention is not limited to those preferred embodiments. Any modification, equivalent replacement, and improvement made without departing from the spirit and principle of the present invention shall be deemed as falling into the scope of protection of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
201711487497.4 | Dec 2017 | CN | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2018/073321 | 1/19/2018 | WO | 00 |