Business data is typically stored within physical tables of a database. The database may comprise a relational database, such as Oracle, Microsoft SQL Server, IBM DB2, Teradata and the like. The structures and relationships of the physical database tables are complex. A typical end user is therefore unable to locate or extract desired information from the physical database tables.
Business Intelligence (BI) tools (e.g., BusinessObjects Universe Designer®) may be used to build an abstraction layer that shields end users from the complexity of the physical tables. More specifically, the abstraction layer allows the end users to query a database using intuitive terms rather than references to specific physical entities of the database.
U.S. Pat. No. 5,555,403 describes such an abstraction layer, referred to therein as a semantic layer. Briefly, a semantic layer defines a set of “business objects” that represent business entities, such as customers, time, financial figures, etc. Business objects may be classified as dimensions (along which one may want to perform an analysis or report), details (e.g., additional information on dimensions), and measures (e.g., indicators, most often numeric, whose value can be determined for a given combination of dimension values).
Dimension objects may be further abstracted into higher-level entities known as analysis objects. For example, the Country and City dimension objects may be child objects of a Geography analysis object, and a Product dimension object may be a child object of a Production analysis object. A dimension object may be referenced through its parent analysis object.
A user of a BI tool may use analysis objects of an abstraction layer to query underlying physical tables. For example, the user may formulate a symbolic query using the business objects of the abstraction layer. A query generator may then generate a query of an underlying database based on the symbolic query and on a mapping between the tables of the database and the abstraction layer.
Creation of an abstraction layer based on a relational database is time-consuming and costly. Commonly-assigned U.S. Pat. No. 7,181,440 describes a system to generate a dimensional data model based on a relational database. Although this system provides significant advantages over prior systems, systems are desired which may provide more-refined data models and/or greater efficiency.
The following description is provided to enable any person in the art to make and use the described embodiments and sets forth the best mode contemplated for carrying out some embodiments. Various modifications, however, will remain readily apparent to those in the art.
Embodiments may be implemented by an architecture such as system 100 of
Modeling tool 120 may comprise a software application to retrieve a table structure and data from RDBMS 110 as described below. Modeling tool 120 may communicate with RDBMS 110 via any suitable database connections (e.g., Java Database Connector, QT/Connection Server). In some embodiments, modeling tool 120 comprises BusinessObjects Universe Designer® or BusinessObjects BI Modeler®.
Modeling tool 120 may operate to determine a plurality of groups of columns of relational tables of RDBMS 110 based at least in part on data stored in the relational tables. Each column of a group of columns is mutually-dependent on each of the other columns of the group of columns, and at least two of the plurality of groups of columns include columns of a same relational table. Moreover, modeling tool 120 may create, for each of the plurality of groups of columns, a respective dimension object.
Some embodiments may therefore efficiently provide more than one dimension object per relational table, resulting in a potentially more-expressive abstraction layer than previously available. Since the groups of columns are determined based at least in part on data stored in the relational tables, the underlying schema of RDBMS 110 need not be fully normalized prior to generation of the dimension objects according to some embodiments.
The elements of system 100 may be implemented by any suitable combination of hardware and/or software. Each element may be located remotely from one or more other elements. More than one element may be implemented in a single device and/or software package.
Initially, at 305, a table structure of an RDBMS is retrieved. The retrieved structure may include a list of tables, their constituent columns, and joins therebetween. Such a structure may be referred to as a data foundation, systems for retrieval of which are known in the art.
At 310, a fact table is determined based on the orientation of the joins of the table structure. For example, a table of the table structure is identified as a fact table based on the cardinalities of its joins to other tables (e.g., ∞ to 1). In the example of
One or more measure objects are created at 315 based on the columns of the determined fact table. In some embodiments, one measure object is created to correspond to each numeric non-key column of the fact table. In the illustrated example, a Sales measure object is created at 315 to correspond to the Sales column of Sales table 240. The measure object may also indicate the dimension object(s) (described below) which drives the column corresponding to the measure object.
An analysis object is created at 320 based on a join which starts from the current fact table. The analysis object is created in metadata that may be used in an abstraction layer according to some embodiments. Continuing with the present example, a Production analysis object may be created at 320 based on the join between Sales fact table 240 and Product table 210. Next, at 325, groups of columns are created from the tables descending from the above-mentioned join. With respect to the present example, the descending tables are Product table 210, City table 220 (by virtue of the join between Product table 210 and City table 220), and Country table 230 (by virtue of the join between City table 220 and Country table 230).
Groups of columns may be created from these tables based on the table structure and on the data stored in the columns. According to some examples of 325, each created group of columns includes only mutually-dependent columns. Accordingly, 325 may comprise retrieving and analyzing data stored within all the columns of the descending tables to identify any mutual dependencies therebetween.
For example, the data of City table 220 may be retrieved and analyzed to identify mutual dependencies between columns Name, Population and Postal Code. Accordingly, a group may be created at 325 including columns Name, Population and Postal Code of City table 220.
The data of City table 220 may be further analyzed to determine that no mutual dependencies exist with respect to column State. A group may therefore also be created at 325 including column state of City table 220.
According to some embodiments, only one query is issued per table when retrieving data at 325, in order to reduce processing time. The one query may scan all columns of the table without using any joins, ordering, grouping or filtering. Additionally or alternatively, detection of dependencies may occur while rows of data are being fetched and such fetching may terminate once all dependencies (or lack thereof) are believed to have been identified. Some embodiments cap the number of rows fetched, and/or skip over partially-empty rows or null data values.
Primary/foreign key information of the RDBMS may also be used at 325 to infer dependencies between columns of a single table. For example, if a primary key exists in a table, it is assumed that one way dependencies exist between the columns of the primary key and the column(s) of the foreign key. It may also be assumed that no dependencies exist between foreign keys inside a same table, without regard to the actual data of the foreign key columns.
One way dependencies between columns of the different groups are determined at 330. These dependencies may be determined from the retrieved table structure (i.e., the list of tables and joins) as well as the data content. For example, if the structure indicates a link between a Product table and a Product Category table, the link is analyzed to deduce a dependency between product and product category columns. A direction of the dependency between two groups (i.e., between one column of one group and one column of another group) may be determined based on table key information and cardinality information detected from the data of the linked columns.
Next, at 335, a dimension object is created for each group of columns. Also, a hierarchy object indicating the determined one way dependencies is associated with the analysis object created at 320.
A dimension object may be created in metadata of the aforementioned abstraction layer. The dimension object may be created based on a group of columns by choosing one column of the group as the name of the dimension object and assigning the other columns as properties of the object. Foreign key columns may be designated as “outgoing” properties, which will be described in detail below.
The analysis object is linked to the hierarchy object specifying the determined one way dependencies. The analysis object is associated to a key column thru the dimension objects that describe it, and thereby indicates the set of all the dimension objects describing data that depends directly or indirectly to that key column. In the present example, the Production analysis object is associated to the Product key column.
Tree 600 illustrates metadata of an abstraction layer based on schema 200. Tree 600 includes the above-described dimension objects corresponding to the Sales, City, State and Country groups of columns, as well as the Production analysis object. As described in commonly-assigned U.S. patent application Ser. No. 12/463,702, tree 600 includes metadata of properties associating a dimension object with one or more other dimension objects. These properties represent the one way dependencies shown in
Tree 600 also includes the Sales measure object created at 325. The Sales measure object indicates the dimension objects (Product, City) that drive the data thereof. Creation of the remaining objects of tree 600 will be described below.
Returning to process 300, flow proceeds to 340 and returns to 315 if another join exists which starts from the present fact table. If so, a new analysis object is created along with associated dimension objects as described above. However, no dimension object is created for a group of columns associated with the new analysis object if a dimension object has already been created for that group of columns for another analysis object. Nevertheless, the previously-created dimension object will be reflected in the hierarchy of the newly-created analysis object. For example, using the
If no other joins are determined to start from the fact table, flow proceeds from 340 to 345 to determine whether another fact table (or an isolated table that is not joined to another table) exists. If so, flow returns to 315 and continues as described above with respect to the fact table (or isolated table). Flow terminates once the determination at 345 is negative.
It is noted that the Production and Geography analysis objects of tree 600 are functionally-dependent. Moreover, the Sales measure object is governed by the Product and City dimension objects of the Production and Geography analysis objects. The metadata of tree 600 may be leveraged to issue a dimension object-based query such as “select the sales amount of products in their design country”. The functional dependencies may also or alternatively be leveraged to issue an analysis object-based query such as “select the sales amount of products in their design geography”.
Embodiments may be employed to efficiently define a semantically rich abstraction layer on a relational data source. Since the functional dependencies between dimension objects enable navigation from one analysis object to another, the dimension objects may be used explicitly when a query is expressed in terms of dimension objects, or implicitly when the query is expressed in terms of analysis objects.
Referring to
Embodiments described herein are solely for the purpose of illustration. Those in the art will recognize other embodiments may be practiced with modifications and alterations to that described above.
This application is related to commonly-assigned U.S. patent application Ser. No. 12/463,702, entitled “Functionally-Dependent Analysis Objects” and filed on May 11, 2009.