Business data is typically stored within physical tables of a database. The database may comprise a relational database, such as Oracle, Microsoft SQL Server, IBM DB2, Teradata and the like. The structures and relationships of the physical database tables are complex. To access the stored data, a user with knowledge of the structure and relationships generates specific database-language queries which are intended to extract desired data from the database.
Business Intelligence (BI) tools typically rely on an abstraction layer that shields end users from the complexity of the physical tables. The abstraction layer allows the end users to query a database using intuitive terms rather than references to specific physical entities of the database. However, generation of these queries is beyond the technical capabilities of a typical business user.
Moreover, user-generated queries return only specifically-requested data. Additional queries must be independently generated if additional data is desired. Conventional BI tools therefore do not provide an efficient, navigable presentation of data stored in a database.
Commonly-assigned U.S. Pat. No. 7,493,330, which is incorporated by reference herein for all purposes, describes a system to present data stored in a database. The system initially presents the data in a logically-categorized format, and allows intuitive filtering of and navigation through the presented data. However, the system operates in conjunction with an index of the stored data and aggregations of the stored data, rather than with a standards-based database management system.
Systems are desired for efficiently navigating and visualizing data stored in a database. Such systems preferably employ direct queries to a database using standard query protocols.
The following description is provided to enable any person in the art to make and use the described embodiments and sets forth the best mode contemplated for carrying out some embodiments. Various modifications, however, will remain readily apparent to those in the art.
System 100 of
Metadata designer 130 may comprise a software application to create information space metadata 110 based on relational data source 120. Metadata designer 130 may comprise a standalone, Web-based or other application executing on any computing device or devices that are or become known. Dashed lines are used in
Metadata designer 130 may determine a table structure of relational data source 120 directly from relational data source 120, from a structured list, from manual entry thereof by a database administrator, or by other means. The table structure may include a list of tables of data source 120, their constituent columns, and joins therebetween. Such a structure may be referred to as a data foundation, systems for retrieval of which are known in the art.
Detailed examples of information space metadata 110 according to some embodiments are provided below. Briefly, however, information space metadata 110 may include a connection property definition including information for communicating with relational data source 120, and an information space SQL statement to describe a structure of the database tables of data source 120. Information space metadata 110 may also comprise metadata describing business objects of an abstraction layer associated with data source 120.
U.S. Pat. No. 5,555,403 describes such an abstraction layer, referred to therein as a semantic layer. Briefly, an abstraction layer defines a set of “business objects” that represent business entities, such as customers, products, stores, time, sales figures, etc. represented in the data of a data source. Business objects may be classified as dimensions (along which one may want to perform an analysis or report), details (e.g., additional information on dimensions), and measures (e.g., indicators, most often numeric, whose value can be determined for a given combination of dimension values). Commonly-assigned U.S. Pat. No. 7,181,440 describes a system to generate business objects based on a relational data source.
Accordingly, information space metadata 110 may also include metadata associated with dimension objects and measure objects. For each of the dimension objects, the metadata may specify a relational table of data source 120 associated with the dimension object, and one or more column names of the relational table associated with the dimension object. For each of the measure objects, the metadata may specify a relational table associated with the measure object, one or more column names of the relational table associated with the measure object, and an aggregation method (e.g., SUM) associated with the measure object.
The elements of system 100 may be implemented by any suitable combination of hardware and/or software. Each element may be located remotely from one or more other elements. More than one element may be implemented in a single device and/or software package.
Area 320 allows an operator to specify properties of a dimension object selected in area 310. These properties may include, but are not limited to, Name, Description, and Column (i.e., a column of the data source which is associated with the dimension object). Also shown is search property window 330 to define an SQL query for searching the dimension. Window 330 may alternatively specify a table-based search property based on a column specified in field 340. Checkbox 350 is used to indicate whether actual values of the dimension are to be indexed for subsequent searching.
Interface 400 may be provided by metadata designer 130 to generate measure object metadata. A measure object is selected in area 410 of interface 400 and properties thereof are specified in area 420. These properties may include, but are not limited to, Name, Description, and Column (i.e., a column of the data source which is associated with the measure object). Pull-down menu 430 allows an operator to indicate an aggregation method (e.g., SUM, COUNT, MIN, MAX, AVG) associated with the measure object. Usage of a specified aggregation method according to some embodiments is described below.
As shown, information space metadata 610 includes a Database Connection Property Definition. The Database Connection Property Definition includes information for communicating with relational data source 620. The following is an example of a Database Connection Property Definition according to some embodiments, and based on the data source selected in interfaces 300-500:
Also continuing with the example of
As mentioned above, this metadata specifies a relational table associated with the dimension object, and one or more column names of the relational table associated with a dimension object. For each measure object, this metadata specifies a relational table associated with the measure object, one or more column names of the relational table associated with the measure object, and an aggregation method associated with the measure object.
Information space 610 further includes an Information Space SQL Statement. This statement may reflect the structure as specified in interface 500 of
Navigation module 630 comprises hardware and/or software to display data of relational data source 620 based on the metadata of information space metadata 610. This data is acquired by hardware and/or software of query technique 640, also based on the metadata of information space metadata 610. In some embodiments, navigation module 630 may also operate to display and provide navigation through data of other data 650. Other data 650 may comprise a data source such as an index described in the background, the data of which may be accessed using native data access mechanisms of navigation module 630.
Initially, at 710, metadata indicating a structure of a relational database is determined. For example, the metadata may be determined by creating the metadata as described with respect to
Next, at 720, one or more structured query language queries is generated based on the metadata. The one or more structured query language queries are to retrieve, from the relational database, facet values of each of a plurality of facets of the relational database. In this regard, the term facet is used to describe a particular category of data, which in the present example corresponds to a dimension object. Specifically, with reference to the present example, Year, Store City, Store name and Lines are facets, and 2003, Houston, e-Fashion New York and Dresses are facet values.
The one or more structured query language queries are also to retrieve, from the relational database, aggregated values of a measure for each of the facet values of each of the plurality of facets. The measure corresponds to a measure object specified in the metadata.
According to some embodiments of 720, navigation module 630 requests the facets, facet values, and aggregated values from query technique 640. Query technique 640 then uses information space metadata to generate one or more structured query language queries to retrieve the requested data from relational data source 620.
For example, query technique 640 may generate the following queries at 720 based on the example metadata shown above, with “( . . . )” representing the above-described Information Space SQL Statement. Query technique 640 generates four queries beginning with “SELECT TOP 25” because the example includes four facets.
The facet values retrieved from relational data source 620 at 720 are used at 730 to determine a display order of the retrieved facets. Notably, the display order is based at least in part on the stored data of relational data source 620, rather than solely based on user preferences. The display order may be based on merit values of each facet, as described in aforementioned U.S. Pat. No. 7,493,330. As described therein, a merit value of a facet is based on an entropy value and a coverage value associated with the facet.
An entropy value is calculated for each facet (i.e., category) of data store 620. The entropy value for a facet is based on a number of distinct facet values (i.e., attributes) associated with the facet and a total number of stored data records. It is noted that the above-listed SQL queries result in retrieval of the information needed to calculate the entropy of each facet (i.e., the total number of records, the facet values for each facet, and the number of occurrences of each facet value).
Coverage values are then determined for each facet. A coverage value associated with a facet is a percentage of the total data records which include a facet value of the facet. Next, for each facet, the entropy value is multiplied by the coverage value and the product is normalized to result in a merit value. The display order determined at 730 may reflect an ordering of the facets according to descending merit values. Some embodiments may employ other techniques to determine the display order based on the facet values at 730.
At 740, an interface is generated which displays the facet values of the plurality of facets in the determined display order. The interface also displays each of the aggregated values corresponding to each facet value in association with their corresponding asset value in the determined display order.
Area 810 of interface 800 displays the facets and their facet values in the determined display order. For example, the merit values determined for each displayed facet may be as follows: Year: 0.49, Store city: 0.35, Store name: 0.23, Lines: 0.18, resulting in the display order: Year, Store City, Store name, Lines. Each facet value is displayed in association with a corresponding aggregated value of the Quantity sold measure.
Again, the information displayed in area 810 may be determined using the above-listed SQL queries, which were in turn determined from the metadata of information space 610. Accordingly, some embodiments may efficiently generate an interface to display stored data of an SQL database in a comprehensible manner based on metadata describing the structure of the database.
Area 820 of interface 800 displays graphic visualization 825 of an aggregated measure value corresponding to each facet value (2001, 2002, 2003) of a first facet (i.e., Year). Buttons 830 allow selection of the graphic visualization type, each of which also displays an aggregated measure value corresponding to each facet value. Data of graphic visualization 825 may be retrieved at 720 of process 700 by generating the following SQL queries:
Any facet in addition to Year may be referenced in the foregoing SQL queries. In some embodiments, the graphic visualization corresponds to a first facet of the display order. In these cases, the above SQL queries are generated after the display order is determined at 740 of process 700.
Process 900 of
A selection of a facet value is received (e.g., by navigation module 630) at 910. Selection of a facet value may comprise selecting a facet value displayed in area 810 (or area 820) of interface 800. The selection is then transmitted to navigation module 630 via known user interface control techniques.
In response to selection of the facet value, one or more structured query language queries are generated at 920 based on metadata indicating a structure of a relational database. The one or more structured query language queries are to retrieve second aggregated values of the measure for each facet value of each facet. The aggregated values are filtered by the selected facet value.
Continuing with the example of
The foregoing SQL queries retrieve the independent aggregated values of the Quantity sold measure for each facet value of each facet (other than the Year facet), filtered by the facet value “2003”. An interface to display these aggregated values in association with corresponding facet values is generated at 930.
Interface 1000 of
Area 1020 of interface 1000 displays graphic visualization 1025 of an aggregated measure value corresponding to each facet value of a second facet (i.e., State). The data of graphic visualization 1025 may be retrieved by generating the following SQL queries:
The Store city facet may be represented in graphic visualization 1025 because this facet appears after the Year facet in the display order reflected in area 1010. In some embodiments, the second facet represented in graphic visualization 1025 is user-selectable (i.e., graphic visualization 1025 may display the aggregated Quantity sold in 2003 of each Line).
A selection of a second measure is received at 1110. Selection of a facet value may comprise selecting a new measure in measure bar 1015 of interface 1000. In response to selection of the second measure, one or more structured query language queries are generated at 1120 based on metadata indicating a structure of the present relational database. The one or more structured query language queries are to retrieve second aggregated values of the second measure for each facet value of each facet.
Continuing with the present example, it is assumed that the user has selected a Revenue measure in measure bar 1015. Navigation module 630 receives the selection at 1120 and, in response, generates the following queries at 1120 based on information space metadata 610:
These SQL queries retrieve the aggregated values of the Revenue measure for each facet value of each facet. In the present example, it is assumed that information space metadata 160 associates the Revenue measure with an appropriate table column of data source 610 and associates an aggregation method of SUM with the Revenue measure.
An interface to display these aggregated values in association with corresponding facet values is generated at 1130. Interface 1200 of
Since the facet values and record counts obtained by the above SQL queries are the same as those acquired in the example of process 700, the display order of the facets does not change. However, embodiments are not limited thereto, particularly in cases where a different method is used to determine the display order.
Area 1220 of interface 1200 displays graphic visualization 1225 of an aggregated measure value corresponding to each facet value (i.e., 2001, 2002, 2003) of the Year facet. Embodiments are not limited to graphic visualization of the first-ordered facet. The data of graphic visualization 1225 may be retrieved by generating the following SQL queries:
Embodiments described herein are solely for the purpose of illustration. Those in the art will recognize other embodiments may be practiced with modifications and alterations to that described above.