The invention is more fully appreciated in connection with the following detailed description taken in conjunction with the accompanying drawings, in which:
Like reference numerals refer to corresponding parts throughout the several views of the drawings.
The following terminology is used while disclosing embodiments of the invention:
Semantic Domain is the term for a level of abstraction based on a relational, OLAP, or other data source. The abstraction may be based upon a combination of existing semantic domains. The semantic domain includes data model objects that describe the underlying data source and define dimensions, attributes and measures that can be applied to the underlying data source. The semantic domain may include data foundation metadata that describes a connection to, structure for, and aspects of the underlying data source. The term Combined Semantic Domain in particular is used to describe a semantic domain that describes the combination of two or more existing semantic domains where the combined existing semantic domains include semantic domains that describe a relational data source, OLAP data source, other data source, or another combined semantic domain.
Data Model Object is the term for an object defined within a semantic domain that represents, defines, and provides metadata for a dimension, attribute or measure in an underlying data source. Data model objects can contain calculations from, based on or designed to be applied to an underlying data source. Types of data model objects include base dimensions, base attributes, base measures, calculated dimensions, calculated attributes, and calculated measures.
Base Dimension is a type of data model object that represents a side of a multidimensional cube, a category, a column, or a set of data items within a data source. Each dimension represents a different category, such as region, time, or product type. Base dimension definitions support the specification of hierarchies. Members of a base dimension may be defined through a filter or transform.
Base Measure is a type of data model object that describes an aggregation of underlying data values based on governing dimensions. In the case of an OLAP data source, the measure may be defined directly in the source data. In the case of a relational data source, a column (or query expression), aggregation type, and governing dimensions are defined for the base measure. Types of aggregations include sum, count, maximum, minimum, average, first child, last child, and the like.
A Base Attribute is a type of data model object that is associated with a dimension and for each member for the dimension there is an attribute value. For example, a customer dimension might have base attribute values for age, gender, and phone.
A Calculated Attribute is a type of data model object that is associated with a dimension and for each member of the dimension there is a calculated attribute value.
Calculated Dimension is a type of data model object where a dimension object contains members that are produced by a calculation. Members are determined dynamically based on the transformation of the underlying data or explicitly specified and bound to calculations. Member levels and hierarchies may be calculated as an aspect of a calculated dimension.
Calculated Measure is a type of data model object that is not bound directly to the underlying database. Instead, the object has a value expression that is evaluated to produce the value for the measure. These expressions may reference values of other measures (base measures or calculated ones) and may reference base and calculated dimensions for constraints and contexts. Calculated measures refer to values or ranges of values of a current measure or any other measures across subsets of the dimension space. Calculated measures can be used to calculate lead/lag ranges, and the like.
Base Dimension Member is the term used to describe a distinct value within a base dimension, where the distinct value has a unique ID, display name, or attributes.
Hierarchy is the term used to describe the specified arrangement of base dimension members within a base dimension. A base dimension contains one or more hierarchies. Members are associated with a level within the base dimension. Members can be arranged as children of other members and form tree structures. Levels generally (but not necessarily) correspond to different depths within a hierarchy. A typical example is a geography hierarchy where levels include, country, state, city, store and the like. A hierarchy is used to interpret the calculation of measures, dimensions, and queries.
Data foundation is the term used to describe metadata that describes how to access a data source. A data foundation may include metadata specifying the data structure and aspects of the data in the underlying data source, including the relationships between the data items.
The optional network interface circuit 108 facilitates communications with networked computers (not shown) and data sources 109. Data sources include OLAP databases 109-1, relational databases 109-2, data files 109-3, other database types, warehouses, and the like. The computer 100 also includes a memory 110. The memory 110 includes executable instructions to implement operations of the invention.
In the example of
For the purposes of illustration, the components are shown on a single computer. Modules may be located on different computers. It is the functions of the modules that are significant, not where they are performed or the specific manner in which they are performed.
Each of the semantic domains 202, 204, 206, 208, 222, and 224 contains groups of defined data model objects 210, 212, 214, 216, 228, 230 and data foundation metadata. These groups of data model objects can contain any number of data model objects or can exist as an empty collection. For example, the types of data model objects contained in the data model object groups include objects representing base dimensions, base attributes, base measures, calculated dimensions, calculated attributes, and calculated measures. The base dimensions and measures describe aspects of the underlying data sources 218, 220, 226, 232, and 234.
Define base dimension 310 specifies one or more base dimensions for the semantic domain. Optionally, a hierarchy can be applied to the dimension members 312 such that a hierarchical structure for the dimension members is applied when members are interpreted by calculations and queries. Optionally, one or more calculated dimension can be defined 314, where the definition of the calculated dimension includes using an expression language to define the dimension either distinctly from other dimensions or measures, or in reference to existing dimensions and measures. The definition of a calculated dimension can include a calculated hierarchy for dimension members or reference an existing hierarchy for the dimension members.
Optionally define base attribute 316 specifies one or more relationships between a base or calculated dimension member defined for the semantic domain and an attribute associated with the dimension member.
Optionally define calculated attribute 318 specifies one or more calculated relationships between a base or calculated dimension member defined for the semantic domain and an attribute associated with the dimension member, where the calculation can either determine the logic of the relationship or transform aspects of the attribute value.
Define base measure 320 specifies one or more measures based on the underlying data source. In a typical embodiment, defining a base measure includes selecting a column from a fact table or constructing a query expression based on a data source, specifying an aggregation type, specifying one or more governing dimension, or optionally customizing the aggregation type for specific governing dimensions. Aggregation types include sum, count, maximum, minimum, average, first child, last child, none and the like. Customizing the aggregation type by dimension is used in a number of standard measures, such as an inventory measure where the product related dimensions are aggregated by sum, but the time related dimensions are aggregated by ‘last child’.
Optionally, calculated measures are defined 322. A workflow for defining calculated measures and dimensions is illustrated in
In a typical embodiment, the workflow to define base and calculated dimensions and measures for a semantic domain does not require that the data model objects be defined in any order unless the data model object itself has a logical dependency on another data model object. Additionally, definitions related to the data foundation, such as specifying connection attributes and schema structure (tables and joins in the relational case) can be updated or redefined later during the workflow.
Based on the metadata contained in the specified OLAP data source, base dimensions and measures are automatically generated 406. Optionally, based on the metadata contained in the specified OLAP data source, base attribute relationships for dimension members are automatically generated 406. In one embodiment of the invention, a dimension, a measure and an attribute are respectively generated in the semantic domain for each dimension, measure and attribute in the underlying data source. The underlying data source may contain zero or more dimensions, measures, and attributes. Optionally, the user may change the selection of base dimensions, attributes and measures 410 defined within the semantic domain. Optionally, the user may also define a calculated dimension 412, calculated measure 414, or calculated attribute 416 within the semantic domain.
In a typical embodiment, the workflow to define data model objects does not require that the data model objects be defined in any order unless the data model object itself has a logical dependency on another data model object.
When data model objects in a first underlying semantic domain are evaluated within a second combined semantic domain, regardless of whether the data model object was considered a base or calculated data model object in the first underlying semantic domain, it is evaluated as a base dimension, attribute, or measure in the second combined semantic domain. Define base dimension 506 enables the specification of one or more dimensions from the underlying semantic domains to define dimensions within the new semantic domain. The base dimensions can refer to only one of the underlying semantic domains or can be used to refer to one or more of the dimensions in one or more of the underlying semantic domains. Optionally, define base attribute 508 specifies one or more relationships between a dimension member defined for the semantic domain and an attribute associated with the dimension member.
Optionally, in the case where dimensions combine existing dimensions in the underlying semantic domains, combining rules are specified 510 to provide instructions for the logic that is used when combining the dimensions. Dimension combining rules may indicate that the dimension members are based solely on the members from one of the component domains, which may be desired if the members in one dimension are a superset of the members from another dimension or if each of the dimensions has the same members. Other combining rules for dimensions involve integrating the members from the component dimensions into a single dimension. The individual member hierarchies can be concatenated at a level or the member hierarchies can be merged. Additional rules may be specified to control how conflicting member information should be resolved. Custom rules can also be specified to control the combination of dimensions. Dimension combining rules can be based on attributes and attribute values associated with dimension members.
Similarly, defining a base measure specifies one or more measures from the underlying semantic domains to define measures 512 within the new semantic domain. The measures can refer to only one of the underlying semantic domains or can refer to two or more of the underlying semantic domains.
Optionally, in the case where measures combine existing measures in the underlying semantic domains, combining rules are specified 514 to provide instructions for the logic that is used when combining the measures. Measure combining rules control how a value for the combined measures is derived from the values of the component measures. Typically, if a value only exists for one of the component measures for a given evaluation context, then the combined measure will use that value. If values exist for more than one of the component measures, then the combining rules indicate that the value from one of the component measures is preferred or that the values should be combined in a specific way, including using an aggregation function or the like. Custom rules can also be specified to control the combination of measures.
Optionally, calculated dimensions 516, calculated attributes 518 and calculated measures 520 are defined for the new combined semantic domain. The semantic domain definition can be updated 522 using the semantic domain designer 116, optionally in conjunction with GUI module 112. In a typical embodiment, the workflow to define data model objects does not require that the data model objects be defined in any order unless the data model object itself has a logical dependency on another data model object. Optionally, the combined semantic domain definition is saved 524 to a collection of semantic domains 114, where it is available as a definition for the query engine 118.
In one embodiment, a combined semantic domain contains two or more semantic domains where one or more of the semantic domains represent a combined semantic domain. Recursively, in turn one or more of the semantic domains contained at each level can represent a combined semantic domain. In this way, even if only two semantic domains are explicitly combined any number of data sources can be implicitly combined. Rule complexity is enhanced by leveraging the different levels at which semantic domains are combined.
Define base dimension 706 specifies one or more base dimensions for the semantic domain. Optionally, a hierarchy is applied to the dimension members 708, such that a hierarchical structure for the dimension members is applied when members are interpreted by calculations and queries. Optionally, one or more calculated dimensions are defined 710, where the definition of the calculated dimension includes using an expression language to define the dimension, either distinctly from other dimensions or measures, or in reference to existing dimensions and measures. The definition of a calculated dimension can include a calculated hierarchy for dimension members or reference an existing hierarchy for the dimension members.
Optionally, define base attribute 712 specifies the definition of one or more relationships between a base or calculated dimension member defined for the semantic domain and an attribute associated with the dimension member. Optionally, define calculated attribute 714 specifies one or more calculated relationships between a base or calculated dimension member defined for the semantic domain and an attribute associated with the dimension member, where the calculation can either determine the logic of the relationship or transform aspects of the attribute value.
Define base measure 716 specifies one or more measures based on the underlying data source. Customizing the aggregation type by dimension is used in a number of standard measures, such as an inventory measure where the product related dimensions are aggregated by sum, but the time related dimensions are aggregated by ‘last child’. Optionally, calculated measures are defined 718. Workflow details for defining calculated measures and dimensions are illustrated in
In one embodiment of the invention, the definition for a semantic domain is declarative and uses a lazy evaluation strategy, where any function only explores enough of its arguments in order to produce a result. In a functional embodiment, the semantic domain declares the data logic, evaluates a broad range of expressions (including strong typing) and maintains precision within the data definition. The semantic domain provides reusable logic (e.g., based on strong typing, lazy evaluation, and/or readily combinable functional units).
An embodiment of the present invention relates to a computer storage product with a computer-readable medium having computer code thereon for performing various computer-implemented operations. The media and computer code may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well known and available to those having skill in the computer software arts. Examples of computer-readable media include, but are not limited to: magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROMs and holographic devices; magneto-optical media such as floptical disks; and hardware devices that are specially configured to store and execute program code, such as application-specific integrated circuits (“ASICs”), programmable logic devices (“PLDs”) and ROM and RAM devices. Examples of computer code include machine code, such as produced by a compiler, and files containing higher-level code that are executed by a computer using an interpreter. For example, an embodiment of the invention may be implemented using Java, C#, C++, or other object-oriented programming language and development tools. Another embodiment of the invention may be implemented in hardwired circuitry in place of, or in combination with, machine-executable software instructions.
The foregoing description, for purposes of explanation, used specific nomenclature to provide a thorough understanding of the invention. However, it will be apparent to one skilled in the art that specific details are not required in order to practice the invention. Thus, the foregoing descriptions of specific embodiments of the invention are presented for purposes of illustration and description. They are not intended to be exhaustive or to limit the invention to the precise forms disclosed; obviously, many modifications and variations are possible in view of the above teachings. The embodiments were chosen and described in order to best explain the principles of the invention and its practical applications, they thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated. It is intended that the following claims and their equivalents define the scope of the invention.
This application shares a common specification with the commonly owned and concurrently filed patent application entitled, “Apparatus and Method for an Extended Semantic Layer with Multiple Combined Semantic Domains Specifying Data Model Objects”, Ser. No. ______, filed Aug. 31, 2006. This application is also related to the commonly owned and concurrently filed patent application entitled, “Apparatus and Method for Processing Queries Against Combinations of Data Sources”, Ser. No. ______, filed Aug. 31, 2006.