The subject matter described herein relates to a business intelligence computing system that supports relational data arranged in hierarchies.
Business intelligence computing systems (BI) include software and computing hardware that enable the transformation of raw data from a variety of data sources into meaningful and useful information for business analysis purposes. BI computing systems have been widely adopted because they are capable of handling large amounts of unstructured data to help identify, develop and otherwise interpret rich and complex data. Conventional BI applications have limited abilities to support hierarchical workflows using simple and flat relational data sources.
In one aspect, a business intelligence (BI) computing system obtains a plurality of relational data records from a data source that has an associated hierarchy description. The BI computing system, using the hierarchy description, next builds a hierarchical structure of the plurality of data records. Subsequently, the BI computing system, using the hierarchical structure, initiates at least one hierarchical workflow operation to allow a user to display, analyze, and navigate the plurality of data records on a BI client.
The hierarchical structure can be built by iterating through the plurality of data records to form a plurality of nodes. The building can include determining, for each field in each record, whether a hierarchy is active on the related dimension. Levels of the hierarchical structure can be recursively run through to retrieve parent nodes for those fields in which there is a hierarchy active on the related dimension. A sparse matrix can be generated in which the nodes of the hierarchy can be stored. Tuple elements can be generated for each of the nodes.
The hierarchical arrangement of the plurality of data records can be a leveled hierarchy and/or a parent-child hierarchy and/or a modeled hierarchy.
Non-transitory computer program products (i.e., physically embodied computer program products) are also described that store instructions, which when executed by one or more data processors of one or more computing systems, causes at least one data processor to perform operations herein. Similarly, computer systems are also described that may include one or more data processors and memory coupled to the one or more data processors. The memory may temporarily or permanently store instructions that cause at least one processor to perform one or more of the operations described herein. In addition, methods can be implemented by one or more data processors either within a single computing system or distributed among two or more computing systems. Such computing systems can be connected and can exchange data and/or commands or other instructions or the like via one or more connections, including but not limited to a connection over a network (e.g. the Internet, a wireless wide area network, a local area network, a wide area network, a wired network, or the like), via a direct connection between one or more of the multiple computing systems, etc.
The subject matter described herein provides many technical advantages. For example, unlike conventional BI clients that do not support hierarchical workflows on the top of relation data, the current subject matter allows such workflows which, in turn, enables complex modelling for a variety of data sources.
The details of one or more variations of the subject matter described herein are set forth in the accompanying drawings and the description below. Other features and advantages of the subject matter described herein will be apparent from the description and drawings, and from the claims.
Like reference symbols in the various drawings indicate like elements.
The current subject matter provides a way to use hierarchies in business intelligence (BI) tools on top of simple and flat relational data sources such as CSV-files. Hierarchies, as used herein, refers to hierarchy workflows supported in most BI tools such as hierarchical value help and filtering, hierarchical result sets and hierarchical navigations within this result set (as will be described in more detail below). In particular, the current subject matter describes how to define such hierarchies, how to build the hierarchical structure, compute the values and perform interactions (filtering and navigations) on top of these hierarchies.
One or more of the data sources 1401 . . . n can include a hierarchy description. As an example, a relational data source, based on a CSV-file consumed by a provider, can be based on a CSV-file containing the actual data and a section or separate CSV-file containing the metadata description. Within this metadata section, the hierarchy description can be located.
The hierarchy description can specify a hierarchy type, a name and possibly a description for the hierarchy and/or an actual description of the hierarchy structure. First, the hierarchy can have a dimension it is based on (i.e., the reference characteristic, etc.) which describes the dimension to which this hierarchy can be assigned and which holds the leaves of this hierarchy.
The current subject matter can handle various types of hierarchies. Level hierarchies, are one example illustrated in Table 1 below, that can have a structure build that is based on one or more levels (e.g., a retailer hierarchy with the countries they are located in as the top level and the cities within these countries as the second level as illustrated, etc.).
In addition as this is a level hierarchy, the structure can be defined by a set of levels. In this example the dimension “retailer” is hierarchical with the location on the first level and the city on the second level.
Another example of a hierarchy type are parent-child hierarchies that can comprise a build on the mappings between one or more pairs of two columns where one forms the parent and one the child.
Modeled hierarchies are another type that can require a more complex definition as here the complete hierarchy structure needs to be modeled. On the other hand, modeled hierarchies are much more flexible.
A definition for a modeled hierarchy can contain the complete hierarchy starting by the root nodes up to the leaves:
This definition can be read while parsing the metadata and the objects representing these hierarchies and their metadata can be created and attached to the reference characteristic (i.e., the dimension to which the hierarchy belongs, etc.).
To use the hierarchy in an analytical tool (e.g. SAP DESIGN STUDIO, etc.), one or more of the following features can be made available in the data access layer 110: (i) hierarchical value help, (ii) hierarchical filter support, (iii) hierarchical display in the result set, (iv) navigation within the hierarchy (drilling), and (iv) aggregate leaves to get a node amount.
With the value help feature, the hierarchy description can be parsed to form metadata of the hierarchies. When the hierarchy is consumed, the data records can be read to build up the hierarchical structure. To do this the records can be iterated. The constraints can be:
With reference to diagrams 300 of
With reference now to diagram 400 of
To save memory these newly adjusted/created objects can be stored in a sparse matrix (such as that illustrated in diagram 500 of
In case there are multiple hierarchies active at the same point in time on multiple dimensions, multiple sparse matrices can be used to store the hierarchical records.
This sparse matrix can be useful as the same position in the records might be needed for multiple different nodes. For example, Product Group [P1] might be a leaf in one hierarchy but at the same time a node in another hierarchy. Therefore a manager can be used that holds references to the different record matrices (one per active hierarchy). When accessing a hierarchical record this manager, can return a pointer to the correct hierarchical records matrix for the given hierarchy. Again if an entry is missing for any of these hierarchies the original matrix can be accessed.
With conventional workflows, for each unique record a tuple and data cells can be created. In case the record returns a hierarchical object (node or leaf) a different workflow is required because additional tuples for the nodes need to be created. Therefore, an initial result set with the newly created tuples can look like that of diagram 700 of
To identify needed tuples and to build them the following a process such as that in diagram 800 of
In particular, it can initially be determined that a new tuple element is to be created (805). It is then determined if there is a next member of the hierarchy (810) and if it not, then the new tuple element is created (815). If there is a next member, then such member is analyzed (820) to determine if it is a hierarchy element (825). If so, then it is determined whether it has a parent (830), and if so, then the process moves to the parent (835). If no hierarchy member exists or there is no parent, then a tuple element is created for the member (840). It is then determined whether the member having the newly created tuple element is a hierarchy member (845). If not, then the process reverts back to checking the next member (810). Otherwise, the tuple element is connected to the parent tuple element (850). Is then determined whether this member is a node (855). If not, then process reverts back to checking the next member (810). If so, then the tuple element is returned (860) and the process moves back to the child member (865). The process then continues to create tuple element for the member (840).
In case multiple dimensions are put on one axis or multiple hierarchies are active on one axis, the workflow gets more complex as now the single tuple elements need to be connected with each other. And the newly created artificial hierarchy tuples need to be available on all levels.
Diagram 900 of
For the newly created tuples (for the nodes) data cell values need to be calculated by aggregating their children. To do this all single values can be collected and then run through an aggregation logic that calculates the value depending on the aggregation type. For reference, see diagram 1000 of
Initially, the single values can be calculated by aggregating all values of identical records (in the below example there were two records for “SEM300” so both values need to be aggregated). Subsequently, a new logic can be used to calculate the values for the new hierarchy tuples. To do this all single values of the leaves positioned under the node of the current tuple are collected and aggregated.
A filter can iterate through all available tuples (e.g., a cube) and checks for each tuple if it is inside the filter criteria or outside.
For hierarchical filters a comparison is not enough anymore as a filter on a node means filtering all its children as well. Therefore, for each hierarchical filter all tuples can be checked to determine if the member or any of its parents is contained in the filter.
So for the following hierarchy with a filter on “Node2”:
A check can be run through with all six tuples, the first three would return false, the following three true as they are either equal to the filter or children of the filtered node.
With hierarchy navigations, a drill down can return the delta tuples that are requested. For the following example if “Node2” is collapsed and should be drilled down (i.e. expanded), a delta of two tuples can be returned (“Leaf2” and “Leaf3”).
To accomplish the above, the description of where to drill is handed over, and then filtering can performed in a manner similar to that above. All tuples can be checked to determine if they match the drill context. The drill context can contain a path of tuple elements that describes where the drill down happens. So for each tuple, the drill context elements (i.e., tuple elements) can be iterated over and the check illustrated in diagram 1100 of
One or more aspects or features of the subject matter described herein can be realized in digital electronic circuitry, integrated circuitry, specially designed application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs) computer hardware, firmware, software, and/or combinations thereof. These various aspects or features can include implementation in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which can be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device. The programmable system or computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
These computer programs, which can also be referred to as programs, software, software applications, applications, components, or code, include machine instructions for a programmable processor, and can be implemented in a high-level procedural language, an object-oriented programming language, a functional programming language, a logical programming language, and/or in assembly/machine language. As used herein, the term “machine-readable medium” refers to any computer program product, apparatus and/or device, such as for example magnetic discs, optical disks, memory, and Programmable Logic Devices (PLDs), used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term “machine-readable signal” refers to any signal used to provide machine instructions and/or data to a programmable processor. The machine-readable medium can store such machine instructions non-transitorily, such as for example as would a non-transient solid-state memory or a magnetic hard drive or any equivalent storage medium. The machine-readable medium can alternatively or additionally store such machine instructions in a transient manner, such as for example as would a processor cache or other random access memory associated with one or more physical processor cores.
To provide for interaction with a user, one or more aspects or features of the subject matter described herein can be implemented on a computer having a display device, such as for example a cathode ray tube (CRT) or a liquid crystal display (LCD) or a light emitting diode (LED) monitor for displaying information to the user and a keyboard and a pointing device, such as for example a mouse or a trackball, by which the user may provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well. For example, feedback provided to the user can be any form of sensory feedback, such as for example visual feedback, auditory feedback, or tactile feedback; and input from the user may be received in any form, including, but not limited to, acoustic, speech, or tactile input. Other possible input devices include, but are not limited to, touch screens or other touch-sensitive devices such as single or multi-point resistive or capacitive trackpads, voice recognition hardware and software, optical scanners, optical pointers, digital image capture devices and associated interpretation software, and the like.
In the descriptions above and in the claims, phrases such as “at least one of” or “one or more of” may occur followed by a conjunctive list of elements or features. The term “and/or” may also occur in a list of two or more elements or features. Unless otherwise implicitly or explicitly contradicted by the context in which it is used, such a phrase is intended to mean any of the listed elements or features individually or any of the recited elements or features in combination with any of the other recited elements or features. For example, the phrases “at least one of A and B;” “one or more of A and B;” and “A and/or B” are each intended to mean “A alone, B alone, or A and B together.” A similar interpretation is also intended for lists including three or more items. For example, the phrases “at least one of A, B, and C;” “one or more of A, B, and C;” and “A, B, and/or C” are each intended to mean “A alone, B alone, C alone, A and B together, A and C together, B and C together, or A and B and C together.” In addition, use of the term “based on,” above and in the claims is intended to mean, “based at least in part on,” such that an unrecited feature or element is also permissible.
The subject matter described herein can be embodied in systems, apparatus, methods, and/or articles depending on the desired configuration. The implementations set forth in the foregoing description do not represent all implementations consistent with the subject matter described herein. Instead, they are merely some examples consistent with aspects related to the described subject matter. Although a few variations have been described in detail above, other modifications or additions are possible. In particular, further features and/or variations can be provided in addition to those set forth herein. For example, the implementations described above can be directed to various combinations and subcombinations of the disclosed features and/or combinations and subcombinations of several further features disclosed above. In addition, the logic flows depicted in the accompanying figures and/or described herein do not necessarily require the particular order shown, or sequential order, to achieve desirable results. Other implementations may be within the scope of the following claims.