Business intelligence applications allow a company to perform tasks such as gathering data from heterogeneous sources, analyzing such data, and producing reports. Traditionally, a business intelligence application includes one or more stack elements configured to perform data retrieval, integration, management, and/or reporting functions. The stack elements typically require some knowledge of the structure and content of data available from various sources, and as a result under existing approaches considerable administrative effort may be required to enable a typical business intelligence application to use (e.g., include in a proper or desired way in a report) data associated with a data field newly added at a source.
Various embodiments of the invention are disclosed in the following detailed description and the accompanying drawings.
The invention can be implemented in numerous ways, including as a process, an apparatus, a system, a composition of matter, a computer readable medium such as a computer readable storage medium or a computer network wherein program instructions are sent over optical or communication links. In this specification, these implementations, or any other form that the invention may take, may be referred to as techniques. A component such as a processor or a memory described as being configured to perform a task includes both a general component that is temporarily configured to perform the task at a given time or a specific component that is manufactured to perform the task. In general, the order of the steps of disclosed processes may be altered within the scope of the invention.
A detailed description of one or more embodiments of the invention is provided below along with accompanying figures that illustrate the principles of the invention. The invention is described in connection with such embodiments, but the invention is not limited to any embodiment. The scope of the invention is limited only by the claims and the invention encompasses numerous alternatives, modifications and equivalents. Numerous specific details are set forth in the following description in order to provide a thorough understanding of the invention. These details are provided for the purpose of example and the invention may be practiced according to the claims without some or all of these specific details. For the purpose of clarity, technical material that is known in the technical fields related to the invention has not been described in detail so that the invention is not unnecessarily obscured.
In the example shown, a data source 102 is coupled to a network 104; a public or private network and/or combination thereof, for example the Internet, an Ethernet, serial/parallel bus, intranet, NAS, SAN, LAN, WAN, and other forms of connecting multiple systems and/or groups of systems together. Business intelligence application (“BIA”) 106 receives data from one or more database sources 102 through network 104. BIA 106 extracts, warehouses, and analyzes the data; accepts analysis queries from user 108; and delivers reports to user 108. In some embodiments, there may be more than one user 108. In some embodiments, a BIA administrator maintains the BIA 106 and troubleshoots any problems.
In step 202, an application developer designs BIA 106, taking into consideration the requirements of user 108 that are known at design time for data sources 102, analysis queries, and reports. The application developer defines data flows between the internal components of BIA 106. Examples of a data flow include a path that starts with a particular data field or type of data field associated with one or more data sources and indicates intermediate processing, if any, done on and/or with respect to data values associated with the field or type of field and one or more outputs, e.g., how the data, as processed if applicable, is provided as output, e.g., in a report.
In step 204, the BIA enters a “configurator” mode, which allows the BIA to adapt to its initial or a changed environment. In some embodiments, a changed environment may involve a data field change in a data source 102, for example the introduction of a data field for a customer size in a opportunity database entry for a CRM vendor 102. In this example, the configurator mode will identify the new customer size field, and reconfigure the BIA 106 to correctly analyze and produce reports that include the customer size field.
In step 206, the BIA 106 enters its normal “run-time” mode, in which a data source 102 is accessed to answer queries from user 108 and produce reports to user 108. In some embodiments, the BIA 106 may reenter the configurator mode 204 to further analyze any changed data fields at a scheduled time or as the administrator requests it.
The ETL 310 is connected to a data warehouse system 314, which warehouses data for future analysis and reports. The data agents may also be connected through to other components, for example data warehouse system 314. Data warehouse metadata is stored in a data warehouse metadata repository 316. Examples of data warehouse metadata include table schema for each table in the data warehouse. The data warehouse 314 is also connected to an Online Analytical Processing (“OLAP”) cube, which analyzes data and prepares reports using a multidimensional approach. OLAP metadata is stored in an OLAP metadata repository 320. Examples of OLAP metadata include the eXtended Markup Language (“XML”) files to configure the cubes and analysis. The OLAP cube 318 is also connected to a User Interface (“UI”) system 322. The UI 322 presents the analyzed data and reports from OLAP cube 318 to a user 108. UI metadata is stored in a UI metadata repository 324.
Conventionally, the ETL 310, data warehouse 314, OLAP cube 318 and UI 322 all store metadata in the corresponding local metadata repository. When a data field change occurs in any of the source databases 302, 304 and 306, typically each metadata repository; ETL metadata 312, data warehouse metadata 316, OLAP metadata 320 and UI metadata 324 must be updated to reflect the new data field change. Conventionally, static data fields, also known as “flexible fields”, are reserved for a limited number of added data field changes. However, this approach requires a priori knowledge of the data field count, types, and sizes which are not always available; the alternative is to statically allocate very large resources to accommodate average data field numbers and sizes. For example, a particular configuration might have fields PHONE_NUMBER NUMERIC(10) and NAME VARCHAR(128). Here the field count would be two, the types would be NUMERIC and VARCHAR, and the corresponding sizes would be 10 and 128. The metadata repositories are stored separately and human errors and inconsistencies may be made for a data field change. Additionally, if the size and number of static data fields are not used, then the business intelligence system must carry the additional resource burden of the unused fields.
A technique for propagating metadata changes for one or more business intelligence application stack elements, such as ETL 310, data warehouse 314, OLAP 318 and/or UI 322, automatically, so that data field changes are handled efficiently and correctly, is disclosed.
In the disclosed ETL system 352, the metadata repository is a shared metadata repository 354. The shared metadata repository 354 shares metadata with other business intelligence components. Throughout this specification, “business intelligence application” or BIA refers to a set of agents and/or components configured to receive, store, integrate, process, and/or provide access to (e.g., as output, such as a report) business data from one or more sources. In the example shown in
In comparison to
In some embodiments, the user and/or administrator have the ability in ETL 352 to define views and insert/update statements for data movement corresponding to at least one shared metadata entry in shared metadata repository 354. In some embodiments, the user and/or administrator have the ability in data warehouse 356 to create columns and tables to represent at least one shared metadata entry in shared metadata repository 354. In some embodiments, the user and/or administrator have the ability in OLAP cube 358 to categorize, summarize and aggregate data in at least one shared metadata entry in shared metadata repository 354. In some embodiments, the user and/or administrator have the ability in UI 360 to build reports using at least one shared metadata entry shared metadata repository 354 from a zero footprint web interface. In some embodiments, the user and/or administrator have the ability to incorporate a new type of BIA stack element, for example, data mining, into the auto-propagation scheme.
With a centralized shared metadata repository 354, a dynamic data field can be used that expands and contracts automatically with data field changes without the overhead of a static data field. A technique to implement a dynamic data field is disclosed.
Source data agents are shown as source A data agent 402, source B data agent 404 and source C data agent 406, as data agents, for example for corresponding sources 302, 304 and 306. In some embodiments there may be less than three or more than three data agents. The data agents are connected to the application repository server 408 which enables the BIA 362. In some embodiments, the server 408 may be spread across multiple physical servers. The server 408 interacts with an application developer 412, through an application design tool 410. The server 408 also is connected to the shared metadata repository 354.
In step 502, a generic source account is used to extract standard metadata from each source to the repository server 408. For example, if an SFA source is used, a test or generic SFA account is set up to extract the source standard metadata without any custom data fields.
In step 504, the source standard metadata is used to provide to applicable BIA stack elements a default or initial definition of data flow connections between data fields and tables.
In step 506, the generic account is compared with the actual source account to find optional and/or custom fields. Extension points are added to the shared metadata repository 354 by using dynamic data fields. Dynamic data fields are analogous to dynamically allocated variables in programming; they allow more than one or more attributes of a new data field, such as data type or size, to be defined dynamically, rather than requiring static definition at BIA design time. In some embodiments, an administrator defines, e.g., at BIA installation, customization, or a subsequent time, how detected dynamic data fields are to be used, for example how and/or where they should be included in the data flows defined for a particular BIA installation. Examples of such uses include what, if any, intermediate processing should be done with respect to data values associated with a dynamic data field and whether/how such data should be included in reports. In some embodiments, a user interface (UI) and/or related component interacts with a human user, such as an administrator, to receive input regarding how data associated with a new field is to be used.
In step 602, the dynamic data field is defined in each table that requires a custom or optional field. In step 604, the dynamic data fields are plumbed throughout the BIA according to industry best practices with the assistance of the application developer 412. “Industry best practices” are defined throughout this specification as practices generally accepted, through theory or experience, as safe data flow connections of these dynamic data fields. An example might be a dynamic data field with a ratings (from 1 through 10) type; a safe placement might be to average two ratings together, but an unsafe placement would be the addition of two ratings which would not make sense.
After all the dynamic data field data flow connections have been established and are verified, the design flow is complete and the BIA is ready for either configuration or run-time.
For the example data flow graph in
With the generic template, it is determined in step 504 that Source A table 710 has a data field 6 that is split using the Split “A” 716 algorithm to two fields in ETL table 714. Similarly, Source A table 710 has a data field 5 and Source B table 712 has a field ii that when combined are re-split using the Split “B” 718 algorithm to two different fields in ETL table 714.
A dynamic data field is introduced in step 604 as part of Source A table 708. According to industry best practices, the dynamic data field can be joined using algorithm Join 720 with Source A table 710's field 7 to a field in ETL table 714. This example only shows the data flow graph between data agents and ETL, and may be continued from ETL through data warehouse, OLAP and UI. This is shown in the next figure,
As shown in
In step 802, the administrator starts the configurator process and extracts the user metadata from each of its sources. In step 804, the differences between the current user metadata and the previous template are compared using a “diff” type tool. In some embodiments, the previous template when initially run may be the generic template of step 504.
In step 806, the discovered differences are classified as either new data fields, deleted data fields or modified data fields and presented to the administrator. In step 808, the administrator is permitted to bind the new data fields to dynamic data fields. In some embodiments, this binding may be done without the administrator's assistance.
In step 810, the union of the previous template metadata and the bindings of the configurator are combined to present new shared metadata. In step 812, the new shared metadata is deployed to generate the runtime objects such as table schema and XML files for the OLAP metadata.
The configurator flow of
Although the foregoing embodiments have been described in some detail for purposes of clarity of understanding, the invention is not limited to the details provided. There are many alternative ways of implementing the invention. The disclosed embodiments are illustrative and not restrictive.