The present invention relates to model analysis in general, and more particularly to providing data lineage information and impact analyses using models.
The information technology (IT) infrastructure of large enterprises may include vast numbers, amounts, and types of assets, including data, computer hardware and software, and sources and consumers of data, making their management a complex task. Two useful tools for managing IT assets within an enterprise are impact analysis and data lineage analysis. In impact analysis one or more assets of an enterprise's information technology infrastructure are analyzed to determine the impact they have on other assets. This is important where, for example, there is a need to modify, suspend, or decommission an asset, such as during routine system maintenance and system upgrades, as well as for disaster recovery planning. In data lineage analysis an analysis is performed of an enterprise's information technology infrastructure and/or an enterprise's operational logs in order to determine the path that data take from their initial entry into or generation within an enterprise to a specific destination within the enterprise.
In recent years enterprises have sought ways to improve the use and management of their IT assets by employing models, such as metadata models, that provide information about their IT assets and their associations. These models are themselves expressed as data that are typically stored in relational databases. Techniques that employ models in support of impact analysis and data lineage analysis are therefore in demand. However, where an enterprise's many IT assets and associations result in increasingly large models that are stored on multiple distributed databases, and where performing such analyses on such models requires increasing amounts of CPU time and other system resources and involves increasing amounts of network communications overhead, efficient model analysis methods would be advantageous.
The present invention provides for improved model-based analysis.
In one aspect of the present invention a system is provided for model analysis, the system including means for accessing a model stored on a computer-readable physical medium, the model having a plurality of classes and associations between the classes, and a model analyzer implemented as computer program embodied on a computer-readable physical medium, the model analyzer configured to query each class in the model that has an association with a class of any instance in a set of source instances, thereby identifying a set of target instances that are associated with any of the source instances.
In another aspect of the present invention a method is provided for model analysis, the method including accessing a model stored on a computer-readable physical medium, the model having a plurality of classes and associations between the classes, and querying each class in the model that has an association with a class of any instance in a set of source instances, thereby identifying a set of target instances that are associated with any of the source instances.
In another aspect of the present invention a computer program is provided embodied on a computer-readable medium, the computer program including a first code segment operative to access a model stored on a computer-readable physical medium, the model having a plurality of classes and associations between the classes, and a second code segment operative to query each class in the model that has an association with a class of any instance in a set of source instances, thereby identifying a set of target instances that are associated with any of the source instances.
The present invention will be understood and appreciated more fully from the following detailed description taken in conjunction with the appended drawings in which:
Reference is now made to
Model 100 is typically stored in a model storage 110, which may be computer memory, magnetic storage, or any other suitable information storage medium. Model 100 may be stored in storage 110 is any suitable format, such as in a relational database (RDB) or object-oriented database (OODB). Model 100 as stored in storage 110 is preferably accessible to one or more computers 112, such as for impact analysis or data lineage analysis as may be performed by a model analyzer 114 whose operation may be controlled by computer 112.
Reference is now made to
It will be appreciated that each pair resulting from the query represents a path segment of one or more unique paths from the root source instance of the analysis to a target instance of a pair. Representations of any of the paths may be created using any suitable format, such as the graph described hereinbelow with reference to
This process of designating “target instances” in one query as “source instances” in the next is preferably repeated until no new path segments are found.
The method of
Given a metadata UML model and an instance (object) of a class:
The pseudo code above assumes that partial paths may be included in the result set, although an alternative implementation might eliminate partial paths from the results.
The query for returning pairs [SourceID, TargetObject] may be expressed as follows:
Input parameters: reference, list of SourceIDs, SourceClass.
The following pseudocode query may be used for returning pairs [SourceID, TargetObject], assuming an ORM (Object/Relational Mapping) layer:
Where an ORM layer does not exist, the pseudocode may be converted into other query language, such as SQL, provided the reference corresponds to an explicit or implicit Foreign Key.
Reference is now made to
(Bob:Computer, Customers:Database)
(Bob:Computer, Orders:Database)
(Bob:Computer, Insurance:Database).
All instances of application 106 having a “read by” association with any of the instances found as a result of the first query are then found as the result of a second query, resulting in the pairs
(Customers:Database, CustReporting:Application)
(Customers:Database, CustSupport:Application)
(Customers:Database, LogisticsWizard:Application)
(Orders:Database, BalanceAnalyzer:Application)
(Orders:Database, Support:Application)
(Orders:Database, LogisticsWizard:Application)
(Insurance:Database, RiskAnalyzer:Application)
(Insurance:Database, Spending:Application).
Finally, all instances of user 108 having a “uses” association with any of the instances found as a result of the second query are then found as the result of a third query, resulting in the pairs
(CustReporting:Application, John:User)
(CustSupport:Application, Jim:User)
(LogisticsWizard:Application, John:User)
(BalanceAnalyzer:Application, Terry:User)
(Support:Application, Jill:User)
(LogisticsWizard:Application, Brian:User)
(RiskAnalyzer:Application, Kim:User)
(Spending:Application, Lori:User).
It may thus be seen that all paths within model 100 may be identified using just three queries. By contrast, a naïve, prior art approach might apply one query to the root source instance Bob:Computer, one query per database instance found, and one query per application found, resulting in 1+3+8=12 total queries for this example.
For lack of room,
It is appreciated that the present invention may be applied to any framework of modeled data, and not just to metadata models. For example, the present invention may be applied to an analysis for an on-line music store where, given a customer order for a music album, a list may be produced of all albums by musicians that ever played with any of the musicians on the ordered album. The list may then be used as part of a promotion offering discounts on the albums found during the analysis.
It is appreciated that one or more of the steps of any of the methods described herein may be omitted or carried out in a different order than that shown, without departing from the true spirit and scope of the invention.
While the methods and apparatus disclosed herein may or may not have been described with reference to specific computer hardware or software, it is appreciated that the methods and apparatus described herein may be readily implemented in computer hardware or software using conventional techniques.
While the present invention has been described with reference to one or more specific embodiments, the description is intended to be illustrative of the invention as a whole and is not to be construed as limiting the invention to the embodiments shown. It is appreciated that various modifications may occur to those skilled in the art that, while not specifically shown herein, are nevertheless within the true spirit and scope of the invention.