IBM WebSphere Data Integration Suite comprises a job design tool that is used to design data flow between stages, known within the product as “Jobs” using a graphical user interface (GUI). An example GUI window 10 produced by the job design tool is shown in
Jobs, stages 12, links 14 and columns all have “Properties” that further define their behaviour. The job design tool allows the user to drag and drop stages 12 and links 14 onto a “canvas” that represents the overall job design; then to navigate the canvas, select a stage 12 by pointing at it, and open a properties editor that dives down into a stage's link-level inputs and outputs, and columns, to edit the various properties. In this way very complicated data flow graphs can be built up, containing several levels with large amounts of metadata.
A perennial issue for designers has been how to compare versions of a job design in a genuinely useful way. The current approach is to export a job's overall metadata as an XML representation, and use a standard XML-oriented diff tool to compare two XML documents generated from two copies of the job. The problem with this approach is that, except in trivial cases, there is insufficient context for the designer to compare the designs, and be able to distinguish between differences in structure or properties. Also, there is often unnecessary detail shown in terms of what has not been changed.
According to a first aspect of the present invention there is provided a method of analyzing a set of data, the set of data being derived from an original set of data, comprising: comparing the derived set of data with the original set of data; generating a hyperlink to represent each difference between the derived and original data set, each difference being a changed item, an additional item, or a missing data item; providing at least one agent that is activated on selection of the hyperlink to operate on the changed, additional or missing data item; whereby a list of hyperlinks is generated to represent all the differences of the derived and original data set and selecting one of the hyperlinks will execute an agent to operate on a single data item in one of the data sets.
According to a second aspect of the invention there is provided a data comparison tool for analyzing a set of data, the set of data being derived from an original set of data, comprising: a comparator for comparing the derived set of data with the original set of data; a link generator for generating a hyperlink to represent each difference between the derived and original data set, each difference being a changed item, an additional item, or a missing data item; and at least one agent that is activated on selection of the hyperlink to operate on the changed, additional or missing data item; whereby a list of hyperlinks is generated to represent all the differences of the derived and original data set and selecting one of the hyperlinks will execute an agent to operate on a single data item in one of the data sets.
A comparison tool has been implemented that has knowledge of the major structural components of a job design and extracts the differences between two designs as hierarchical information so that it can be presented as an expandable tree in the context of the original jobs being compared.
The differences between two designs are presented hierarchically for ease of exploration, and as the user selects items in the comparison window the relevant part of one or both job design canvases is highlighted. The nodes of the tree represent the basic structure that the user navigates to set properties (Job>Stage>Input/Output>Column). A node represents either a repeating group (e.g. stages, inputs) or a specific item at that level (i.e. name of the stage or link involved). Leaves of the tree represent property or name differences, or the addition or removal of a node. Nodes are only present if there is a change to some leaf below it—so only the parts of the jobs that differ will appear in the tree.
If the user selects a node in the comparison window that represents a stage, the stage or stages involved will be highlighted in the job design windows. If the node represents a changed stage, both stage icons are highlighted on both canvases. If it represents an added or removed stage, the stage icon is highlighted on only one canvas—that of the job where the stage exists. Furthermore, leaf items that describe any change are presented in a “hyperlink” style, so that clicking on the appropriate part of the item takes you directly to the job editor dialog that contains the property in question. The dialog may pertain to either of the two jobs being compared, depending on which hyperlink was selected. This makes it easy to look at a changed property in the context it is being used, either in the “before” version of the job, or the “after” version.
A tool to allow a user to save the comparison tree as an HTML file has also been implemented. This tool uses dynamic HTML to allow a user to expand and contract the difference tree in the same way as when looking at it via the comparison tool in the context of the job design component. The layout and appearance of the HTML is the same in both the report and the tool.
Embodiments of the invention will now be described, by means of example only, with reference to the accompanying drawings in which:
A data integration system 20 of the preferred embodiment is shown in
The data flow editor displays a selected data set and provides editing functions for the data. For instance, a user selected an original data set 24A, edits data within and saves a new derived data set 24B.
The user interface 23 provides a viewer for the data set and for comparison results of the data sets. In this embodiment the user interface uses operating system windows to display the data and comparisons.
An original data set can be created using the data flow editor and saved in memory 24A. A derived data set can be edited using data flow editor 22 and save in memory 24B.
The data comparison tool 26 comprises: a comparator 28; a link generator 30; an editor agent 32; a undo agent 34; and storage 36 for comparison results. The data comparison tool 26 is controlled by method 300 shown in
The comparator 28 compares a derived set of data with an original set of data.
The link generator 30 generates a hyperlink to represent each difference between the derived and original data set, each difference being a changed item, an additional item, or a missing data item. A list of hyperlinks is generated to represent all the differences of the derived and original data set and selecting one of the hyperlinks will execute an agent to operate on a single data item in one of the data sets. The hyperlink can link to an item in the derived data set which is different to a corresponding item in the original data set. The hyperlink can also link to an item in the derived data set which was added to the original data set. The hyperlink can also link to an item in the original data set that was removed in the derived data set.
The editor agent 34 allows the user to confirm or make further changes to the data item. The editor agent can be a simple text editor or a special editor for that particular data item.
The undo agent 36 undoes the difference wherein a changed item is changed back, an additional item is removed and a missing item is replaced.
The comparison tool method 300 defines a sequence of steps that the tool takes once operated.
The first step 302 the comparator is prompted to compare the derived set of data with the original set of data.
The next step 304, the link generator is prompted to generate a hyperlink to represent each difference between the derived and original data set, each difference being a changed item, an additional item, or a missing data item.
In step 306, the list of hyperlinks is saved in the memory 36.
In step 308, the list is displayed using user interface 23.
In step 310 an agent is activated on selection of a hyperlink to operate on the changed, additional or missing data item. The editor agent can be activated to allow the user to confirm or make further changes to the data item. The undo agent can be activated to undoing the difference wherein a changed item is changed back, an additional item is removed and an missing item is replaced.
Some examples of how the comparison tool works follow.
The comparison results window in
The next screenshot in
The next screenshot in
It will be clear to one skilled in the art that the method of the present invention may suitably be embodied in a logic apparatus comprising logic means to perform the steps of the method, and that such logic means may comprise hardware components or firmware components.
It will be equally clear to one skilled in the art that the logic arrangement of the present invention may suitably be embodied in a logic apparatus comprising logic means to perform the steps of the method, and that such logic means may comprise components such as logic gates in, for example, a programmable logic array. Such a logic arrangement may further be embodied in enabling means for temporarily or permanently establishing logical structures in such an array using, for example, a virtual hardware descriptor language, which may be stored using fixed or transmittable carrier media.
It will be appreciated that the method described above may also suitably be carried out fully or partially in software running on one or more processors (not shown), and that the software may be provided as a computer program element carried on any suitable data carrier (also not shown) such as a magnetic or optical computer disc. The channels for the transmission of data likewise may include storage media of all descriptions as well as signal carrying media, such as wired or wireless signal media.
The present invention may suitably be embodied as a computer program product for use with a computer system. Such an implementation may comprise a series of computer readable instructions either fixed on a tangible medium, such as a computer readable medium, for example, diskette, CD-ROM, ROM, or hard disk, or transmittable to a computer system, via a modem or other interface device, over either a tangible medium, including but not limited to optical or analogue communications lines, or intangibly using wireless techniques, including but not limited to microwave, infrared or other transmission techniques. The series of computer readable instructions embodies all or part of the functionality previously described herein.
Those skilled in the art will appreciate that such computer readable instructions can be written in a number of programming languages for use with many computer architectures or operating systems. Further, such instructions may be stored using any memory technology, present or future, including but not limited to, semiconductor, magnetic, or optical, or transmitted using any communications technology, present or future, including but not limited to optical, infrared, or microwave. It is contemplated that such a computer program product may be distributed as a removable medium with accompanying printed or electronic documentation, for example, shrink-wrapped software, pre-loaded with a computer system, for example, on a system ROM or fixed disk, or distributed from a server or electronic bulletin board over a network, for example, the Internet or World Wide Web.
It will be further appreciated that embodiments of the present invention may be provided in the form of a service deployed on behalf of a customer to offer service on demand.
It will also be appreciated that various further modifications to the preferred embodiment described above will be apparent to a person of ordinary skill in the art.
Number | Date | Country | Kind |
---|---|---|---|
0614277.2 | Jul 2006 | GB | national |
This application is a Submission Under 35 U.S.C. §371 for U.S. National Stage Patent Application of International Application Number PCT/EP2007/056681, filed 3 Jul. 2007, and entitled METHOD AND APPARATUS FOR COMPARING PROCESS DESIGNS, which is related to and claims priority to European Patent Application Serial Number EP0614277.2, filed 19 Jul. 2006, the entirety of which are incorporated herein by reference. This invention relates to a method and apparatus for comparing process designs. In particular this relates to a method and apparatus for comparing ETL process designs.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP2007/056681 | 7/3/2007 | WO | 00 | 1/13/2009 |