System and method for generating code for an integrated data system

Information

  • Patent Application
  • 20070214111
  • Publication Number
    20070214111
  • Date Filed
    March 10, 2006
    19 years ago
  • Date Published
    September 13, 2007
    18 years ago
Abstract
A computer implemented method, apparatus, and computer usable program code for generating code for an integrated data system. A mixed data flow is received. The mixed data flow contains mixed data flow operators, which are associated with multiple runtime environments. A graph is generated containing logical operators based on the mixed data flow in response to receiving the mixed data flow. The logical operators are independent of the plurality of runtime environments. The graph is converted to a model. The logical operators are converted to model operators associated with the multiple runtime environments. The model operators allow for analysis of operations for the mixed data flow. The model is converted into an execution plan graph. The execution plan graph is executable on different runtime environments.
Description

BRIEF DESCRIPTION OF THE DRAWINGS

The novel features believed characteristic of the invention are set forth in the appended claims. The invention itself, however, as well as a preferred mode of use, further objectives and advantages thereof, will best be understood by reference to the following detailed description of an illustrative embodiment when read in conjunction with the accompanying drawings, wherein:



FIG. 1 is a pictorial representation of a data processing system in which the aspects of the present invention may be implemented;



FIG. 2 is a block diagram of a data processing system in which aspects of the present invention may be implemented;



FIG. 3 is a block diagram of a data integration system in accordance with an illustrative embodiment of the present invention;



FIG. 4 is an exemplary data flow in a heterogeneous data integration system in accordance with an illustrative embodiment of the present invention;



FIG. 5 is a flow diagram illustrating a processing framework for a data integration system in accordance with an illustrative embodiment of the present invention;



FIG. 6 is the data flow of FIG. 4 divided by region in a heterogeneous data integration system in accordance with an illustrative embodiment of the present invention.



FIG. 7 is a flow diagram illustrating a region processing framework for a data integration system in accordance with an illustrative embodiment of the present invention;



FIG. 8 is an exemplary execution plan for the data flow of FIG. 6 for a data integration system in accordance with an illustrative embodiment of the present invention;



FIG. 9 is a flow diagram illustrating code generation in accordance with an illustrative embodiment of the present invention;



FIG. 10 is an exemplary data flow diagram for different runtime engines in accordance with an illustrative embodiment of the present invention;



FIG. 11 is an exemplary flow diagram showing a logical operator graph mapped to an extended query graph model in accordance with an illustrative embodiment of the present invention;



FIG. 12 is an exemplary flow diagram of code generated by a code generation system in accordance with an illustrative embodiment of the present invention;



FIG. 13 is a data flow diagram interconnecting multiple operators for a data integration system in accordance with an illustrative embodiment of the present invention;



FIG. 14 is a table representing operator classification of FIG. 13 in accordance with an illustrative embodiment of the present invention;



FIG. 15 is the data flow of FIG. 13 classified by region in accordance with an illustrative embodiment of the present invention;



FIG. 16 is a partial data flow diagram from FIG. 15 with inserted staging terminals in accordance with an illustrative embodiment of the present invention;



FIG. 17 is the data flow of FIG. 15 with staging terminals separating regions divided into regions and with staging terminals in accordance with an illustrative embodiment of the present invention;



FIG. 18 is a flowchart illustrating operation of a data integration system in accordance with an illustrative embodiment of the present invention;



FIG. 19 is a flowchart illustrating data flow code generation in accordance with an illustrative embodiment of the present invention;



FIG. 20 is a flowchart illustrating the process of converting a logical operator graph to an extended query graph model in accordance with an illustrative embodiment the present invention;



FIG. 21 is a flowchart illustrating operator classification, grouping, and ordering in accordance with an illustrative embodiment of the present invention;



FIG. 22 is a flowchart illustrating classification of operators in the data flow in accordance with an illustrative embodiment of the present invention;



FIG. 23 is a flowchart illustrating operator sequencing in accordance with an illustrative embodiment of the present invention;



FIG. 24 is a flowchart illustrating receiving the next sequence for an operator in accordance with an illustrative embodiment of the present invention; and



FIG. 25 is a flowchart illustrating separating operators into regions in accordance with an illustrative embodiment of the present invention.


Claims
  • 1. A computer implemented method for generating code for an integrated data system, the computer implemented method comprising: receiving a mixed data flow, wherein the mixed data flow contains mixed data flow operators, which are associated with a plurality of runtime environments;responsive to receiving the mixed data flow, generating a graph containing logical operators based on the mixed data flow, wherein the logical operators are independent of the plurality of runtime environments;converting the graph to a model wherein the logical operators are converted to model operators associated with the plurality of runtime environments, wherein the model operators allow for analysis of operations for the mixed data flow; andconverting the model into an execution plan graph, wherein the execution plan graph is executable on different runtime environments.
  • 2. The computer implemented method of claim 1, wherein the graph is a logical operator graph, wherein the graph operators are logical operator graph operators, wherein the model is a extended query graph model, and wherein the model operators are extended query graph model operators.
  • 3. The computer implemented method of claim 1, further comprising executing the execution plan graph using an execution engine, wherein the execution engine invokes one or more runtime engines.
  • 4. The computer implemented method of claim 2, wherein the one or more runtime engines is any of an extract, transform, load engine, a DataStage engine, and structured query language engine.
  • 5. The computer implemented method of claim 2, wherein the converting the graph step further comprises: mapping the logical operator graph operations to the extended query graph model operators; andtransforming logical operator graph operations relationships to extended query graph model operations relationships.
  • 6. The computer implemented method of claim 2, wherein the converting the graph step further comprises: converting a logical operator graph operation directly to an extended query graph model quantifier.
  • 7. The computer implemented method of claim 2, wherein the converting the graph step further comprises: mapping properties of a logical operator graph operation to properties of an extended query graph model entity.
  • 8. The computer implemented method of claim 2, wherein the converting the graph step further comprises: mapping a logical operator graph operation to a property of an extended query graph model operator.
  • 9. The computer implemented method of claim 2, wherein the converting the graph step further comprises: transforming a logical operator graph operation to any of a set of table functions, and stored procedures to invoke an executable program.
  • 10. The computer implemented method of claim 2, wherein the converting the graph step further comprises: converting an expression in the logical operator graph to an expression tree in the extended query graph model.
  • 11. The computer implemented method of claim 2, further comprising: performing analysis and optimization of the logical operator graph, extended query graph model, and the execution plan graph.
  • 12. The computer implemented method of claim 2, wherein the extended query graph model includes structured query language operations, executable operations, and custom operations.
  • 13. The computer implemented method of claim 2, wherein the logical operator graph is a metadata representation of the mixed data flow.
  • 14. The computer implemented method of claim 1, wherein the mixed data flow is received from a user.
  • 15. A system comprising: a graphical user interface for allowing a user to create a mixed data flow, wherein the mixed data flow contains mixed data flow operators, which are associated with a plurality of runtime environments;a code generation system operably connected to the graphical user interface wherein the code generation system receives the mixed data flow from the graphical user interface, generates a graph containing logical operators based on the mixed data flow wherein the logical operators are independent of the plurality of runtime environments, converts the graph to a model, wherein the logical operators are converted to model operators associated with the plurality of runtime environments, wherein the model operators allow for analysis of operations for the mixed data flow, and converts the model into an execution plan graph, wherein the execution plan graph is executable on different runtime environments.
  • 16. The system of claim 15, wherein the, graph is a logical operator graph, wherein the model is an extended query graph model, wherein the code generation system maps the logical operator graph operations to extended query graph model operations, and wherein the functionality and performance of the mixed data flow is maintained in the execution plan graph.
  • 17. The system of claim 15, wherein a plurality of runtime engines execute one or more regions of the execution plan graph.
  • 18. The system of claim 17, wherein the plurality of runtime engines are any of an extract, transform, load engine, a DataStage engine, and structured query language engine.
  • 19. A computer program product comprising a computer usable medium including computer usable program code for generating code for an integrated data system, said computer program product including: computer usable program code for receiving a mixed data flow, wherein the mixed data flow contains mixed data flow operators, which are associated with a plurality of runtime environments;computer usable program code responsive to receiving a mixed data flow, for generating a graph containing logical operators based on the mixed data flow, wherein the logical operators are independent of the plurality of runtime environments;computer usable program code for converting the graph to a model wherein the logical operators are converted to model operators associated with the plurality of runtime environments, wherein the model operators allow for analysis of operations for the mixed data flow; andcomputer usable program code for converting the model into an execution plan graph, wherein the execution plan graph is executable on different runtime environments.
  • 20. The computer program product of claim 19, further comprising: computer usable program code for mapping the graph operations to the model operators; andcomputer usable program code for transforming logical operator graph operations relationships to extended query graph model operations relationships.
  • 21. The computer program product of claim 19, comprising computer usable program code for mapping the graph operations to any of a model quantifier, a property of a model operator, a set of table functions, and a stored procedure to invoke an executable program.
  • 22. The computer program product of claim 19, wherein the computer usable program code for converting the model into an execution plan graph further comprises: computer usable program code for mapping properties of a graph operation to properties of an model entity.
  • 23. The computer program product of claim 19, wherein the computer usable program code for converting the model into an execution plan graph further comprises: computer usable program code for converting an expression in the graph to an expression tree in the model.
  • 24. The computer program product of claim 19, further comprising: computer usable program code for performing analysis and optimization of the logical operator graph, extended query graph model, and the execution plan graph.
  • 25. The computer program product of claim 19, wherein the graph is a logical operator graph, wherein the graph operators are logical operator graph operators, wherein the model is an extended query graph model, and wherein the model operators are extended query graph model operators.
  • 26. The computer program product of claim 25, wherein the logical operator graph is a metadata representation of the mixed data flow.
  • 27. The computer program product of claim 19, wherein the mixed data flow is received from a user.
  • 28. The computer program product of claim 19, wherein a plurality of runtime engines execute one or more regions of the execution plan graph.
  • 29. The computer program product of claim 28, wherein the plurality of runtime engines are any of an extract, transform, load engine, a DataStage engine, and structured query language engine.