Data flow system and method for heterogeneous data integration environments

Information

  • Patent Application
  • 20070214171
  • Publication Number
    20070214171
  • Date Filed
    March 10, 2006
    18 years ago
  • Date Published
    September 13, 2007
    17 years ago
Abstract
A computer implemented method, apparatus, and computer usable program code for generating an execution plan graph from a data flow. A metadata representation of the data flow is generated in response to receiving the data flow. A set of code units is generated from the metadata representation. Each code unit in the set of code units is executable on multiple different types of runtime engines. The set of code units is processed to produce the execution plan graph.
Description

BRIEF DESCRIPTION OF THE DRAWINGS

The novel features believed characteristic of the invention are set forth in the appended claims. The invention itself, however, as well as a preferred mode of use, further objectives and advantages thereof, will best be understood by reference to the following detailed description of an illustrative embodiment when read in conjunction with the accompanying drawings, wherein:



FIG. 1 is a pictorial representation of a data processing system in which the aspects of the present invention may be implemented;



FIG. 2 is a block diagram of a data processing system in which aspects of the present invention may be implemented;



FIG. 3 is a block diagram of a data integration system in accordance with an illustrative embodiment of the present invention;



FIG. 4 is an exemplary data flow in a heterogeneous data integration system in accordance with an illustrative embodiment of the present invention;



FIG. 5 is a flow diagram illustrating a processing framework for a data integration system in accordance with an illustrative embodiment of the present invention;



FIG. 6 is the data flow of FIG. 4 divided by region in a heterogeneous data integration system in accordance with an illustrative embodiment of the present invention.



FIG. 7 is a flow diagram illustrating a region processing framework for a data integration system in accordance with an illustrative embodiment of the present invention;



FIG. 8 is an exemplary execution plan for the data flow of FIG. 6 for a data integration system in accordance with an illustrative embodiment of the present invention;



FIG. 9 is a flow diagram illustrating code generation in accordance with an illustrative embodiment of the present invention;



FIG. 10 is an exemplary data flow diagram for different runtime engines in accordance with an illustrative embodiment of the present invention;



FIG. 11 is an exemplary flow diagram showing a logical operator graph mapped to an extended query graph model in accordance with an illustrative embodiment of the present invention;



FIG. 12 is an exemplary flow diagram of code generated by a code generation system in accordance with an illustrative embodiment of the present invention;



FIG. 13 is a data flow diagram interconnecting multiple operators for a data integration system in accordance with an illustrative embodiment of the present invention;



FIG. 14 is a table representing operator classification of FIG. 13 in accordance with an illustrative embodiment of the present invention;



FIG. 15 is the data flow of FIG. 13 classified by region in accordance with an illustrative embodiment of the present invention;



FIG. 16 is a partial data flow diagram from FIG. 15 with inserted staging terminals in accordance with an illustrative embodiment of the present invention;



FIG. 17 is the data flow of FIG. 15 with staging terminals separating regions divided into regions and with staging terminals in accordance with an illustrative embodiment of the present invention;



FIG. 18 is a flowchart illustrating operation of a data integration system in accordance with an illustrative embodiment of the present invention;



FIG. 19 is a flowchart illustrating data flow code generation in accordance with an illustrative embodiment of the present invention;



FIG. 20 is a flowchart illustrating the process of converting a logical operator graph to an extended query graph model in accordance with an illustrative embodiment the present invention;



FIG. 21 is a flowchart illustrating operator classification, grouping, and ordering in accordance with an illustrative embodiment of the present invention;



FIG. 22 is a flowchart illustrating classification of operators in the data flow in accordance with an illustrative embodiment of the present invention;



FIG. 23 is a flowchart illustrating operator sequencing in accordance with an illustrative embodiment of the present invention;



FIG. 24 is a flowchart illustrating receiving the next sequence for an operator in accordance with an illustrative embodiment of the present invention; and



FIG. 25 is a flowchart illustrating separating operators into regions in accordance with an illustrative embodiment of the present invention.


Claims
  • 1. A computer implemented method for generating an execution plan graph from a data flow, the computer implemented method comprising: responsive to receiving the data flow, generating a metadata representation of the data flow;generating a set of code units from the metadata representation, wherein each code unit in the set of code units is executable on a plurality of different types of runtime engines; andprocessing the set of code units to produce the execution plan graph.
  • 2. The computer implemented method of claim 1, wherein the metadata representation is a logical operator graph.
  • 3. The computer implemented method of claim 2, wherein the logical operator graph is comprised of operators and wherein the operators are classified into regions in association with different runtime engines that support the operators.
  • 4. The computer implemented method of claim 3, wherein the regions are separated by staging terminals for allowing the regions to transfer data.
  • 5. The computer implemented method of claim 4, further comprising: optimizing the regions.
  • 6. The computer implemented method of claim 5, further comprising: generating the set of code units for each region;generating staging code for the staging terminals separating the regions.
  • 7. The computer implemented method of claim 6, wherein the processing step further comprises: aggregating the set of code units to build a particular execution plan graph for each region.
  • 8. The computer implemented method of claim 7, further comprising: processing each operator in each of the regions wherein processing comprises: resolving staging with neighboring operators;generating a number of code units for each operator to form the set of code unit; andplacing the set of code units into the particular execution plan graph for each region.
  • 9. The computer implemented method of claim 8, further comprising: performing post aggregation processing.
  • 10. The computer implemented method of claim 9, further comprising: aggregating the particular execution plan graph for each region into the execution plan graph.
  • 11. The computer implemented method of claim 1, wherein the processing step further comprises: generating deployment code for preparing one or more runtime engines for execution of the execution plan graph;generating run code for executing the execution plan graph; andgenerating un-deployment code for undoing the effects of the deployment code.
  • 12. The computer implemented method of claim 1, further comprising: executing the execution plan graph, wherein each of the regions is executed by one of the plurality of different types of runtime engines associated with the regions.
  • 13. The computer implemented method of claim 12, further comprising: adding at least one new runtime engine for executing the execution plan graph.
  • 14. The computer implemented method of claim 13, further comprising: establishing a region definition for each region; andestablishing an operator definition for each operator.
  • 15. A data integration system comprising: a processor for processing a code generation system; anda storage operably connected to the processor for storing the code generation system wherein the code generation system may be loaded into a main memory for execution by the processor, wherein the code generation system further comprises: a logical operator graph region classifier for classifying operators in a mixed data flow, segregating the operators into regions, and sequencing the execution of the regions to form classified regions;a region specific classifier and optimizer for generating classified optimized regions;a logical operator graph code generator and optimizer for generating code units from the classified optimized regions;a operator specific code generator for generating a set of code units;a logical operator graph plan aggregator for aggregating a plurality of plans; anda region specific aggregator for aggregating a plurality of regional plans to form an execution plan graph.
  • 16. The data integration system of claim 15, wherein the logical operator graph code generator and optimizer further comprises: a plurality of regional code generators wherein each of the plurality of regional code generators is associated with one of a plurality of different runtime engines.
  • 17. The data integration system of claim 16, wherein each of the plurality of regional code generators generates code for an associated region and generates staging code for staging terminals separating each region.
  • 18. The data integration system of claim 15, wherein the operator specific code generator resolves staging with neighboring operators and places the set of code units into execution plan graphs.
  • 19. A computer program product comprising a computer usable medium including computer usable program code for generating code from a data flow, said computer program product including: computer usable program code responsive to receiving the data flow, for generating a metadata representation of the data flow;computer usable program code for generating a set of code units from the metadata representation, wherein each code unit in the set of code units is executable on a plurality of different types of runtime engines; andcomputer usable program code for processing the set of code units to produce an execution plan graph.
  • 20. The computer program product of claim 19, wherein the computer usable program code for processing the set of code units further comprises: generating deployment code for preparing one or more runtime engines for execution of the execution plan;generating run code for executing the execution plan graph; andgenerating un-deployment code for undoing the effects of the deployment code.
  • 21. The computer program product of claim 19, wherein the metadata representation is a logical operator graph, wherein the logical operator graph is comprised of operators and wherein the operators are classified into regions in association with different runtime engines that support the operators.
  • 22. The computer program product of claim 19, wherein the regions are separated by staging terminals for allowing the regions to transfer data.
  • 23. The computer program product of claim 19, further comprising: computer usable program code for optimizing the regions.
  • 24. The computer program product of claim 22, further comprising: computer usable program code for generating the set of code units for each region;computer usable program code for generating staging code for the staging terminals separating the regions.
  • 25. The computer program product of claim 19, wherein the computer usable program code for processing the set of code units further comprises: computer usable program code for aggregating the set of code units to build a particular execution plan graph for each region.
  • 26. The computer program product of claim 25, further comprising: computer usable program code for processing each operator in each of the regions wherein processing comprises: computer usable program code for resolving staging with neighboring operators;computer usable program code for generating a number of code units for each operator to form the set of code unit; andcomputer usable program code for placing the set of code units into the particular execution plan graph for each region.
  • 27. The computer program product of claim 26, further comprising computer usable program code for performing post aggregation processing.
  • 28. The computer program product of claim 19, further comprising computer usable program code for aggregating the particular execution plan graph for each region into the execution plan graph.
  • 29. The computer program product of claim 19, further comprising: computer usable program code for executing the execution plan graph, wherein each of the regions is executed by one of the plurality of different types of runtime engines associated with the regions; andcomputer usable program code for adding at least one new runtime engine for executing the execution plan graph.
  • 30. The computer program product of claim 29, further comprising: computer usable program code for establishing a region definition for each region; andcomputer usable program code for establishing an operator definition for each operator.