The present invention generally relates to service dataflows, and more particularly relates to the composition of service dataflows.
A dataflow is a software component which processes messages sent from a client to a service such as a web service. Current approaches for dataflow composition generally require a situational approach where specific dataflows have to be manually constructed and statically composed before runtime for specific policies requirements. These conventional approaches are unable to overcome the increasingly complex challenges for consuming these services because they are limited in scope and functionality. For example, in some conventional systems a programmer is required to create specialized dataflows based on particular policies before runtime. Other systems require manual work and intervention.
In one embodiment, a method is disclosed. The method comprises receiving a request from a client. A first set of dataflows associated with the at least one service is retrieved. A dataflow processes messages sent from the client to a service. A dataflow execution plan comprising the first set of dataflows is generated. At least one dataflow in the set of dataflows is determined to be associated with a service policy. A second set of dataflows associated with the service policy is retrieved in response to the determination. The second set of dataflows is inserted into the dataflow execution plan preceding the at least one dataflow.
In another embodiment, a system is disclosed. The system comprises a data composition manager. The data composition manager is configured for receiving a request from a client. A first set of dataflows that enforce at least one set of policies is retrieved in response to receiving the request. Each dataflow in the first set of dataflows is a software component that processes a set of messages sent from the client to a service. A dataflow execution plan is generated that comprises the first set of dataflows. At least one dataflow in the first set of dataflows is determined to be associated with a dataflow policy. At least a second set of dataflows associated with the dataflow policy is retrieved in response to the determination. At the at least second set of dataflows is inserted into the dataflow execution plan preceding the at least one dataflow.
In yet another embodiment, a computer program product is disclosed. The computer program product comprises computer readable storage medium having computer readable program code embodied therewith. The computer readable program code comprising computer readable program code configured to receive a request from a client. A first set of dataflows that enforce at least one set of policies is retrieved in response to receiving the request. Each dataflow in the first set of dataflows is a software component that processes a set of messages sent from the client to a service. A dataflow execution plan is generated that comprises the first set of dataflows. At least one dataflow in the first set of dataflows is determined to be associated with a dataflow policy. At least a second set of dataflows associated with the dataflow policy is retrieved in response to the determination. At the at least second set of dataflows is inserted into the dataflow execution plan preceding the at least one dataflow.
In a further embodiment, a method is disclosed. The method comprises retrieving a dataflow execution plan comprising a set of dataflows. One of an endpoint of the dataflow execution plan and a dataflow in the dataflow execution plan is selected. An incoming message type required by the endpoint or dataflow that has been selected is determined. An output message type provided by one of a dataflow and service request in the dataflow execution plan immediately preceding the endpoint or dataflow that has been selected is determined. The incoming message type that has been determined and the output message type that has been determined are compared. A determination is made, based on the comparison, that the incoming message type and the output message type fail to match. A transformation dataflow is inserted into the dataflow execution plan immediately preceding the endpoint or the dataflow that has been selected. The transformation dataflow automatically transforms an output message of at least one of the dataflow and service request to the incoming message type required by the endpoint or dataflow that has been selected.
The accompanying figures where like reference numerals refer to identical or functionally similar elements throughout the separate views, and which together with the detailed description below are incorporated in and form part of the specification, serve to further illustrate various embodiments and to explain various principles and advantages all in accordance with the present invention, in which:
Operating Environment
A dataflow, in one embodiment, is a software component that process messages sent from a client to a service. Dataflows can be used for functions such as data transformation, logging, and routing. A dataflow typically operates within a runtime environment such as, but not limited to, an Enterprise Service Bus (ESB), which is a software architecture that allows for the performance of complex tasks by a distributed set of middleware services, each service performing a particular component function.
In one embodiment, a service 110 associated with a dataflow is a web service. A web service is a software system which supports automatic interaction with a client over a network. A web service provides a machine-readable interface that describes how the service can be invoked. The policies associated with the dataflows, in this embodiment, are web service policies. A web service policy is a specification that describes the capabilities and constraints for a web service and the policy requirements for a web service client. For example, policies can be used to specify Quality of Service requirements for a client or security requirements for a service. Other policies may include logging, authentication, load balance, fail-over, access control and privacy. A policy has a scope, which determines where and when it applies, and has typically two concepts: the policy assertion, which describes the capabilities or requirements of the policy element it attaches to, and the policy subject, which is the element in the specified domain where the policy attaches to. A more detailed discussion on the definitions of Policy Subject, Policy Enforcement Point and Policy Decision Point can be found in the “Network Working Group Request for Comments: 3198 Category: Informational”, which is hereby incorporated by reference in its entirety.
Dataflow Composition Manager
The DFPR 206 comprises a dataflow definition 236 (dataflow) for each policy domain supported by the DFRT 208 that enforces that policy domain. The DFPR 206 also comprises a transformation 218 for each of the policy domains that specifies how to map from an assertion in that domain to a parameterization of the dataflow. The DFPR 206 identifies the dataflow required for satisfying a given policy and provides a mapping between the parameters defined in the given policy and the parameters required in the identified dataflow.
The DFRT 208 is a runtime environment where the dataflows execute. The DFRT 208 acts as a policy enforcement point for one or more policy domains. The DFRT 208 establishes the current in-scope subject; determines policy domains where multiple policy domains can apply to a subject resulting in the need for multiple data flows; loads the flow or flows from the DFPR 206; transforms assertions in parameters for a dataflow; and executes the final plan. The dataflow planner 210, in one embodiment, comprises a DFRT query module 220, an ordering resolution module 222, a dependency resolution module 224, and an optimizer 226. The DFRT Query Module 220 queries the PDP 204 through the DFRT 208 to establish the current in-scope policy subjects. The ordering resolution module 222 resolves processing order for dataflows returned by the DFPR 206. The dependency resolution module 224 resolves dataflow dependencies. The optimizer 226 eliminates redundant dataflow and outputs the final dataflow plan to the dataflow execution manager 212. For example, the optimizer 226 eliminates duplicate transformations if they can be reused. The dataflow execution manager 212 executes the dataflow plan using the DFRT 208. In one embodiment, the dataflow execution manager 212 comprises a plan converting module 228 that converts the final dataflow plan received from the optimizer 226 to ready-to-run artifacts so that the dataflow can be executed in the DFRT 208.
Dynamic and Automatic Policy-Driven Dataflow Composition
The number of web services available in public and private networks, each with different policies and requirements is increasing very rapidly. As discussed above, conventional dataflow composition systems are unable to overcome the increasingly complex challenges for consuming these services because they are limited in scope and functionality. For example, in some conventional systems a programmer is required to create specialized dataflows based on particular policies before runtime. Other systems require manual work and intervention.
Various embodiments of the present invention, on the other hand, are advantageous over these conventional systems because they dynamically plan, based on a set of policies that are specified for a service, a set of reusable generic dataflows that are sequenced and invoked to satisfy the specified policies. The following is a more detailed discussion of this dynamic and automatic composition of service dataflows based on policies.
In one embodiment, a service request shown as an incoming message 230 in
If the size of the list of dataflows returned by the DFRT 208 is greater than 1 then the dataflow planner 210 queries the policy store 202 (via the DFRT 208) to retrieve order resolution policies 215. Order resolution policies indicate a priority to assign to a given dataflow. In other words, the ordering resolution policies are used to determine the execution order for the current iteration. For example, it can be determined that if there is a logger dataflow and a router dataflow that the router dataflow runs last. Ordering policies 215 can be defined at system level and at the dataflow/endpoint level. Ordering policies that are defined at endpoint level override system-level policies.
The ordering resolution module 222 then resolves the ordering resolution policies 215 and sorts the dataflows accordingly based on the priorities indicated by the ordering policies 215. The dependency resolution module 224 then performs a recursive dependency resolution process. This is advantageous because before a given dataflow executes other dataflows (i.e., dependent dataflows) may be required to achieve the desired end results. In one embodiment, the dependency resolution module 224 selects a current dataflow from the sorted list of dataflows. The dependency resolution module 224 then queries the DFRT 208 to retrieve the dependent policies for the current dataflow. Based on the dependent policies for the current dataflow the dependency resolution module 224 determines if additional dataflows need to be inserted into the current dataflow. If additional dataflows need to be inserted the dataflow planner 210 performs the ordering resolution process for the current dataflow and its inserted dependent dataflows. When all dependencies have been satisfied for all of the dataflows the dependency resolution process completes.
The optimizer 226 then optimizes the resulting sequence of dataflows, referred to as an execution plan 232. For example, the optimizer removes any redundancies, identifies dataflows that can run in parallel, and the like. The dataflow planner 210 then sends the execution plan 232 to the dataflow execution manager 212. The plan converting module 228 then converts the execution plan 232 into a format that is acceptable by the DFRT 208 and runs the plan using the DFRT 208. An outbound message 234 is then generated based on the execution plan 232 and sent to the endpoint, i.e., service.
A message is received from a client 102 requesting a given service. The dataflow planner 210 invokes the DFRT 208 and retrieves a list of dataflows with related transformations for this particular endpoint, as discussed above. For example, the DFRT 208, via the policy decision point 204, determines that this endpoint is associated with logging, routing, and message filtering policies 402, 404, 406. The DFRT 208 then retrieves the LoggerDF 408, RouterDF 410, and FilterDF 412 dataflows and associated transformations from the DFPR 206. Because the DFRT 208 returned more than 1 (or any other given threshold) dataflow, the dataflow planner 210 retrieves one or more ordering policies 215 from the policy store 202 and orders the dataflows accordingly, as discussed above. In the current example, the ordering policies indicate that the LoggerDF 408 has the highest priority, the FilterDF 412 has the next highest priority, and the RouterDF 410 has the lowest priority.
Therefore, based on the ordering policies 215, the dataflow planner 210 creates the execution plan 500 shown in
The dataflow planner 210 updates the execution plan 500 of
Similar to the example above, a set of policies are attached to a request message/operation of endpoint. For example, a logging policy 702, a routing policy 704, and a message filter policy 706 are attached to the endpoint. These policies are triggered when a given operation of a service is invoked. Each of these policies is associated with one or more dataflows stored in the DPFR 206. For example, the logging policy 702 is associated with a LoggerDF dataflow 708; the routing policy 704 is associated with a RouterDF dataflow 710; and the message filter policy 706 is associated with a FilterDF dataflow 712. A set of classifications (
A message is received from a client 102 requesting a given service. The dataflow planner 210 invokes the DFRT 208 and retrieves a list of dataflows with related transformations for this particular endpoint, as discussed above. For example, the DFRT 208, via the policy decision point 204, determines that this endpoint is associated with logging, routing, and message filtering policies 702, 704, 706. The DFRT 208 then retrieves the LoggerDF 708, RouterDF 710, and FilterDF 712 dataflows and associated transformations from the DFPR 206. Because the DFRT 206 returned more than 1 dataflow, the dataflow planner 210 retrieves one or more ordering policies 215 from the policy store 202 and orders the dataflows accordingly, as discussed above. In the current example, the ordering policies indicate that the LoggerDF 708 has the highest priority, the FilterDF 712 has the next highest priority, and the RouterDF 710 has the lowest priority.
Therefore, based on the ordering policies, the dataflow planner 210 creates the execution plan 800 shown in
The dataflow planner 210 updates the execution plan 800 of
In the current example, the dataflow planner 210 determines that the EncryptionDF 722 is associated with a DBLookup policy 1002 that is associated with a DBLookupDF 1004 with a databaseLookup classification 1006 as shown in
When a dataflow has a dependent policy on another dataflow the parent dataflow may need to receive parameters from the dependent dataflow as shown above. For example, the EncryptionDF 722 required parameters from the DBLookup dataflow 1004 in order to determine whether encryption should be used. The required parameters, in one embodiment, are declared as part of the domain mapping information within the DFPR.
In addition to the processes/operations discussed above, other processes/operations that are not required to be explicitly defined in policies can also be performed. One example of these types of processes/operations is the automatic transformation of messages. In this embodiment, the dataflow planner 210 compares a message type associated with a message that is being inputted into a dataflow or endpoint with the message types supported by the dataflow/endpoint. Message types supported by dataflows and endpoints can be stored in, for example, a service registry. If the input message is the incoming message for the composed dataflow then the message type of the incoming message is used in the comparison. If the input message is coming from another dataflow then the message type for this dataflow is retrieved from a service registry where the dataflow is registered. If the dataflow planner 210, based on the comparison, determines that the incoming message type and the required message type do not match the dataflow planner 210 determines if an available transformation for the incoming message exists. If a transformation does exist then the dataflow planner 210 inserts the transformation after all dependencies have been resolved.
In this example, the LoggerDF 1406 and the ServiceInvokeDF 1408 use a normalized interface described by the Web Service Description Language (WSDL) while the input message and the endpoint use implementation-specific interfaces, described by the WSDL. The dataflow planner 210 receives a request for a service and invokes the DFPR 206 to retrieve a set of dataflows with related parameter transformations associated with the service. In this example, the dataflow planner 210 retrieves the LoggerDF 1406 and the ServiceInvokeDF 1408. Similar to the processes discussed above, the dataflow planner 210 builds the execution plan 1500 as shown in
As can be seen from the above discussion the dataflow planner 210 dynamically and automatically (i.e., without user intervention) composes service dataflows based on policies. The dataflow planner dynamically plans a set of reusable generic dataflows that are sequenced and invoked to satisfy the specified policies for a requested service. Various embodiments allow the behavior of a dataflow to be modified both dynamically and declaratively. Dynamic modification refers to modifying how a dataflow operates on a per message basis. Declarative modification refers performing these modifications via a simple configuration change, rather than by having to make programming changes. This is advantageous when, for example, a dataflow needs to be able to deal with many different cases and has a requirement to dynamically add new cases that it can deal with. This occurs in many situations, but is particularly characteristic of service gateway connectivity scenarios, in which a single dataflow must cope with service invocations destined for many different services. The dataflow may need to perform a different mediation for each target service and may need to cope with new target services. Each service invocation, destined for each target service, may need different operations performed on it.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method, or computer program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
Aspects of the present invention are described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
Operational Flow Diagrams
Referring now to
The dataflow planner 210, at step 1802, determines if more than 1 dataflow was returned. If the result of this determination is negative, the control flows to step 1808. If the result of this determination is positive, the dataflow planner 210, at step 1804, retrieves ordering policies from the policy store 202. The dataflow planner 210, at step 1806, resolves the ordering policies and sorts the dataflows in an execution plan accordingly. The dataflow planner 210, at step 1808, recursively performs a dependency resolution process for each of the sorted dataflows. The dataflow planner 210, at step 1810, optimizes the resulting sequence of dataflows. The dataflow planner 210, at step 1812, then submits the sequence of data flows to the dataflow execution manager 212 for execution. The control flow then exits at step 1814.
If the result of the determination at step 1906 is positive, the dataflow planner 210, at step 1912, retrieves a dataflow(s) associated with the policy. The dataflow planner 210, at step 1914, updates the execution plan 232 with the new dataflow(s). The dataflow planner 210, at step 1916, analyzes the new dataflow(s) and each dataflow that is prior to the new dataflow. The dataflow planner 210, at step 1918, determines if the current dataflow is associated with a policy. If the result of this determination is positive, the control flow returns to step 1912. If the result of this determination is negative, the dataflow planner 210, at step 1920 determines if all dataflows have been analyzed. If the result of this determination is positive, the control flow exits at step 1922. If the result of this determination is negative, the control flow returns to step 1912.
The dataflow planner 210, at step 2008, compares the message type required by the recipient and the message type of the outputter. The dataflow planner 210, at step 2010, determines if there is a mismatch. If the result of this determination is negative, the dataflow planner 210, at step 2012, either performs the above processes for the next recipient of a message or exits if this process has been performed for all recipients in the execution plan. If the result of this determination is positive, the dataflow planner 210, at step 2014, identifies a dataflow that transforms the outputted message to the type required by the recipient. The dataflow planner 210, at step 2016, adds the identified transformation dataflow to the execution plan immediately before the recipient. The dataflow planner 210, at step 2018, either performs the above processes for the next recipient of a message or exits if this process has been performed for all recipients in the execution plan.
Information Processing System
The information processing system 2100 includes a computer 2102. The computer 2102 has a processor(s) 2104 that is connected to a main memory 2106, mass storage interface 2108, and network adapter hardware 2110. A system bus 2112 interconnects these system components. The main memory 2106, in one embodiment, comprises the dataflow composition manager 112 and its components discussed above.
Although illustrated as concurrently resident in the main memory 2106, it is clear that respective components of the main memory 2106 are not required to be completely resident in the main memory 2106 at all times or even at the same time. In one embodiment, the information processing system 2100 utilizes conventional virtual addressing mechanisms to allow programs to behave as if they have access to a large, single storage entity, referred to herein as a computer system memory, instead of access to multiple, smaller storage entities such as the main memory 2106 and data storage device 2116. Note that the term “computer system memory” is used herein to generically refer to the entire virtual memory of the information processing system 2100.
The mass storage interface 2108 is used to connect mass storage devices, such as mass storage device 2114, to the information processing system 2100. One specific type of data storage device is an optical drive such as a CD/DVD drive, which may be used to store data to and read data from a computer readable medium or storage product such as (but not limited to) a CD/DVD 2116. Another type of data storage device is a data storage device configured to support, for example, NTFS type file system operations.
Although only one CPU 2104 is illustrated for computer 2102, computer systems with multiple CPUs can be used equally effectively. Embodiments of the present invention further incorporate interfaces that each includes separate, fully programmed microprocessors that are used to off-load processing from the CPU 2104. An operating system (not shown) included in the main memory is a suitable multitasking operating system such as any of the Linux, UNIX, Windows, and Windows Server based operating systems. Embodiments of the present invention are able to use any other suitable operating system. Some embodiments of the present invention utilize architectures, such as an object oriented framework mechanism, that allows instructions of the components of operating system (not shown) to be executed on any processor located within the information processing system 2100. The network adapter hardware 2110 is used to provide an interface to a network 104. Embodiments of the present invention are able to be adapted to work with any data communications connections including present day analog and/or digital techniques or via a future networking mechanism.
Although the exemplary embodiments of the present invention are described in the context of a fully functional computer system, those of ordinary skill in the art will appreciate that various embodiments are capable of being distributed as a program product via CD or DVD, e.g. CD 2116, CD ROM, or other form of recordable media, or via any type of electronic transmission mechanism.
Non-Limiting Examples
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.