This invention relates to web services composition, particularly in a constrained data flow environment.
Web services are self-contained, self-describing, modular applications that can be published, located and invoked across the Internet. They encapsulate information, software or other resources, and make them available over a network via standard interfaces and protocols. They typically are based on industry standard technologies of WSDL (to describe), UDDI (to advertise and syndicate) and SOAP (to communicate). Web services enable users to connect different components within and across organizational boundaries in a platform and language independent manner. New and complex applications can be created by aggregating the functionality provided by existing web services, referred to as service composition, and the aggregated web service is known as composite web service. The constituent web services involved in a service composition are known as component web services. Web service composition enables businesses to interact with each other and process and transfer data to realize complex operations. Furthermore, new business opportunities can be realized by utilizing the existing services provided by other businesses to create a composite service.
Composite web services may be developed using a specification language such as Business Process Execution Language for Web Services (BPEL4WS), or Web Services Choreography Interface (WSCI), or Business Process Modeling Language (BPML) and executed by an engine such as IBM's WebSphere™ Business Integration Process Choreographer, and IBM's Business Process Execution Language for Web Services Java Run Time (BPWS4J). Typically, a composite web service specification is executed by a single coordinator node. The coordinator node receives the client requests, makes the required data transformations and invokes the component web services according to the specification of composite service. This mode of execution is referred to as centralized orchestration. However, in certain scenarios businesses might want to impose restrictions on access to the data they provide or the source from which they can accept data. Centralized orchestration can lead to violation of these data constraints as the central coordinator has access to the output data of all the component web services and all the component web services receive data from central coordinator only.
Alternatively, fully decentralized orchestration might be used, where the original BPEL4WS is partitioned into as many partitions as the number of component web service and each partition resides with the component web service it invokes. The required data transformations are made by these partitions themselves. While a fully decentralized orchestration can overcome many data flow constraints, this approach has certain limitations. Not all the business providing web services may have engines to execute BPEL4WS process. Some of the businesses which have this capability may not allow BPEL4WS processes written by others to execute on their servers. Further, certain data flow constraints can not be met with any of the fully decentralized topologies possible. A fully decentralization approach is described in Chafle et al., “Orchestrating Composite Web Services Under Data Flow Constraints”, in proceedings of the 3rd IEEE International Conference on Web Services, 2005.
Under common data flow constraints neither centralized nor fully decentralized orchestrations of composite web services are practicable. Therefore, the invention provides an improved method and system for composition of web services.
A partially decentralized composition of web services is performed by distributing the coordination responsibility of the component web services, performed at run time by the original centralized composite web service software, to multiple web services. The original software is divided into multiple code partitions and placed among different web services. These code partitions invoke one or more component web services and perform the required data transformation applicable to enable calling and the return of data from the web services. An advantage is that the partitions need not be co-located with the web service it invokes (as against fully decentralized composition) and a therefore the partition(s) may invoke more than one component web service. Also, data transformation is not restricted to domain(s) producing or consuming the data and can be performed by any web service that is eligible to access the data. The web services containing the code partitions that invoke more than one web services and perform the required data transformation are converted into new coordinator nodes. To satisfy any data flow constraints, the data is sent from producer to consumer along a path restricted to the nodes eligible to access this data. The code performing the required data transformation is located on the nodes in this path and may span across multiple nodes.
In the known centralized orchestration approach discussed above, complete BPEL4WS flow resides at the coordinating node which executes it and coordinates the execution of all other component web services. In fully decentralized orchestration approach discussed above, there are no coordinating nodes, there are N number (N is number of component web services involved) of BPEL4WS partitions and all the BPEL4WS code partitions are co-located with the component web service they invoke.
In contrast, in an example embodiment of the present invention, multiple web service domains take up the run time responsibility of coordinating two or more component web services. In other words, the number of code partitions may vary between 1 to N (where N is number of component web services involved), and number of coordinating nodes may be zero or more.
Although BPEL4WS is presented as the preferred embodiment hereinafter, it is to be understood that the invention is not limited to any particular web service composition language, and applies to web services composition in general.
The partitioning generates a set of web services node topologies which meet predetermined data flow and deployment-related constraints, discussed below. The decentralization may lead to many topologies, involving a client interface node, one or more coordination nodes, one or more non-coordinator nodes (usually dependent upon a coordinator node), and one or more component web services. The component web services can depend directly on the receiving node, from a coordination node, or from a non-coordination node. A partition node (node containing BPEL4WS partition, it can be a coordination node or non-coordination node) operates to receive and transform input data (which can include the client request and data returned from a component web service that has already been executed). The partition node then calls any dependent component web services and receives the outputs form those web services. The partition node then transforms such received data and calls any dependent coordination or non-coordination nodes, else returns the results to a higher node in the topology.
In step 16, one of the topologies from the many topologies that are generated is chosen. If more than one topology is generated, then policies/constraints such as ‘topology leading to minimum response time’ or ‘topology having minimum number of hops’ can be used to choose one topology thereof. All partitions of the selected topology are then deployed at their respective system locations in step 18, resulting in a partially decentralized topology. The decentralization takes place at the time of service installation or whenever policies change.
Referring now to the process flow 20 in
Consider a Telecom Service Provider (telco) intending to provide a location-based service to the subscribers where subscribers can get a schedule of movies being screened at movie theaters in a radius of 5 miles from where the subscriber is located. There exists a Yellow Pages service provider incorporating web service that can provide a list of movie theaters and required contact information of the movie theatres in a radius of 5 miles from the subscriber's location. Also, the movie theaters deploy web services that provide movie schedules.
The telco develops a composite web service, that makes use of the web services provided by the Yellow Pages service provider and web services of the movie theaters, to provide a location based movie schedule service to subscribers. Without any data flow constraints, the telco can create composite web service(s) using centralized orchestration. The composite web service(s) deployed at the telco asks the Yellow Pages service provider to fetch a list of movie theaters located within a radius of 5 miles of the subscriber. The composite web service(s) requests all the movie theaters fetched by the Yellow Pages service for the movie schedules, and returns the consolidated schedule to the subscriber.
However, in a real world scenario, there may be constraints on data sharing. The example scenario discussed above has following data flow constraints:
Besides the data flow constraints, the following deployment constraints of runtime infrastructure also exist.
With these constraints in place, centralized orchestration of the composite service is not possible as it violates constraint 1. Alternatively, a fully decentralized orchestration approach can be used to overcome such data flow constraints. In one such fully decentralized topology the yellow pages service provider calls the movie theaters on behalf of the telco and the movie theaters send their schedule directly to the telco. The issue with this topology is that the telco does not know how many movie theaters were contacted and thus how many responses the telco needs to wait for to complete the response for the request. Therefore, this is not a valid topology. In another possible fully decentralized topology, the movie theaters may send their schedule directly to the customer. This requires sensitive customer information related to the subscribers to be disclosed to the movie theaters which violates the data flow constraint 2 and hence prohibits the use of this topology.
Besides the data flow constraints discussed above, the fully decentralized topologies cannot meet the deployment constraints mentioned in items 3 and 4 above as any decentralized topology would require the movie theaters to run a partition of BPEL4WS process written by the telco.
In one embodiment shown in
On receiving a request 34 from the client Telco subscriber 32 (i.e., a client node), the partition 37 within the Telco Service Provider's site 36 contacts the partition 40 (within the Domain site 38), which is configured to retrieve the list of movie theatres from a Yellow Pages web service 42. The list retrieved is used to contact the movie theatres 44, collate the results as a response, and returns the results to the partition 37. Thus, it becomes feasible for the partition 37 to compose the response for the web service request 34 even in the presence of the data flow and deployment constraints discussed above.
A partially decentralized orchestration system (such as the example system 30 of
Composite Web Service Node: The node 52 is a client interface node. The BPEL4WS partition, residing in the BPEL4WS engine 54 at this node 52, is configured to receive any client requests (e.g., the block 24 in
Component Web Service Node (with BPEL4WS partition): The nodes 70, 90 are component web service nodes with BPEL4WS partition (and thus examples of coordination nodes), which invoke the local web service, may also invoke web services in other domains (e.g., the web service of node 110) and may also perform the required data transformation. In terms of coordination responsibility, the component web service nodes with BPEL4WS partition can be further categorised as coordination nodes and non-coordination nodes. Node 90 invokes local web service as well as web services in other domains (web service of node 110), the node has to coordinate the invocation and execution of multiple web services and is thus termed as coordination node. Node 70 on the other hand invokes only the local web service, no coordination among multiple web services is involved and thus the node is termed as non-coordination node. The nodes 70, 90 include a partition deployer 76, 96, a constraint reinforcer 78, 98, a BPEL4WS Engine 72, 92, a monitoring agent 80, 100 and the component web service 74, 94. The data flow constraints are stored in the database (DB) 102.
Component Web Service Runtime Node (without BPEL4WS partition): The node 110 does not contain any BPEL4WS partition related to the composite web service (and thus an example of a non-coordination node). That is, it may be a node that does not have a BPEL4WS engine or it may not allow BPEL4WS partitions for this composite web service. The node 110 includes a monitoring agent 112 and the component web service 114. The data flow constraints are stored in the database (DB) 116.
The status monitor 62 receives status information (i.e., as shown by the dashed arrowheaded lines) from the monitoring agent 60 of its own node 52, and from the monitoring agents 80, 100, 112 of the other nodes 70, 90, 110.
The deployment manager 58 receives the topology (i.e., a set of BPEL4WS flows, 56) selected for deployment from a topology selector 150 (shown in
The partition deployer 76, 96 has two main functions: constraint checking and verification, and deployment. The partition deployer 76, 96 verifies that the BPEL4WS partitions are allowed to be deployed at the respective node 70, 90, and the partitions are allowed from this composite web service runtime environment 52 and authenticity of the composite web service runtime environment 52. The partition deployer 76, 96 further verifies whether the partition submitted for deployment at the respective node 70, 90 satisfies all applicable data flow constraints. Constraint checking and verification is essential because the partition is generated by an external entity and after deployment the partition executes within the domain as a trusted piece of code and has full access to unencrypted output data of the component web service if encryption is being used.
The partition deployer 76, 96 accepts the incoming BPEL4WS partition form the deployment manager 58 and passes the partition to the constraint reinforcer 78, 98 to generate the additional set of constraints. In cases where encryption is being utilized, the constraint reinforcer 78, 98 will also be utilized to add additional security policies to the existing security policies so that any confidential data that is flowing out of that node in the form of newly created message types is also encrypted. The partition is then passed through a constraint checker (not shown, but a part of the partition deployer 76, 96) that checks that the partition adheres to all the data flow constraints. After constraint checking and verification, the partition is then deployed on to the BPEL4WS engine 72, 92.
The goal of the constraint reinforcer 78, 98 is to ensure that the data flow constraints are applied to any new message types that are generated as a result of data transformations being applied to a message type that was part of the original constraints. This new set of constraints will be similar to the ones that already exist for the original message type differing only in the name of the message type and message fields.
The constraint reinforcer 78, 98 uses the Data Dependence Graph (DDG) to trace the transformation of the output data of the component web service. For each partition, the constraint reinforcer 78, 98 searches for all invokes/replies in that partition. For each invoke/reply the constraint enforcer 78, 98 extracts the input message type. The constraint reinforcer 78, 98 uses the DDG to trace back to the origin of this input message type. The constraint reinforcer 78, 98 then searches for all the constraints in the constraints database 82, 102 that have this original message type as part of the tuple (see below). For all such constraints, the constrain enforcer 78, 98 generates a new set of constraints essentially similar to the original ones but with the original message type and message field names replaced by the newly generated message type and message field names.
Constraints are expressed as a 3-tuple of <source, destination, MessageType>. Both the source and the destination are expressed in terms of a domain name. MessageType is the input message type that a particular port type expects. Constraints fall under the “Allowed” and “Not Allowed” categories. “Allowed” constraints are those where either a source can send data to given set of destinations, or where a destination can accept data from the given set of sources. “Not allowed” constraints are those where either a source cannot send data to a given set of destinations or where a destination cannot receive data from given set of sources. The source and destinations can also be expressed in terms of domain name sets e.g. *.co.jp for all companies located in Japan.
In the movie theater example described above, the data flow constraints for the Yellow Pages Service Provider 42 can be expressed as follows:
The “Allowed” and “NotAllowed” constraints can appear in any relative order in the Rules schema with the condition that more specific constraints appear first followed by the less specific ones.
Decentralizer: The decentralizer 152 partitions the composite web service specification using program analysis techniques taking data flow constraints and deployment related constraints into account. The partitions are composite web service specifications themselves that execute at distributed locations and can be invoked remotely. The decentralizer 152 also generates the WSDL descriptors for each of these partitions. The WSDL descriptors permit them to be deployed and invoked in the same way as any standard web service.
An algorithm to create decentralized topologies from a given composite BPEL4WS specification will now be described with reference to the state diagram 170 of
A Program Dependence Graph (PDG) based code partitioning algorithm designed for multiprocessor execution can be used to implement the state diagram 170. Such an algorithm creates independently schedulable tasks at the granularity of partitions of a PDG. To reduce overhead, such algorithms try to merge several PDG nodes to create a larger partition, possibly sacrificing parallelism.
A Threaded Control Flow Graph (TCFG) representation 172 of the composite web service is created. The data dependencies (not shown) are added to this TCFG 172 to get a first PDG 176. For BPEL4WS flows, special handling related to flow and sequence activities are performed. From the control dependence point of view, all the activities inside one leg of the flow activity are dependent on the flow activity, which is in turn dependent on its container activity (e.g., flow or sequence). All the legs of the flow activity have no control dependence among them except for the explicit link constructs. Further, flow activity does not have any data dependence, and the removal of this flow activity from the PDG 172 makes no difference to composition of web services. Similarly, the purpose of sequence activity is to provide a container for other activities and it does not have any real data or control dependency, and therefore all the sequence activities can also be removed from the PDG with out any loss of information. This PDG 176 is further modified to ensure that all the data dependent edges (other than loop-carried dependencies and across TCFG-level edges) are from the left to right direction. These modifications are done by reordering the activities. In this state, the PDG may have data dependent edges across various hierarchical levels of the PDG. These across TCFG-level edges are now broken into at most three different data dependence edges as follows:
A PDG-based code partitioning algorithm 178 first breaks the PDG 176 into independently executable program sections, which in this case are individual BPEL4WS activities, and then tries to merge them to create manageable number of partitions. Consequently, the problem of code partitioning in this case is actually merging individual activities together to create partitions, which are semantically similar to the input BPEL4WS specification 174.
For this purpose, starting at the bottom of the Program Dependence Tree, the sibling nodes that have the same control dependence condition are identified and those nodes are merged. Two sibling nodes in the PDG 176 that have the same control dependence relationship between them if the reversal of flow order of these two nodes does not violate any other dependency. Once all the nodes at one level are merged, this algorithm is applied recursively to the higher levels of the tree till the root node is reached. The result is the partitioned output BPEL4WS specification 182. An informal description of the algorithm is as follows:
Topology Selector: The topology selector 150 ranks the topologies generated by the decentralizer 152 according to some given criteria such as “minimum response time” or “maximum throughput” or “minimum data transfer”. The best topology as ranked by the topology selector 150 is chosen for deployment. The topology selector 150 receives all the topologies generated by the decentralizer 152 as its input. It also takes one or more appropriate criterion to rank the topologies.