This application is related to commonly assigned U.S. application Ser. No. 11/872,385, filed Oct. 15, 2007, commonly assigned U.S. application Ser. No. 11/970,262, filed Jan. 7, 2008, commonly assigned U.S. application Ser. No. 11/971,056, filed Jan. 8, 2008, commonly assigned U.S. application Ser. No. 11/971,068, filed Jan. 8, 2008, commonly assigned U.S. application Ser. No. 12/055,606, filed Mar. 26, 2008 and commonly assigned U.S. application Ser. No. 12/106,757, filed Apr. 21, 2008, the disclosures of which are all incorporated by reference herein in their entirety.
1. Technical Field
The present invention relates to the assembly of parametric information processing applications.
2. Discussion of the Related Art
Configurable applications for automating processing of syndication feeds (i.e., Atom and RSS) are gaining increasing interest and attention on the Web. There are over 30,000 customized feed processing flows (referred to as “pipes”) published on Yahoo Pipes, the most popular service of this kind. Yahoo Pipes offers hosted feed processing and provides a rich set of user-configurable processing modules, which extends beyond the typical syndication tools and includes advanced text analytics such as language translation and keyword extraction. The Yahoo Pipes service also comes with a visual editor for flows of services and feeds. In an example of a flow of feeds and services shown in
Automatic service discovery and composition is one of the promises of Service Oriented Architecture (SOA) that is hard to achieve in practice. Currently composition is done with graphical tools by manually selecting services and establishing their interactions. Business Process Execution Language (BPEL)-WS has been developed to describe composite services. However, this process is tedious and requires extensive knowledge of services being composed. Automatic composition methods aim to provide a solution to this.
Automatic composition work has been focusing on composition using simple compatibility constraints, as well as semantic descriptions of services, such as Ontology Web Language (OWL)-S. A drawback of these approaches is that they do not provide an easy way of interacting with a composer/user. For example, even if the user is goal-oriented and does not require knowledge of services, the user must be familiar with the ontology that was used to describe the services. Furthermore, it is difficult for novice users to create goal specifications, since that requires studying the ontology to learn the terms the system uses. Also, the ontology does not automatically provide a method for verifying the requests. Hence, users do not have any guidance from the system that could help in specifying requests. This turns service composition into a tedious trial and error process.
Similarly to how programs can be composed of operators and functions, composite services describe service invocations and other low-level constructs. Composite services are processing graphs composed of smaller service components. A service component can be an invocation of an existing service, an external data input (e.g., a user-specified parameter or data source), a data processing operator (e.g., an arithmetic operator), or an other (smaller) composite service specified as a processing graph of service components.
While many execution environments include tools that assist users in defining composite services, these tools typically require a detailed definition of the processing flow, including all service components and communication between the components. One example of this type of tool is IBM WebSphere Studio. An example of an execution environment is a stream processing environment, such as Stream Processing Core (SPC), described in N. Jain, L. Amini, H. Andrade, R. King, Y. Park, P. Selo and C. Venkatramani, “Design, Implementation, and Evaluation of the Linear Road Benchmark on the Stream Processing Core”, Proceedings of ACM SIGMOD 2006.
In contrast, methods such as planning can be used to automatically compose new composite services based on a high-level input provided by the user, since automatic composition methods require less knowledge about the service components and in general only require the user to specify the composition goal in application domain terms.
For purposes of automatic composition, in many scenarios the service components can be described in terms of their data effects and preconditions. In particular, we assume that a description (such as Web Services Description Language (WSDL) or Java object code with optional metadata annotations) of each service component specifies the input requirements of the service component (such as data type, semantics, access control labels, etc.). We refer to these input requirements as preconditions of service invocation, or simply preconditions. The description also specifies the effects of the service, describing the outputs of the service, including information such as data type, semantics, etc. In general, a component description may describe outputs as a function of inputs, so that the description of the output can only be fully determined once the specific inputs of the component have been determined. Note that in practical implementations the invocations can be synchronous, such as subroutine or Remote Procedure Call (RPC) calls, or asynchronous, such as asynchronous procedure calls or message exchange or message flow. In stream processing applications the communication between components requires sending data streams from one component to another in the deployed processing graph.
Under these assumptions, an automated planner can then be used to automatically assemble processing graphs based on a user-provided description of the desired output of the application. The descriptions of the components are provided to the planner in the form of a domain description. The planner can also take into account the specification of available primal inputs to the workflow, if not all inputs are available for a particular planning request.
The planner composes a workflow by connecting components, starting from the primal inputs. It evaluates possible combinations of components, by computing descriptions of component outputs, and comparing them to preconditions of components connected to the output. More than one component input can be connected to one component output or one primal input. Logically, this amounts to sending multiple copies of data produced by the component output, with one copy sent to each of the inputs. In practical implementation these do not have to be copies, and it is possible to pass data by reference instead of by value. The process terminates when an output of a component (or a set of outputs taken together) satisfies the conditions specified in the user goal requirement. Note that all conditions are evaluated at plan time, before any applications are deployed or executed.
If multiple alternative compositional applications can be constructed and shown to satisfy the same request, the planner may use heuristics and utility functions to rank the alternatives and select the highest ranked plans.
The application, once composed, is deployed in an execution environment and can be executed one or more times.
Examples of a planner and an execution environment are described in Zhen Liu, Anand Ranganathan and Anton Riabov, “A Planning Approach for Message-Oriented Semantic Web Service Composition”, in AAAI-2007, and in commonly assigned U.S. application Ser. Nos. 11/872,385 and 11/970,262.
Similar work has been done in the contexts of stream processing, web services and grid computing.
A difficulty in planner-based composition involves providing assistance to users when specifying requests. Here, the system must provide its own capabilities information to the user to indicate which requests can be processed, and which changes are allowed to a last submitted request.
These changes can be specified by the user, when the user chooses one modification of the previous request from a set of possible modifications proposed by the system after analyzing the request, for example as described in commonly assigned U.S. application Ser. Nos. 11/872,385 and 11/970,262.
This approach, however, limits the requests proposed by the system to the set of requests that can be specified by choosing from a finite set of discrete options, and requires a hierarchy of options that helps structure the choices so that the set of options reviewed by the user at each step can be understood by the user. It makes it difficult for the user to specify the parameters of the request that are continuous in nature, even if those values are internally represented by discrete values, such as ‘float’ or ‘real’ data type. It also makes it difficult to specify parameters that are chosen from very large non-hierarchical lists of discrete options, for example choosing a state from a list of 50 states.
In an exemplary embodiment of the present invention, a method for assembling parametric information processing applications, comprises: receiving a composition request; composing a processing graph for the request, wherein the processing graph represents an application that includes at least one component; identifying a deployment parameter of the component and requesting a value of the parameter; receiving the parameter value; applying the parameter value to the application; and deploying the application with the parameter value in an execution environment.
The deployment parameter is a configuration parameter of the component.
The method further comprises storing the received parameter value as a default parameter value. The method further comprises presenting the default parameter value to a user when requesting a value of the parameter.
The parameter value request prompts a user to manually enter the value into a field on a user interface.
The method further comprises: receiving a request to execute the deployed application; identifying an execution parameter for the deployed application and requesting a value of the execution parameter; receiving the value of the execution parameter; invoking the deployed application according to the received execution parameter value; and returning a response provided by the invoked application.
The method further comprises: receiving a request to reconfigure the deployed application; identifying a reconfiguration parameter for the deployed application and requesting a value of the reconfiguration parameter; receiving a reconfiguration parameter value; and reconfiguring the deployed processing graph with the reconfiguration parameter value.
The reconfiguration parameter is a configuration parameter of a component in the deployed application.
The deployment parameter is a parameter operator instance in the processing graph. At least one prompt for each parameter operator instance is presented to a user.
If two deployment parameters are identified that are the same only one parameter value needs to be entered.
In an exemplary embodiment of the present invention, a method for assembling parametric information processing applications, comprises: receiving a composition request; composing a processing graph for the request, wherein the processing graph represents an application that includes at least one component; deploying the application in an execution environment; identifying an execution parameter of the component in the deployed processing graph and requesting a value of the parameter; receiving the parameter value; invoking the deployed application according to the execution parameter value; and returning a response provided by the invoked application.
The method further comprises storing the received parameter as a default parameter value. The method further comprises presenting the default parameter value to a user when requesting a value of the parameter.
The parameter value request prompts a user to manually enter the value in text into a field on a user interface.
The method further comprises: receiving a request to reconfigure the deployed application; identifying a reconfiguration parameter for the deployed application and requesting a value of the reconfiguration parameter; receiving a reconfiguration parameter value; reconfiguring the deployed processing graph with the reconfiguration parameter value.
The reconfiguration parameter is a configuration parameter of a deployed component.
The method further comprises: deploying the reconfigured processing graph in the execution environment; invoking the deployed reconfigured processing graph according to the execution parameter value; and returning a response provided by the invoked processing graph.
The method further comprises: identifying a deployment parameter of the component and requesting a value of the parameter; receiving the parameter value; applying the parameter value to the application; and wherein the application is deployed with the parameter value.
In an exemplary embodiment of the present invention, a method for applying parameters at different stages in the lifecycle of a composed application represented by a processing graph, comprises: composing an application in response to a user-specified goal, wherein the application is represented by a processing graph; deploying the processing graph in an execution environment in a deployment stage; and invoking the deployed processing graph in the execution environment in an execution stage, wherein parameter values are applied to the processing graph in the deployment stage if there are any parameters in the processing graph that require user-input prior to deployment, or wherein parameter values are applied to the processing graph in the execution stage if there are any execution parameter requests.
The foregoing features are of representative embodiments and are presented to assist in understanding the invention. It should be understood that they are not intended to be considered limitations on the invention as defined by the claims, or limitations on equivalents to the claims. Therefore, this summary of features should not be considered dispositive in determining equivalents. Additional features of the invention will become apparent in the following description, from the drawings and from the claims.
The present invention provides a user interface to a processing graph composer that allows users to express composition goals and automatically assemble and configure a corresponding processing graph. In one embodiment, similarly to the embodiments described in commonly assigned U.S. application Ser. Nos. 11/872,385 and 11/970,262, the system uses tag clouds to guide the user in this process, updates the tag cloud based on selected tags, and provides instantaneous feedback as new tags are selected by showing a new tag cloud, a preview of the composed processing graph, and optionally, a preview of the results.
To address the drawbacks of existing systems described earlier, our invention allows the composition of parametric processing graphs, and provides an extended user interface that is dynamically reconfigured in accordance with the automatically composed processing graph in order to notify the user about relevant parameters of the composed graph, and request values of these parameters from the user.
Some key aspects of the invention are:
It is also noted that the invention proposes a method for simplifying service composition and making it accessible to end users, who are familiar with the application domain, but are not necessarily familiar with the set of services that can be composed. The invention enables an intuitively understandable user interface that composes the service based on a minimum required specification, and provides assistance when creating such specification.
The invention further proposes a radically simplified tag-based component description approach to reduce the knowledge engineering work required upfront in order to start using an automatic composer. In addition, the invention proposes a new formalism of planning using tag taxonomies and actions that create new objects.
In the current invention, we have significantly extended the Stream Processing Planning Language (SPPL) planning model to add support for tag taxonomies and tag-based operator instances. It is finally noted that SPPL are good planners for practical implementations, since they can be extended to compute tag clouds and have been shown to be highly scalable with the number of operators.
An automatic composer (i.e., the illustrated simplified composition system), which is described in detail in commonly assigned U.S. application Ser. No. 11/872,385, can compose services using a type-based composition, where modules are chained so that the type of the output of a module matches the type of the input of the following module until the output of the last module in the application matches the user's goals. Other methods involve a richer and more flexible semantic reasoning in order to match inputs and outputs of the modules, and the data sources. Data sources and modules are described individually when they are registered into our system. The description includes the type or a semantic description of the data sources, and the module's input and outputs. For the purpose of illustration we refer to this description as SPPL, which is an exemplary language for use with our invention. Note however that our invention applies to other languages used for describing components that can be assembled into larger applications (e.g., WSDL, etc.).
Data sources and modules can include user-inputs which must be specified by the user at runtime whenever an automatically composed service that contains those components is invoked. In order to allow such interaction we use the concept of user-input modules, each of which requires one user input when it is included in a service. A service can include several of those, in which case the user is prompted to enter a separate parameter for each one of them. Such modules can then be used like any other modules to feed downstream modules in the applications. As mentioned earlier, such matching of user-input modules to other modules can be type-based or semantic based. If it is type based it might be necessary to use type (or semantic) specialization in order to distinguish between the different uses of the same concept. For instance, a ZIP code in a carrier application (USPS, UPS, FedEx, DHL) is used both to mean origin ZIP code and destination ZIP code. In which case it should be differentiated by two different types FromZIP and ToZIP, both derived from a ZIP code type, which can be used in other applications that need a single ZIP code (i.e., local weather, local news, etc.). In our invention, the different parameters (e.g., FromZIP, ToZIP, ZIP) are represented by different modules. The description of the modules includes the types of parameters they provide. The description can also include a default value, which is used for previewing and can be changed by the user, with the possibility to recall the last user-entered value so that it becomes the new default for the user. If the invention is implemented as a service-client application, the last entered value can be stored on the server side in a user-profile database. Or, it can be remembered for the duration of the session, for instance using cookies in a web-based application.
When a service is composed that includes user-input modules, the corresponding processing graph is annotated with meta-data that identify the presence of such user-input, with descriptions such as user-prompts, default values, and value range. The interface is then automatically reconfigured to include fields allowing the user to enter values for the parameters, as specified in the annotation. This is illustrated in
For ease of reference,
The parameters mentioned above are configuration variables that can be applied to the processing graphs at different stages in its lifecycle. Details of applying the parameters and which kinds of parameters are available in a specific implementation of the system depend on the services provided by the execution environment. We distinguish between three different kinds of parameters that can be associated with a processing graph:
The same parameter can belong to one or more categories. Deployment parameters are used to parameterize the processing graph before deployment, and are available in all execution environments, since the substitution of values for deployment parameters can take place before the processing graph is deployed in the execution environment. Examples of deployment parameters are configuration parameters of the processing elements in the Stream Processing Core, or configuration parameters of modules in Yahoo Pipes. Execution parameters can be provided when the deployed graph is executed. For example, composite web services can be invoked with different input parameters. Finally, reconfiguration parameters can be used in some execution environments to reconfigure components of already deployed processing graphs. For example, in the Stream Processing Core, the processing graph begins continuous processing immediately after it is deployed, and a separate execution stage is not required. However, in certain applications individual processing elements can be reconfigured after they are deployed by sending messages to the processing elements containing the name and the new value for a reconfiguration parameter.
The application of the three kinds of parameters is shown in
As shown in
Two exemplary methods for identifying parameters associated with a processing graph will now be discussed.
There are several levels of functionality provided by this invention. The simplest implementation consists of a composer of processing graphs, as described in commonly assigned U.S. application Ser. Nos. 11/872,385 and 11/970,262, and a post-processing module, which receives the processing graph generated by the composer. Using information about parameterization of individual components supplied in component descriptions, the system then requests the user for parameter values for each component that has a parameter or a set of parameters. In this case, the set of configuration parameters presented to the user is the set of all parameters of all component instances included in the processing graph. For example, once the processing graph shown in
The simple approach described above can be improved by using the composer to detect cases where the same parameter value can be assigned to multiple parameters. In the example shown in
A detailed description of an abstract formalism for use with the present invention is now provided.
Abstract Model
First, we will give the formal definition of the compositional semantics of a flow. We address this by defining a model for deriving the semantic description of a flow based on the descriptions of its individual components. A key characteristic of our model is that it captures not only the semantics of inputs and outputs, but also the functional dependency between the outputs and the inputs. This model can also be expressed using an SPPL formalism introduced in A. Riabov and Z. Liu. Planning for stream processing systems. In AAAI'05, July 2005, for describing planning tasks, a copy of which is incorporated by reference herein in its entirety, which allows us to use an efficient planning algorithm for flow composition.
Composition Elements
Objects, Tags and Taxonomies
A taxonomy T={t} is a set of tags (i.e., keywords) t. An object o is described by a set of tags d(o)⊂T selected from the taxonomy T. An object can be, for example, a resource bookmark, as in del.icio.us, or a feed, as in Syndic8.com.
In the simplest case, for example if T is formed as a folksonomy, by people specifying one or more tags to describe certain objects, the tags in T are unrelated and T is completely unstructured. Introducing a taxonomy structure in T, however, enhances query expressivity, as we explain below, and helps keep tag-based descriptions succinct. The structure of the taxonomy is described by specifying a sub-tag relationship between tags. The following definition is the standard definition of a taxonomy sub-tag relation applied to tagging.
Definition 1. A tag t1 ∈ T is a sub-tag of t2 ∈ T, denoted t1::t2, if all objects described by t1 can also be described by t2. The sub-tag relation is transitive, i.e., if t1::t2 and t2::t3 implies t1::t3 for ∀ t1, t2, t3 ∈ T.
For example, NewYorkTimes Newspaper. For notational convenience we will further assume that each tag is a sub-tag of itself, i.e., ∀ t ∈ T, t::t.
If two tags t1, t2 ∈ T are such that t1::t2 and t2::t1, these tags are synonyms, since by definition they describe the same set of objects. We will denote this as t1≡t2.
Queries
Queries are used to describe the desired results produced by a composition (i.e., composition goals), or to specify the input conditions of an operator.
Definition 2. A tag query q⊂T selects a subset Qq(O) of an object set O={o} such that each object in the selected subset is described by all tags in q, taking into account sub-tag relationships between tags. Formally, Qq(O)={o ∈ O|∀ t ∈ q ∃ t′ ∈ d(o) such that t′::t}.
Note that this definition of a query remains equally effective in configurations with implicit taxonomies, where the sub-tag relationships are not stated explicitly, as well in cases where taxonomies have explicitly stated sub-tag relationships.
For example, consider a set of objects O1 and a taxonomy T1 where NewYorkTimes::Newspaper, and some objects in O1 are annotated with NewYorkTimes. Assume that O2 is created from O1 by explicitly annotating every object in the set {o ∈ O1|{NewYorkTimes}⊂d(o)} with Newspaper tag, and taxonomy T2 is the same as T1 but with the sub-tag relationship between Newspaper and NewYorkTimes removed (thus defining an implicit taxonomy). As a result, for q={Newspaper} the selected subset will be the same in both sets of objects.
This is an important property of the proposed approach. It allows mixing implicit taxonomies, typical of folksonomy-like bottom-up modeling approaches, with much more structured and elaborate top-down modeling, which is typical of taxonomies and ontologies. By effectively enabling an easy gradual transition from implicitly defined to explicitly stated sub-tag relationships between tags, as the model evolves, it greatly reduces the effort required for creating a first working set of descriptions compared to the top-down ontology-based modeling approaches, where the significant cost of defining taxonomies must be paid upfront.
Operators
An operator is a basic unit in the composition. Generally, it creates one or more new objects from a subset of existing objects. An operator can require no inputs. When one or more inputs are required, an input condition is specified for each input. The input condition is specified as a tag query, which must be satisfied by the corresponding object provided as input. The outputs are described by specifying tags that are added to and removed from the description of the new objects produced by the output.
The descriptions of the new objects functionally depend on descriptions of input objects. There are two methods of propagating information from the input to the output. The first, explicit, method involves using a typed tag variable that can be bound to one of the tags describing the input object, and then using this variable to describe one or more of the outputs. Note this method can generally be used only to propagate tags of types that are known when the operator is described. In certain cases, however, it is desirable to propagate tags of types that emerge after the operator has been described. To enable the second method of propagation, a special “sticky” tag Ω is defined to serve as a label for automatically propagating tags. If any sub-tag of Ω appears in at least one input object description, it will be automatically added to the description of all output objects.
The following definition captures the properties of an operator explained above.
Let
Given the above parameters of an operator, and
Definition 3. Operator f=<p,{right arrow over (t)},n,{right arrow over (q)},m,{right arrow over (a)},{right arrow over (r)}) is a function on the object set, defined as f(O,{right arrow over (v)},{right arrow over (o)})=O∪O′, where O′={oj′|o∉O}i=1m(f) is the set of new objects produced by the operator, and where
The definition above provides a formula for computing descriptions of new objects produced by the operator: the description of each object is the union of automatically propagated tags derived from Ω and operator-output-specific added tags, minus the set of operator-output-specific removed tags.
Composition
Composition Semantics
A composition of operators is defined simply as the result of applying one operator to the object set produced by another operator.
Definition 4. The composition of l operator instances formed by operators f1, f2, . . . fl applied to object subsets {right arrow over (o)}1, {right arrow over (o)}2, . . . , {right arrow over (o)}l and parameterized with tags {right arrow over (v)}1, {right arrow over (v)}2, . . . , {right arrow over (v)}l correspondingly is the composite operator f=ofj, j=1 . . . l defined as
f(O)=fl( . . . (f2(f1(O,{right arrow over (v)}1,{right arrow over (o)}1),{right arrow over (v)}2,{right arrow over (o)}2)),{right arrow over (v)}l,{right arrow over (o)}l).
Notice that f(O)=O∪O′1∪O′2 . . . ∪O′l, where O′i is the set of new objects produced by operator fi. Also note that input objects for each subsequent operator can be selected from the object set produced by the preceding operator, i.e.,
{right arrow over (o)}1⊂O0≡O
{right arrow over (o)}2⊂O1≡O∪O′1
. . . .
{right arrow over (o)}l⊂Ol-1≡O∪O′1∪O′2∪ . . . ∪O′l-1
Definition 5. The composition is valid when the input conditions of each operator instance fj are satisfied by the object array {right arrow over (o)}j, i.e.,
Subsequent instances of operators may use objects produced by preceding operators as inputs, i.e., there could exist i and j, i<j such that oi∩O′i≠∅. In other words, there is a data dependency between oj and oi. Data dependencies between operator instances within a composition can be represented using a data dependency graph where arcs connect operator outputs to inputs of other operators. Note that under this model the directed data dependence graphs will always be acyclic.
Goal Driven Composition
The problem of goal-driven composition can now be defined as the problem of finding a composition of operators that produces an object satisfying a given query. As an additional simplifying assumption, we assume that the composition is applied to an empty object set. This assumption is not significantly constraining, since the initial objects can always be produced by operators that do not require any input objects. On the other hand, the assumption allows uniform modeling of both feeds and services as operators.
Given a composition problem (T,g), where:
the solution set is defined as follows.
Definition 6. The set of solutions S(T,g) to the goal-driven composition problem (T,g) is the set of all valid compositions F of operators in such that
The second condition in the definition above helps eliminate from consideration inefficient compositions that have dead-end operator instances producing unused objects.
Composition Ranking
Before the set of compositions S(T,g) can be presented to the user, the compositions must be ranked, with those most likely to satisfy a user's intent appearing first in the list. The ranking is based on a heuristic metric reflecting composition quality. Each operator f ∈ F is assigned a fixed cost c(f). Cost of an operator instance in a composition is equal to the cost of the corresponding operator.
Definition 7. Rank rank({circumflex over (f)}) of the composition
{circumflex over (f)}(O)=fn( . . . (f2(f1(O)) . . . )
is the sum of the costs of operator instances, i.e.,
By default for all operators c(f)=1. Hence, the best compositions are the shortest ones. During configuration of the system, the number can be left equal to the default, or configured for some operators to reflect feed or service quality.
Goal Refinement Tag Cloud
The refinement tag cloud provides valuable help to the user in refining the goal. The tag cloud is simply a popularity-weighted set of tags computed over the descriptions of outputs of all compositions in a solution set S(T,g). In theory, if the goal g is empty, the tag cloud is computed over all valid compositions. Although the set of all compositions may indeed be very large, the set of compositions with differently described outputs is much smaller. The SPPL planner can compute the tag cloud without constructing all compositions.
Note that the queries in our model behave as though the super-tags from the taxonomy are always included in the object description with the corresponding sub-tags. The same approach should be used during tag cloud computation. Even if the super-tags are not included in an object description explicitly, they are added to the description automatically for the purposes of computing the weights in the tag cloud. This ensures that even if certain tags do not accumulate enough weight to appear in the visible portion of the tag cloud, they add weight to their super-tags, and will still be accessible through those super-tags.
In the following, we describe how the abstract formalism described above is applied in practice to descriptions of components.
Execution Runtime
As one embodiment, the execution environment can be a simple Java-based runtime. Each service in this runtime implements interface Service with a single public method named process that receives and returns a hashmap containing input and output object values:
The set of hashmap keys used to identify input and output objects in the input and output hashmaps is specific to each service. A separate description is provided to specify the hashmap keys recognized by the service, as well as tag-based annotations on inputs and outputs. This description is then used to construct a description of an operator. Service implementation invokes external web services for sophisticated processing, such as language translation, when necessary.
A simple XML format is used to define a flow and deploy it in the runtime. Once deployed, the flow can be called with user-defined values of parameters, and will produce results.
Flow definition consists of flow inputs (i.e., external parameters), calls (i.e., operator instances) and a flow output. The call elements instruct runtime about the Java classes to be used to process data, and the input objects to be included in the input map. The objects can be specified as string values by specifying value attribute, or linked to outputs of other calls by specifying a link. In the example of
Descriptions
The automatic composer requires descriptions of services, feeds, parameters, and taxonomies. These descriptions are translated into operators and other elements of the abstract model described earlier, which is then used by the planner to generate flows. All descriptions can be specified in one file or broken into multiple files, which are then automatically combined into one logical file before processing.
Tag Taxonomies
Taxonomies are described by specifying sub-tag relationships between tags. A tag does not need to be explicitly declared before it is used, but a tag{ } statement is necessary to declare parents of a tag, which follow after ‘-’, for example:
tag {NYTFrontPage-NewYorkTimes FrontPage}.
Tag names beginning with underscore “_” are hidden tags that are never displayed in a user interface, but otherwise behave as normal tags. Hidden tags can be used to express composition constraints that are internal to the system, for example, type constraints. The special tag Ω is represented as _StickyTag.
Feed Descriptions
In the example of a feed description below, the output annotation uses tags to describe the content of the feed, as well as its language.
Such descriptions can be generated automatically, for example using Syndic8 tags and default values for language. The description is translated into an operator that has no inputs, and produces a single output object tagged with all tags used in output annotation. If this operator is included in a flow composed by the planner, during flow execution the runtime will bind the corresponding operator instance to a built-in service that returns the URL string as a single entry in the hashmap of output objects.
Service Descriptions
Each service can have a number of inputs and outputs. Service description is directly translated into an operator that requires and produces a corresponding number of objects. For example, the following describes a FetchFeed service.
This description uses a variable named ?lang of type _Language, and declares an input and an output. The output list of tags is treated as a list of added tags by default. However, tags preceded with ˜ are interpreted as removed tags.
Note that sub-tags of _Language are not sticky (i.e., are not derived from the special tag represented as _StickyTag), and therefore must be propagated explicitly from input to output using a variable. However, if the FetchFeed operator is applied to the output of the feed operator in the example above, NYTFrontPage tag will be propagated to the output of FetchFeed as well, since that tag is sticky according to the taxonomy in
Each input and output in the description can have a port name specified in square brackets. In this example, only the input has a port name “url”. The port name is the name of the entry in the hashmap that is used to carry the corresponding input or output object. Since there is only one output port, the runtime does not need to know the name of the output object. Finally, java description element specifies the name of the Java class that implements the service.
Flow Parameters and Constants
Flows that take external parameters can also be composed using the same framework. When two or more services within a flow are parametric, the planner can decide whether to expose the service parameters as one input parameter of the flow, or as several separate parameters. This is achieved by using tags to describe service input parameters (as inputs to services), and representing parameter values similarly to feeds, i.e., as operators that produce a single object described by tags. The following is an example of service description that has an external parameter.
Service YNewsSearchURL has two inputs, but the corresponding operator will have only one input. The constant string in quotes is used to initialize the prefix parameter to a constant. In the plan, suffix parameter will be connected to the object produced by the operator corresponding to Destination service. Note that including constants into the description makes it possible to specify different semantic descriptions for different configurations of the same service.
Similarly, YAnswersSearchURL service is described with an additional input that requires tags _SearchQuery and _String. This description allows the two services, YNewsSearchURL and YAnswersSearchURL, to have the same requirements for the input parameter, and therefore allows the instances of those services to be linked to the same parameter value. The input constraints can contain tags that describe both the data type constraints on acceptable parameter values (e.g., whether the parameter must be a string, a number or a date), and semantic constraints (e.g., that the parameter is a query).
More Service Description Examples
The following examples from the sample application further illustrate different services that can be described in this model.
These descriptions describe the services in the application shown in
Implementation of a Planner With the Provided Descriptions.
In one embodiment, planning (i.e., application composition for a user-specified request) can be accomplished by translating the request and the set of tags and component descriptions into SPPL, and providing the translated material as input to an SPPL planner, such as that described in A. Riabov and Z. Liu. Planning for stream processing systems. In AAAI'05, July 2005. Then, the plans produced by the SPPL planner can be trivially translated into compositional applications and deployed in execution environment.
SPPL representation has been described in commonly assigned U.S. application Ser. No. 11/406,002, filed Apr. 18, 2006.
The semantic model naturally maps to the SPPL formalism, which describes the planning domain as a set of actions that can be composed by the planner. The set of actions is created based on the set of operators. Action preconditions, described by predicates, are created based on operator input conditions. Tags are represented as types in SPPL, and preconditions are specified using a variable of the corresponding type. Action effects are mapped to operator outputs. An SPPL predicate propagation mechanism is used for propagation of sticky and regular tags.
Requesting Parameter Values from the User
In the preceding example, the planner composes the processing graph shown in
In one embodiment, the system maintains default parameter values for each parameter operator. These default values are shown to the user in the prompt asking to enter parameter values. For example:
Query: New York Travel
The user can then change the value of the parameter. In one embodiment, the new value of the parameter entered by the user is stored and later presented as a default when the same parameter operator is included in the processing graph.
Note that the number of parameter prompts presented to the user corresponds to the number of parameter operator instances included in the processing graph, and at least one prompt for each parameter operator instance is presented.
Depending on the implementation of the execution environment and, in certain scenarios, subject to the structure of the processing graph, the parameter named “Query” described above can be an execution parameter or a deployment parameter. The value “New York Travel” of an execution parameter is provided as part of an execution request. Alternatively, if “Query” is a deployment parameter, the value of the parameter is fixed at the time of graph deployment and, in general, different values cannot be provided for different execution requests. Instead, the value of a deployment parameter remains the same for each execution of the processing graph.
Finally, if reconfiguration is supported by the processing graph and the execution environment, “Query” can be a reconfiguration parameter. In that scenario, the value of the reconfiguration parameter can be changed between groups of execution requests, remaining the same for each group. For example, the effective value of the parameter can be “New York Travel” for several executions until a reconfiguration request with value “London Travel” is received, after which an effective value of the parameter is changed to “London Travel” for following executions.
Encoding and Applying Parameter Values
Above we described how the parameters are identified and how the values of parameters are requested from the user. The user-specified values for the three different kinds of parameters described earlier are applied to the processing graph at different stages of processing graph lifecycle supported by the execution environment, as shown in
Jetty web server (http://jetty.mortbay.com/), integrated with OSGI platform, is used to host the servlets. SPPL Goal Generator service generates SPPL goal descriptions based on a user-specified set of tags submitted via the Main GUI Servlet.
SPPL Planner service invokes an SPPL planner to process the generated goal and generate a processing graph. Examples of a planner and an execution environment are described in Zhen Liu, Anand Ranganathan and Anton Riabov, “A Planning Approach for Message-Oriented Semantic Web Service Composition”, in AAAI-2007, the disclosure of which is incorporated by reference herein in its entirety.
The Platform Adapter service translates the processing graph produced by the SPPL planner to the format recognized by the target execution environment. The Platform Adapter service can also include procedures for deploying the translated processing graph in the target execution environment, for invoking the deployed processing graph and retrieving results of its execution, and for generating a preview of results received from the processing graph.
The system is configured by providing an SPPL domain description that includes descriptions of all service components and primal data, and optionally a set of component bindings. The component bindings are files used by the platform adapter to generate a platform-specific representation of the processing graph. The component bindings are typically represented as templates, with one template provided for each component, with placeholders that are filled in by the platform adapter to represent connections between components in generated processing graphs.
A brief hardware description of a computer in which the system or parts of the system described above may be implemented will now be described.
As shown in
The memory 1015 includes random access memory (RAM) and read only memory (ROM). The memory 1015 can also include a database, disk drive, tape drive or a combination thereof. The input 1025 is constituted by a keyboard or mouse and the output 1030 is constituted by a display or printer. The network 1035 may be the Internet, for example.
The elements of a user interface according to an exemplary embodiment of the present invention are shown in
A tag cloud is a weighted list of tags. Weights reflect the popularity of tags. Clicking on any tag in the tag cloud adds the tag to the planning goal, and to the list of selected tags. This also leads to a new processing graph being composed, and a new tag cloud. The new tag cloud is created in the context of currently selected tags. In particular, the new tag cloud does not include the selected tags or any other tags that never appear on the same feed description where all selected tags appear. When the new processing graph is constructed, it is immediately deployed and an output feed is shown in a preview window.
Implied tags are tags that always appear together with the selected tags. Guessed tags are tags assigned to the output of the graph, and as such, they do not appear in implied or selected tags.
A processing graph element is a description of the processing graph in graphical form. Clicking on the graph opens an editor window, where the graph can be modified after automatic composition. The editor can be provided by the target execution environment.
A text description of the processing graph is created based on the set of modules included in the processing graph. In our implementation, hovering a mouse over modules in a graphical representation on the left causes a highlight to appear on the corresponding line of the textual description on the right.
A preview (or full view) of results produced by the composed and deployed processing graph is shown in the bottom of the window.
The user interface may also include a search string, where tag goals can be typed in, as an alternative to clicking tags in the tag cloud.
It is understood that the present invention may be implemented in various forms of hardware, software, firmware, special purpose processors, or a combination thereof. In one embodiment, the present invention may be implemented in software as an application program tangibly embodied on a program storage device (e.g., magnetic floppy disk, RAM, CD ROM, DVD, ROM, and flash memory). The application program may be uploaded to, and executed by, a machine comprising any suitable architecture.
It is also understood that because some of the constituent system components and method steps depicted in the accompanying figures may be implemented in software, the actual connections between the system components (or the process steps) may differ depending on the manner in which the present invention is programmed. Given the teachings of the present invention provided herein, one of ordinary skill in the art will be able to contemplate these and similar implementations or configurations of the present invention.
It is further understood that the above description is only representative of illustrative embodiments. For convenience of the reader, the above description has focused on a representative sample of possible embodiments, a sample that is illustrative of the principles of the invention. The description has not attempted to exhaustively enumerate all possible variations. That alternative embodiments may not have been presented for a specific portion of the invention, or that further undescribed alternatives may be available for a portion, is not to be considered a disclaimer of those alternate embodiments. Other applications and embodiments can be implemented without departing from the spirit and scope of the present invention.
It is therefore intended, that the invention not be limited to the specifically described embodiments, because numerous permutations and combinations of the above and implementations involving non-inventive substitutions for the above can be created, but the invention is to be defined in accordance with the claims that follow. It can be appreciated that many of those undescribed embodiments are within the literal scope of the following claims, and that others are equivalent.
Number | Name | Date | Kind |
---|---|---|---|
4571678 | Chaitin | Feb 1986 | A |
5159685 | Kung | Oct 1992 | A |
5187788 | Marmelstein | Feb 1993 | A |
5657428 | Tsuruta et al. | Aug 1997 | A |
5659754 | Grove et al. | Aug 1997 | A |
5675757 | Davidson et al. | Oct 1997 | A |
5675805 | Boldo et al. | Oct 1997 | A |
5696693 | Aubel et al. | Dec 1997 | A |
5937195 | Ju et al. | Aug 1999 | A |
5999729 | Tabloski, Jr. et al. | Dec 1999 | A |
6032142 | Wavish | Feb 2000 | A |
6053951 | McDonald et al. | Apr 2000 | A |
6339783 | Horikiri | Jan 2002 | B1 |
6347320 | Christensen et al. | Feb 2002 | B1 |
6430698 | Khalil et al. | Aug 2002 | B1 |
6601112 | O'Rourke et al. | Jul 2003 | B1 |
6665863 | Lord et al. | Dec 2003 | B1 |
6721747 | Lipkin | Apr 2004 | B2 |
6792595 | Storistenau et al. | Sep 2004 | B1 |
6799184 | Bhatt et al. | Sep 2004 | B2 |
6813587 | McIntyre et al. | Nov 2004 | B2 |
6891471 | Yuen et al. | May 2005 | B2 |
6983446 | Charisius et al. | Jan 2006 | B2 |
7000022 | Lisitsa et al. | Feb 2006 | B2 |
7062762 | Krishnamurthy et al. | Jun 2006 | B2 |
7103873 | Tanner et al. | Sep 2006 | B2 |
7164422 | Wholey et al. | Jan 2007 | B1 |
7174536 | Kothari et al. | Feb 2007 | B1 |
7222182 | Lisitsa et al. | May 2007 | B2 |
7231632 | Harper | Jun 2007 | B2 |
7263694 | Clewis et al. | Aug 2007 | B2 |
7290244 | Peck et al. | Oct 2007 | B2 |
7334216 | Molina-Moreno et al. | Feb 2008 | B2 |
7409676 | Agarwal et al. | Aug 2008 | B2 |
7426721 | Saulpaugh et al. | Sep 2008 | B1 |
7466810 | Quon et al. | Dec 2008 | B1 |
7472379 | Chessell et al. | Dec 2008 | B2 |
7499906 | Kloppmann et al. | Mar 2009 | B2 |
7536676 | Baker et al. | May 2009 | B2 |
7543284 | Bolton et al. | Jun 2009 | B2 |
7565640 | Shukla et al. | Jul 2009 | B2 |
7614041 | Harper | Nov 2009 | B2 |
7627808 | Blank et al. | Dec 2009 | B2 |
7657436 | Elmore et al. | Feb 2010 | B2 |
7681177 | LeTourneau | Mar 2010 | B2 |
7685566 | Brown, Jr. et al. | Mar 2010 | B2 |
7716167 | Colossi et al. | May 2010 | B2 |
7716199 | Guha | May 2010 | B2 |
7730467 | Hejlsberg et al. | Jun 2010 | B1 |
7756855 | Ismalon | Jul 2010 | B2 |
7769747 | Berg et al. | Aug 2010 | B2 |
7773877 | Yang et al. | Aug 2010 | B2 |
7792836 | Taswell | Sep 2010 | B2 |
7797303 | Roulland et al. | Sep 2010 | B2 |
7809801 | Wang et al. | Oct 2010 | B1 |
7810085 | Shinnar et al. | Oct 2010 | B2 |
7814123 | Nguyen et al. | Oct 2010 | B2 |
7827210 | Meliksetian et al. | Nov 2010 | B2 |
7860863 | Bar-Or et al. | Dec 2010 | B2 |
7861151 | Milic-Frayling et al. | Dec 2010 | B2 |
7877387 | Hangartner | Jan 2011 | B2 |
7882485 | Feblowitz et al. | Feb 2011 | B2 |
7886269 | Williams et al. | Feb 2011 | B2 |
7886273 | Hinchey et al. | Feb 2011 | B2 |
7900201 | Qureshi et al. | Mar 2011 | B1 |
7954090 | Qureshi et al. | May 2011 | B1 |
7958148 | Barnes et al. | Jun 2011 | B2 |
7984417 | Ben-Zvi et al. | Jul 2011 | B2 |
7984423 | Kodosky et al. | Jul 2011 | B2 |
7992134 | Hinchey et al. | Aug 2011 | B2 |
8001527 | Qureshi et al. | Aug 2011 | B1 |
8032522 | Goldstein et al. | Oct 2011 | B2 |
8037036 | Blumenau et al. | Oct 2011 | B2 |
8046737 | Wittenberg et al. | Oct 2011 | B2 |
8078487 | Li et al. | Dec 2011 | B2 |
8078953 | Kunz et al. | Dec 2011 | B2 |
8086598 | Lamb et al. | Dec 2011 | B1 |
8122006 | de Castro Alves et al. | Feb 2012 | B2 |
20020109706 | Lincke et al. | Aug 2002 | A1 |
20040015783 | Lennon et al. | Jan 2004 | A1 |
20040249664 | Broverman et al. | Dec 2004 | A1 |
20050096960 | Plutowski et al. | May 2005 | A1 |
20050097224 | Chen et al. | May 2005 | A1 |
20050125738 | Srivastava et al. | Jun 2005 | A1 |
20050125739 | Thompson et al. | Jun 2005 | A1 |
20050159994 | Huddleston et al. | Jul 2005 | A1 |
20050172306 | Agarwal et al. | Aug 2005 | A1 |
20050177406 | Facciorusso et al. | Aug 2005 | A1 |
20050192870 | Geddes | Sep 2005 | A1 |
20060212836 | Khushraj et al. | Sep 2006 | A1 |
20070033590 | Masuouka et al. | Feb 2007 | A1 |
20070043607 | Howard et al. | Feb 2007 | A1 |
20070112777 | Field et al. | May 2007 | A1 |
20070129953 | Cunningham et al. | Jun 2007 | A1 |
20070136281 | Li et al. | Jun 2007 | A1 |
20070190499 | Baur | Aug 2007 | A1 |
20070204020 | Anderson et al. | Aug 2007 | A1 |
20070208685 | Blumenau | Sep 2007 | A1 |
20070244912 | Kawaguchi | Oct 2007 | A1 |
20070245298 | Grabarnik et al. | Oct 2007 | A1 |
20070250331 | Liu et al. | Oct 2007 | A1 |
20070282746 | Anke et al. | Dec 2007 | A1 |
20080065455 | Sun et al. | Mar 2008 | A1 |
20080086485 | Paper | Apr 2008 | A1 |
20080140778 | Banavar et al. | Jun 2008 | A1 |
20080168529 | Anderson et al. | Jul 2008 | A1 |
20080243484 | Mohri et al. | Oct 2008 | A1 |
20090070165 | Baeuerle et al. | Mar 2009 | A1 |
20090100407 | Bouillet et al. | Apr 2009 | A1 |
20090125366 | Chakraborty et al. | May 2009 | A1 |
20090177957 | Bouillet et al. | Jul 2009 | A1 |
20090192783 | Jurach, Jr. et al. | Jul 2009 | A1 |
20090276753 | Bouillet et al. | Nov 2009 | A1 |
20100293043 | Atreya et al. | Nov 2010 | A1 |
20110078285 | Hawkins et al. | Mar 2011 | A1 |
Entry |
---|
Heinlein, C. “Workflow and Process Synchronization with Interaction Expressions and Graphs”, 2001, IEEE, p. 243-252. |
N. Jain, L. Amini, H. Andrade, R. King, Y. Park, P. Selo and C. Venkatramani, “Design, Implementation, and Evaluation of the Linear Road Benchmark on the Stream Processing Core”, Proceedings of ACM SIGMOD 2006. |
Zhen Liu, Anand Ranganathan and Anton Riabov, “A Planning Approach for Message-Oriented Semantic Web Service Composition”, in AAAI-2007. |
A. Riabov and Z. Liu. Planning for stream processing systems. In AAAI'05, Jul. 2005. |
E. Sirin and B. Parsia. Planning for Semantic Web Services. In Semantic Web Services Workshop at 3rd ISWC, 2004. |
M. Pistore, P. Traverso, P. Bertoli, and A. Marconi. Automated synthesis of composite BPEL4WS web services. In ICWS, 2005. |
A. Riabov and Z. Liu. Scalable planning for distributed stream processing systems. In ICAPS'06, 2006. |
K. Whitehouse, F. Zhao, and J. Liu. Semantic streams: A framework for composable semantic interpretation of sensor data. In EWSN'06, 2006. |
Xie et al., “An additive reliability model for the analysis of modular software failure data”, Oct. 24, 1995, IEEE, p. 188-193. |
Groen et al., “Reliability data collection and analysis system”, Aug. 24, 2004, IEEE, p. 43-48. |
Camilo Rostoker, Alan Wagner, Holger Hoos, “A Parallel Workflow for Real-time Correlation and Clustering of High-Frequency Stock Market Data”, (Mar. 26-30, 2007), Parallel and Distributed Processing Symposium, 2007, IPDPS 2007. IEEE International pp. 1-10 [retrieved from http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4227944&isnumber=4227918]. |
Rana et al., An XML Based Component Model for Generating Scientific Applications and Performing Large Scale Simulations in a Meta-computing Environment, Google 2000, pp. 210-224. |
Santos-Neto et al., Tracking Usage in Collaborative Tagging Communities, Google 2007, pp. 1-8. |
Li et al. Collaborative Tagging Applications and Approaches, IEEE Sep. 2008, pp. 1-8 (14-21). |
D.. Hinchcliffe, “A bumper crop of new mashup platforms”, http://blogs.zdnet.com/Hinchcliffe/?p=111&tag=nl.e622. |
Narayanan, S., and McIlraith. S. 2002, Simulation, verification and automated composition of web services, in WWW'02. |
Traverso, P., and Pistore, M. 2004, Automated composition of semantic web services into executable processes, in ISWC'04. |
Marti Hearst, Design Recommendations for Hierarchical Faceted Search Interfaces, ACM SIGIR Workshop on Faceted Search, Aug. 2006. |
Lyritsis et al, “TAGs; Scalable threshold based algorithms for proximity computation in graphs”, ACM EDBT, pp. 295-306, 2011. |
Riabov et al., Wishful Search: Interactive Composition of Data Mashups, Google 2008, pp. 775-784. |
Habernal et al., Active Tags for Semantic Analysis, Google 2008, pp. 69-76. |
Comito et al, “Selectively based XML query processing in structured peer to peer networks”, ACM IDEAS, pp. 236-244, 2010. |
Jiang et al, “XML RL update language: syntax and semantics”, IEEE, pp. 810-816, 2010. |
Ma et al, “Mining web graphs for recommendations”, IEEE, pp. 1-14, 2011. |
Connor et al, “Key key value stores for efficiently processing graph data in the cloud”, IEEE, pp. 88-93, 2011. |
Baird, R.; Hepner, M.; Jorgenson, N.; Gamble, R., “Automating Preference and Change in Workflows,” Seventh International Conference on Composition-Based Software Systems (Feb. 25-29, 2008), pp. 184-193 [retrieved http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4464023&isnumber=4463987]. |
Pistore, M.; Barbon, F.; Bertoli, P.; Shaparau, D.; Traverso , P., “Planning and Monitoring Web Service Composition” (2004), AIMSA 2004, LNAI 3192, pp. 106-115 [retrieved from http://www.springerlink.com/content/21nucbh4rrjfk8av/fulltext.pdf]. |
Peer, J., “Web Service Composition As AI Planning—A Survey”, (2005) [retrieved from http://decsai.ugr.es/˜faro/CDoctorado/bibliografia/refPlanning4SW/LinkedDocuments/webservice-composition-as-aiplanning-pfwsc.pdf]. |
Hepner, M., “Dynamic Changes to Workflow instances of Web Services,” (2007), University of Tulsa, [retrieved from http:/www.seat.utulsa.edu/papers/Hepner07-Dissertation.pdf]. |
A. Stentz, The Focused D* Algorithm for Real-Time Replanning (IJCAI-1995). |
Akkiraju et al., “SEMAPLAN: Combining Planning with Semantic Matching to Achieve Web Service Composition”, American Association for Artificial Intelligence 2005, pp. 1-8. |
Bohannon et al, “Optimizing view queries to ROLEX to support navigable results trees”, ACM, pp. 1-12, 2002. |
Sheshagiri et al., “A Planner for Composing Services Described in DAML-S”, ACM 2003, pp. 1-5. |
Altinel et al., “Damia—A Data Mashup Fabric for Intranet Applications”, Sep. 28, 2007, pp. 1370-1373. |
Number | Date | Country | |
---|---|---|---|
20090276753 A1 | Nov 2009 | US |