Recent trends have led to a modularization and distribution of business applications and devices, which increases the importance of applications and business processes. Due to modularization, the resulting integration scenarios or programs—essentially a composition of integration patterns (i.e., best practices in the form of integration operators)—becomes increasingly complex. While existing integration platform vendors may allow for the development of integration programs, typically they do not offer a formally verifiable foundation. As a result, the following types of problems may be common:
Together these problems not only lead to high costs for the development and maintenance of such solutions (and, in case of errors, in productive business processes), but also cause frustration and a lack of trust in those solutions. Since an application is central to most current IT solutions, a trustworthy application integration may be desired. An application integration would be trustworthy, for example, if the integration solutions were intentional and formally verifiable.
It would therefore be desirable to provide trustworthy application integration in a secure, automatic, and efficient manner.
According to some embodiments, methods and systems may be associated with trustworthy application integration. A formalization platform may facilitate definition of pattern requirements by an integration developer. The formalization platform may also formalize single pattern compositions and compose single patterns to template-based formalized compositions. A correctness platform may then check for structural correctness of the formalized compositions composed by the formalization platform and execute a semantic transformation or binding to pattern characteristics and associated interactions. The correctness platform may also check composition semantics and generate a formal model. An implementation platform may translate the formal model generated by the correctness platform and configure implementation parameters of the translated formal model. The implementation platform may then execute the translated formal model in accordance with the configured implementation parameters.
Some embodiments comprise: means for facilitating, by a computer processor of a formalization platform, definition of pattern requirements by an integration developer; means for formalizing, by the formalization platform, single pattern compositions; means for composing, by the formalization platform, single patterns to template-based formalized compositions; means for checking, by a correctness platform, for structural correctness of the formalized compositions composed by the formalization platform; means for executing, by the correctness platform, a semantic transformation or binding to pattern characteristics and associated interactions; means for checking, by the correctness platform, composition semantics and generate a formal model; means for translating, by an implementation platform, the formal model generated by the correctness platform; means for configuring, by the implementation platform, parameters of the translated formal model; and means for executing, by the implementation platform, the translated formal model in accordance with the configured implementation parameters.
Some technical advantages of some embodiments disclosed herein are improved systems and methods to provide trustworthy application integration in a secure, automatic, and efficient manner.
In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of embodiments. However, it will be understood by those of ordinary skill in the art that the embodiments may be practiced without these specific details. In other instances, well-known methods, procedures, components and circuits have not been described in detail so as not to obscure the embodiments.
One or more specific embodiments of the present invention will be described below. In an effort to provide a concise description of these embodiments, all features of an actual implementation may not be described in the specification. It should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another. Moreover, it should be appreciated that such a development effort might be complex and time consuming, but would nevertheless be a routine undertaking of design, fabrication, and manufacture for those of ordinary skill having the benefit of this disclosure.
Embodiments described herein may provide a formal foundation for trustworthy application integration. Note that the integration patterns may be assumed to have been responsibly developed (i.e., their execution semantics are formally well-defined and can be verified). Embodiments described herein may include the following contributions:
As used herein, devices, including those associated with the system 200 and any other device described herein, may exchange information via any communication network which may be one or more of a Local Area Network (“LAN”), a Metropolitan Area Network (“MAN”), a Wide Area Network (“WAN”), a proprietary network, a Public Switched Telephone Network (“PSTN”), a Wireless Application Protocol (“WAP”) network, a Bluetooth network, a wireless LAN network, and/or an Internet Protocol (“IP”) network such as the Internet, an intranet, or an extranet. Note that any devices described herein may communicate via one or more such communication networks.
The platforms 210, 220, 230, including those associated with the system 200 and any other device described herein, may store information into and/or retrieve information from various data stores (e.g., a data storage device), which may be locally stored or reside remote from the platforms 210, 220, 230. Although a single formalization platform 210, correctness platform 220, and implementation platform 230 are shown in
An operator or administrator may access the system 200 via a remote device (e.g., a Personal Computer (“PC”), tablet, or smartphone) to view information about and/or manage operational information in accordance with any of the embodiments described herein. In some cases, an interactive graphical user interface display may let an operator or administrator define and/or adjust certain parameters (e.g., to define how optimization rules are applied) and/or provide or receive automatically generated recommendations or results from the system 200.
At S310, a computer processor of a formalization platform may facilitate definition of pattern requirements by an integration developer. At S320, the formalization platform may formalize single pattern compositions. At S330, the formalization platform may compose single patterns to template-based formalized compositions.
At S340, a correctness platform may check for structural correctness of the formalized compositions composed by the formalization platform. At S350, the correctness platform may execute a semantic transformation or binding to pattern characteristics and associated interactions. At S360, the correctness platform may check composition semantics and generate a formal model.
At S370, an implementation platform may translate the formal model generated by the correctness platform. At S380, the implementation platform may configure parameters of the translated formal model. The implementation platform may then execute the translated formal model in accordance with the configured implementation parameters at S390.
The analysis further results to a description of the execution semantics 410 of each pattern ps1, . . . , psi. The different conceptual aspects of these semantics 410 are grouped and formulated as requirements 420, each denoting a distinct concept relevant to more than one pattern 430, and thus summarizing all aspects of the domain. For example, in application integration, patterns 430 require control and data flow, transacted resources, time, etc.
Enterprise Application Integration (“EAI”) is the centerpiece of current on-premise, cloud, and device integration scenarios. Some embodiments described herein provide optimization strategies that may help reduce model complexity and improve process execution using design time techniques. In order to achieve this, embodiments may formalize compositions of Enterprise Integration Patterns (“EIPs”) based on their characteristics and provide optimization strategies using graph rewriting.
Note that EAI may compose EIPs from a catalog comprising the original patterns and recent additions. This can result in complex models that are often vendor-specific, informal and ad-hoc; optimizing such integration processes is desirable, but hard. In most cases, this is further complicated by data aspects being absent in the model. As a concrete motivation for a formal framework for data-aware integration process optimization, consider the following example: many organizations have started to connect their on-premise applications, such as Customer Relationship Management (“CRM”) systems, with cloud applications (such as cloud for customer available from SAP®) using integration processes. For example,
Embodiments described herein may develop a verification and static analysis framework for applying and reasoning about optimizations of data-aware integration patterns. Moreover, a graph-based representation of integration patterns may let optimizations be realized as graph rewriting rules. As used herein, the term “optimization” may be associated with a process that iteratively improves compositions but gives no guarantee of optimality. Due to brevity, embodiments may focus on common EAI optimization objectives: message throughput (runtime benchmarks), pattern processing latency (abstract cost model), and also runtime independent model complexity from the process modeling domain. Furthermore, embodiments may concentrate on pattern compositions within one integration process (not to or within message endpoints).
Note that some optimization techniques may have a main goal of reducing model complexity (i.e., number of patterns) or “process simplification.” The cost reduction of these techniques can be measured by pattern processing time (latency, i.e., time required per operation) and model complexity metrics. Process simplification can be achieved by removing redundant patterns like Redundant Subprocess Removal (e.g., remove one of two identical sub-flows), Combine Sibling Patterns (e.g., remove one of two identical patterns), or Unnecessary Conditional Fork (e.g., remove redundant branching). The simplifications may require a formalization of patterns as a control graph structure (pre-Requisite R1), which helps to identify and deal with the structural change representation.
In addition, a reduction of data can be facilitated by pattern push-down optimizations of message-element-cardinality-reducing patterns, such as Early-Filter (for data; e.g., remove elements from the message content), Early-Mapping (e.g., apply message transformations), as well as message-reducing optimization patterns like Early-Filter (for messages; e.g., remove messages), Early-Aggregation (e.g., combine multiple messages to fewer ones), Early-Claim Check (e.g., store content and claim later without passing it through the pipeline), and Early-Split (e.g., cut one large message into several smaller ones). Measuring data reduction requires a cost model based on the characteristics of the patterns, as well as the data and element cardinalities. For example, the practical realizations for multimedia and hardware streaming show improvements especially for early-filter, split and aggregation, as well as moderate improvements for early mapping. This requires a formalization that is able to represent data or element flow (R2). Data reduction optimizations target message throughput improvements (i.e., processed messages per time unit), however, some may have a negative impact on the model complexity.
Still further, a parallelization of processes can be facilitated through transformations such as Sequence to Parallel (e.g., duplicate pattern or sequence of pattern processing), or, if not beneficial, reverted, e.g., by Merge Parallel. For example, good practical results have been shown for vectorization and hardware parallelization. Therefore, again, a control graph structure (R1) may be required. Although a main focus of parallelization is message throughput, heterogeneous variants may also improve latency. In both cases, parallelization may require additional patterns, which negatively impacts the model complexity. The opposite optimization of merging parallel processes mainly improves the model complexity and latency.
Besides control flow (as used in most of the related domains), a suitable formalization may be able to represent the control graph structure (R1) (including reachability and connectedness properties) and the data element flow (R2) between patterns (not within a pattern). Furthermore, the formalization must allow verification of correctness (R3) on a pattern-compositional level (i.e., each optimization produces a correct pattern composition), taking the inter-pattern data exchange semantics into account. In contrast to other approaches, embodiments described herein define a novel data-aspect representation of the extended EIPs and guarantee correctness.
Summarizing the requirements R1 through R3, a suitable formalization of integration patterns is graph based, can represent the data element flow, and allows correctness checking. Hence, embodiments may utilize an Integration Pattern Typed Graph (“IPTG”) as an extended control flow graph. First fix some notation: a directed graph is given by a set of nodes P and a set of edges E⊆P×P. Fora node p∈P, write ·p={p′∈P|(p′,p)∈E} for the set of direct predecessors of p, and p·={p″∈P|(p,p″)∈E} for the set of direct successors of p.
An IPTG is a directed graph with set of nodes P and set of edges E⊆P×P, together with a function type: P→T, where T={start, end, message processor, fork, structural join, condition, merge, external all}. An IPTG (P, E, type) is correct if:
In the definition, think of P as a set of extended EIPs that are connected by message channels in E, as in a pipes and filter architecture. The function type records what type of pattern each node represents. The first correctness condition says that an integration pattern has at least one source and one target, while the next three states the cardinality of the involved patterns coincide with the in-and-out-degrees of the nodes in the graph representing them. The last condition states that the graph represents one integration pattern, not multiple unrelated ones, and that messages do not loop back to previous patterns.
To represent the data flow, i.e., the basis for the optimizations, the control flow has to be enhanced with (a) the data that is processed by each pattern, and (b) the data exchanged between the patterns in the composition. The data processed by each pattern (a) is described as a set of “pattern characteristics,” formally defined as follows. A pattern characteristic assignment for an IPTG (P, E, type) is a function char: P→2PC, assigning to each pattern a subset of the set
PC=({MC}××
)∪({ACC}×{ro,rw})∪({MG}×
)∪({CND}×2BExp)∪({PRG}×Exp×(
≥0×(
≥0∪{∞}))),
Where is the set of Booleans, BExp the set of Boolean expressions, Exp the set of program expressions, and MC, CHG, MG, CND, PRG some distinct symbols. The property and value domains in the pattern characteristic definition are based on the pattern descriptions, and could be extended if required. Consider a characteristic (MC, n, k) that represents a message cardinality of n:k, (ACC, x) the message access, depending on if x is read-only (“ro”) or read-write (“rw”), and a characteristic (MG, y) that represents whether the pattern is message generating depending on the Boolean y. Finally, (CND, {c1, . . . , cn}) represents the conditions c1, . . . , cn used by the pattern to route messages, (PRG, (p, (v, v′))) the program used by the pattern for message translations, together with its timing window.
For example, the characteristics of a content-based router CBR may be char(CBR)={(MC, 1:1), (ACC, ro), (MG, false), (CND, {cnd1, . . . , cndn-1})}, because of the workflow of the router: it receives exactly one message, then evaluates up to n−1 routing conditions cnd1 up to cndn-1 (one for each outgoing channel), until a condition matches. The original message is then rerouted read-only (in other words, the router is not message generating) on the selected output channel, or forwarded to the default channel, if no condition matches. The data exchange between the patterns (b) is based on input and output contracts (similar to data parallelization contracts). These contracts specify how the data is exchanged in terms of required message properties of a pattern during the data exchange, formally defined as a “pattern contract.” A pattern contract assignment for an IPTG (P,E, type) is a function contr:P→CPT×2EL, assigning to each pattern a function of type
CPT={signed,encrypted,encoded}→{yes,no,any}
and a subset of the set
EL=MS×2D
where MS={HDR, PL, ATTCH}, and D is a set of data elements (the concrete elements of D are not important, and will vary with the application domain). The function of type CPT may be represented by its graph, leaving out the attributes that are sent to any, when convenient.
Each pattern will have an inbound and an outbound pattern contract, describing the format of the data it is able to receive and send respectively—the role of pattern contracts is to make sure that adjacent inbound and outbound contracts match. The set CPT in a contract represents integration concepts, while the set EL represents data elements and the structure of the message: its headers (HDR, II), its payload (PL, Y) and its attachments (ATTCH, A)
For example, a content-based router is not able to process encrypted messages. Recall that its pattern characteristics included a collection of routing conditions: these might require read-only access to message elements such as certain headers h1 or payload elements e1, e2. Hence, the input contract for a router mentioning these message elements is:
inContr(CBR)=({(encrypted,no)},{(HDR,{h1}),(PL,{e1,e2})}).
Since the router forwards the original message, the output contract is the same as the input contract.
Let (C, E)∈2CPT×2EL be a pattern contract, and X⊆CPT×2EL a set of pattern contracts. Write XCPT={C′|(∃E′)(C′, E′)∈X} and XEL={E′|(∃C′)(C′, E′)∈X}. Consider that (C, E) matches X, in symbols match ((C, E), X), if the following condition holds: (═(p, x)∈C)(x=any ∨(∀C′∈XCPT)(∃(p′, y)∈C′)(p=p′∧(y=any∨y=x)∧(∀(m,
Consider if an inbound contract Kin matches the outbound contracts K1, . . . , Kn of its predecessors. This is the case if (i) for all integration concepts that are important to Kin, all contracts Ki either agree, or at least one of Kin or Ki accepts any value (concept correctness); and (ii) together, K1, . . . , Kn supply all the message elements that Kin needs (data element correctness).
Since pattern contracts can refer to arbitrary message elements, a formalization of an integration pattern can be quite precise. On the other hand, unless care is taken, the formalization can easily become specific to a particular pattern composition. In practice, it is often possible to restrict attention to a small number of important message elements, which makes the formalization manageable. Putting everything together, pattern compositions may be formalized as IPCG with pattern characteristics and inbound and outbound pattern contracts for each pattern. In particular, an IPCG is a tuple (P, E, type, char, inContr, {outContr}) where (P, E, type) is an IPCG, char: P→2PC is a pattern characteristics assignment, and inContr:Πp∈P(2CPT×2EL)|·p| and inContr: Πp∈P(2CPT×2EL)|·P·| are pattern contract assignments—one for each incoming and outgoing edge of the pattern, respectively—called the inbound and outbound contract assignment respectively. It is correct, if the underlying IPTG (P, E, type) is correct, and inbound contracts matches the outbound contracts of the patterns' predecessors, i.e. (∀p)(p=start∨match(inContr(p),{outContr(p′)|p′∈·p}). Note that two IPCGs are isomorphic if there is a bijective function between their patterns that preserves edges, types, characteristics, and contracts.
One improvement of this composition is depicted in
Some embodiments may be associated with semantics using timed db-nets, timed db-nets with boundaries, etc. Consider a progression from pattern contract graphs to timed db-nets with b boundaries. Initially, a pattern composition may undergo graph definition to create a pattern contract graph. The pattern contract graph may then undergo a translation to create a time db-net with boundaries (in accordance with a timed db-net processed for boundary definitions).
As used herein, a “timed db-net” is a db-net , which is a
-typed control layer
, i.e., a tuple (P, T, Fin, Fout, color, query, guard, action), where: (i) P=Pc
Pv is a finite set of places partitioned into control places Pc and view places Pv, (ii) T is a finite set of transitions, (iii) Fm is an input flow from P to T, a normal output flow Fout and a roll-back flow (iv) Fout and Frb are respectively an output and roll-back flow from T to Pc (v) color is a color assignment over P (mapping to a Cartesian product of data types), (vi) query is a query assignment from Pv to Q (mapping the results of Q as tokens of Pv), (vii) guard is a transition guard assignment over T (mapping each transition to a formula over its input inscriptions), and (viii) action is an action assignment from T to A (mapping some transitions to actions triggering updates over the persistence layer).
The LTS of a timed =
S,s0,→
, with the (possibly infinite) set of
-snapshots S, the transition relation→⊆S×T×S over states, labeled by transitions T, and both S and → are defined by simultaneous induction as the smallest sets s.t.: (i) s0∈S; (ii) given a
-snapshot s∈S, for every transition t∈T, binding σ, and
-snapshot s′, if s[t, σ
s′ then s′∈S and
Moreover, a -snapshot
I, m
is a database instance I and corresponding marking m in the net. A timed db-net has a special marking (·, 0.0), with an age of value 0.0 appended.
A “timed db-net” may be defined as a tuple (, τ) with (
) a db-net, and τ:T→
≥0×(
≥0∪{∞}), where T is the set of transitions from the control layer
, and
≥0 the set of non-negative rational numbers. For a transition t, let τ(t)=(v1, v2); v1≤v2 (with v1≤∞ always) may then be required.
The function τ in the definition is called a time assignment function. The default choice for τ is to map transitions to the pair (0, ∞), which correspond to a standard db-net transition.
Given a transition t, adopt the following graphical conventions: (i) if τ(t)=(0, ∞), then no temporal label is shown for t; (ii) if τ(t) is of the form (v, v), attach label “[v]” to t; (iii) if τ(t) is of the form (v1, v2) with v1≠v2, attach label “[v1, v2]” to t.
A “db-net” may be defined as a tuple (), where:
Some timed db-nets may be considered “open,” in the sense that they have “ports” or “open boundaries” for communicating with the outside world: tokens can be received and sent on these ports. Thus, a boundary configuration may be defined that records what is expected from the external world for the net to be functioning. This is not the most general notion of open db-nets, but it is general enough for these purposes.
A “boundary configuration” may be defined as follows. Let be a type domain and
=(Q, A) a data layer respectively over it. A boundary configuration over (
,
) is an ordered finite list of colors:
c∈{
1× . . . ×m|Di∈
}
Such a list may be written as c1⊗ . . . ⊗cn, and I for the empty list.
The length of a boundary configuration list may give the number of “open ports” of the boundary. Each color c in the list may describe the type of data to be sent/received on the port. An open timed db-net has a left and a right boundary, both described by boundary configurations.
A “timed db-net with boundaries” may then be defined as follows. Let , and
be a type domain, a persistence layer and a data layer respectively, and let ⊗i<mci and ⊗i<nc′ be boundaries over
, L. A control layer with left boundary ⊗i<mci and right boundary ⊗i<nc′i is a tuple (P, T, Fin, Fout, Frb, color, query, guard, action) which is a control layer over
, except that Fin is a flow from P
{1, . . . , m} to T, and Fout and Frb are flows from T to Pc
{1, . . . , n}, i.e., (i) P=Pc
Pv is a finite set of places partitioned into control places Pc and view places Pv, (ii) T is a finite set of transitions, (iii) Fin is an input flow from P
{1, . . . , m} to T (assume color(i)=ci), (iv) Fout and Frb are respectively an output and roll-back flow from T to P
{1, . . . , n} (assume color(j)=c′j), (v) color is a color assignment over P (mapping P to a Cartesian product of data types), (vi) query is a query assignment from Pv to Q (mapping the results of Q as tokens of Pv), (vii) guard is a transition guard assignment over T (mapping each transition to a formula over its input inscriptions), and (viii) action is an action assignment from T to A (mapping some transitions to actions triggering updates over the persistence layer). One may write (
, τ):⊗i<mci→⊗i<nc′i for timed db-nets with control layers with the given boundaries, and call such a tuple a “timed db-net with boundaries.”
Note in particular that a timed db-net with empty boundaries is by definition a timed db-net. One can extend the ⊗ operation on colors to nets, by defining N⊗N′: {right arrow over (c)}⊗{right arrow over (c′)}→{right arrow over (d)}⊗{right arrow over (d′)}, for N:{right arrow over (c)}→{right arrow over (d)} and N:{right arrow over (c′)}→{right arrow over (d′)} to be the two nets N and N′ next to each other this gives a tensor product or “parallel” composition of nets. The point of being explicit about the boundaries of nets is to enable also a “sequential” composition of nets, whenever the boundaries are compatible. The following definition uses the notation XY for the disjoint union of X and Y, with injections inX:X→X
Y and inY:Y→X
Y. For ƒ:X→Z and g:Y→Z, write [f, g]:X
Y→X′ for the function with [ƒ, g](inx(x))=ƒ(x) and [ƒ, g]inY(y))=g(y).
The term “synchronization” may then be defined as follows. Let (, τ):⊗i<mci→⊗i<nc′i and (
, τ′):⊗i<nc′i→⊗i<kc″i. Their composition may be defined as:
(∪
′,
∪
′,
∪
″,
″, τ″):⊗i<mci→⊗i<kci
(where a union of tuples is pointwise) with:
The composed net consists of the two constituent nets, as well as n new control places x, for communicating between the nets. These places take their color from the shared boundary. Some embodiments may only use nets with boundaries to plug together open nets. In other words, only consider the execution semantics of nets with no boundary, and since these are literally ordinary timed db-nets, they can inherit their execution semantics from those. Composition of nets behaves as expected: it is associative, and there is an “identity” net which is a unit for composition. All in all, this means that nets with boundaries are the morphisms of a strict monoidal category as follows.
For any timed db-nets N, M, K with compatible boundaries, there may be N (M∘K)=(N∘M)∘K, and for each boundary configuration c1⊗ . . . ⊗cn, there is an identity net idc:c1⊗ . . . ⊗cn→c1⊗ . . . ⊗cn such that idc∘N=N and idc∘M=M for every M, N with compatible boundaries. Furthermore, for every N, M, K, there may be N⊗(M⊗K)=(N⊗M)⊗K. As proof of such a statement, note that that associativity for both ∘ and ⊗ is obvious. The identity net for c1⊗ . . . ⊗cn is the net with exactly n places x1, . . . , xn, with color(x)=ci. As a result, one can use a graphical “string diagram” language to define nets and their compositions.
Consider now the translation of IPCGs to timed db-nets with boundaries. First assign a timed db net with boundaries p
for every node p in an IPCG. Which timed db-net with boundaries is to be constructed depends on type(p). If the cardinality of p is k:m, then the timed db net with boundaries will be of the form
p
:⊗i=1k colorin(p)i→⊗j=1m Colorout(p)j where the colors colorin(p)i and colorout(p)j are defined depending on the input and output contracts of the pattern respectively:
This incorporates the data elements of the input and output contracts into the boundary of the timed db-net, since these are essential for the dataflow of the net. Note that one may also incorporate the remaining concepts from the contracts such as signatures, encryption, and encodings into the translation. More precisely, the concrete characteristics required for the translation of type type(p) and pattern category p from the IPTG definition:
Since the pattern categories subsume all relevant patterns, one may subsequently define the translation for each pattern category and specify timed db-net with boundary templates for each of the pattern categories.
pstart
:I→colorout(pstart) as shown by start elements 1010 in
pend
:colorin(pend)→I.
pfork
:colorin(pfork)→⊗j=1ncolorout(pfork)j as shown in
pjoin
:⊗j=1mcolorin(pjoin)j→colorout(pjoin) as shown in
pcfork
: colorin(pcfork)→⊗j=nncolorout(pcfork)j as shown in
pmp
:colorin(pmp)→colorout(pmp) as shown in
pmerge
:colorin(pmerge)→colorout(pmerge) as shown in
pcall
:colorin(pcall)→colorout(pcall)1⊗colorout(pcall)2) as shown in
Now consider how to translate not just individual nodes from an IPCG, but how to also take the edges into account. Recall that a pattern contract also represents concepts, i.e., properties of the exchanged data (e.g., if a pattern is able to process or produce signed, encrypted, or encoded data). A message can only be sent from one pattern to another if their contracts match, i.e., if they agree on these properties. To reflect this in the timed db-nets semantics, enrich all color sets to also keep track of this information: given a place P with color set C, construct the color set C×{yes, no}3, where the color(x, bsign, bencr, benc) is intended to mean that the data value x is respectively signed, encrypted, and encoded (or no)t according to the yes/no values bsign, bencr, and benc. To enforce the contracts, also make sure that every token entering an input place chin is guarded according to the input contract by creating a new place chin′ and a new transition from chin′ to chin, which conditionally copies tokens whose properties match the contract. The new place chin′ replaces chin as an input place. Dually, for each output place chout create a new place chout′ and a new transition from chout to chout′ which ensures that all tokens satisfy the output contract. The new place chout′ replaces chow as an output place. Formally, the construction is as follows:
Let X=(, τ):⊗i<mci→⊗i<nci′ be a timed db-net with boundaries and {right arrow over (C)}=ICo, . . . , ICm-1, OC0, . . . , OCn-1)∈CPT. Define the time db-net with boundaries XCPT({right arrow over (C)})=(
, τ)⊗i<m(ci×{yes, no}3)→⊗i<nci′×{yes, no}3 with
The pattern contract construction can again be realized as template translation on an inter pattern level. The templates in
Consider two examples to gain an understanding of the construction.
A join router may structurally combine many incoming to one outgoing message channel without accessing the data (cf.{(MC, 1:1), (ACC, ro), (MG, false), (CND, Ø)}). While the data format (i.e., the data elements EL) has to be checked during the composition of the boundary, the runtime perspective of the boundary (x, p, q, r) is any for x, p, q in the input and output.
With respect to synchronizing pattern compositions and determining the correctness of a translation, define the full translation of a correct IPCG G. For the translation to be well-defined, only data element correctness of the graph is needed. Concept correctness may be used to show that in the nets in the image of the translation, tokens can always flow from the translation of the start node to the translation of the end node.
Let a correct integration pattern contract graph G be given. For each node p, consider the timed db-net:
The graphical language previously enabled can be used to compose these nets according to the edges of the graph. The resulting timed db-net is then well-defined, and has the option to complete, i.e., from each marking reachable from a marking with a token in some input place, it is possible to reach a marking with a token in an output place. Since the graph is assumed to be correct, all input contracts match the output contracts of the nets composed with it, which by the data element correctness means that the boundary configurations match, so that the result is well-defined.
To see that the constructed net also has the option to complete, first note that the interpretations of basic patterns described herein do (in particular, one transition is always enabled in the translation of a conditional fork pattern in
Embodiments may be used to evaluate the translation in two case studies of real-world integration scenarios: the replicate material scenario and a predictive machine maintenance scenario. The former is an example of hybrid integration, and the latter of device integration. The aim is to observe different aspects of the following hypotheses:
For each of the scenarios, give an IPCG with matching contracts (→H1), translate it to a timed db-net with boundaries, and show how its execution can be simulated (→H2). The scenarios are both taken from a cloud platform integration solution catalog of reference integration scenarios.
An IPCG 800 representing an integration process for the replication of material from an enterprise resource planning or customer relationship management system to a cloud system was illustrated in
With respect to translation to timed db-net with boundaries, first translate each single pattern from
Notably, constructing an IPCG requires less technical knowledge such as particularities of timed db-nets, but still enables correct pattern compositions on an abstract level. While the CPT part of the pattern contracts (e.g., encrypted, signed) could be derived and translated automatically from a scenario in a yet to be defined modeling language, many aspects like their elements EL as well as the configuration of the characteristics by enrichment and mapping programs requires a technical understanding of IPCGs and the underlying scenarios. As such, IPCGs can be considered a suitable intermediate representation of pattern compositions. The user might still prefer a more appealing graphical modeling language on top of IPCGs. Thus, one may conclude: (1) IPCG and timed db-net are correct with respect to composition and execution semantics (→H1, H2); (2) IPCG is more abstract than timed db-net, however, still quite technical; and (3) tool support for automatic construction and translation maybe preferable.
Some embodiments are associated with an IPCG representing a predictive maintenance create notification scenario that connects machines with Enterprise Resource Planning (“ERP”) and a Plant Design Management System (“PDMS”). Add all pattern characteristics and data, which provides sufficient information for the translation to timed db-nets with boundaries.
With respect to translation to timed db-net with boundaries, again translate each single pattern 2300 from
In the second step, refine the timed db-net with boundaries to also take contract concepts into account by the construction in the IPCG definition. This ensures the correctness of the types of data exchanged between patterns, and follows directly from the correctness of the corresponding IPCG. Other contract properties such as encryption, signatures, and encodings are checked through the transition guards.
With respect to optimization strategy realization, graph rewriting provides a visual framework for transforming graphs in a rule-based fashion. A graph rewriting rule is given by two embeddings of graphs LK
R, where L represents the left hand side of the rewrite rule, R the right hand side, and K their intersection (the parts of the graph that should be preserved by the rule). A rewrite rule can be applied to a graph G after a match of L in G has been given as an embedding L
G; this replaces the match of L in G by R. The application of a rule is potentially non-deterministic: several distinct matches can be possible. Visually, a rewrite rule may be represented by a left hand side and a right hand side graph with dashed portions and solid portions: dashed parts are shared and represent K, while the solid parts are to be deleted in the left hand side, and inserted in the right hand side, respectively. For instance,
Formally, the rewritten graph may be constructed using a Double-Pushout (“DPO”) from category theory. DPO rewritings might be used, for example, because rule applications are side-effect free (e.g., no “dangling” edges) and local (i.e., all graph changes are described by the rules). Additionally, Habel and Plump's relabeling DPO extension may be used to facilitate the relabeling of nodes in partially labeled graphs. In
A redundant sub-process optimization may remove redundant copies of the same sub-process within a process. With respect to change primitives, the rewriting is given by the rule 2510 in
Sibling patterns have the same parent node in the pattern graph (e.g., they follow a non-conditional forking pattern) with channel cardinality of 1:1. Combining them means that only one copy of a message is traveling through the graph instead of two for this transformation to be correct in general, the siblings also need to be side-effect free, i.e., no external calls, although this may not be captured by the correctness criteria. With respect to change primitives, the rule 2520 is given in
Consider data reduction optimization strategies, which mainly target improvements of the message throughput (including reducing element cardinalities). These optimizations require that pattern input and output contracts are regularly updated with snapshots of element datasets ELin and ELout from live systems, e.g., from experimental measurements through benchmarks.
An early-filter may be associated with a filter pattern that can be moved to or inserted prior to some of its successors to reduce the data to be processed. The following types of filters may be differentiated:
Both patterns are message processors in the sense of
Early-mapping may be associated with mapping that reduces the number of elements in a message to increase the message throughput. With respect to change primitives, the rule 2620 is given in
Early-aggregation may be associated with a micro-batch processing region which is a subgraph which contains patterns that are able to process multiple messages combined to a multi-message or one message with multiple segments with an increased message throughput. The optimal number of aggregated messages is determined by the highest batch-size for the throughput ratio of the pattern with the lowest throughput, if latency is not considered. With respect to change primitives, the rule 2710 is given in
Consider now an early-claim check optimization. If a subgraph does not contain a pattern with message access, the message payload can be stored intermediately persistently or transiently (depending on the quality of service level) and not moved through the subgraph. For instance, this applies to subgraphs consisting of data independent control flow logic only, or those that operate entirely on the message header (e.g., header routing). With respect to change primitives, the rule 2720 is given in
Now consider an early-split optimization. Messages with many segments can be reduced to several messages with fewer segments, and thereby reducing the processing required per message. A segment is an iterable part of a message like a list of elements. When such a message grows bigger, the message throughput of a set of adjacent patterns might decrease, compared to the expected performance for a single segment. This phenomenon is referred to as a segment bottleneck sub-sequence. Algorithmically, these bottlenecks could be found, e.g., using maxflow-mincut techniques based on workload statistics of a scenario. The splitter (SP) node is a message processor from
According to some embodiments, parallelization optimization strategies may be used to increase message throughput. These optimizations may require experimentally measured message throughput statistics, e.g., from benchmarks. Consider, for example, a sequence to parallel optimization. A bottleneck sub-sequence with channel cardinality 1:1 can also be handled by distributing its input and replicating its logic. The parallelization factor is the average message throughput of the predecessor and successor of the sequence divided by two, which denotes the improvement potential of the bottleneck sub-sequence. The goal is to not overachieve the mean of predecessor and successor throughput with the improvement to avoid iterative-optimization. Hence, the optimization is only executed if the parallel sub-sequence reaches lower throughput than their minimum. With respect to change primitives, the rule 2910 is given in
In a merge parallel, balancing fork and join router realizations may limit the throughput in some runtime systems, so that a parallelization decreases the throughput. This is called a limiting parallelization, and is defined as when a fork or a join has smaller throughput than a pattern in the following sub-sequence. With respect to change primitives, the rule 2920 is given in
A heterogeneous parallelization consists of parallel sub-sequences that are not isomorphic. In general, two subsequent patterns Pi and Pj can be parallelized, if the predecessor pattern of Pi fulfills the input contract of Pj, Pi behaves read-only with respect to the data element set of Pj, and the combined outbound contracts of Pi and Pj fulfill the input contract of the successor pattern of Pj. With respect to change primitives, the rule 3000 is given in
Note that any of the data reduction optimization techniques might also be applied in a “pushdown to endpoint” scenario by extending the placement to the message endpoints, with similar contracts.
Reduce interaction optimization strategies may reduce interactions to target a more resilient behavior of an integration process. For example, when endpoints fail, different exceptional situations have to be handled on the caller side. In addition, this can come with long timeouts, which can block the caller and increase latency. Knowing that an endpoint is unreliable can speed up processing, by immediately falling back to an alternative. With respect to change primitives, the rule 3110 is given in
A reduce requests optimization might be associated with a message limited endpoint, i.e., an endpoint that is not able to handle a high rate of requests, and can get unresponsive or fail. To avoid this, the caller can notice this (e.g., by TCP back-pressure) and react by reducing the number or frequency of requests. This can be done be employing a throttling (or even a sampling pattern) which removes messages. An aggregator can also help to combine messages to multi-messages. With respect to change primitives, the rewriting is given by the rule 3200 in
Note that optimizations might not change the input-output behavior of the pattern graphs in the timed db-nets semantics, i.e., if rule G is rewritten as G′, then G
has the same observable behavior as
G′
. Consider a bisimulation argument, and let
=<S, s0, →> and
=<S′, s0′, →′> be the associated labelled transition systems of two timed db-nets and
′ with the same boundaries. A
-snapshot (I, m) is equivalent to a
′-snapshot (I′, m′), (I, m)≈(I′, m′), if I=I′, and m and m′ agree on output places, i.e., for every output place p with m(p)=(α, γ, age) and m′(p)=(α′, γ′, age′), there are α=α′, for the elements that are in the message α, and those that are not required by any endpoint γ (usually γ=Ø), and age is the timed db-net age information.
Further, say that is equivalent to
,
˜
, if whenever s0→* (I, m) then there is (I′, m′) such that s0′→′*(I′, m′) and (I, m)≈(I′, m′), and whenever s0′→* (I′, m′) then there is (I, m) such that s0→*(I, m) and (I′, m′)≈(I, m). Note that this bisimulation argument neglects the unused fields as well as the age of the tokens. With the following congruence relation for the composition, it can be shown that the correctness of the optimizations.
The relation ˜ is congruence relation with respect to composition of timed db-nets with boundaries, i.e., it is reflexive, symmetric and transitive, and if ˜
with t0 on the shared boundary of B1 and B2, then
With respect to redundant sub-processes, each move on the left hand side either moves tokens into a cloud, out of a cloud, or inside a cloud. In the first two cases, this can be simulated by the right hand side by moving the token through the CE or CBR and CF respectively followed by a move into or out of the cloud, while in the latter case the corresponding token can be moved in SG1′ up to the isomorphism between SG1′ and the cloud on the left. Similarly, a move on the right hand side into or out of the cloud can easily be simulated on the left hand side. Suppose a transition fires in SG1′. Since all guards in SG1′ have been modified to require all messages to come from the same enriched context, the corresponding transition can either be fired in SG1 or SG2.
Regarding combining sibling patterns, suppose the left hand side takes a finite number of steps and ends up with m(P2) tokens in P2 and m(P3) tokens in P3. There are three possibilities: (i) there are tokens of the same color in both P2 and P3; or (ii) there is a token in P2 with no matching token in P3; or (iii) there is a token in P3 with no matching token in P2. For the first case, the right hand side can simulate the situation by emulating the steps of the token ending up in P2, and forking it in the end. For the second case, the right hand side can simulate the situation by emulating the steps of the token ending up in P2, then forking it, but not moving one copy of the token across the boundary layer in the interpretation of the fork pattern. The third case is similar, using that SG2 is isomorphic to SG1. The right hand side can easily be simulated by copying all moves in SG1 into simultaneous moves in SG1 and the isomorphic SG2.
By construction, an early filter optimization may remove the data not used by P2, so if the left hand side moves a token to P2, then the same token can be moved to P2 on the right hand side and vice versa. With early mapping, suppose the left hand side moves a token to P4. The same transitions can then move the corresponding token to P4 on the right hand side, with the same payload, by construction. Similarly, the right hand side can be simulated by the left hand side. For early aggregation optimization, the interpretation of the subgraph SG2 is equivalent to the interpretation of P1 followed by SG2′ followed by P3, by construction, hence the left hand side and the right hand side are equivalent. For early claim check optimization, since the claim check CC+CE simply stores the data and then adds it back to the message in the CE step, both sides can obviously simulate each other. For early split optimization, by assumption, P1 followed by SSQ1 (P1 followed by SSQ1 followed by P2 for the inserted early split) is equivalent to SSQ1 followed by P1, from which the claim immediately follows.
For sequence to parallel optimization, the left hand side can be simulated by the right hand side by copying each move in SSQ1 by a move each in SSC1′ to SSQn′. If the right hand side moves a token to an output place, it must move a token through SSQ1′, and the same moves can move a token through SSQ1 in the left hand side. For merge parallel optimization, when left hand side moves a token to the output place, it must move a token through SSQ1′, and the same moves can move a token through SSQ1 in the right hand side. The right hand side can be simulated by the left hand side by copying each move in SSQ1 by a move each in SSQ1′ to SSQn′. For a heterogeneous sequence to parallel optimization, the right hand side can simulate the left hand side as follows: if the left hand side moves a token to an output place, it must move it through all of SSQ1 to SSQn. The right hand side can make the same moves in the same order. For the other direction, the left hand side can reorder the moves of the right hand side to first do all moves in SSQ1, then in SSQ2 and so on. This is still a valid sequence of steps because of side-effect-freeness.
For the “ignore, try failing endpoints” optimization, suppose the left hand side of
An IPCG may be constructed for various integration scenarios following the workflow 3300 shown in
In this way, embodiments may address an important shortcoming in EAI research, namely the lack of optimization strategies, and the informality of descriptions of pattern compositions and optimizations. Embodiments may develop a formalization of pattern compositions in order to precisely define optimizations. Note that formalization and optimizations may be relevant even for experienced integration experts, with interesting choices, implementation details, and/or trade-offs.
Note that the embodiments described herein may be implemented using any number of different hardware configurations. For example,
The processor 3510 also communicates with a storage device 3530. The storage device 3530 may comprise any appropriate information storage device, including combinations of magnetic storage devices (e.g., a hard disk drive), optical storage devices, mobile telephones, and/or semiconductor memory devices. The storage device 3530 stores a program 3512 and/or an application integration engine 3514 for controlling the processor 3510. The processor 3510 performs instructions of the programs 3512, 3514, and thereby operates in accordance with any of the embodiments described herein. For example, the processor 3510 might be associated with a formalization platform and facilitate definition of pattern requirements by an integration developer (and formalize singles pattern compositions to compose single patterns to template-based formalized compositions). The processor 3510 might also be associated with a correctness platform that checks for structural correctness of the formalized compositions and executes a semantic transformation or binding to pattern characteristics and associated interactions. Such a processor 3510 may also check composition semantics and generate a formal model. In some embodiments, the processor 3510 is associated with an implementation platform that translates the formal model generated by the correctness platform and configures implementation parameters of the translated formal model. In this case, the processor 3510 may then execute the translated formal model in accordance with the configured implementation parameters.
The programs 3512, 3514 may be stored in a compressed, uncompiled and/or encrypted format. The programs 3512, 3514 may furthermore include other program elements, such as an operating system, clipboard application, a database management system, and/or device drivers used by the processor 3510 to interface with peripheral devices.
As used herein, information may be “received” by or “transmitted” to, for example: (i) the platform 3500 from another device; or (ii) a software application or module within the platform 3500 from another software application, module, or any other source.
In some embodiments (such as the one shown in
Referring to
The IPCG identifier 3602 might be a unique alphanumeric label that is associated with a contract graph and/or integration designer in accordance with any of the embodiments described herein. The data specifications 3604 might define source materials used to create the contract graph (e.g., schema, mapping, configuration data, etc.). The runtime benchmarks 3606 might include measurements that might be used to improve contract graphs (e.g., latency, throughput, etc.). The optimizations 3608 might indicate one or more types of strategies that were used to improve the contract graph. The status 3610 might include the current state of the contract graph (e.g., in process, halted, completed on a certain date and/or time, etc.).
Thus, embodiments may provide trustworthy application integration in a secure, automatic, and efficient manner. The following illustrates various additional embodiments of the invention. These do not constitute a definition of all possible embodiments, and those skilled in the art will understand that the present invention is applicable to many other embodiments. Further, although the following embodiments are briefly described for clarity, those skilled in the art will understand how to make any changes, if necessary, to the above-described apparatus and methods to accommodate these and other embodiments and applications.
Although specific hardware and data configurations have been described herein, note that any number of other configurations may be provided in accordance with some embodiments of the present invention (e.g., some of the information associated with the databases described herein may be combined or stored in external systems). Moreover, although some embodiments are focused on particular types of application integrations and microservices, any of the embodiments described herein could be applied to other types of applications. Moreover, the displays shown herein are provided only as examples, and any other type of user interface could be implemented. For example,
Any of the embodiments described herein might incorporate dynamic aspects into the formalization of patterns for a more precise cost semantics. In addition, purely data related techniques (e.g., message indexing, fork path re-ordering, and/or merging of conditions) may be analyzed for their effects. Moreover, multi-objective optimizations and heuristics for graph rewriting on the process level may be implemented in connection with any of these embodiments. Further note that embodiments may utilize other types of optimization, such as pattern placement optimizations (pushing patterns to message endpoints, i.e., sender and receiver applications), optimizations that reduce interactions (helping to stabilize the process), etc.
The present invention has been described in terms of several embodiments solely for the purpose of illustration. Persons skilled in the art will recognize from this description that the invention is not limited to the embodiments described, but may be practiced with modifications and alterations limited only by the spirit and scope of the appended claims.