Returning to
In a stream data processing system, incoming data flow continuously from several sources. This data needs to go through several levels of processing, such as selection, filtering, or combining, to generate the expected output. A directed acyclic diagram, referred to as a stream processing diagram, is used in
In the quantitative relationship between the input, output and resource consumption, each processing unit 20 processes data flows from its upstream nodes simultaneously at a given proportion and generates output flows to its downstream nodes at a possibly different proportion.
It is assumed that there are a total of R units of CPU resource available. An exemplary embodiment of the present invention finds optimal or approximate solutions of allocating the resource among all the processing units to maximize the total value of return generated by the system. Distributed solutions capable of adapting to local changes in the consumption and production rates are provided. There are many different metrics to measure the value of return, for example, throughput, loss, and delay. The metric considered in this exemplary embodiment is the weighted sum of the throughputs, where the weights are, for example, based on the importance of the throughputs of the output streams.
It should be understood that these R units of CPU resource may reside on multiple physical servers, in which case the processing units can be assigned to different servers. Thus, this exemplary embodiment involves the “continuous optimization” problem where the central pool of CPU resource can be arbitrarily partitioned. For the “discrete optimization” problem, which includes both process assignment and server resource allocation, one way to solve the problem is to combine the “continuous optimization” solution with a “bin packing” approach by packing the processing units into bins representing physical servers.
An exemplary embodiment of the present invention provides a solution when there is only one final output stream of interest. In other words, ={O} is a singleton, where O is the only sink node. Denote the task graph G=(N, E), where N is the set of all tasks, and E the set of edges representing the information flow in between the tasks. Without loss of generality, node N is denoted to be the last processing node reaching O, since there is exactly one edge leading to each sink node. In this case, an exemplary embodiment of the present invention can provide a simple backward algorithm to solve the problem.
In order for node O to receive an output at rate βOk, node k must produce an output stream at a rate βOk, for which node k requires at least one unit of resource and inputs at rate αik from all its predecessor nodes i.
For each predecessor node i, in order to provide the output to node k at the requested rate αikxk, (with xk=1), the minimum amount of resource required at node i is
Furthermore, node i requires inputs at rate αji from all its predecessor nodes. If node i is the parent of both nodes j and j′, then i needs to meet the input requirements of both j and j′. Hence, the minimum amount of resource required at node i is:
The above procedure is repeated until the top of the diagram is reached, where all nodes are source nodes. The resulting resource requirement at each node is P, and the input rates from all source nodes in I are the minimum requirements in order for sink node O to receive the output at rate βOk.
In the following, the algorithms and methods described above are generalized to address the cases with multiple output nodes, that is, |O|>1. In this setting, there is a decision between generating an output for one stream versus generating an output for another stream, or both. This kind of trade-off is somewhat difficult to evaluate due to the simultaneous flow consumption and output. Nevertheless, first the algorithms to treat certain simpler cases are derived, and then the solution is extended to address the general cases.
More specifically, the case where the task graph G is a tree is now explained. That is, there is a single external input stream at the root node, and all tasks have just outgoing edges. We start with the basic case in the following example in
Consider the basic problem where one wants to allocate a total resource R to the 3 tasks (represented by the 3 nodes) in the binary tree as shown in
The 3-node binary tree can be reduced to an equivalent single node by aggregating the resource consumption of the two leaf nodes and change the parameters accordingly as follows.
Following the above description, the problem in
After merging the leaf nodes into a single leaf, we also have another basic reduction to reduce two nodes in tandem into a single node. We can further aggregate the model in
First consider the mapping from
Finally, we check that the objective values are the same. Then we can map the optimal solution of one problem to a feasible solution to the other problem and maintain the objective values. Therefore, the mapped solution must be optimal for its corresponding problem.
Besides binary trees, the change from the structure of
Since each round of execution of step 1 above decreases the number of links by 1, the complexity is O(|E|).
Optimality can be proved using induction. When there is only one node, it is clear that the algorithm generates the optimal solution. Suppose the algorithm generates the optimal solution for all the trees with less than K nodes. Now, a tree with K+1 nodes is considered. First, the algorithm is used to generate the solution x#. Suppose the optimal solution to the original solution, x* is strictly better than x#. Then contradictions are derived. The method or algorithm set forth above, solves a series of simple problems like
In this section, the general rules that can be applied to aggregate some special sub-diagrams into equivalent supernodes are described. Some general properties of the centralized solutions for the resource allocation problem are first presented.
Assume that the consumption and production rates (αij and βkj) are known constants. Then, the general problem stated above can be formulated as a linear program (LP) problem. First consider a resource allocation policy x=(xj, j∈P), where xj is the amount of resource assigned for processing unit j, j∈P. Node j, with xj being the amount of resource, can then process αijxj amount of flow for each node i∈Ij, and generate βkjxj amount of flow for each node k∈Oj. For such an allocation to be valid, the amount of flow consumed by the downstream nodes cannot be larger than the amount generated by the upstream nodes. This needs to hold for all the edges in the diagram, including the input edges. The overall benefit from the output streams is Σk∈O Σj∈I
One can use general linear program solvers to solve the (LP) problem once the parameters are available. This centralized solution, however, requires global information and needs all the processing nodes to act in coordination. It is not dynamic enough to adapt to the possible changing environment, where certain flow properties and system parameters could vary over time. Distributed algorithms, however, require local information and are more adaptive to the changing environment. Distributed algorithms or methods for this resource allocation problem based on the properties from the global optimal solutions are developed below. For this purpose, some key properties of optimal solutions are derived next.
For the ease of presentation, a partial order of the nodes in a diagram G=(N,ε) is presented. For two nodes i and j in N, we say that i<j if there is a directed path (i=l1,l2, . . . ,lL=j) that leads node i to node j, where (lj,lj+1)∈ε for j=1, . . . ,L−1. It is assumed say that a node is active (resp. idle) in a solution if this node receives nonzero (resp. zero) CPU resource allocation in that solution.
Referring to
In the following relating to
Consider a diagram G=(N,E) that has a series-parallel sub-diagram G1=(N1,E1). Denote G2=(N2,E2):=G\G1. Sub-diagram G1 can be aggregated into an equivalent supernode S using the following procedure.
Perform the initial steps used in solving the single output problem as shown in
Next, a supernode S (see the right plot in
for P∈I1, and produces output at rates
where {circumflex over (x)}S:=Σk∈G
By replacing sub-diagram G1 (in the original diagram G) by supernode S, a new diagram {tilde over (G)}=(Ñ, {tilde over (E)}), where Ñ={S}∪N2, and {tilde over (β)}SP=β1P for all P∈I1 is obtained.
The original linear programming (LP) problem for diagram G is equivalent to the (LP) for the new diagram {tilde over (G)}={S}∪G2. If ({tilde over (x)}S,{tilde over (x)}2) is the optimal solution of the (LP) on the new diagram {tilde over (G)}, then
is optimal for the original (LP) on diagram G.
It suffices to show that from any optimal solution for the (LP) problem on diagram {tilde over (G)}, we can find a feasible solution x for the LP problem on diagram G that uses the same amount of capacity and has the same objective value. In addition, from any optimal solution for (LP) on diagram G, we can find a feasible solution for the (LP) problem on the diagram {tilde over (G)}, that uses the same amount of capacity and has the same objective value.
Similarly, the internal flow constraints on links (N,Q) for all Q∈N can be verified. Therefore x must be feasible to the original (LP) on diagram G.
Therefore, the two approaches must be equivalent.
The following statement then is immediate. Any series-parallel sub-diagram (G1,s,t) in G can therefore be aggregated to an equivalent supernode following the aggregation procedure described above.
Suppose link l=(a,b) is one such that by cutting l, the original diagram G can be partitioned into two disjoint sub-diagrams G1 and G2, with a∈G1 and b∈G2. (G1,G2) is termed a single link (a,b) partition for G, see, for example,
The following shows that if node b is active in the optimal solution, it then must fully utilize all the output flow produced by node a.
Suppose diagram G has a single link (a,b) partition (G1,G2) with a,b∈P. There must be an optimal solution x* for the (LP) on diagram G such that either xb*=0 or αabxb*=βbaxa*.
Consider the special case when the diagram G is of the tree topology. That is, after changing all the directed edges to undirected edges, the resulting undirected diagram is a tree. In this case, any link in ε can produce a single link partition for G. The following statement is then immediate.
For the (LP) where the underlying diagram G is a tree, there must be an optimal solution x* such that for link (i,j) is in ε with i,j∈P, either xj*=0, or αijxj=βjixi. That is, either node j is idle, or else it must fully utilize all the output flow from its parent node i. A procedure that aggregates the sub-diagram G2 into an equivalent supernode S will now be described
1. First formulate (LP) for G2 assuming a is the source node with the maximum input rate available to node b is 1, and there is no capacity constraint. Solve for the optimal solution {circumflex over (x)}2=({circumflex over (x)}j, j∈G2). Suppose it produces benefit V2({circumflex over (x)}2). Denote {circumflex over (x)}S:=Σj∈G
and produces a single output stream at rate
Notice that when the network is of a tree structure, any link can produce a single link partition for G. One can therefore shrink any sub-tree into an equivalent supernode. The backward shrink algorithm described above is just doing this backward recursively.
A desired result is that the backward shrink algorithm for networks of a tree structure must produce an optimal solution to the original problem.
In general, the various aggregation rules presented above can be applied to the general cases of arbitrary network topology, and substantially reduce the size of the original problem for large scale systems. In particular, since the optimal resource allocation for the nodes inside a supernode is determined locally, subject to the total available resource to that supernode, all nodes in a sub-diagram can be considered as a cluster and the role of supernode can be played by an elected local leader. The leader can then represent the whole cluster and negotiate with other leaders. Such distributed and local management is also more robust especially when the system parameters are not static and subject to changes. Therefore, instead of requiring a centralized and global optimization, a distributed way of resource management is enabled, which is discussed in more detail below.
Distributed solutions for the problem in order to adapt to real-time fluctuations in the consumption and production rates and changes in resource consumption requirements are set forth below.
Two heuristics to solve the general problem are described herein. These heuristics are based on the optimal solutions for the tree case and for the single-output case. Experimental results illustrating the effectiveness of the methods are also presented. In the following, these heuristics can be implemented easily in a distributed way. The first heuristic is based on the optimal solution for trees. As assumed earlier, all the nodes have been labeled from 1 to N such that all the edges (i, j) satisfy i<j. This algorithm will start from the bottom of the diagram and move up to the top. At each step, the algorithm examines each node, generates aggregated information based on information from its children, and passes this information up to its parents.
If the original diagram is a tree, it can be shown that the above method obtains the optimal solution. For the general diagram case, experimental results are set forth to demonstrate the quality of this distributed method.
Another heuristic for the general problem with multiple output streams is developed based on the single output method combined with the general gradient descent method. Assume there are multiple output streams, O1, . . . ,Ok. We define a function f(u1, . . . ,uk) to be the best objective value if the solutions are generating flows for the output streams according to the relative proportion given by (u1, . . . ,uk). Finding f(u1, . . . ,uk) is the same as solving a modified problem with a new final sink node Ok+1, and making all the original output flows to flow into this final sink node. The β parameters for all the flows from O1, . . . ,Ok to Ok+1 are all set to be 1. The a proportions at Ok+1 are given by (u1, . . . ,uk) for flows from O1, . . . ,Ok. The β parameter at Ok+1 is w1u1+ . . . +wkuk. The weight factor w at Ok+1 is 1. The equivalence of these two problems can be easily checked. The backtrack method described above can be applied to find the optimal solution for the single output problem, thus, the value of f(u1, . . . ,uk) for any given (u1, . . . ,uk) can be found. The gradient descent method can be applied to find the maximum value for function f(u1, . . . ,uk). The general procedure is given below:
Note that in Heuristics B described above, the gradient method can be replaced by other search techniques such as simulated annealing, Tabu search, genetic algorithms, etc.
Heuristics A described above has the advantage that it can quickly generate high quality solutions for simple diagram topologies. When the diagram is complex, however, the quality may degrade. Heuristics B is provided to handle more effectively complex diagram structures. Experimental results to compare the performance of these two heuristics and the linear programming (LP) solution are set forth below.
The setting of the experiment is as follows. First, directed acyclic graphs with N nodes are generated randomly using the following 4 steps:
1). Randomly generate N points xi,yi in the unit square [0, 1]×[0, 1];
2). For i=1, . . . ,N, generate its successor set Si:={j:xj≧xi,yj≧yi};
3) For i=1, . . . ,N, generate its immediate successor set si:=Si−∪k∈s
4) For i=1, . . . ,N, create a link from i to j if j∈Si.
This method is inspired by a scheme to generate random partial orders among N elements. Once the diagram is generated, the parameters α, β, w are then generated from independent uniform random samples. Diagrams with 20, 50, and 100 nodes are randomly generated. For each fixed number of nodes, 1000 instances of the problem with random topology and random parameter values are generated. The two above-described heuristics to obtain the corresponding objective values are applied. The LP solution using an LP solver is also obtained. The characteristics of the random diagrams are collected, as well as the quality of the two heuristics. Because the problem is a maximization problem, the quality of the heuristics is reflected by the achieved percentage of the optimal LP solution.
It has been observed that Heuristics B is consistently better than Heuristics A. It is also important that the average number of iterations is small. This means Heuristics B does not require too much additional time to compute compared with Heuristics A. It is encouraging to find that Heuristics B consistently generates quality solutions and, more importantly, its effectiveness can be improved through the use of more sophisticated search methods. It will be noted that the distributed nature and the efficiency of the methods are important, thus Heuristics B seems to be a preferable solution.
The ways to implement the resource allocation heuristics A and B in a distributed fashion are described below.
It is assumed in using the resource management model that initial, possibly rough, estimates of the consumption and production rates are available. Based on these estimates, the corresponding LP problem is solved and the processes are assigned to various physical servers with the appropriate CPU resource allocation. The CPU resource allocations can be adjusted/modified by individual processing units locally, and distributedly, through requests of resource release or resource increase submitted to the resource manager. The resource manager grants such requests while ensuring that the total resource usage for the system does not exceed its allocated capacity. In case a resource increase request cannot be satisfied locally, within the same physical server, the corresponding process can be migrated to another server through the help of the local resource manager. Due to its overhead, such process migration is to be avoided unless there is a significant gain in total return of the output streams. In the descriptions below, it is assumed that this rule is followed when resource allocation decisions are made, so that this issue will not be addressed explicitly.
The distributed versions of Heuristics A and B, both using measurement based distributed implementations are briefly described below.
In Heuristics B, Step 1) calls the method used to solve the single output problem. That method can be implemented in a distributed way, analogous to the distributed implementation of Heuristics A. Such an implementation is referred to as a Distributed Algorithm or Method.
As mentioned hereinabove, the stream processing environment (for example, flow properties and system parameters) can be dynamically changing. Furthermore, the stream processing systems can have non-stationary behaviors: the input streams can be non-stationary in the traffic patterns and/or in contents; and the stream consumption and production can be non-stationary due to the non-stationary contents. Therefore, the resource allocation solutions should adapt to the changes.
The distributed implementations of the Heuristics described above can easily handle such changes by local adaptation. In fact, each node will constantly (with, say fixed sampling frequency) measure the input rates, the consumption rates and the production rates.
If significant deviation occurs, which can be detected with some change-point detection techniques, the node will send the updated rate information to the parents. These parents will in turn forward the updated, aggregated parameters to their parents up to the top node which will readjust the resource allocation decisions and propagate downwards the resource allocation readjustments. To avoid abrupt changes in the resource allocation decisions, a smoothing step is added in both distributed methods, such that the actual resource allocated for each node is the moving average of its previous allocation and the current new assignment with a preselected smoothing factor.
It will be seen from Heuristic A that when a changing event occurs, it takes twice the depth of the diagram for the system to adjust to the new optimal solution. More specifically, a random tree structure with 355 nodes is generated where the depth equals 7. At time 0, a random subset of nodes are selected and the corresponding parameters are perturbed by ±20%. As time, it is assumed a unit time is the propagation delay for one hop, thus t time units means t hops away, proceeds, the performance of the perturbed system gradually converges and actually reaches the new optimal at time t=14 which is twice the depth of the tree. Here the smoothing factor γ=0.5. An average time that it takes for the perturbed system to reach 95% of the new optimal over 1000 random instances of the problem with 50, 100, 500 and 1000 nodes respectively, where the depth is fixed to be 7, varies from 4.1 to 7.1 seconds.
An exemplary embodiment of the present invention solves the CPU resource allocation problem in stream processing systems with the objective of maximizing the total return of multiple output streams. Structural properties of the optimal solution for the problem are described under different network topologies, and develop efficient, yet simple to implement methods, to solve them. Detailed performance analysis on optimality and complexity of those methods are also provided. Instead of requiring a centralized and global optimization, the present methods enable a distributed mechanism for resource management. Some general rules are presented that can aggregate all information of certain special sub-diagrams into equivalent supernodes. Although only specified aggregation rules for special types of network topology are described, the concept of supernodes can be extended to more general cases, where the analysis, however, will be much more involved.
Two distributed solutions to the general problem were presented above, and the corresponding measurement-based distributed implementations are given. Experimental results show that the methods described above are highly robust and capable of quickly adapting to real-time fluctuations in the consumption and production rates and changes in resource consumption requirements, while achieving high quality solutions even in non-stationary systems. It is also possible to extend the problem formulation to other variations, for example, by taking into account loss of dataflows, or by restricting to the case requiring no information loss.
In the following, the principles described above are presented in slightly different form using expanded diagrams.
As an example of this system, it is assumed that all ci are 1 and C=16. To generate a 1 unit output, x5=1, x4=0.5, x3=0.25, x2=⅛, and x1=⅛. Scaling so that the sum is 16, we get x1=1, x2=1, x3=2, x4=4, and x5=8.
In
A resource budget is allocated to any node without predecessors in step S38. Any node, upon receiving a resource budget, applies the basic rules for the elementary tree to compute the resource budgets for itself and for its children, as shown in step S40. The node then sends the resource budget information to its children, as shown in step S42.
Finally, for each node the actual resource that is allocated is actually the moving average of its previous allocation and the currently allocated new assignment, as shown in step S44.
It will be understood, of course, that the foregoing is presented by way of example only and is not intended to limit the present invention, which is defined by the appended claims.
This invention was made with Government support under Contract No.: H98230-04-3-0001 awarded by the U.S. Dept. of Defense. The Government has certain rights in this invention.