The described technology is directed to the field of digital resource management.
An application program (“application”) is a computer program that performs a specific task. A distributed application is one that executes on two or more different computer systems more or less simultaneously. Such simultaneous activity is typically coordinated via communications between these computer systems.
One example of a distributed application is an email application, made up of server software executing on a server computer system that sends and receives email messages for users, and client software executing on a user's computer system that communicates with the server software to permit the user to interact with email messages, such as by preparing messages, reading messages, searching among messages, etc.
In some cases, server software executes on server computer systems located in one or more data centers. For users who are part of a distributed organization, a particular user's computer system, or “client,” may be connected to multiple datacenters by a wide-area network (“WAN”), such as one where network edge proxies, or “edge nodes,” that interact with clients are connected to data centers by WAN links.
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This summary is not intended to identify key factors or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
A facility for managing distributed system for delivering online services is described. For each of a plurality of distributed system components of the first type, the facility receives operating statistics for the infrastructure component of the first type. For each of a plurality of distributed system components of a second type, the facility receives operating statistics for the infrastructure component of the second type. The facility uses the received operating statistics for distributed system components of the first and second types to generate a model predicting operating statistics for the distributed system for a future period of time.
The inventors have observed that infrastructures for supporting one or more distributed applications are both complex, and expensive to build and operate. Few effective tools exist for choosing particular infrastructure resources to allocate to particular users or groups of users in a way that best meets users' needs and limits overall costs.
Similarly, few effective tools exist for determining when particular resources in the infrastructure should be expanded—such as by adding a new WAN link—or contracted—such as by reducing the number of servers executing the server component of a particular application—in a way that best serves users while limiting cost.
To overcome these disadvantages, the inventors have conceived and reduced to practice a software and/or hardware facility (“the facility”) for modeling the operation, or “dynamics,” of an infrastructure for supporting one or more distributed applications or other online services. The facility constructs a time-based, stochastic model designed to predict future dynamics of the infrastructure based on past measurements of the infrastructure. The facility uses the model to predict future load, as a basis for allocating particular resources within the infrastructure-such as to particular users or groups of users—and adjusting the overall supply of resources within the infrastructure.
In some embodiments, a controller provided by the facility maintains an up-to-date view of the infrastructure's health and workload, and periodically configures each infrastructure component based upon that view. In some embodiments, this configuration determines the edge nodes used by users, the data centers used by edge nodes, and the WAN paths used by traffic between edge nodes and data centers.
In some embodiments, the model generated by the facility models infrastructure load and performance as a function of time. In some embodiments, the facility generates the model using a stochastic linear program, such as a second-order cone program.
In some embodiments, the facility applies the model to past measurements to obtain predictions that can be compared to later past infrastructure dynamics in order to determine overall prediction error—i.e., capturing the variance of the workload and performing upon it statistical multiplexing. The facility uses this prediction error to adjust predictions produced by the model in advance of their use to adjust allocation, making it highly likely that the facility will prevent congestion without having to overallocate resources.
In some embodiments, the facility implements resource reallocations that it determines in accordance with the model, also referred to as “load migrations,” in a gradual manner. As one example, in some embodiments, the facility reallocates clients to edge nodes only with respect to new network connections initiated by those clients.
Certain terminology used herein is described as follows. A “session” is one or more queries (over possibly multiple TCP connections) from the same user to the same service that follow a DNS lookup; queries after a succeeding lookup belong to a different session. DNS lookup is a useful marker because it allows the facility to direct the user toward the desired edge proxy. In some embodiments, the facility uses a short DNS time-to-live (“TTL”), such as 10 seconds, so that users perform new lookups after pauses in query traffic. A “user group” (“UG”) is a set of users that are expected to have similar relative latencies to particular edge proxies (e.g., because they are proximate in Internet topology). Operating at this granularity enables the facility to learn latencies at group-level rather than user-level, which can be more challenging. In some amounts, the facility defines UGs as clients having the same/24 IP address prefix.
The model predicting workload for an upcoming period is based upon workload information from previous periods. Measurement agents on DNS servers report the arrival rates of new sessions for each UG and each service; report on resource usage and departure rate of sessions; and measurement agents on network switches that face the external world report on non-edge traffic matrix (in bytes/second). Measurement agents on edge proxies capture edge proxy workload in terms of resource(s) that are relevant for allocation (e.g., memory, CPU, traffic). The facility uses exponentially weighted moving average (EWMA) to estimate workload for the next period. The facility also tracks the distribution of estimation errors (i.e., estimated minus actual), which the facility uses in the stochastic model. Also, in some embodiments, health monitoring services at proxy sites and data centers inform the controller how much total infrastructure capacity is lost.
By operating in some or all of the ways described above, the facility is often able to increase the capacity of an infrastructure to handle a greater number of transactions, and/or increase the performance of the infrastructure in handling transactions.
While various embodiments are described in terms of the environment described above, those skilled in the art will appreciate that the facility may be implemented in a variety of other environments including a single, monolithic computer system, as well as various other combinations of computer systems or similar devices connected in various ways.
Those skilled in the art will appreciate that the steps shown in
Additional details of the facility's implementation in various embodiments follow.
The model of temporal load variations generated by the facility in some embodiments is described below. For ease of exposition, assume initially that the infrastructure hosts only service, which is present at all proxies and all DCs, and that there are no restrictions on mapping UGs to proxies and DCs. Also assume that there is only one bottleneck resource (e.g., memory) at proxies and DCs and that the capacity of this resource can be described in terms of the number of active sessions. Extension of the model to remove these assumptions follows below.
Table 1 above summarizes inputs to the model. A user session uses three “servers”—a load balancer b, edge proxy y, datacenter c—and four WAN paths—request and response paths between b and y (as y may be remote) and y and c. Each path is a pre-configured tunnel, i.e., a series of links from the source to destination switch; there can be multiple tunnels between switch pairs. The tuple (b, Pby, y, pyc, c, pcy, y, pyb, b) is referred to as an end-to-end or e2e-path.
Table 2 above summarizes outputs of the model, given current system state and estimated workload in the next time period. These are the fraction of each UG's new sessions that traverse each e2e-path, the fraction of each UG's existing sessions, and the fraction of non-edge traffic that traverse each network path. The computation is based on modeling the impact of output on user performance and the utilization of infrastructure components, as a function of time t relative to the start of the time period (0≦t≦T, where T is time period length).
Server utilization is impacted by existing sessions. There are ng0 old sessions of g from the last time period, and dg is their predicted departure rate. A device to put τg,ψ fraction of these sessions on e2e-path ψ causes the number of sessions to vary as:
∀g,ψ∈Ψg:ng,ψold(t)=(ng0−t*{tilde over (d)}g)τg,ψ (1)
The facility assumes that the departures are uniformly spread over the next time period, and thus ng0≧T*{tilde over (d)}g. The facility's variance handling absorbs any sub-time period variances in departure and arrival rates.
The facility captures impact of session affinity by mandating that the number of sessions for server tuples θ=(b,y,c) do not change when a time period starts, as follows:
For new sessions that arrive in next time period, ãg is the net arrival rate (i.e., arrivals minus departures) of new sessions of g and the fraction of those put on ψ is πg,ψ. The number of new sessions on i vary as:
∀ψ∈Ψg:ng,ψnew(t)=t*ãgπg,ψ (3)
Thus, the total number of sessions from g on ψ is:
∀g,ψ∈Ψ:ng,ψ(t)=ng,ψold(t)+ng,ψnew (4)
and the utilization of a server is:
Edge traffic load ũg (ũgr) is the predicted request (response) traffic increase rate from new sessions, and {tilde over (v)}g ({tilde over (v)}gr) is the predicted request (response) traffic decrease rate of old sessions. qg0(rg0) is the total request (response) traffic of g at t=0. These traffic rates are net effect of relevant sessions; individual sessions will have variability (e.g., whether the content is cached). The request traffic from g on e2e-path ψ varies as:
∀g,ψ∈Ψg:qg,ψ(t)=t*ũgπg,ψ+(qg0−t*{tilde over (v)}g)τg,ψ (6)
where πg,ψ and τg,ψ are weights for new and old sessions. Equation (6) above assumes for request and response traffic between load balancers and proxies is same as that between proxies and DCs. In cases where that is not true, however, the equation is accordingly modified.
For a link l, the total request traffic varies as:
Similarly, for response traffic:
For non-edge traffic load, Ts,d0 is the predicted traffic from ingress switch s to egress switch d at t=0 and the predicted change rate is {tilde over (c)}s,d. (If non-edge traffic is expected to not change substantially during the next time period, the facility uses {tilde over (c)}s,d=0.) If ws,d,p is the fraction of traffic put on network path p∈Ps,d, where Ps,d is the set of paths from s to d, the non-edge traffic from s to d on p varies as:
∀s,d,p∈Ps,d:os,d,p(t)=(Ts,d0+t*{tilde over (c)}s,d)ws,d,p (10)
For a link l, the total non-edge traffic load varies as:
Thus, the overall utilization of link 1 is:
Finally, the facility uses these constraints for conservation of weights:
The facility uses corresponding conservation constraints for τg,ψ and ws,d,p.
The facility seeks performance objectives by preferring traffic distributions with low delays, and seeks efficiency objectives by preferring traffic distributions with low utilization. The facility reconciles these requirements by penalizing high utilization in proportion to expected queuing delay it imposes. The facility uses a piece-wise linear approximation of the penalty function (μ). The results are relatively insensitive to the exact shape—it can also differ across components—but the monotonically non-decreasing slope of the function is retained.
Thus, the objective function is:
The first term integrates utilization penalty over the time period; the second term, where δl is the propagation delay of link l, captures the propagation delay experienced by all traffic on the WAN; and the third term, in where hg,b captures the performance of g to load balancer b, captures the performance of traffic in reaching the infrastructure. η1 and η2 are coefficients to balance the importance of different factors (default value=1).
The facility assigns values to the model's output variables by minimizing the objective under the constraints above. The model uses continuous time, and in some embodiments the facility ensures that the constraints hold at all possible times. Utilization of components linearly decreases or increases with time (due to arrival and departure rates being fixed during the time period). As a result, extreme utilizations occur at time period start or end. Thus, the constraints hold at all times if they hold at t=0 and t=T.
To efficiently handle the objective, since (μ) is monotonic and convex, its first term is transformed using:
Relying again on the linearity of the resource utilization, the second term (and similarly the third) are transferred using:
This approach provides an efficiently-solvable LP. Its constraints are the ones listed earlier but enforced only at t=0,T. Its objective uses the transformations above and the piecewise approximation of (μ).
SOCP is a convex optimization problem with cone-shaped constraints besides linear ones. The general format of conic constraints is √{square root over (Σi=0n=1xi2)}≦xn, where x0, . . . , xn are variables. A cone can be shown in 3D space that results from √{square root over (x2+y2)}≦z. Such constraints can be solved efficiently using the modern interior point method.
To translate the facility's model into a stochastic model and then to an SOCP, the facility models the workload as a random variable. This makes component utilizations random as well. The facility then obtains desirable traffic distributions by bounding the random variables for utilization.
To tractably capture the relationship between random variables that represent workload and those that represent utilization, it is assumed that prediction errors (i.e., differences from actual values) are normally distributed with zero mean. This assumption holds to a first order for a EWMA-based predictor. It is also assumed that the error distributions of different UGs are independent. Independence is not required for actual rates of UGs, which may be correlated (e.g., diurnal patterns). It is also not required for estimation errors for different resource (e.g., memory, bandwidth) needed by a UG (because different resource types never co-occur in a cone).
The facility's approach ensures that, even with prediction errors, the utilization of a component a does not exceed μ′α (t) with a high probability pα (such as 99.9%, for example). The facility computes μ′α (t) based on the predicted workload. The deterministic LP above does not offer this guarantee if the workload is underestimated. Rather, the facility uses:
∀α∉B∪Y∪C∪L:P[μα(t)≦μ′α(t)]≧pα (17)
When prediction errors are normally distributed, components utilizations are too, as the sum of normally distributions is normally distributed. Thus, the requirement above is equivalent to:
∀α∉B∪Y∪C∪L:E[μα(t)]+Φ−1(pα)σ[μα(t)]≦μ′α(t) (18)
where Φ−1 is the inverse normal cumulative distribution function of N(0,1), and E[μα(t)] and σ[μα(t)] are the mean and standard variance of μα(t) respectively. The facility computes E[μα(t)] as a function of the traffic that α carries, using equations similar to those in the temporal model. The facility computes σ[pα(t)] as follows. Because ng,ψ (t) is normally distributed, its standard variance is:
∀g,ψ∈Ψg:σ[ng,ψ(t)]2=t2(σ[ãg]2πg,ψ2+σ[dg]2τg,ψ2) (19)
Thus, for servers, the standard variances are:
Similarly, the standard variance for edge request traffic ql (t) on link l is:
Thus, for links, the standard variance are:
where r1(t) and o1(t) are edge response and non-edge traffic. The facility computes their variance as in Eqn 21.
The quadratic formulations in Eqns. (18)-(22) are essentially cone constraints. For example, merging Eqn. (18), (19) and (20) produces:
The facility solves these constraints along with the earlier ones temporal model to obtain desired outputs. In the objective function (Eqn. 14), μ′α is used instead of μα. The same principles as before are used to remove the dependence on time t.
The facility converts the output of the model to new system configuration as follows. The DNS servers, load balancers, and proxies are configured to distribute the load from new sessions as per computed weights; their routing of old sessions remains unchanged. But two issues arise with respect to network switch configuration: weights differ for current and old sessions but switches do not know to which category a packet belongs; and weights are UG-specific, which would require UG-specific rules to implement, but the number of UGs can be more than switch rule capacity. The facility addresses both issues by having servers embed appropriate path (tunnel) identifier in each transmitted packet. Switches forward packets based on this identifier.
To scale computation, in some embodiments, the facility implements a few optimizations to reduce the size of the LP. A key is the large number of UGs (O(100K)). To reduce it, the facility aggregate UGs at the start of each time period. For each UG, the facility first ranks all entry points in decreasing order of performance and then combines into virtual UGs that have the same entry points in the top-three positions on the same order. The facility formulates the model in terms of VUGs. The performance of a VUG to an entry point is the average of the aggregate, weighted by UGs number of sessions. The variance of the VUG is computed similarly using the variance of individual UGs. Further, to reduce the number of e2e-paths per VUG, the facility limits each VUG to its best three entry points, each load balancer to three proxies, and each source-destination switch pair to six paths (tunnels). Together, these optimizations reduce the size of the LP by multiple orders of magnitude.
The facility also implements an SOCP-specific optimization. Given cone constraint of the form √{square root over (x12+x22+x32)}≦x4 for infrastructure component α, if |x1|≦0.1% of α's capacity. The facility approximates it as |x1|+√{square root over (x22×x32)}≦x4. Since √{square root over (x12+x22+x32)}≦|x1|+√{square root over (x22+x32)}, this approximation is conservative. It is assuming worst case load for x1, but it is a small fraction of capacity, it has minimal impact on the solution. This optimization reduces the number of variables inside cones by an order of magnitude.
In some embodiments, a computing system for tailoring the operation of an infrastructure for delivering online services is provided. The computing system comprises: a prediction subsystem configured to apply a stochastic linear program model to predict operating metrics for heterogeneous components of the infrastructure for a future period of time based upon operating metrics for a past period of time; and an adaptation subsystem configured to use the operating metrics predicted by the modeling subsystem as a basis for reallocating resources provided by the components of the infrastructure to different portions of a load on the infrastructure.
In some embodiments, a computer-readable medium is provided that has contents configured to cause a computing system to, in order to manage a distributed system for delivering online services: from each of a plurality of distributed system components of a first type, receive operating statistics for the infrastructure component of the first type; from each of a plurality of distributed system components of a second type distinct from the first type, receive operating statistics for the distributed system component of the second type; and use the received operating statistics for distributed system components of the first and second types to generate a model predicting operating statistics for the distributed system for a future period of time.
In some embodiments, a method in a computing system for managing a distributed system for delivering online services is provided. The method comprises: from each of a plurality of distributed system components of a first type, receive operating statistics for the infrastructure component of the first type; from each of a plurality of distributed system components of a second type distinct from the first type, receive operating statistics for the distributed system component of the second type; and use the received operating statistics for distributed system components of the first and second types to generate a model predicting operating statistics for the distributed system for a future period of time.
In some embodiments, a computer-readable medium storing an online services infrastructure model data structure is provided. The data structure comprises: data representing a stochastic system of linear equations whose solution yields a set of weights specifying, for each of a plurality of infrastructure resource types, for each of a plurality of combinations of (1) a group of client devices with (2) one of a plurality of infrastructure resource instances of the infrastructure resource type, the extent to which the group of client devices should be served by the infrastructure resource instance during a future period of time, the linear equations of this stochastic system being based on operating measurements of the infrastructure during a past period of time.
It will be appreciated by those skilled in the art that the above-described facility may be straightforwardly adapted or extended in various ways. While the foregoing description makes reference to particular embodiments, the scope of the invention is defined solely by the claims that follow and the elements recited therein.