The present disclosure is related to auto-scaling of cloud based resources for applications and in particular to joint auto-scaling of cloud based node and link resources for applications.
Many applications are performed by resources accessed by a user via a network. Such resources and connections between them may be provided by a cloud. The cloud allocates nodes containing resources to execution of the application and the nodes may be scaled up or down based on the volume of use of the application, referred to as workload. If the workload increases, more resources may be allocated to performing the application. The workload may increase due to more users using the application, the existing user increasing their use, or both. Similarly, the workload may decrease such that fewer resources may be allocated or provisioned to the application.
Current auto-scaling services and approaches scale nodes in isolation. Connections to and between nodes providing resources such as a virtual machine (VM) for an application in a cloud based system may also be scaled in isolation. Scaling the VM nodes without scaling the links between them results in insufficient or wasted network resources. Each of the nodes and links may implement their own scaling policies that react to their workload measurements. Increasing resources in one node may result in changes in workload that occur in other nodes, which then increase their resources.
A method includes receiving runtime metrics for a distributed application, the distributed application utilizing cloud resources including computer nodes and network links, detecting a change in the runtime metrics, determining nodes and links associated with the distributed application utilizing an application topology description data structure, and jointly scaling the links and nodes responsive to the detected change in the runtime metrics.
A computer implemented auto-scaling system includes processing circuitry, a storage device coupled to the processing circuitry, and auto-scaling code stored on the storage device for execution by the processing circuitry to perform operations. The operations include receiving runtime metrics for a distributed application, the distributed application utilizing cloud resources including computer nodes and network links, detecting a change in the runtime metrics, determining nodes and links associated with the distributed application utilizing an application topology description data structure, and jointly scaling the links and nodes responsive to the detected change in the runtime metrics.
A non-transitory storage device having instructions stored thereon for execution by processor to cause the processor to perform operations including receiving runtime metrics for a distributed application, the distributed application utilizing cloud resources including computer nodes and network connections, detecting a change in the runtime metrics, determining nodes and links associated with the distributed application utilizing an application topology description data structure, and jointly scaling the links and nodes responsive to the detected change in distributed application workload metrics.
In the following description, reference is made to the accompanying drawings that form a part hereof, and in which is shown by way of illustration specific embodiments which may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention, and it is to be understood that other embodiments may be utilized and that structural, logical and electrical changes may be made without departing from the scope of the present invention. The following description of example embodiments is, therefore, not to be taken in a limited sense, and the scope of the present invention is defined by the appended claims.
The functions or algorithms described herein may be implemented in software in one embodiment. The software may consist of computer executable instructions stored on computer readable media or computer readable storage device such as one or more non-transitory memories or other type of hardware based storage devices, either local or networked. Further, such functions correspond to modules, which may be software, hardware, firmware or any combination thereof. Multiple functions may be performed in one or more modules as desired, and the embodiments described are merely examples. The software may be executed on a digital signal processor, ASIC, microprocessor, or other type of processor operating on a computer system, such as a personal computer, server or other computer system, turning such computer system into a specifically programmed machine.
Current auto-scaling services and approaches scale nodes in isolation. Links to and between nodes providing resources such as a virtual machine (VM) for an application in a cloud based system may also be scaled in isolation. Scaling the VM nodes without scaling the links between them results in insufficient or wasted network resources. While capacity of a node, such as the number of central processing units (CPUs) and memory may be increased or decreased, the scaling policies of different VM are not coordinated. In the case of a distributed application, where different functions of the application may be performed on different nodes, modifying the capacity at a first node may result in a need for changing the capacity at other nodes. However, a delay may occur as workloads changes are detected only when the first node capacity change results in a cascading workload change at the other nodes.
Each tier may consist of multiple nodes and multiple resources at each node, and many other different types of application services may be associated with the tiers in further embodiments. Communication connections between the users 110 and tiers/nodes are indicated at 130, 135, and 140. Note that with additional tiers and nodes which may be present in provisioning larger applications, the number of connections between nodes may be significantly larger than in the simple example shown.
In prior systems, each tier may have its own VM scaling policy that operates in reaction to workload changes. Similarly, communication links may also have their own scaling policies reacting to changes in bandwidth utilization. Such scaling may be referred to as reactive scaling. Scaling the VM and links in reaction to, not ahead of, workload changes results in reduced performance or wasted resources due to scaling delay.
Scaling delay may include the time to make a decision to react, determining resources to add, booting and rebooting VM. In the case of reducing resources, the delays may be associated with taking a snapshot of a resource to be reduced and deleting the resource. The delays are amplified where resources may be changed at short time intervals. Further, changing the node capacities without changing the capacities of the links between the node may result in still further delay as separate link scaling occurs only when the increase in node capacities result in different communication loads on the links.
In system 100, a joint auto-scaling policy 150 is shown which provides a policy for proactively and jointly scaling the resources at nodes and the connections between the nodes. In other words, scaling of resources may begin prior to workload changes reaching different nodes based on overall workload metrics, also referred to as runtime metrics.
In one embodiment, the joint auto-scaling policy is used by an auto-scaling system, to perform a method 200 of auto-scaling the node resources and links as illustrated in flowchart form in
At 215, a change in distributed application workload metrics is received. A workload measurement system may be observing the workload and providing resource utilization metrics such as for example, frequency of transactions and time to perform transactions, and various quality of service (QoS) measurements. The metrics may be defined by a user or administrator in various embodiments. At 220, cloud resources and network connections associated with the distributed application are determined utilizing a cloud resources and connections topology description data structure. The data structure may be provided by an application administrator, and may be in the form of a mark-up language that describes the structure of the nodes and connections that are used to perform the distributed application.
In one embodiment, the data structure specifies a joint auto-scaling policy and parameters of the distributed application, also referred to as a cloud application in OASIS TOSCA (Topology and Orchestration Specification for Cloud Applications).
At 225, actions to jointly scale the links and nodes of an application may be provided responsive to the detected change in distributed application workload metrics. The actions may specify the link and node resources to increase or decrease in accordance with the auto-scaling policy associated with the distributed application. The actions may use resource management application programming interfaces (APIs) to update link and node capacities for the distributed application.
In one embodiment, the cloud resources are adjusted at multiple nodes of an application. The links between the nodes may be scaled by adjusting the network bandwidth between the multiple nodes.
The application topology description data structure includes an initial reference value for the workload metrics for the distributed application.
In one embodiment, the application topology description data structure further includes link capacities, node capacities, link capacity limits, node capacity limits, link cost, node cost, source node, and sink node.
Joint link-node auto-scaling of an application may be performed using an integral control algorithm to calculate a target total capacity based on current application metrics and a pair of high and low threshold metrics.
The distributed application in one embodiment comprises a tiered web application with different nodes performing different functions of the web application. The nodes and links between the nodes are scaled jointly in accordance with an auto-scaling policy. Capacities of under-provisioned links and nodes are increased such that cost increase of the links and nodes is minimized. Capacities of over-provisioned links and nodes are decreased such that cost decrease of the links and nodes is maximized.
The distributed application 305 is illustrated with a logical representation of cloud resources used to execute the application. A source node is shown at 306 coupled to an application topology 307 and a sink node at 308. In one embodiment, a user 310, such as an administrator or managing system, provides a joint auto-scaling policy 315 and an application topology description 320 to a network scaling service 325. A monitor 330 is used to monitor workload of the distributed application 305 and continuously provides workload metrics as they are generated via a connection 335 to the network scaling service 325. An existing network scaling service 325 may be used and modified to provide joint proactive scaling of both node and links of the distributed application responsive to the provided metrics and joint auto-scaling policy 315.
Auto-scaling decisions of the scaling service 325 are illustrated at 340, and may include adding or removing link capacities of the distributed application using network control representational state transfer (REST) APIs (e.g., Nova and Neutron+extensions) for example. The decisions 340 are provided to an infrastructure as a service (IaaS) cloud platform, such as for example OpenStack, which then performs the decisions on a datacenter infrastructure 350 that comprises the nodes and links executing the distributed application as deployed by user 310. Note that the infrastructure 350 may include networked resources at a single physical location, or multiple networked machines at different physical locations as is common in cloud based provisioning of distributed applications. The infrastructure 350 may also host the scaling service 325 which may also include the monitor 330.
In one embodiment, application topology 320 is converted into an application model for use by the scaling service 325. The application model may be expressed as G=(N, E, A, C, L, s, t), where:
N={ni|ni is a node}
E={eji|eij is a link from node ni to node nj in E}
Ak={aij|aij1>0 is the link capacity of eij at time k}
Bk={bi|bi>0 is the node capacity of ni at time k}
CE={cij|cij>0 is the link capacity cost of eij}
LE={lijlij≧aij is the maximum capacity of link eij}
CN={ci|ci>0 is the capacity cost of ni}
LN={li|li≧ci is the maximum capacity of node ni}
s is the source node of E that generates input to N and
t is the sink node of E that receives output from N
The total cost of the application model G is the sum {aijcij for all eij in E}+sum{bici for all ni in N}. The joint auto-scaling policy 315 specifies Mref, A0, B0, LE, LN, s, and t, where Mref, is an initial value of the metrics. The measured metrics at time k are represented by Mk, and as indicated above, may include various QoS and resource utilization metrics. The application model, joint policy, and measured metrics are provided to the scaling service 325, which may implement a modified form of integral control in one embodiment where:
1. Uk+1=Uk+K(Mt−Mk) if Mk<Ml (scale up)
2. Uk+1=Uk−K(Mk−Mh) if Mk>Mh (scale down)
3. Uk+1=Uk (do nothing)
4. Ui=capacity(min_cut(G, Ai)) for i=k, k+1
The integral control coefficient, K, is used to control how quickly scaling occurs in response to changes in the measurement metrics. Note that the first three potential actions, scale up, scale down, and do nothing are dependent on if the measured metrics Mk is below the low threshold Ml, above the high threshold Mh, or within the thresholds respectively. In each case, a target total capacity at time k+1 is calculated based on the current total capacity Uk at time k plus a total increased capacity, minus a total decrease capacity, or without any change. The total increased capacity K(Mk−Mk) is the difference between the low threshold and the measured metrics times the integral control coefficient, while the total decreased capacity K(Mk−Mh) is the difference between the measured metrics and the high threshold times the integral control coefficient. The fourth potential action calculates the current total capacity Uk from the application topology and associates the target total capacity Uk+1 to the application topology by a min_cut function as described in further detail below. In one embodiment, the decisions 340 include API calls to allocate link-node capacities by matrix Ak+1 that define new link capacities and vector Bk+1 that defines the new node capacities for the application.
1. max-flow(G)=min_cut (G) capacity=minimal under-provisioned links, and
2. there could be more than one min_cut below the target total capacity.
The link-node scale-up algorithm may be viewed as a solution to a cost optimization problem defined as follows:
1. sum {dij for eij in S×T}=diff
2. aij+dij≦lij
3. bi+di<li
4. di=fi(sum{dij}) for ni in S′ and di=fi(sum{dji}) for ni in T′ (inc_nodes) dij, di≧0
Scaling down may be performed in a similar manner.
The allocation of total decreased capacity among the over-provisioned links may be defined as a solution to the following optimization problem:
1. sum {dij for eij in S×T}=diff
2. 0<aij−dij
3. 0<bi−di
4. di=fi(−sum{dij}) for ni in S′ and di=fi(−sum{di}) for ni in T′(dec_nodes)
5. dij,di≧0
Various described embodiments may provide one or more benefits for users of the distributed application. The scaling policy may be simplified as there is no need to specify complex scaling rules for different scaling groups. Users can jointly scale links and nodes of applications, avoiding the delays observed in reactive scaling using individual and independent scaling policies of nodes and links. Cost for joint resources (compute and network) maybe reduced while maintaining the performance of the distributed application. For cloud providers, joint resource utilization (compute and network) may be provided while providing global performance improvement to applications. Proactive auto-scaling based on application topology results in improved efficiency, reducing delays observed with prior cascading reactive methods of auto-scaling. Still further, the min_cut methodology, the application scaling, and the link-node scaling algorithms are all polynomial time, reducing the overhead required for identifying resources to scale.
One example computing device in the form of a computer 2000 may include a processing unit 2002, memory 2003, removable storage 2010, and non-removable storage 2012. Although the example computing device is illustrated and described as computer 2000, the computing device may be in different forms in different embodiments. For example, the computing device may instead be a smartphone, a tablet, smartwatch, or other computing device including the same or similar elements as illustrated and described with regard to
Memory 2003 may include volatile memory 2014 and non-volatile memory 2008. Computer 2000 may include—or have access to a computing environment that includes—a variety of computer-readable media, such as volatile memory 2014 and non-volatile memory 2008, removable storage 2010 and non-removable storage 2012. Computer storage includes random access memory (RAM), read only memory (ROM), erasable programmable read-only memory (EPROM) and electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, compact disc read-only memory (CD ROM), Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium capable of storing computer-readable instructions.
Computer 2000 may include or have access to a computing environment that includes input 2006, output 2004, and a communication connection 2016. Output 2004 may include a display device, such as a touchscreen, that also may serve as an input device. The input 2006 may include one or more of a touchscreen, touchpad, mouse, keyboard, camera, one or more device-specific buttons, one or more sensors integrated within or coupled via wired or wireless data connections to the computer 2000, and other input devices. The computer may operate in a networked environment using a communication connection to connect to one or more remote computers, such as database servers. The remote computer may include a personal computer (PC), server, router, network PC, a peer device or other common network node, or the like. The communication connection may include a Local Area Network (LAN), a Wide Area Network (WAN), cellular, WiFi, Bluetooth, or other networks.
Computer-readable instructions stored on a computer-readable medium are executable by the processing unit 2002 of the computer 2000. A hard drive, CD-ROM, and RAM are some examples of articles including a non-transitory computer-readable medium such as a storage device. The terms computer-readable medium and storage device do not include carrier waves to the extent carrier waves are deemed too transitory. For example, a computer program 2018 capable of providing a generic technique to perform access control check for data access and/or for doing an operation on one of the servers in a component object model (COM) based system may be included on a CD-ROM and loaded from the CD-ROM to a hard drive. The computer-readable instructions allow computer 2000 to provide generic access controls in a COM based computer network system having multiple users and servers. Storage can also include networked storage such as a storage area network (SAN) indicated at 2020.
1. In example 1, a method includes receiving runtime metrics for a distributed application, the distributed application utilizing cloud resources including computer nodes and network links, detecting a change in runtime metrics, determining nodes and links associated with the distributed application utilizing an application topology description data structure, and jointly scaling the links and nodes responsive to the detected change in runtime metrics.
2. The method of example 1 wherein the links and nodes are scaled in accordance with an auto-scaling policy.
3. The method of example 2 wherein the auto-scaling policy is associated with the distributed application.
4. The method of any of examples 1-3 wherein scaling the nodes comprises adjusting resources at multiple nodes of the application.
5. The method of example 4 wherein scaling the links comprises adjusting bandwidths of networks between the multiple nodes.
6. The method of any of examples 1-5 wherein the application topology description data structure includes an initial reference value for the runtime metrics for the distributed application.
7. The method of example 6 wherein the application topology description data structure further includes link capacities, node capacities, maximum link capacity, maximum node capacity, link cost, node cost, source node, and sink node.
8. The method of example 6 wherein auto-scaling the links and nodes is performed using an integral control algorithm to generate a change from current total capacity to a target total capacity for the application.
9. The method of example 8 wherein the capacities of links and nodes are scaled up or down or remain the same dependent on a target capacity, and wherein a target total capacity is calculated based on a pair of high and low threshold metrics.
10. The method of example 9 wherein under-provisioned links and nodes are identified using a graph min_cut method based on the application topology and capacities of under-provisioned links and nodes are increased to meet the target total capacity such that cost of the under-provisioned links and nodes is reduced by iteratively allocating total increased capacity among the links inversely proportional to their costs.
11. The method of example 9 wherein over-provisioned links and nodes are identified using a graph max-cut method based on the application topology and capacities of over-provisioned links and nodes are decreased to meet the target total capacity such that cost of the over-provisioned links and nodes is reduced by iteratively allocating total decreased capacity among the links proportional to their costs.
12. The method of any of examples 1-11 wherein the distributed application comprises a tiered web application with different nodes performing different tiers of the web application.
13. In example 13, a computer implemented auto-scaling system includes processing circuitry, a storage device coupled to the processing circuitry, and auto-scaling code stored on the storage device for execution by the processing circuitry to perform operations. The operations include receiving runtime metrics for a distributed application, the distributed application utilizing cloud resources including computer nodes and network links, detecting a change in runtime metrics, determining nodes and links associated with the distributed application utilizing an application topology description data structure, and jointly scaling the links and nodes responsive to the detected change in runtime metrics.
14. The system of example 13 wherein the links and nodes are scaled in accordance with an auto-scaling policy.
15. The system of example 14 wherein the auto-scaling policy is associated with the distributed application and wherein the processing circuitry comprises cloud based resources.
16. The system of any of examples 13-15 wherein auto-scaling the links and nodes comprises adjusting resources at multiple nodes of the application, wherein scaling the links comprises adjusting bandwidths of networks between the multiple nodes, wherein the application topology description data structure includes an initial reference value for the runtime metrics for the distributed application, and wherein the application topology description data structure further includes link capacities, node capacities, maximum link capacity, maximum node capacity, link cost, node cost, source node, and sink node.
17. The system of example 16 wherein auto-scaling the links and nodes is performed using an integral control algorithm to generate a change from current total capacity to a target total capacity for the entire application (or the entire services to support the application), wherein a target total capacity is calculated based on a pair of high and low threshold metrics, wherein under-provisioned links and nodes are identified using a graph min_cut method based on the application topology and capacities of under-provisioned links and nodes are increased to meet the target total capacity such that cost of the under-provisioned links and nodes is minimized by iteratively allocating total increased capacity among the links inversely proportional to their costs, and wherein over-provisioned links and nodes are identified using a graph max-cut method based on the application topology and capacities of over-provisioned links and nodes are decreased to meet the target total capacity such that cost of the over-provisioned links and nodes is minimized (or compared and/or reduced) by iteratively allocating total decreased capacity among the links proportional to their costs.
18. In example 18, a non-transitory storage device has instructions stored thereon for execution by processor to cause the processor to perform operations including receiving runtime metrics for a distributed application, the distributed application utilizing cloud resources including computer nodes and network connections, detecting a change in runtime metrics, determining nodes and links associated with the distributed application utilizing an application topology description data structure, and jointly scaling the links and nodes responsive to the detected change in distributed application workload metrics.
19. The non-transitory storage device of example 18 wherein auto-scaling the links and nodes comprises adjusting resources at multiple nodes of the application, wherein scaling the links comprises adjusting bandwidths of networks between the multiple nodes, wherein the application topology description data structure includes an initial reference value for the runtime metrics for the distributed application, and wherein the application topology description data structure further includes link capacities, node capacities, maximum link capacity, maximum node capacity, link cost, node cost, source node, and sink node.
20. The non-transitory storage device of example 19 wherein auto-scaling the links and nodes is performed using an integral control algorithm to generate a change from current total capacity to a target total capacity for the entire application, wherein a target total capacity is calculated based on a pair of high and low threshold metrics, wherein under-provisioned links and nodes are identified using a graph min_cut method based on the application topology and capacities of under-provisioned links and nodes are increased to meet the target total capacity such that cost of the under-provisioned links and nodes is reduced by iteratively allocating total increased capacity among the links inversely proportional to their costs, and wherein over-provisioned links and nodes are identified using a graph max-cut method based on the application topology and capacities of over-provisioned links and nodes are decreased to meet the target total capacity such that cost of the over-provisioned links and nodes is reduced by iteratively allocating total decreased capacity among the links proportional to their costs.
Although a few embodiments have been described in detail above, other modifications are possible. For example, the logic flows depicted in the figures do not require the particular order shown, or sequential order, to achieve desirable results. Other steps may be provided, or steps may be eliminated, from the described flows, and other components may be added to, or removed from, the described systems. Other embodiments may be within the scope of the following claims.