The present disclosure generally relates to cloud computing. In particular, a technique for scaling an application having a set of virtual machines is presented. The technique may be practiced in the form of a method, a computer program, an arrangement (e.g., an apparatus or node) and a system.
The workgroup on Network Function Virtualization (NFV) within the European Telecommunications Standard Institute (ETSI) has published recommendations on lifecycle management of Virtualized Network Functions (VNF) in their framework ETSI GS NFV-MAN 001, V1.1.1 (2014-12) entitled NFV Management and Orchestration (MANO). One particular aspect of VNF lifecycle management is VNF elasticity via scaling processes.
The VNF scaling processes defined in the ETSI VNF MANO framework include scale-out/-in and scale-up/-down operations (see, e.g., section B.4.4.). Scale-out/-in operations are directed at changing the capacity of a VNF by means of adding or removing Virtual Machines (VMs). Scale-up/-down operations encompass changing the capacity of a VNF by means of adding or removing infrastructure resources (e.g., in terms of computing, network and storage resources) to or from existing VMs.
The execution of a VNF scaling process is triggered by a system entity detecting the need for a capacity increase or decrease via monitoring Key Performance Indicators (KPIs) of the VNF or of its underlying infrastructure. The behavior of the detection entity is usually configured by means of policies. Current policies to configure the decisions for VNF scaling are based on thresholds for certain KPI types (see, e.g., section B.4.4.3 of the ETSI VNF MANO framework). The monitoring of the KPIs and their comparison against the thresholds enable the detection of threshold passing. For example, in the event of a KPI relaxing below a configured threshold, the need for a capacity decrease is detected and execution of a scale-in or scale-down operation will be triggered.
It has been found that the VNF scaling process as defined in the ETSI VNF MANO framework is not optimal in many aspects. For example, the speed of convergence of the scaling process is strongly varying and difficult to predict. As such, the scaling process does not exhibit a deterministic behavior. Similar problems are also encountered for applications with virtual machines that conform to MANO standards different from the ETSI VNF MANO framework.
Accordingly, there is a need for a technique that permits a more efficient scaling of applications having a set of one or more virtual machines.
According to a first aspect, a method of scaling an application having a set of one or more virtual machines is provided. The steps of the method are performed during runtime of the application and responsive to a determination that the scaling operation is required for the application, wherein the determination is based on at least one first performance measurement result obtained for the application. In detail, the method comprised calculating a scaling magnitude for the required scaling operation taking into account at least one second performance measurement result obtained for the application, wherein the scaling magnitude is indicative of a resource quantity to be added to or removed from the application. The method further comprises triggering generation of a scaling request, wherein the scaling request is directed at a scaling of the application on the basis of the calculated scaling magnitude.
In certain variations, the method may also comprise determining, prior to the calculation of the scaling magnitude, that a scaling operation is actually required for the application. That determination can be based on the same performance measurement result that will also be taken into account for the calculation of the scaling magnitude or on a different performance measurement result.
In one variant, an operating target is defined for a performance indicator underlying the second performance measurement result. The performance indicator may, for example, be a load parameter of the application, that is measured to obtain the second performance measurement result. The operating target may generally be an operating point or an operating range for a given performance indicator.
The scaling magnitude may be calculated based on a present (i.e., current) or expected relationship between the performance indicator and the operating target.
An expected relationship between the performance indicator and the operating target may, for example, be derived by extrapolating one, two or more second performance measurement results obtained for the same performance indicator at different points in time.
A scaling factor may be determined from the present or expected relationship between the performance indicator and the operating target. The relationship between the operating target and the performance indicator may be expressed in various ways. As an example, the relationship may be defined as a current or expected deviation of the performance indicator from the operating target.
The scaling factor may generally be taken into account for calculating the scaling magnitude. As an example, the scaling magnitude may be calculated from the scaling factor and a resource quantity presently allocated to the application. In certain variants, the scaling magnitude may be determined by multiplying the presently allocated resource quantity with the scaling factor. The result of this multiplication may be processed further (e.g., offset) to obtain the scaling magnitude.
In certain variants the scaling magnitude is calculated taking into account multiple second performance measurement results obtained for multiple performance indicators. As an example, for each performance indicator a dedicated second measurement result may be obtained. In such a case a dedicated operating target may be defined for each performance indicator. The scaling magnitude may then be calculated based on present or expected relationships between the performance indicators and the associated operating targets.
There may exist a known correlation between the second performance measurement result and the resource quantity to be added to or removed from the application. As an example, the correlation may be a functional (e.g., essentially linear) relationship or a mapping. The correlation may have been determined prior to runtime of the application (e.g., via an empirical approach that can be based on measurements).
The known correlation may be taken into account in the calculation of the scaling magnitude. As an example, the scaling magnitude may be determined from the correlation and the relationship between the operating target and the performance indicator.
The second performance measurement result is in one example indicative of a system performance of the application The second measurement result may thus have been obtained by aggregating individual performance measurements over the set of virtualized machines. As an example, for each individual virtual machine in the set a dedicated individual performance measurement may be performed. The resulting individual performance measurement results can be aggregated (e.g., added, averaged, etc.) so as to obtain the (“final”) second performance measurement result that will be taken into account in the scaling magnitude calculation.
At least one of the first measurement result and the second measurement result may be indicative of a load of the application. Additionally, or in the alternative, at least one of the first measurement result and the second measurement result may be independent from the number of virtual machines associated with the application. As explained above, averaging of individual performance measurement results obtained for each individual virtual machine could be applied to that end.
The determination that a scaling cooperation is required and the calculation of the scaling magnitude may be performed on the basis of one and the same performance measurement result or set of performance measurement results. As such, the (first) measurement result underlying the determination that the scaling operation is required may be used as the (second) measurement result that is taken into account upon calculating the scaling magnitude.
As said, the method presented herein may further comprise determining that a scaling operation is required. That determination may be performed in various ways, for example by subjecting the first performance measurement result to at least one threshold decision. In some variants, a lower threshold and an upper threshold for the first performance measurement result may be defined. The determination may in certain variants also be performed based on the operating target for the performance indicator. As an example, it may be determined that a scaling operation is required upon detection of a predefined deviation of the first performance measurement result from the operating target.
The operating target (e.g., the operating point or operating range) for the performance indicator may lie between the lower threshold and the upper threshold. In other variants, the operating target may at least partially lie below the lower threshold or above the upper threshold. There may exist a predefined relationship between the operating target for the performance indicator on the one hand and at least one of the lower threshold and the upper threshold on the other.
The method may further comprise verifying the calculated scaling magnitude. Moreover, the method may optionally comprise adjusting the calculated scaling magnitude dependent on a result of the verification. The scheduling request may be triggered to be generated such that it is indicative of the adjusted scaling magnitude.
The verification of the calculated scaling magnitude may be performed in various ways, for example by comparing the calculated scaling magnitude with at least one configuration parameter. A threshold decision may be applied in this regard.
The at least one configuration parameter may be selected from a parameter set comprising a maximum number of allowed virtual machines for the application, a minimum number of allowed virtual machines for the application, a maximum amount of allowed infrastructure resources for an individual virtual machine, and a minimum amount of allowed infrastructure resources for an individual virtual machine. As such, the resource quantity may be indicative of a number of virtual machines to be added to or removed from the application. Alternatively, or in addition, the resource quantity may be indicative of infrastructure resources for the virtual machines to be added to or removed from (e.g., per virtual machine) the application.
Also provided is a computer program product comprising program code portions for performing the steps of any of the methods and method steps presented herein when the computer program product is executed by at least one computing device (e.g., a processor or a distributed set of processors). The computer program product may be stored on a computer-readable recording medium, such as a semiconductor memory, a CD-ROM, DVD, and so on.
According to a still further aspect, an arrangement configured to trigger scaling of an application having a set of one or more virtual machines is presented. The arrangement comprises at least one processor configured to perform dedicated operations during runtime of the application and responsive to a determination that a scaling operation is required for the application, wherein the determination is based on at least one first performance measurement result obtained for the application. Specifically, the processor is configured to calculate a scaling magnitude for the required scaling operation taking into account at least one second performance measurement result obtained for the application, wherein the scaling magnitude is indicative of a resource quantity to be added to or removed from the application. The processor is further configured to trigger generation of a scaling request, wherein the scaling request is directed at a scaling of the application based on the calculated scaling magnitude.
The application may be configured as a VNF. Moreover, the arrangement may be configured as a VNF Manager (VNFM). The VNF and VNFM may conform to ETSI GS NFV-MAN 001, V1.1.1 (2014-12). It should be noted that the arrangement could also be configured in any other manner and is thus not limited to being implemented in a telecommunications scenario.
The arrangement may generally be configured to perform any of the methods and method steps presented herein. Moreover, the arrangement may be configured as an apparatus, a network node or a set of network nodes.
Also provided is a system comprising the arrangement presented herein and the application having the set of one or more virtual machines. The system may belong to a telecommunications cloud system. The telecommunications cloud system may further comprise at least one of a Radio Base Station (RBS) function, an Evolved Packet Core (EPC) function, an Internet Protocol Multimedia Subsystem (IMS) core function, and one or more other functions running on the set of virtual machines.
Further details, aspects and advantages of the present disclosure will become apparent from the following description of exemplary embodiments and the accompanying drawings, wherein:
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as specific network nodes, network configurations, communication protocols, and so on, in order to provide a thorough understanding of the present disclosure. It will be apparent to one skilled in the art that the present disclosure may be practiced in other embodiments that depart from these specific details. For example, while the following embodiments will partially be described in connection with exemplary cloud architectures and an exemplary ETSI recommendation, it will be appreciated that the present disclosure may also be practiced in connection with other cloud architectures and other cloud management and orchestration approaches. It will also be appreciated that the present disclosure is not limited to be applied in connection with telecommunications systems. Rather, the present disclosure could, for example, also be implemented in connection with online sales or other enterprise applications.
Those skilled in the art will further appreciate that the steps, services and functions explained herein below may be implemented using individual hardware circuitry, using software functioning in conjunction with a programmed micro-processor or general purpose computer, using one or more Application Specific Integrated Circuits (ASICs) and/or using one or more Digital Signal Processors (DSPs). It will also be appreciated that when the present disclosure is described in terms of a method, it may also be embodied in one or more processors and one or more memories coupled to the one or more processors, wherein the one or more memories are encoded with one or more programs that perform the steps, services and functions disclosed herein when executed by the one or more processors.
The functional layer 110 contains the functions (Network Functions (NF) and Dedicated Functions (DF)) performed by the 5G network system including tasks like mobility, security, routing, baseband processing, etc. Many but not necessarily all of these NFs will be performed by software running on virtualized hardware. Some of these NFs running on virtualized hardware will utilize Application Program Interfaces (API) provided by an execution environment to be able to control functionalities executed in hardware such as Service Defined Network (SDN) switches, hardware acceleration and so on.
Since at least some of these NFs are virtualized (VNFs), they are not tied to a specific hardware node. That is, they can be executed in different places within the network system depending on the given deployment scenario and requirements. This approach makes it possible to, for instance, distribute in a flexible way gateway functionalities closer to radio access nodes 130 when needed for particular services, while supporting more centralized gateways for other services. In theory this also makes it possible to dynamically re-configure the network system based on ongoing services or load. However, in the 2020 time frame it is still expected that time critical functions such as baseband processing today performed by dedicated hardware in the access nodes 130 (implementing DFs) will in most cases continue to do so.
The infrastructure (hardware) layer 120 of the cloud architecture 100 contains radio nodes including user terminals (also called User Equipment, UEs), relay nodes (including wireless MTC-gateways or self-backhauled nodes) and one or more RANs 140 with the access nodes 130. In
The cloud model underlying cloud architectures, such as the architecture 100 shown in
The hardware layer typically refers to the data center(s) 160 and other core infrastructure nodes 150 (see
Finally, at the application layer, there are installed generally one or more service provider applications providing, in the present embodiment, telecommunications services and, in more general realizations, business applications, web services, multimedia and gaming services. All of these qualify as software-as-a-service (SaaS) in the cloud paradigm terminology. The scaling approach presented herein may be performed in relation to an application residing on the application layer.
Due to the high availability requirements in the cloud architecture 100 of
For example, during peak hours extra capacities are required by critical applications supporting, for example, the RAN 140. On the other hand, during off-peak hours the capacity requirement may drastically relax. To cope with these and other load fluctuations in the network system of
As shown in
An exemplary mode of operation of the triggering arrangement 20 illustrated in
In an (optional) initial step 302, the at least one processor 22 of the triggering arrangement 20, or any other entity, determines that a scaling operation is required for the application with the one or more VMs. That determination may be based on at least one first performance measurement result obtained for the application. The first performance measurement result may have been received by the triggering arrangement 20 via the one or more interfaces 24. Step 302 is performed during runtime of the application.
The determining step 302 may include subjecting the first performance measurement result to one or more threshold decisions. Specifically, a lower threshold and an upper threshold may be defined (e.g., in the memory 26). It may thus be determined in step 302 that a scaling operation is required if the first performance measurement result exceeds the upper threshold or falls below the lower threshold. Alternatively, the determination in step 302 whether a scaling operation is required could also be based on a deviation of the first performance measurement result from a predefined operating target for a performance indicator. Details regarding the operating target will be described below.
Responsive to the determination in step 302 that a scaling operation is required, the at least one processor 22 calculates a scaling magnitude for the required scaling operation in step 304. The calculation in step 304 takes into account at least one second performance measurement result for the application. The second performance measurement result may be identical or different from the first performance measurement result processed in step 302. The second performance measurement result may also have been received via one or more interfaces 24.
The scaling magnitude calculated in step 304 is indicative of a resource quantity to be added to or removed from the application. As an example, the resource quantity may be indicative of a number of VMs to be added to or removed from the application. As a further example, the resource quantity may be indicative of an amount of infrastructure resources (e.g., in terms of one or more of computing, storage and networking resources) to be added to or removed from one or more of the VMs in the set.
In an optional step not depicted in
In a further step 306, generation of a scaling request is triggered by the triggering arrangement 20. The scaling request generated in response to the triggering operation is directed at a scaling of the application on the basis of the calculated scaling magnitude. Step 306 encompasses the case where the calculated scaling magnitude is adjusted responsive to its verification (as the adjusted scaling magnitude will still be based on the scaling magnitude calculated in step 304).
Responsive to the triggering step 306, the scaling request will either be generated locally within the triggering arrangement 20 or by any other entity. As such, the triggering arrangement 20 may send via the one or more interfaces 24 either a triggering event for the scaling request or the scaling request as such to another entity in charge of actually scaling the application, such as a cloud management entity.
The cloud management entity 40 will receive the scaling request or the triggering event for generation of a scaling request either directly from the triggering arrangement 20 or from an entity located between the triggering arrangement 20 and the cloud management entity 40, see
Responsive to receipt of the scaling request or the triggering event, the cloud management entity 40 adds or removes one or more VMs 46 from the application 42 depending on the indicated scaling magnitude. Alternatively, or in addition, the cloud management entity 40 adds or removes infrastructure resources to or from one or more of the VMs 46 dependent on the indicated scaling magnitude.
A VM 46 may generally be constituted by a (virtualized) computing resource. Thus, creation or generation of a VM 46 may refer to deployment or allocation of the associated computing resource. To each computing resource, networking resources and storage resources can be added (e.g., associated, allocated or connected) on demand. Different technologies exist to allocate computing resources and exposed them as VMs 46. Such technologies include a hypervisor as hardware abstraction layer, containers (e.g., Linux containers), PaaS frameworks, and a so-called bare metal virtualization. In the ETSI Framework, the term is used to designate a virtualized application 42. A deployed VNF typically consists of multiple instances of one or more (typically different) VM types, where each VM type runs its own, dedicated function.
In certain variants, the calculation step 304 in
The operating target and related parameters may be specified in different ways and may be stored in the memory 26 for being accessed by the one or more processors 22 of the triggering arrangement 20 (see
The scaling magnitude may be calculated based on a present or expected relationship between the respective performance indicator and the respective operating target. As an example, the present relationship may simply be determined by analyzing a deviation of the (current) second performance measurement result from the operating target. An expected relationship may be determined by extrapolating a number of (previous) second performance measurement results into the future and by analyzing a deviation of the extrapolated value from the operating target.
In certain variants, a scaling factor may be determined from the present or expected relationship between the performance indicator and the operating target. In such a case the scaling magnitude may be calculated from the scaling factor and a resource quantity presently allocated to the application (e.g., using a multiplication operation).
For a particular performance indicator, the scaling factor may be determined from the associated operating target (e.g., as stored in the memory 26 as part of a predefined scaling policy configuration). Alternatively, or in addition, the scaling factor may be determined from the present or expected relationship between the performance indicator and the operating target.
In an exemplary implementation compliant, for example, with the ETSI framework, the expected relationship (e.g., deviation) is used for scale-up and scale-out operations. Extrapolation in connection with scale-down and scale-in may be used only if a reaction time for the scale-down/scale-in operation is very slow and the point of extrapolation is not further than the time to complete the associated scaling action. By specifying the operating target for each KPI of an application 42 for each of a scale-out, scale-in, scale-up and scale-down operation individually, an operator can configure the desired behavior of the application 42 concerning the desired load and resource utilization.
In certain scenarios, a dedicated operating target for each performance indicator (e.g., KPI) of interest and for each configured scaling operation (e.g., one of more of scale-out, scale-in, scale-up and scale-down) may be provided. The one or more operating targets (or parameters suitable to derive the one or more operating targets) may be stored in the memory 26 for use by the one or more processors 22 of the triggering arrangement 20 (see
The runtime data in terms of the (first and second) performance measurement results may continuously be received at runtime of the application 42. Further, the calculation of the scaling magnitude in step 304 may also take into account runtime information on the number of active VMs 46 in the application 42 and the actual amount of allocated infrastructure resources per application 42 or VM 46 in the application 42.
In the following, the embodiments described above will exemplarily be put in the larger context of ETSI GS MFV-MAN 001, V1.1.1 (2014-12). It will appreciated that the following details of exemplary scaling approaches could likewise be applied in connection with other cloud management and orchestration approaches.
In step 1, the VNFM is continuously informed during runtime of the VNF (e.g., in the form of the application 42 illustrated in
The VNFM then generates a scaling request comprising the calculated scaling magnitude and sends the scaling request in step 3 to the NFVO for VNF expansion using the operation Grant Lifecycle Operation of the VNF Lifecycle Operation Granting interface. In step 4, the NFVO takes a scaling decision and checks the scaling magnitude in the resource request received from the VNFM against its capacity database for free resource availability. The remaining steps 5 to 15 in the signaling diagram of
Initially, in step 602, runtime data are received. The runtime data include performance measurement results for multiple KPIs.
The measurement results received in step 602 have been previously aggregated as generally illustrated in
Then, in step 604, the individual (aggregated) measurement results for the various KPIs are individually subjected to a threshold decision to determine the requirement of a scaling operation (in accordance with step 302 of
In case it is determined in step 604 that for at least one KPI the associated upper threshold or lower threshold is passed, the method proceeds to step 606. In step 606 a scaling factor is calculated. There exist various algorithmic options for calculating the scaling factor.
One exemplary algorithm assumes a linear functional relationship between the performance measurement results and the number of VMs 46 or allocated infrastructure resources of the VMs 46 within the application 42. For each KPI type value KPIi passing an associated threshold value (as determined in step 604), a KPI type specific scaling factor SFi can be calculated as follows:
wherein OPi denotes the operating point value for this KPI type. In a simple form, the scaling factor SF is set to the maximum of the individual scaling factors SFi in case of an exemplary scale-out operation. That is
SF=max(SFi)
In its simple form, the scaling factor SF is set to the minimum of the individual scaling factors SFi in case of a scale-in operation. In a similar manner, scale-up and scale-down operations are handled. In general, different weights could be given to the different KPIs. The scaling factor SF may then be impacted by these weights.
Of course, other algorithms could be used as well for calculating the scaling factor. Generally, there will often be a way to define one or more KPIs that scale linearly with the number of VMs 46 or the amount of allocated resources. Another way of defining the correlation of the performance measurement results and the number of VMs 46 or allocated infrastructure resources comprises running the application 42 with different numbers of VMs or different amounts of allocated infrastructure resources and different loads, and capturing the correlation between the performance measurement results and the required number of VMs 46 or required amount of infrastructure resources. These processes are performed prior to runtime of the application 42, for example, during an installation phase of the system. As a result, a functional relationship or a mapping between the performance measurement results and the required resource quantity to be added or to be removed can be determined.
From step 606, the process illustrated in
In detail, the scaling factor determined in step 606 is multiplied with the number of active VMs 46 or the amount of allocated infrastructure resources. The result of this multiplication is rounded up or down to an integer value so as to obtain the scaling magnitude. The scaling magnitude will thus be indicative of the particular resource quantity to be added or removed from the application.
In order to avoid exceeding or falling below a configured maximum or minimum size of the application 42 in terms of the number of VMs 46 or amount of infrastructure resources, the scaling magnitude calculated in step 608 can be verified (not shown in
Then, in step 610, the scaling request is generated that includes the calculated and, potentially, adjusted scaling magnitude. The processing of the scaling request may then be performed as generally illustrated in
In the following, an actual example of a scale-out operation will be described with reference to the exemplary diagram shown in
For the purpose of the following example, a “VM load KPI” is defined as the arrival rate of requests in an individual VM 46 divided by the maximum number of requests the individual VM 46 can handle:
The arrival rate will be measured over a configurable time interval. The maximum number of requests one VM 46 can handle is supposed to be a known value (e.g., a predefined number).
To determine the system performance of the set of VMs 46 defining a particular application 42 (see
Load_sys=(Σk=0num
where LoadVM_k is the load of the k'th VM 46.
In the example illustrated in
Let us suppose that we have a system with three VMs 46 (i.e., n=3 in
The KPI aggregator 70, based on these three values, calculates the system load KPI value as defined above to equal:
Load_sys=1.
It is this value that enters as measurement result step 602 in
SF=(1−0.7)/0.7=0.43
The number of VMs 46 to be added to the application 42 (i.e., the scaling magnitude) can then be calculated in step 608 by multiplying that scaling factor SF with the current number of VMs 46 utilized by the application 42:
Ceil(SF*current_number_VM)=Ceil(0.43*3)=2
This means that the scaling request sent in step 610 will indicate that two VMs 46 have to be newly added to the application 42.
As has become apparent from the above description of exemplary embodiments, the behavior of applications with one or more VMs can be controlled more deterministic in terms of load and resource utilization. In certain variants, an operator is able to specify desired operating targets for the application. Moreover, the capacity of an application can be adapted faster to the current load, and will also converge faster to a desired operating target.
The present disclosure may, of course, be carried out in other ways than those specifically set forth herein without departing from the scope of the claims appended hereto. Thus, the present embodiments are to be considered in all respects as illustrative and not restrictive, and all changes coming within the scope the appended claims are intended to be embraced therein.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2015/057344 | 4/2/2015 | WO | 00 |