The present application is based on, and claims priority from, Taiwan Patent Application No. 103114547 filed on Apr. 22, 2014, the disclosure of which is hereby incorporated by reference herein in its entirety
The technical field generally relates to a method and system for dynamic instance deployment of public cloud.
Webcast services have been mushroomed for recent years. Users may watch live videos such as online games, entertainment, news, sports program, technology via the Internet. With the popularity of online streaming, these streaming services require more and more bandwidth to operate. A peer-to-peer (P2P) network may use a mutual data sharing approach among peers, to increase the efficiency of streaming transmission. In a P2P network, many factors affects the quality of video such as user leaving and joining, low computational power of user equipment, insufficient bandwidth of user equipment, the distance between the video source and the user equipment. To overcome the variance, an architecture combining relaying servers and the P2P network is a good way to maintain the viewing quality for users.
With the popularity of mobile devices such as a hand-held video camera device, any user can become a streaming source. Both streamers and viewers can start from anywhere anytime. With the trend, the workload of a server increases rapidly, a streaming service company may work with a public cloud provider to build a distributed server group within the public cloud, and initiate variable number of relaying servers to meet flexible demands. For example, the streaming service company may pre-analyze the maximum simultaneous on-line users, and pre-establish sufficient virtual machines (VMs) from the public cloud.
Even if the estimation of the number and the behavior of users are achievable, a large number of standby servers are still needed to deliver the same viewing quality at peak time. With the doubt of quality degradation, the streaming service company still can't turnoff idle servers rashly during off-peak time. In many live broadcasting events, we always find idle servers with a low connection number. Money wastes on idle servers have been widening. Therefore, how to find an automatic way to minimize costs while maintaining satisfying viewing quality has become an important issue.
Auto-scaling may be done by vertical scaling and horizontal scaling. The vertical scaling is to modify hardware resources, such as increasing central processing unit (CPU) and/or Memory and/or bandwidth, while the number of servers remains unchanged. The horizontal scaling is to increase or decrease the number of servers, while the hardware specification of servers remains unchanged. Horizontal scaling is usually done by templates, server image, snapshots, or command-line scripts predefined by the public cloud provider and will establish many virtual machines of the same specification. At present, some cloud providers may require the tenant to preset some servers as an auto-scaling group in advance, wherein only servers within the group have the auto-scaling function. Some cloud providers may provide the tenant the ability to conduct benchmarking for different server instance types. One of implementations may utilize measuring the service completion time to find out which server instance type has the best performance cost ratio, and then perform the auto-scaling by setting a policy, which may be threshold-triggered or time-triggered.
The existing dynamic server scaling technologies may be divided into two categories. One category is that public cloud providers provide a reactive instance allocation mechanism at infrastructure-level to serve a large amount of tenants. Such techniques measure the current memory usage or network usage of servers, and provide a variety of metrics for tenants to choose. Auto-scaling is based on a threshold value. The threshold value may be set by users (public cloud tenants), or by using default best practices. A load balancer adjusts the workload of these servers belongs to the scaling group. The other category is based on the application characteristics of each tenant itself to determine a service pressure at application-level, and set business logic through an application programming interface (API) of the public cloud providers. This category of technologies is mostly proactive and may predict future workloads. The reference metrics for the technologies may be a number queued data, an average response time of these data, a number of network connections and so on.
There is a technology that provides a tightly integrated automatic management including inter-cloud automation management, which allows users to set various templates, macros, scripts, etc., performance metrics may be arranged into an array, and the scaling logic is determined by the tenant itself. There is a technology that provides a two-dimensional matrix of these metrics to train an active artificial neural network. The artificial neural network will determine whether auto-scaling should take action or not. There is a technology that considers a navigation route when access a website, and finds out the route with the heaviest pressure and perform auto-scaling on related servers of the route. There is a technology that provides a two-tier application service solution, and this technology observes the reaction effectiveness of the first layer through a linkage system, to decide whether the second layer should scale-up. There is a technique that controls a load balancer to arrange and dispatch workload to other servers based on an overall flow state of the current virtual machines (VMs). Some technologies suggest turning off the VMs according to a billing cycle.
There is a technology that considers a best balance between a penalty fee and a saving cost by trying to break the service level agreements (SLA) with tenants. This technology may be used by multi-tier applications. The scaling method is based on predicting the application capacity and considering the cost model and the resource model. All requests will go through a service gateway or a load balancer. Most virtual machines (VMs) have a same general resource allocation, wherein part of these virtual machines has a lower resource allocation. When the application capacity needs to scale up, the virtual machines of the lower resource allocation are vertically scaled up to a general resource allocation. When the application capacity needs to scale down, a vertical or horizontal scaling is performed to scale down one or more virtual machines to the lower resource allocation.
In the existing server dynamic scaling technologies, some technologies do not estimate the impact to the service provider (the tenant) after turning off the server(s). Some technologies only turn off a machine selected from a group of machines according to the status of a previous server. Some technologies cannot completely control which server should take the workload even with a load balancer. Some technologies do not fully utilize characteristics of the public cloud for cost saving, such as different pricing of data centers, the least billing cycle of the public cloud where an hourly fee is still charged for less than one hour, the combination of multiple public cloud providers, and so on. Therefore, how to find an automatic way to minimize costs while maintaining satisfying viewing quality has become an important issue is a worthy topic.
The embodiments of the present disclosure may provide a method and system for dynamic instance deployment of public cloud.
An exemplary embodiment relates to a method for dynamic instance deployment of public cloud. The method may comprise: obtaining, by a load monitor, a current server deployment, and the current server deployment at least including, for each server of a plurality of servers, an identity information of said server, a number of current connections of said server, a server instance type of said server, and a located area of said server; determining, by a scaling engine, whether there is at least one server of the plurality of servers satisfies at least one trigger condition; adding, by the scaling engine, the at least one server that satisfies the at least one trigger condition into a server candidate set; and receiving, by the scaling engine, an information of a performance cost ratio, and performing, by the scaling engine, a server scaling procedure for at least one area according to the server candidate set.
Another embodiment relates to a system for dynamic instance deployment of public cloud. This system may comprise a load monitor and a scaling engine. The load monitor obtains a current server deployment, wherein the current server deployment at least includes, for each server of a plurality of servers, an identity information of said server, a number of current connections of said server, a server instance type of said server, and a located area of said server. The scaling engine determines whether there is at least one server of the plurality of servers satisfies at least one trigger condition, adds the at least one server that satisfies the at least one trigger condition into a server candidate set, receives an information of a performance cost ratio, and performs a server scaling procedure for at least one area according to the server candidate set.
The foregoing will become better understood from a careful reading of a detailed description provided herein below with appropriate reference to the accompanying drawings.
Below, exemplary embodiments will be described in detail with reference to accompanying drawings so as to be easily realized by a person having ordinary knowledge in the art. The inventive concept may be embodied in various forms without being limited to the exemplary embodiments set forth herein. Descriptions of well-known parts are omitted for clarity, and like reference numerals refer to like elements throughout.
According to the exemplary embodiments in the disclosure, a method and system for dynamic instance deployment of public cloud is provided. The technology for the method and system collects the deployment state of all servers currently in one or more public clouds, and performs efficiency measurement for considering services to the tenants (who lease servers from public cloud providers) on the one or more public clouds, so as to understand such as a number of connections and a located area etc., of each server instance type of various server instance types, wherein a public cloud has at least one server.
The tenant may calculate the performance cost ratio of each server instance type according to the numbers of connections of these servers. The tenant may set at least one trigger condition according to a service request. According to an exemplary embodiment of the disclosure, the server that satisfies one of the at least one trigger condition may be added into a server candidate set. When the situation that satisfies the trigger condition occurs, a server scaling procedure is performed for at least one area according to the inputted information of a performance cost ratio and the server candidate set.
According to an exemplary embodiment of the present disclosure, the at least one trigger condition may be set as one or more combinations of trigger conditions, wherein the trigger conditions may be described as follows. Triggers when one or more operation statuses of a server reaches a threshold value; or triggers at one or more o'clock sharps; or triggers when a server is going to finish a billing cycle within a time interval; or triggers periodically with a fixed time interval. For example, the at least one trigger condition may be set to trigger when an idle rate or a resource utilization rate of the CPU, the memory or the bandwidth of a server reaches a threshold value; or triggers at 2 o'clock sharp or at 3 o'clock sharp or at 5 o'clock sharp or at 12 o'clock sharp and so on, but not limit to trigger at every o'clock sharp; or triggers on every Wednesday; or triggers when a server is going to finish a billing cycle; or triggers every minute. The idle rate is generally defined as one minus the resource utilization rate.
According to an exemplary embodiment, the definition of the performance cost ratio is an averaged unit price required of each connection.
As aforementioned, when there is at least one server that satisfies the at least one trigger condition, a server scaling procedure of an area may be performed based on the inputted information of the performance cost ratio and the server candidate set. Examples for performing a server scaling up may be such as adding a server with a high performance cost ratio in an area, or adding a server of a smallest instance type, or adding a server of a largest instance type, or adding a server of a largest instance type with a maximum number of connections, and then wait for a next trigger condition. Examples for performing a server scaling down may be such as turning off a server with a lower resource utilization rate, or turning off a server with a low performance cost ratio, thereby resulting users reconnect to other servers with a high performance cost ratio.
When the number of users gradually decreased with the lapse of time, the number of idle servers is increased. According to an exemplary embodiment of the present disclosure, servers of low performance cost ratios may be turned off; thereby allowing users to reconnect to other servers with high performance cost ratios to save money. The trigger timing of the server scaling procedure is such as triggering when an idle rate of CPU, memory, or bandwidth, etc., reaches a threshold value (for example, takes a CPU idle rate of 80% and 20%, respectively, as upper and lower thresholds), or triggering at o'clock sharp, or triggering when any server is going to finish a billing cycle, or triggering per minute. The triggering may add all current servers into the server candidate set, or add the server which is going to finish the billing cycle into the server candidate set.
In
Accordingly,
The term “area” in the present disclosure may be such as the area divided by the geographical location, or the area divided by the round trip time (RTT) of a packet between a user equipment and a server.
According to an exemplary embodiment of the present disclosure, the information of the performance cost ratio at least includes information of the unit price of each connection corresponding to each server instance type in each area of at least one area, and information of the maximum number of connections corresponding to each server instance type in each area of the at least one area.
According to an exemplary embodiment, the server scaling procedure may be divided into two stages, wherein a first stage is intra-area server scaling, and a second stage is inter-area server scaling down. In other words, when there is a server that satisfies at least one trigger condition, an intra-area server scaling is performed for each area of the at least one area, and then an inter-area server scaling down is performed. According to the exemplary embodiments of the present disclosure, in the two-stage server scaling procedure, under the premise of without causing any inter-area connection, the first stage first minimizes the operating cost of servers within each area of the at least one area, thereby most users may be reconnected to the servers of the same area, and the server scaling procedure of the second stage may cause a small portion of users reconnect to servers in other areas. Thereby the server scaling procedure may achieve a balance on both saving the server cost and maintaining the user quality (in terms of reducing inter-area connections).
A target number of servers corresponding to a server instance type=The unassigned number of connections/the maximum number of connections corresponding to the server instance type.
The unassigned number of connections is updated as follows.
The unassigned number of connections=The unassigned number of connections
Mod the maximum number of connections corresponding to the server instance type;
wherein Mod is a modulo operation.
In step 720, there are many implementation schemes for assigning the target number of servers corresponding to each server instance type in the area. According to an exemplary embodiment, for example, one scheme may orderly assign the target number of servers corresponding to each server instance type in the area, from the lowest unit price to the highest unit price corresponding to each connection of a plurality of server instance types in the area. Assuming a server that is in the area and is going to finish a billing cycle (60 minutes) in t minutes is added to a server candidate set, or all servers in the area are added to the server candidate set (i.e., t=60). Then a server scaling procedure for the area may be operated as following: aggregating numbers of connections of all servers in the server candidate set as an unassigned number of connections, by orderly assigning the unassigned number of connections to a server instance type of the highest performance cost ratio (each connection corresponding to a server instance type has the lowest unit price). For example, a server of XL instance type has the highest performance cost ratio and assumed be able to support up to 800 connections, [the unassigned number of connections/800] servers of XL instance type are assigned first. After the assignment, the unassigned number of connections is updated to [the unassigned number of connections Mod 800]. When the updated unassigned number of connections has not yet come to zero, then this process continues to assign the unassigned number of connections to a next server instance type, until the unassigned number of connections becomes zero. If the unassigned number of connections is less than a maximum number of connections corresponding to the server instance type, then the target number of servers of the server instance type is added by 1. An active tenant wanting to save cost may adjust the formula as abandoning the unassigned number of connections, and use the target number of servers of the server instance type instead. There are many schemes to implement this fine-tuning which is not contrary to the spirit of starting the assignment from server(s) of a high performance cost ratio. At this time a target deployment of an area has been completed (the target deployment also includes the number of servers corresponding to each server instance type in the area). Performing an adjustment according to a number difference between the target deployment and a current number of servers in the area may increase or decrease the servers of various instance types. When increasing at least one server is needed, the scaling engine 420 may directly increase the at least one server. When turning off at least one server is needed, the scaling engine 420 may use, but not limited to, a minimum edit distance (Levenshtein) as a principle for performing the adjustment of the number of servers, based on the number of current connections of the server. For example, if one of two servers of the same XL instance type is needed to be turned off, then the server currently with a fewer number of connections is chosen.
According to the aforementioned exemplary embodiments,
Therefore a server of instance type XL, a server of instance type L, and a server of instance type S should be turned off, according to the number differences between the target deployment and the current number of servers in the area. When turning off a server, the server of the same instance type with a minimum edit distance may be considered. For example, currently three servers of the same instance type XL are available for selection. According, the server(s) of instance type XL having the lowest number of current connections may be chosen to be turned off. Thereby, the server of instance type XL whose instance ID is i-PSRHEDNF (server of instance type XL having the lowest number of current connections), the server of instance type L whose instance ID is i-PHAQQQYT, and the server of instance type S whose instance ID is i-KGMUCWEE (server of instance type S having the lowest number of current connections) are turned off, as shown in
According to an exemplary embodiment, performing the scaling procedure in the second stage of inter-area server scaling down is based on the idle rates or the resource utilization rates of all servers in the server candidate set 422. For example, performing the scaling down may be based on the idle rates (from a highest to a lowest idle rate) of these servers or based on the resource utilization rates (from a lowest to a highest resource utilization rate) of these servers. One calculation method for the resource utilization rate of a server is such as the following exemplary formula:
The resource utilization rate=the ratio of the number of current connections of the server to the maximum number of connections corresponding to the server instance type of the server.
In other words, the inter-area server scaling down may determine whether to turn off a server according to a total of all maximum numbers of connections corresponding to all server instance types of all servers in the server candidate set, a total of numbers of current connections of all servers in the server candidate set, and the maximum number of connections corresponding to a server instance type of said server.
According to the technique for dynamic instance deployment of public cloud in the exemplary embodiment, the inter-area connections may be generated after the inter-area scaling down in the second stage. If a tenant does not want to generate any inter-area connection, the scaling engine 420 may be set to not perform the inter-area server scaling down procedure, but this may get a poor result for cost saving.
Referring to
In summary, according to the exemplary embodiments of the disclosure, a method and system for dynamic instance deployment of public cloud is provided. The technique for dynamic instance deployment of public cloud uses a load monitor to obtain a current server deployment running on the public cloud to provide to a scaling engine. The scaling engine uses a trigger condition scheme to trigger a server scaling procedure, and dynamically adjusts the target number of servers for each server instance type, thereby reducing the operating cost of servers while maintaining the service quality of the tenant. This technique may run on a single public cloud, also may run across on a plurality of public clouds.
It will be apparent to those skilled in the art that various modifications and variations can be made to the disclosed embodiments. It is intended that the specification and examples be considered as exemplary only, with a true scope of the disclosure being indicated by the following claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
103114547 | Apr 2014 | TW | national |