The invention relates to load balancing in general and in particular to load balancing in a multiple server system yielding uniform response time for a particular service regardless of the server performing the service.
Conventional load balancing systems are tailored for a single service provider. However, in emerging multi-server systems that are located in massive data centers operated by a network provider, server resource is a commodity that can be bought, leased or rented by any service provider. While current load balancing systems achieve load balancing at the level of a service, different services would most likely run at different load levels. A multi-server environment would require a load balancing system capable of balancing the traffic destined to each service between the servers hosting the service. For example, it may be desirable to run web proxy, WAN acceleration, anti-virus scanning, IDS/IPS tools and firewalls within the data center. However, the data center may not have dedicated computing resources to exclusively support the maximum load for each of these services.
Various deficiencies of the prior art are addressed by the present embodiments including a method and system provide for load balancing in a multi-server environment hosting multiple services. Specifically, the method according to one embodiment comprises: determining, an induced aggregate load for each of the multiple services in accordance with corresponding load metrics; determining, the maximum induced aggregate load on a corresponding server to generate a substantially similar QoS for each of the plurality of services; and distributing, the multiple services across the multiple servers in response to the determined induced aggregate and maximum induced aggregate loads, wherein the QoS for each of the multiple services is substantially uniform across the servers.
In another embodiment, a method comprises the steps of: determining, the QoS for each of the multiple services running on a corresponding server; and transmitting, a new request for service to the server with the best QoS for the corresponding service.
In yet another embodiment in a system having at least one load balancing server communicatively coupled to at least one server supporting multiple services, each load balancing server is adapted to distribute the multiple services wherein the QoS for each of the multiple services is substantially uniform across one or more servers supporting a corresponding service. One or more networked servers are adapted to compute the respective induced aggregate load and the maximum induced aggregate load for each of multiple services supported by the servers.
The teachings of the present embodiments can be readily understood by considering the following detailed description in conjunction with the accompanying drawings, in which:
To facilitate understanding, identical reference numerals have been used, where possible, to designate identical elements that are common to the figures.
The next generation of hosted environments is modeled on the premise that a particular server can run more than one service, for example, as one virtual machine per service on a physical server. Therefore, it is desirable to support more services using the same set of servers, since it is unlikely that these services will all be overloading the servers at the same time. Paradoxically, the response time for each service is dependent on the load of the server.
Since current systems apply their load balancing metric to only one service, the above condition cannot be satisfied using the current state-of-the-art. The present embodiments depart from the conventional paradigm and provide for a single server supporting multiple services, while simultaneously applying load balancing concepts on the aggregated services across multiple servers.
The distribution of the services can be effected such that all servers running this service instance experience the same load. This mechanism exploits the multiplexing effect that can be achieved. The foregoing articulated objective is not satisfied using the current state-of-the-art load balancing, because current systems apply their load balancing metric to only one service. Therefore, what is needed is a system that is adapted to run multiple services on a single server, yet allowing the load balancing concepts to be applied on the aggregated services across multiple servers.
The present embodiments are primarily described within the context of load balancing in a multiple server system supporting multiple services; however, those skilled in the art and informed by the teachings herein will realize that the invention is also applicable to other technical areas and/or embodiments.
One embodiment allows for overlapping services on a single server. Other embodiments provide an array of servers wherein each server is adapted to host different sets of services such that the response times for a service is independent of the server supporting the particular service. In addition, overlapping services on a single server facilitates the use of the multiplexing benefits to support a large number of services on relatively a few servers. This translates to capital (capex) and operational expenditures (opex) savings in the form of reduced infrastructure, lower management costs, less power consumption, etc.
Existing solutions require that a server exclusively supports only a single service. A load balancer that balances among multiple servers essentially interfaces to only a disjointed set of servers for each service. Existing solutions are ill suited to implement the multiple services on a single server model, while load balance them effectively and contemporaneously providing improved Quality of Service (QoS).
The embodiments herein disclosed depart from the traditional QoS paradigm. Traditionally, QoS refers to the capability of a network to provide better service to selected network traffic over various technologies including Ethernet, Frame Relay, Asynchronous Transfer Mode (ATM) etc. The primary goal of QoS is to provide priority including dedicated bandwidth, controlled jitter, latency and improved loss characteristics. Fundamentally, QoS enables a system to provide better service to certain flows.
The load induced on a server or exerted by a certain service can be measured in the form of active connections, central processing unit (CPU) load, memory consumption, free memory, input/output (I/O) bandwidth consumption, network throughput, or any combination thereof. Each of the above metrics can either be expressed as an absolute number or as a percentage of the maximum possible value. It will be understood by an artisan of ordinary skill in the art that the present embodiments are not limited to these load metrics, but that other load metrics can be considered, e.g., geographic location, queue overflow, congestion, traffic shaping and policing.
The present embodiments provide at least the following advantages over the prior art.
The load metric of a server is sent to the load balancing system. In addition, the f_i( ) for service s_i is available at all servers running this service. If f_i( ) is not available at the load-balancing system, an alternative solution is hereafter articulated.
As expressed above, the load balancer needs to be able to balance the traffic, while ensuring that the load on the servers is nearly the same. To illustrate this concept, consider a set of n services S={s_i}, i=1, 2, . . . , n. Let there be m servers, numbered from 1 to m. Let each service s_i run on a set of servers P_iε{1,2, . . . ,m}. Let the load on server j due to service i be denoted by l(i,j). Current load balancing systems ensure that l(i,j)=l(i,k), for all j,kεP_i. However, this is useful only if each server runs at most one service, where
if jεP_i′. The next generation of hosted environments is modeled on the premise that a particular server can run more than one service (for example, as one virtual machine per service on a physical server). This implies that
for all j,kεP for any service s_iεS. In other words, for any particular service running in the multi-server environment, considering the servers running the particular service, the aggregate load on these servers from all the services that they are supporting should be the same.
In one embodiment, the response times are extrinsic to the load balancer. In another embodiment, the response times are intrinsic to the load balancer. In the extrinsic case, the load balancing system extrapolates the distribution function for each service based on two main components: (A) at the individual servers; and (B) at the load balancing system.
Given a load metric, the induced load is computed for each service s_i on each server j, and is denoted as l(i,j). The response time for service s_i running on server j is denoted by r(i,j). The goal is to ensure that r(i,j)=r(i,k)=R(i), for all j,kεP_i, and this relationship to also hold true for all s_iεS. Note that R(i) is variable, and is not necessarily a pre-determined constant.
Let the aggregate load on server j be
This presumes that the load metric is additive across services, which is true for all of the metrics described earlier in this section, and also for most other metrics. For this server, r(i,j)=f_i(L(j)). Since the function f_i( ) is non-decreasing and monotonic, the maximum aggregate load on this server that generates the same response time for this service is computed. This is given by
M(r(i,j))=max{L(j)|f(L(j))=r(i,j))}. Note that M(r(i,j))>=L(j). The maximum acceptable load that server j can handle without changing r(i,j) for any service s_i running on j is given by
which by definition is at least L(j).
Each server sends L(j) and L_max(j) to the load-balancing system. This computation is periodically performed with period T seconds, or upon the receipt of K requests, and the load balancing system is updated accordingly. It will be understood by an artisan of ordinary skill in the art that the invention is not limited to these two options, but that other variations are possible, e.g., polling, interrupt driven, or that the date is provided by any extrinsic entity under suitable communications regime.
Both L(j) and L_max(j) are sent to the load-balancing system which implements algorithm ‘X’. Algorithm ‘X’ (one of the currently available load-balancing algorithms that can provide load balancing for a single service) is applied to each incoming packet request. It determines which servers are running this service and the service type for the request. Among all the servers running this service, if there exists a single server j such that the load condition L(j)<L_max(j) is satisfied, then the request is sent to server j. If there are multiple such servers satisfying this condition, any one of these servers can be selected using one of the following policies: random, least-server-id (each server has a numeric id. The least-server-id is defined as the lowest numbered id among all servers present, and refers to the server that has this id), last-server-selected, or round robin and this request is sent to the selected server.
Alternatively, if, for all servers running this service, L(j)=L_max(j), then Algorithm ‘X’ is applied to determine which server should now receive the packet.
The storage requirements for this algorithm at the balancing system are proportional to the number of servers denoted by O(m). This is also the total communication overhead of the load-balancing system with the servers.
The load balancing system can also implement QoS management in evaluating QoS policies and goals. One of the ways to evaluate the response time is by testing (e.g., ping) the response of a targeted server to see whether the QoS goals have been achieved.
In another embodiment, the response times are intrinsic to the load balancer. Under that condition, the response time of each service on a server can by itself be a load metric if this measure can be known to the load balancing system. Typically, the response times for each service on a server, r(i,j), is sent by the server to the load balancer. In this case, the load balancing algorithm simply sends a new request of service s_i to the server with the least response time for service s_i, among all the servers that run s_i. If there are multiple such servers satisfying this condition, any one of these servers can be selected. The following policies are used in the selection of a server: random, least-server-id, last-server-selected, or round robin. The generated request is sent to the selected server.
This implies that the load balancing system has to keep track of r(i,j) for all possible combinations of service s_i and server id j. The computations performed in the above embodiment are not necessary, since the response time metric is not additive. However, the storage requirements for this algorithm at the balancing system are proportional to the product of the number of services and the number of servers, O(mn). This is also the total communication overhead of the load-balancing system with the servers.
In yet another embodiment, the system incorporates a seamless server failover component. The load balancing system has the capability to detect the status of a server (failed or operational). When a failure is detected, the failed server's state and operations are moved to a backup server. In order to ensure that incoming packets are seamlessly redirected to this new server, existing balancing-tables that map flow identifiers to server id must be updated to reflect the new server's id. This task can consume a lot of time since these flow balancing tables can be very large, and can lead to requests getting lost if they arrive before the update is completed. Instead, a hitless instant update scheme ensures this re-mapping is done efficiently with no packet loss.
The load balancing system has a Flow Balancing table which specifies the target server for a particular redirected flow. It consists of two columns: a ‘flow identifier value’ and a ‘server-id’ field. As a prophylactic measure, a separate table called the Server Mapping Table consisting of two columns: a ‘virtual server id’ and a ‘physical server id’ is created. The ‘server-id’ column of the Flow Balancing table is modified to now contain a ‘virtual server id’. The Flow Balancing Table and Server Mapping Table are modified to show how the physical server id of a failed server is updated to that of the backup server.
Every request that is received by the load balancing system now involves two table lookups as opposed to the one lookup in contemporary systems. The ‘virtual server id’ corresponding to the flow identifier of the request is determined from the Flow Balancer Table, and this virtual server id is now used to look up the physical server id from the Server Mapping Table as illustrated below.
When there is a server failover from primary server to backup server, the physical server id of the failed server is updated to that of the backup server in the Server Mapping Table. For example, if server ‘6’ failed, then the server farm will redirect traffic originally destined to the failed server to a replacement (alternate) server. If server ‘2’ is chosen as the replacement server, the Server Mapping Table is subsequently modified to show the virtual server id corresponding to the failed server. By performing this single update operation, which can be done automatically, all subsequent requests that referred to the failed server will now be redirected by the load balancing server to the backup server. This ensures that the load-balancing service will not be degraded during the failover process. Thus, the load balancer fails over instantaneously with all traffic destined to virtual server ‘1’ now moving to the new server. The time it takes to accomplish this switch is the time needed to modify the entry of the failed server, which can be accomplished in less than a few microseconds.
In other embodiments, the re-routing is done to any server that is currently known to be running, including ones that are already mapped to some virtual server id. In other words, the following is also possible:
While the foregoing is directed to various embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof. As such, the appropriate scope of the invention is to be determined according to the claims, which follow.