The embodiment below relates to a load distribution system.
With the recent development of computers and networks, a service is being implemented in which data previously stored in, e.g., a local hard disk is stored in a memory device of a server system connected via a network. This service is called a cloud service. In such a cloud service, a large number of servers are provided, and processing requests from users are distributed to the servers. At this time, load distribution needs to be performed in order to prevent processes from being concentrated on one server. The load on a server depends largely on a time from when the server issues an I/O processing request to a memory device to when the I/O processing request is processed, i.e., the processing speed of the memory device. If the memory device processes the I/O processing request quickly, the server having sent the I/O processing request finishes processing early, which reduces the load. Thus, the load on the server is determined by a response time of the memory device.
A system in
In a convention method for load leveling, the loads on the I/O processing servers 14-1 to 14-3 that perform actual processing are levelled using the load distribution device 12.
Assume a case where a given I/O processing server performs writing to a given storage device. Even when an I/O processing request is issued from an access server, if the load on the storage device is high, and a response is slow, it takes a long time for the I/O processing server to perform processing. Thus, the load on the I/O processing server can be considered as high in this case. The load here refers to the overall load on the system side that includes the I/O processing server and the storage device, as viewed from the access server. If an access server desires to access given data, the access server never accesses the data without access to a storage device holding the data. If the traffic to a given storage device is heavy, a load cannot be distributed. However, in the sense of distribution of the load on a system including an I/O processing server and a storage device as viewed from an access server, the load on the I/O processing server may be distributed. A plurality of I/O processing servers and a plurality of storage devices are interconnected, and different I/O processing servers can gain access to a single storage device. The load on an I/O processing server can be considered to refer to a response time from when an I/O processing request is transmitted to a storage device to when processing of the I/O processing request is completed. Thus, if a response time of a given I/O processing server is long, an I/O processing request is sent to a different I/O processing server with a shorter response time. This allows distribution of the load on a system including an I/O processing server and a storage device even in a case where access servers try to gain access to the same storage device.
With the above-described configuration, if the load on any of the I/O processing servers 14-1 to 14-3 becomes high, the load distribution device 12 distributes shares of the load to the others of the I/O processing servers 14-1 to 14-3. This configuration suffers from the two problems below.
High Load on Load Distribution Device 12
If the number of access servers 10-1 and 10-2 increases, the load on the load distribution device 12 may become higher to cause a bottleneck in processing by the load distribution device 12.
Abnormality in Load Distribution Device 12
If an abnormality occurs in the load distribution device 12, and the load distribution device 12 goes down, processing of all I/O processing requests stops.
For transfer of processing of an I/O processing request in the event of an abnormality, a conventional method adopts a multipath method.
In an access server 10, a plurality of paths (access paths (1) and (2)) to the I/O processing server 14-1 and the I/O processing server 14-2 are defined in advance. Assume that an abnormality has occurred in the I/O processing server 14-1 when the I/O processing servers 14-1 and 14-2 are performing processing of I/O processing requests. In this case, the access server 10 continues processing by switching an access path from access path (1) to access path (2).
The method suffers from the problem below.
Assume a situation where there are a plurality of access servers, and an abnormality occurs in any one of I/O processing servers when the access servers are using the same I/O processing servers. In this situation, since each access server performs path switching, the load on a specific I/O processing server may become high after the path switching.
Provision of a mechanism by which access servers adjust the loads on I/O processing servers can serve as measures against increase in the load on a specific I/O processing server. However, this case requires some communication means between access servers. In light of a cloud service and the like, individual access servers are likely to be managed by different companies. Communication between access servers may cause a leakage of secrets of a company to a different company. The method in
In some conventional techniques, a storage management server which manages information on a server, an application running on the server, a storage device, an access path, and the like is provided to perform load distribution. In other conventional techniques, a controller in a storage subsystem monitors the load status of each connection port, and load distribution is performed on a result of the monitoring.
According to an aspect of the embodiment, a load distribution system includes a plurality of storage devices which accept an input processing request or an output processing request and return a processing result, a plurality of I/O processing servers which transmit the input processing request or the output processing request to the one of the plurality of storage devices, receive the processing result, and, when a response time from the transmission of the input processing request or the output processing request to completion of processing of the input processing request or the output processing request exceeds a threshold, send out an overload response indicating that processing which deals with the input processing request or the output processing request is in an overloaded state, and a plurality of access servers which transmit an input processing request or an output processing request from a user to an I/O processing server which is not in an overloaded state on the basis of the overload response from the I/O processing server.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention.
In an environment with a large number of access servers and a large number of I/O processing servers, when the configuration in
In the method (
There is thus a need for different measures to prevent issued I/O processing requests from being concentrated on a specific I/O processing server.
The embodiment below is applied to, for example, a system (e.g., a cloud system) with a large number of servers and a large number of storages. The embodiment provides a system which has uniformity in service achieved by leveling the load on the whole system and has reliability high enough to continue a service even in the event of a failure in any piece of equipment at the time of load leveling.
To this end, I/O processing request information which an access server sends to an I/O processing server and response information which the I/O processing server returns to the access server are extended.
If an I/O processing server is connected as an iSCSI target device, a request from an access server is transmitted as a command which is obtained by extending a SCSI command. A response from the I/O processing server to the access server is implemented by extending response information of a SCSI command. (If an I/O processing server is connected as a Fibre Channel target, a request from an access server and a response from the I/O processing server to the access server are both implemented as extensions of a Fibre Channel command. Since ways for implementation in the both cases are the same, the case of iSCSI will be described as an example.)
An object of load distribution here is a whole system including I/O processing servers and storage devices, as viewed from an access server. If an access server desires to access given data, the access server needs to gain access to a storage device in which the data is stored. In this case, even if the storage device is highly loaded and is not easily accessible, the access server cannot gain access to a storage device without the data desired to be accessed. However, since a plurality of storage devices and a plurality of I/O processing servers are interconnected, switching from one I/O processing server accessible to a desired storage device to another can be performed. Although the loads on storage devices cannot be distributed, the loads on I/O processing servers which transmit I/O processing requests to the storage devices can be distributed. Distribution of the loads on I/O processing servers allows uniformization of the load on a system including the I/O processing servers and storage devices, as viewed from an access server. In this case, the load on each I/O processing server corresponds to a response time from issuance of an I/O processing request to a storage device to completion of the I/O processing request. It is thus possible to perform load distribution by switching from an I/O processing server with a longer response time to an I/O processing server with a shorter response time.
Load distribution is performed in the manner below.
Briefly speaking, if an I/O processing server becomes overloaded, the I/O processing server gives a response to the effect that the I/O processing server is overloaded to an I/O processing request (request) from an access server. Upon receipt of the response, the access server distributes the I/O processing request to a different I/O processing server. To which storage device each I/O processing server is connected at the start of operation of a system is determined at the time of startup of the system. This is to prevent the traffic to some I/O processing servers from becoming heavy at the start of the operation.
Since an access server performs distribution by itself on the basis of the load statuses of I/O processing servers, even if there are a plurality of access servers, communication and adjustment between the access servers are unnecessary.
That is, an I/O processing server determines, on the basis of a response time from when the I/O processing server receives an I/O processing request from an access server to when the I/O processing server returns a response from a storage device to the access server (including a time for processing in the server itself), whether the server itself is overloaded. If the I/O processing server is overloaded, the I/O processing server notifies the access server that the I/O processing server is overloaded. Upon receipt of the notification to the effect that the I/O processing server is overloaded from the I/O processing server, the access server transmits the I/O processing request to a different I/O processing server.
An access server transmits a command for a load information response to all I/O processing servers in order to check the loads on storage devices connected to I/O processing servers and on the I/O processing servers. The access server asks an I/O processing server under a lowest load to process an I/O processing request, to which no response has been made or which has resulted in an error, on the basis of responses to the command.
Even in load distribution when an access server receives a response indicating overload from an I/O processing server, the access server transmits a command for a load information response to all I/O processing servers. If the access server asks an I/O processing server under a lower load to perform I/O processing, on the basis of a result of the transmission, more efficient load distribution can be performed.
From the foregoing, efficient distribution of loads of I/O processing requests in a cloud system allows improvement in the performance of the whole system. Although use of a large number of servers enhances the probability of a failure in any of I/O processing servers, a service provided by a cloud system can be continued even in the event of a failure.
Note that if an abnormality occurs in an I/O processing server, an access server can sense the abnormality as no response to an I/O processing request or an error. In this case, the access server issues an I/O processing request to a different I/O processing server.
The overall system includes a plurality of access servers 20-1 to 20-n (physical servers for providing a cloud service; a large number of virtual machines are made to run on the access servers). The overall system further includes a plurality of I/O processing servers 21-1 to 21-m (servers which process I/O processing requests issued by the access servers) and a plurality of storage devices 22-1 to 22-N. These devices are connected by networks 23 and 24.
Each two of the access servers 20-1 to 20-n do not have a communication path for management in terms of security ensuring, and communication for adjustment of the loads on the I/O processing servers is not performed.
Each two of the I/O processing servers 21-1 to 21-m do not have a communication path for management in terms of security ensuring, and communication for adjustment of the loads on the two I/O processing servers is not performed.
An access server and an I/O processing server can each be represented with the same block diagram. An I/O acceptance unit 30 accepts a transmitted I/O processing request (request). If the device in
An I/O time monitoring unit 32 monitors a time from acceptance of an I/O processing request to reception of a response to the I/O processing request and return of the response to an access server and updates a management table of a management table storage unit 33, as needed. The I/O time monitoring unit 32 includes L counters 35-1 to 35-L corresponding in number to I/O processing servers. When an overload response to a response is not returned, a corresponding counter counts up to update a value in a counter value holding unit of the management table. The number L is equal to m corresponding in number to I/O processing servers if the device in
The management table will be described later. A management table monitoring unit 34 monitors a threshold level which is registered in the management table stored in the management table storage unit 33. An I/O issuance unit 31 is intended to transfer an I/O processing request accepted by the I/O acceptance unit 30 after a time of acceptance of the I/O processing request is registered. If the device in
An I/O processing request is issued from a user application, and a storage device is notified of the I/O processing request via an access server and I/O processing server (1). I/O processing server (1) receives a response to the I/O processing request and measures a response time from when the I/O processing server receives the I/O processing request from the access server to when a response from the storage device is returned to the access server. If the response time does not exceed a threshold which is set in the server itself currently, a response indicating normalcy is returned from the I/O processing server to the access server and the user application. On the other hand, if the response time exceeds the threshold set in the server itself currently, it is determined that there is an overload. Since the response time includes a time from the issuance of the I/O processing request to completion of processing of the I/O processing request, the response time includes a response time of the storage device and a processing time of the I/O processing server. Thus, if the response time exceeds the threshold, it is surmised that there is an overload in one or both of the storage device and the I/O processing server.
If I/O processing server (1) detects an overload for the I/O processing request to the storage device as an access destination, the I/O processing server returns a response to the access server as an issuer of the I/O processing request and notifies the access server of an overloaded state. Upon receipt of overload information from the I/O processing server, the access server distributes a subsequent I/O processing request to a different I/O processing server at the time of issuance of the I/O processing request. The distribution of an I/O processing request to a different I/O processing server is called reselection processing in
If an error occurs in an I/O processing server, an I/O processing request issued by an access server results in an error. The access server is considered to be capable of sensing an abnormality in the I/O processing server at this time. Upon sensing of the abnormality in the I/O processing server, the access server executes reselection processing and reissues an I/O processing request to a different I/O processing server. With this configuration, a service can be continued even when an abnormality occurs in an I/O processing server.
In a process of reselecting an I/O processing server as an I/O processing request issuance destination in step S40, management tables for I/O processing servers are referred to, and an I/O processing server with a lowest threshold level (to be described later) is selected as a server as an I/O processing request reissuance destination among from I/O processing servers meeting the requirement that an I/O size and an I/O frequency be not more than thresholds. With this selection, an I/O processing request can be issued to an I/O processing server which is determined to be under a lower load.
If an I/O processing server recovers from an abnormality, the I/O processing server is set again as an object of I/O issuance in an access server. With this setting, the I/O processing server after the recovery becomes an object of load distribution again.
An access server manages, for each I/O processing server, a threshold for a size of an I/O processing request which can be issued and a threshold for an I/O frequency. At the time of I/O processing request issuance, an I/O processing request with a size larger than a threshold (or an I/O processing request with a frequency above a threshold) of an I/O processing server previously asked is made to a different I/O processing server. If there is a response indicating an overloaded state from an I/O processing server, thresholds are decreased in small steps (if there is no response indicating an overloaded state, a threshold size is increased in small steps). This reduces the processing load on an I/O processing server.
The management table in
The processing in
Referring to
In step S13, the access server reselects an I/O processing server as an I/O processing request issuance destination according to
In step S16, the access server increases the value of the threshold level by 1. In step S24, the access server initializes the counter 35 and a counter value holding unit of the management table. In step S17, the access server performs I/O processing server sorting (to be described later) and ends the processing. In step S22, the access server determines whether the threshold level set in the level value setting unit of the threshold definition table for the I/O processing server, with which the access server is currently dealing, is 1. The determination is to prevent the counter from counting up if the threshold level is 1. The reason for the prevention is that the threshold level cannot be decreased any more in a process of decreasing the threshold level by 1 in step S20 (to be described later). If the determination in step S22 is NO, the flow advances to step S18. On the other hand, if the determination in step S22 is YES, the flow advances to step S19. In step S18, the counter 35 counts up, and the access server increases a value of the counter value holding unit of the threshold definition table by 1 accordingly. In step S19, the access server determines whether the counter has exceeded a definition value. If the determination in step S19 is NO, the access server ends the processing. On the other hand, if the determination in step S19 is YES, the flow advances to step S20.
In step S20, the access server decreases the threshold level by 1. In step S23, the access server resets the counter 35 and initializes the counter value holding unit of the management table. In step S21, the access server performs the I/O processing server sorting (to be described later) and ends the processing.
Before the start of the processing in
The I/O processing server sorting is to change the order of the threshold definition tables in the management table including a plurality of threshold definition tables.
In the I/O processing server sorting, threshold definition tables are arranged in ascending order of a threshold level held in a level value setting unit of a threshold definition table, a list is produced so as to correspond to I/O processing servers, and a collection of tables, in which the plurality of threshold definition tables are arranged as a list, is set as a management table.
In step S25, a threshold level changed in step S16 or S20 of
By sorting the I/O processing server management table managed by the access server in ascending order of threshold level, an I/O processing server under a lowest load becomes the top of the list in the management table.
State (1) in
In the sorting, since the threshold definition table for I/O processing server (4) comes before the threshold definition table for I/O processing server (2), the threshold definition tables need to be interchanged. State (2) in
Note that, as a method for the management table list sort processing, a process of providing a pointer to a next table in each threshold definition table and changing a value of the pointer may be used instead of the method in
An I/O processing server manages a processing time from acceptance of an I/O processing request to completion of I/O processing. If a processing time exceeds a threshold in a management table which is held by the I/O processing server, the I/O processing server returns an overload response to an access server. That is, if a response time of a storage device (including a time for processing in the server itself) exceeds the threshold, the I/O processing server returns an overload response to the access server. An initial value of the processing time threshold is set in advance to a response time geared to a response time of the storage device. If a processing time exceeds the threshold, the response time threshold is increased in small steps (the threshold is decreased in small steps if the processing time does not exceed the threshold). With this configuration, frequent return of an overload response is avoided. For example, if I/O processing server (1) is in an overloaded state, and a different I/O processing server (I/O processing server (2)) is also in an overloaded state, an I/O processing request to I/O processing server (2) may be distributed again to I/O processing server (1). If I/O processing server (1) is still overloaded at this time, all (only two) I/O processing servers make an overload response. If such a situation persists, an overload response is made to every I/O processing request, which increases exchanges between an access server and I/O processing servers. To avoid this, a threshold is changed in small steps.
As illustrated in
The management table in
When an I/O processing server accepts an I/O processing request from an access server in step S30, the I/O processing server issues the I/O processing request to a storage device in step S31. In step S32, the I/O processing server receives a response to the effect that the I/O processing request is completed from the storage device, returns the response to the access server, and completes basic I/O processing. In step S33, the I/O processing server calculates a response time from a time of the acceptance of the I/O processing request and a time of the completion. In step S34, the I/O processing server refers to a management table (
In step S40, the I/O processing server makes a response to the effect that the I/O processing server is overloaded to the access server. In step S36, the I/O processing server increases the threshold level by 1. In step S43, the I/O processing server resets the counter 35, initializes a value in a counter value holding unit of the management table, and ends the processing. In step S41, the I/O processing server determines whether the threshold level in the level value setting unit of the threshold definition table is 1. The determination is to prevent the counter from counting up if the threshold level is 1. The reason for the prevention is that the threshold level cannot be decreased any more in a process of decreasing the threshold level by 1 in step S39 (to be described later). If the determination in step S41 is NO, the flow advances to step S37. On the other hand, if the determination is YES, the flow advances to step S38. In step S37, the I/O processing server makes the counter 35 count up and increases the value in the counter value holding unit of the management table by 1. In step S38, the I/O processing server determines whether the counter has exceeded a definition value. If the determination in step S38 is NO, the I/O processing server ends the processing. On the other hand, if the determination is YES, the flow advances to step S39. In step S39, the I/O processing server decreases the threshold level by 1. In step S42, the I/O processing server resets the counter 35, initializes the value in the counter value holding unit of the management table, and ends the processing.
When an access server is to reselect an I/O processing server, the access server refers to values of thresholds in a management table and reselects an I/O processing server with largest thresholds (which is considered to be under a lowest load) as an I/O processing request issuance destination.
Referring to
In step S47, the access server determines whether thresholds are larger than values of the distribution destination. If the determination in step S47 is NO, the access server continues the loop in step S46. On the other hand, if the determination is YES, the flow advances to step S48. In step S48, the access server sets an I/O processing server selected in step S47 as the distribution destination and ends the processing.
As another method by which an access server reselects an I/O processing server, it is possible to provide a mechanism by which an access server transmits a load information check command to an I/O processing server, and the I/O processing server returns the number of I/O processing requests being processed by the server itself. An access server can refer to such a response and set an I/O processing server with a smallest number of I/O processing requests as an object of reselection.
If there is an overload response from I/O processing server (1), a threshold level is changed by step S16 in
If an I/O response time for storage device (2) exceeds a threshold, a threshold level is increased by step S36 in
If reselection is performed by checking load information with an I/O processing server by an access server at the time of I/O processing server reselection, a counter which manages the number of I/O processing requests being processed by each I/O processing server itself is prepared in advance in the I/O processing server, in addition to the table in
For an access server, an I/O processing request source in
A management table stored in the management table storage unit 33 in
The management table monitoring unit 34 in
The management table monitoring unit issues a test I/O processing request (dummy I/O processing request) to an I/O issuance destination (an I/O processing server in the case of an access server or a storage device in the case of an I/O processing server) to which an I/O processing request has not been issued for a fixed time. If the I/O processing request is normally processed (if there is no overload response to the I/O processing request in the case of the access server or if a response to the I/O processing request is received within a response time in the case of the I/O processing server), thresholds (a threshold) in a management table are (is) changed. The threshold change is performed according to
In step S50, the management table monitoring unit determines whether an I/O processing request has been issued to an overloaded I/O processing server within a fixed time. If the determination in step S50 is YES, the management table monitoring unit ends processing. On the other hand, if the determination in step S50 is NO, the management table monitoring unit issues a test I/O processing request in step S51. In step S52, the management table monitoring unit determines whether a response to the test I/O processing request is normal. If the determination in step S52 is NO, the management table monitoring unit ends the processing. On the other hand, if the determination in step S52 is YES, the management table monitoring unit changes thresholds and ends the processing.
A response data format for a SCSI command in
Note that a case using a Fibre Channel command is the same as the case of SCSI except that an optical fiber is used as a path for transferring a SCSI command and that a code value defined as being vender specific is used as a response indicating an overloaded state.
In checking of a load status, access server (A) issues an INQUIRY command to I/O processing servers (1) and (2). In each of I/O processing servers (1) and (2), load information is put into a vender specific area of a command and is returned as a response to INQUIRY.
In a loop in step S55, an access server repeats processing times corresponding in number to I/O processing servers. In step S56, the access server issues INQUIRY. In step S57, the access server accepts a response to INQUIRY. In step S58, the access server puts, into a variable (e.g., n), the number of I/O processing requests being accepted by an I/O processing server per unit time that is received through the response. In step S59, the access server refers to a threshold definition table for the I/O processing server, with which the access server is dealing, in a management table held by the access server. In step S60, the access server determines whether n exceeds an I/O frequency threshold. The threshold definition table is the same as that in
If the determination in step S60 is NO, the access server repeats the loop in step S55. On the other hand, if the determination in step S60 is YES, the access server decreases a value of the threshold level in the level value setting unit of the threshold definition table in step S61 such that a value of n is not more than the I/O frequency threshold. In step S62, the access server performs I/O processing server sorting. After that, the access server repeats the loop in step S55. If processing is over for all I/O processing servers in the loop in step S55, the access server ends the processing.
An access server and an I/O processing server are each implemented as a computer 39 including a CPU 40.
A ROM 41, a RAM 42, a communication interface 43, a memory device 46, a media reader 47, and an input/output device 49 are connected to the CPU 40 via a bus 50. The CPU 40 loads and executes a basic program, such as a BIOS, stored in the ROM 41, thereby implementing basic operation of the computer 39.
The CPU 40 deploys a program stored in the memory device 46, such as a hard disk, which performs processing according to the present embodiment onto the RAM 42 and executes the program, thereby implementing processing according to the present embodiment. A program which performs processing according to the present embodiment need not be stored in the memory device 46 and may be stored in a portable recording medium 48, such as a CD-ROM, a DVD, a Blu-ray disc, an IC memory, or a flexible disk. In this case, the program stored in the portable recording medium 48 is loaded using the media reader 47 and is deployed onto the RAM 42, and the CPU 40 executes the program.
Examples of the input/output device 49 include a keyboard, a tablet, a mouse, a display, and a printer. The input/output device 49 is used by a user operating the computer 39 to make an input and output a processing result.
The communication interface 43 accesses a database or the like of an information provider 45 via a network 44 and downloads a program or the like onto the computer 39. The downloaded program is stored in the memory device 46 or the portable recording medium 48 or is directly deployed onto the RAM 42 and is executed by the CPU 40. Execution of the program may be performed by a computer of the information provider 45, and the computer 39 may perform only input/output operation.
According to one embodiment, load distribution in a system having a plurality of access servers and a plurality of I/O processing servers can be achieved.
All examples and conditional language provided herein are intended for the pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although one or more embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
This application is a continuation application of International Application PCT/JP2011/079425 filed on Dec. 19, 2011 and designated the U.S., the entire contents of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP2011/079425 | Dec 2011 | US |
Child | 14302486 | US |