The invention relates to data storage and in particular to data block storage services that store data blocks across a plurality of servers.
The client/server architecture has been one of the more successful innovations in information technology. The client/server architecture allows a plurality of clients to access services and resources maintained and/or controlled by a server. The server listens for, and responds to, requests from the clients and in response to the request determines whether or not the request can be satisfied. The server responds to the client as appropriate. A typical example of a client/server system is where a server is set up to store data files and a number of different clients can communicate with the server for the purpose of requesting the server to grant access to different ones of the data files maintained by the file server. If a data file is available and a client is authorized to access that data file, the server can deliver the requested data file to the server and thereby satisfy the client's request.
Although the clientserver architecture has worked remarkably well it does have some drawbacks. In particular, the clientserver environment is somewhat dynamic. For example, the number of clients contacting a server and the number of requests being made by individual clients can vary significantly over time. As such, a server responding to client requests may find itself inundated with a volume of requests that is impossible or nearly impossible to satisfy. To address this problem, network administrators often make sure that the server includes sufficient data processing resources to respond to anticipated peak levels of client requests. Thus, for example, the network administrator may make sure that the server comprises a sufficient number of central processing units (CPUs) with sufficient memory and storage space to handle the volume of client traffic that may arrive.
Even with a studied provisioning of resources, variations in client load can still burden a server system. For example, even if sufficient hardware resources are provided in the server system, it may be the case that client requests focus on a particular resource maintained by the server and supported by only a portion of the available resources. Thus, continuing with our above example, it is not uncommon that client requests overwhelmingly focus on a small portion of the data files maintained by the file server. Accordingly, even though the file server may have sufficient hardware resources to respond to a certain volume of client requests, if these requests are focused on a particular resource, such as a particular data file, most of the file server resources will remain idle while those resources that support the data file being targeted by the plurality of clients are over burdened.
To address this problem, network engineers have developed load balancing systems that act as a gateway to client requests and distribute client requests across the available resources for the purpose of distributing client load. To this end, the gateway system may distribute client requests in a round-robin fashion that evenly distributes requests across the available server resources. In other practices, the network administrator sets up a replication system that can identify when a particular resource is the subject of a flurry of client requests and duplicates the targeted resource so that more of the server resources are employed in supporting client requests for that resource.
Although the above techniques may work well with certain server architectures, they each require that a central transaction point be disposed between the clients and the server. As such, this central transaction point may act as a bottle neck that slows the servers response to client requests. Accordingly, there is a need in the art for a method for distributing client load across a server system while at the same time providing suitable response times for these incoming client requests.
The systems and methods described herein, include systems for managing requests for a plurality of clients for access to a set of resources. In one embodiment, the systems comprise a plurality of servers wherein the set of resources is partitioned across this plurality of servers. Each server has a load monitor process that is capable of communicating with the other load monitor processes for generating a measure of the client load on the server system and the client load on each of the respective servers.
Accordingly, in one embodiment, the systems comprise a server system having a plurality of servers, each of which has a load monitor process that is capable of coordinating with other load monitor processes executing on other servers to generate a system-wide view of the client load being handled by the server system and by individual respective servers.
Optionally, the systems may further comprise a client distribution process that is responsive to the measured system load and is capable of repartitioning the set of client connections among the server systems to thereby redistribute the client load.
Accordingly, it will be understood that the systems and methods described herein include client distribution systems that may work with a partitioned service, wherein the partitioned service is supported by a plurality of equivalent servers each of which is responsible for a portion of the service that has been partitioned across the equivalent servers. In one embodiment each equivalent server is capable of monitoring the relative load that each of the clients that server is communicating with is placing on the system and on that particular server. Accordingly, each equivalent server is capable of determining when a particular client would present a relative burden to service. However, for a partitioned service each client is to communicate with that equivalent server that is responsible for the resource of interest of the client. Accordingly, in one embodiment, the systems and methods described herein redistributed client load by, in part, redistributing resources across the plurality of servers.
In another embodiment, the systems and methods described herein include storage area network systems that may be employed for providing storage resources for an enterprise. The storage area network (SAN) of the invention comprises a plurality of servers and/or network devices that operate on the storage area network. At least a portion of the servers and network devices operating on the storage area network include a load monitor process that monitors the client load being placed on the respective server or network device. The load monitor process is further capable of communicating with other load monitor processes operating on the storage area network. The load monitor process on the server is capable of generating a system-wide load analysis that indicate the client load being placed on the storage area network. Additionally, the load monitor process is capable of generating an analysis of the client load being placed on that respective server and/or network device. Based on the client load information observed by the load monitor process, the storage area network is capable of redistributing client load to achieve greater responsiveness to client requests. In one embodiment, the storage area network is capable of moving the client connections supported by the system for the purpose of redistributing client load across the storage area network.
Further features and advantages of the present invention will be apparent from the following description of preferred embodiments and from the claims.
The following figures depict certain illustrative embodiments of the invention in which like reference numerals refer to like elements. These depicted embodiments are to be understood as illustrative of the invention and not as limiting in any way.
The systems and methods described herein include systems for organizing and managing resources that have been distributed over a plurality of servers on a data network. More particularly, the systems and methods described herein include systems and methods for providing more efficient operation of a partitioned service. The type of service can vary, however for purpose of illustration the invention will be described with reference to systems and methods for managing the allocation of data blocks across a partitioned volume of storage. It will be understood by those of skill in the art that the other applications and services may include, although are not limited to, distributed file systems, systems for supporting application service providers and other applications. Moreover, it will be understood by those of ordinary skill in the art that the systems and methods described herein are merely exemplary of the kinds of systems and methods that may be achieved through the invention and that these exemplary embodiments may be modified, supplemented and amended as appropriate for the application at hand.
Referring first to
The client 12 can be any suitable computer system such as a PC workstation, a handheld computing device, a wireless communication device, or any other such device, equipped with a network client program capable of accessing and interacting with the server 16 to exchange information with the server 16. Optionally, the client 12 and the server 16 rely on an unsecured communication path for accessing services at the remote server 16. To add security to such a communication path, the client 12 and the server 16 may employ a security system, such as any of the conventional security systems that have been developed to provide to the remote user a secured channel for transmitting data over a network. One such system is the Netscape secured socket layer (SSL) security mechanism that provides to a remote user a trusted path between a conventional web browser program and a web server.
Each server 161, 162 and 163 may include software components for carrying out the operation and the transactions described herein, and the software architecture of the servers 161, 162 and 163 may vary according to the application. In certain embodiments, the servers 161, 162 and 163 may employ a software architecture that builds certain of the processes described below into the server's operating system, into device drivers, into application level programs, or into a software process that operates on a peripheral device, such as a tape library, a RAID storage system or some other device. In any case, it will be understood by those of ordinary skill in the art, that the systems and methods described herein may be realized through many different embodiments, and practices, and that the particular embodiment and practice employed will vary as a function of the application of interest and all these embodiments and practices fall within the scope hereof.
In operation, the clients 12 will have need of the resources partitioned across the server group 16. Accordingly, each of the clients 12 will send requests to the server group 16. The clients typically act independently, and as such, the client load placed on the server group 16 will vary over time. In a typical operation, a client 12 will contact one of the servers, for example server 161, in the group 16 to access a resource, such as a data block, page, file, database, application, or other resource. The contacted server 161 itself may not hold or have control over the requested resource. However, in a preferred embodiment, the server group 16 is configured to make all the partitioned resources available to the client 12 regardless of the server that initially receives the request. For illustration, the diagram shows two resources, one resource 18 that is partitioned over all three servers, servers 161, 162, 163, and another resource 17 that is partitioned over two of the three servers. In the exemplary application of the system 10 being a block data storage system, each resource 18 and 17 may represent a partitioned block data volume.
In the embodiment of
Referring now to
Referring now to
It is transparent to the client 12 to which server 161, 162, 163 he is connected. Instead, the client only sees the servers in the server group 16 and requests the resources of the server group 16. It should be noted here that the routing of client requests is done separately for each request. This allows portions of the resource to exist at different servers. It also allows resources, or portions thereof, to be moved while the client is connected to the server group 16—if that is done, the routing tables 165 are updated as necessary and subsequent client requests will be forwarded to the server now responsible for handling that request. At least within a resource 17 or 18, the routing tables 165 are identical. The described invention is different from a “redirect” mechanism, wherein a server determines that it is unable to handle requests from a client, and redirects the client to the server that can do so. The client then establishes a new connection to another server. Since establishing a connection is relatively inefficient, the redirect mechanism is ill suited for handling frequent requests.
The resources spread over the several servers can be directories, individual files within a directory, or even blocks within a file. Other partitioned services could be contemplated. For example, it may be possible to partition a database in an analogous fashion or to provide a distributed file system, or a distributed or partitioned server that supports applications being delivered over the Internet. In general, the approach can be applied to any service where a client request can be interpreted as a request for a piece of the total resource, and operations on the pieces do not require global coordination among all the pieces.
Turning now to
As shown in
The routing tables may be employed by the system 10 to balance client load across the available servers.
The load monitor processes 22A, 22B and 22C each observe the request patterns arriving at their respective equivalent servers to determine to determine whether patterns or requests from clients 12 are being forwarded to the SAN and whether these patterns can be served more efficiently or reliably by a different arrangement of client connections to the several servers. In one embodiment, the load monitor processes 22A, 22B and 22C monitor client requests coming to their respective equivalent servers. In one embodiment, the load monitor processes each build a table representative of the different requests that have been seen by the individual request monitor processes. Each of the load monitor processes 22A, 22B and 22C are capable of communicating between themselves for the purpose of building a global database of requests seen by each of the equivalent servers. Accordingly, in this embodiment each of the load monitor processes is capable of integrating request data from each of the equivalent servers 161, 162 and 163 in generating a global request database representative of the request traffic seen by the entire block data storage system 16. In one embodiment, this global request database is made available to the client distribution processes 30A, 30B and 30C for their use in determining whether a more efficient or reliable arrangement of client connections is available.
Accordingly, in this initial condition the server group 16 may determine that server 161 is overly burdened or asset constrained. This determination may result from an analysis that server 161 is overly utilized given the assets it has available. For example, it could be that the server 161 has limited memory and that the requests being generated by clients 12A, 12B, and 12C have overburdened the memory assets available to server 161. Thus, server 161 may be responding to client requests at a level of performance that is below an acceptable limit. Alternatively, it may be determined that server 161, although performing and responding to client requests at an acceptable level, is overly burdened with respect to the client load (or bandwidth) being carried by server 162. Accordingly, the client distribution process 30 of the server group 16 may make a determination that overall efficiency may be improved by redistributing client load from its initial condition to one wherein server 162 services requests from client 12C. Considerations that drive the load balancing decision may vary and some examples are the desire to reduce routing: for example if one server is the destination of a significantly larger fraction of requests than the others on which portions of the resource (e.g., volume) resides, it may be advantageous to move the connection to that server. Or to further have balancing of server communications load: if the total communications load on a server is substantially greater than that on some other, it may be useful to move some of the connections from the highly loaded server to the lightly loaded one, and balancing of resource access load (e.g., disk I/O load)—as preceding but for disk I/O load rather than comm load. This is an optimization process that involves multiple dimensions, and the specific decisions made for a given set of measurements may depend on adminstrative policies, historical data about client activity, the capabilities of the various servers and network components, etc.
To this end,
Balancing of client load is also applicable to new connections from new clients. When a client 12F determines that it needs to access the resources provided by server group 16, it establishes an initial connection to that group. This connection will terminate at one of the servers 161, 162, or 163. Since the group appears as a single system to the client, it will not be aware of the distinction between the addresses for 161, 162, and 163, and therefore the choice of connection endpoint may be random, round robin, or fixed, but will not be responsive to the current load patterns among the servers in group 16.
When this initial client connection is received, the receiving server can at that time make a client load balancing decision. If this is done, the result may be that a more appropriate server is chosen to terminate the new connection, and the client connection is moved accordingly. The load balancing decision in this case may be based on the general level of loading at the various servers, the specific category of resource requested by the client 12F when it established the connection, historic data available to the load monitors in the server group 16 relating to previous access patterns from server 12F, policy parameters established by the administrator of server group 16, etc.
Another consideration in handling initial client connections is the distribution of the requested resource. As stated earlier, a given resource may be distributed over a proper subset of the server group. If so, it may happen that the server initially picked by client 12F for its connection serves no part of the requested resource. While it is possible to accept such a connection, it is not a particularly efficient arrangement because in that case all requests from the client, not merely a fraction of them, will require forwarding. For this reason it is useful to choose the server for the initial client connection only from among the subset of servers in server group 16 that actually serve at least some portion of the resource requested by new client 12F.
This decision can be made efficiently by the introduction of a second routing database. The routing database described earlier specifies the precise location of each separately moveable portion of the resource of interest. Copies of that routing database need to be available at each server that terminates a client connection on which that client is requesting access to the resource in question. The connection balancing routing database simply states for a given resource as a whole which servers among those in server group 16 currently provide some portion of that resource. For example, the connection balancing routing database to describe the resource arrangement shown in
At one point, the client 12C is instructed to stop making requests from server 161 and start making them to server 162. One method for this is client redirection, i.e., the client 12 is told to open a new connection to the IP address specified by the server doing the redirecting. Other mechanisms may be employed as appropriate.
Although
As discussed above, in certain embodiments, the systems of the invention may be realized as software components operating on a conventional data processing system such as a Unix workstation. In such embodiments, the system can be implemented as a C language computer program, or a computer program written in any high level language including C++, Fortran, Java or basic. General techniques for such high level programming are known, and set forth in, for example, Stephen G. Kochan, Programming in C, Hayden Publishing (1983).
While the invention has been disclosed in connection with the preferred embodiments shown and described in detail, various modifications and improvements thereon will become readily apparent to those skilled in the art. Accordingly, the spirit and scope of the present invention is to be limited only by the following claims.
This application claims priority to U.S. Provisional Application Ser. No. 60/441,810 filed Jan. 21, 2003 and naming G. Paul Koning, among others, as an inventor, the contents of which are incorporated by reference.
Number | Name | Date | Kind |
---|---|---|---|
5392244 | Jacobson et al. | Feb 1995 | A |
5774660 | Brendel et al. | Jun 1998 | A |
5774668 | Choquier et al. | Jun 1998 | A |
5978844 | Tsuchiya et al. | Nov 1999 | A |
6070191 | Narendran et al. | May 2000 | A |
6108727 | Boals et al. | Aug 2000 | A |
6122681 | Aditya et al. | Sep 2000 | A |
6128279 | O'Neil et al. | Oct 2000 | A |
6141688 | Bi et al. | Oct 2000 | A |
6144848 | Walsh et al. | Nov 2000 | A |
6148414 | Brown et al. | Nov 2000 | A |
6189079 | Micka et al. | Feb 2001 | B1 |
6195682 | Ho et al. | Feb 2001 | B1 |
6199112 | Wilson | Mar 2001 | B1 |
6212565 | Gupta | Apr 2001 | B1 |
6212606 | Dimitroff | Apr 2001 | B1 |
6226684 | Sung et al. | May 2001 | B1 |
6292181 | Banerjee et al. | Sep 2001 | B1 |
6341311 | Smith et al. | Jan 2002 | B1 |
6360262 | Guenthner et al. | Mar 2002 | B1 |
6421723 | Tawil | Jul 2002 | B1 |
6434683 | West et al. | Aug 2002 | B1 |
6449688 | Peters et al. | Sep 2002 | B1 |
6460082 | Lumelsky et al. | Oct 2002 | B1 |
6460083 | Niwa et al. | Oct 2002 | B1 |
6463454 | Lumelsky et al. | Oct 2002 | B1 |
6466980 | Lumelsky et al. | Oct 2002 | B1 |
6473791 | Al-Ghosein et al. | Oct 2002 | B1 |
6498791 | Pickett et al. | Dec 2002 | B2 |
6516350 | Lumelsky et al. | Feb 2003 | B1 |
6598134 | Ofek et al. | Jul 2003 | B2 |
6687731 | Kavak | Feb 2004 | B1 |
6725253 | Okano et al. | Apr 2004 | B1 |
6732171 | Hayden | May 2004 | B2 |
6742059 | Todd et al. | May 2004 | B1 |
6766348 | Combs et al. | Jul 2004 | B1 |
6813635 | Jorgenson | Nov 2004 | B1 |
6850982 | Siegel | Feb 2005 | B1 |
6859834 | Arora et al. | Feb 2005 | B1 |
6886035 | Wolff | Apr 2005 | B2 |
6889249 | Miloushev et al. | May 2005 | B2 |
6910150 | Mashayekhi et al. | Jun 2005 | B2 |
6944777 | Belani et al. | Sep 2005 | B1 |
6950848 | Yousefi'zadeh | Sep 2005 | B1 |
6957433 | Umberger et al. | Oct 2005 | B2 |
6985956 | Luke et al. | Jan 2006 | B2 |
6996645 | Wiedenman et al. | Feb 2006 | B1 |
7003628 | Wiedenman et al. | Feb 2006 | B1 |
7043564 | Cook et al. | May 2006 | B1 |
7047287 | Sim et al. | May 2006 | B2 |
7051131 | Wiedenman et al. | May 2006 | B1 |
7061923 | Dugan et al. | Jun 2006 | B2 |
7076655 | Griffin et al. | Jul 2006 | B2 |
7085829 | Wu et al. | Aug 2006 | B2 |
7089293 | Grosner et al. | Aug 2006 | B2 |
7165095 | Sim | Jan 2007 | B2 |
7181523 | Sim | Feb 2007 | B2 |
7290000 | Meifu et al. | Oct 2007 | B2 |
7356498 | Kaminsky et al. | Apr 2008 | B2 |
7356619 | Anderson et al. | Apr 2008 | B2 |
7421545 | Wiedenman et al. | Sep 2008 | B1 |
7574527 | Tormasov et al. | Aug 2009 | B1 |
7685126 | Patel et al. | Mar 2010 | B2 |
7739451 | Wiedenman et al. | Jun 2010 | B1 |
8055706 | Koning et al. | Nov 2011 | B2 |
8209515 | Schott | Jun 2012 | B2 |
20010039581 | Deng et al. | Nov 2001 | A1 |
20020008693 | Banerjee et al. | Jan 2002 | A1 |
20020009079 | Jungck et al. | Jan 2002 | A1 |
20020035667 | Bruning et al. | Mar 2002 | A1 |
20020059451 | Haviv et al. | May 2002 | A1 |
20020065799 | West et al. | May 2002 | A1 |
20020069241 | Narlikar et al. | Jun 2002 | A1 |
20020103889 | Markson et al. | Aug 2002 | A1 |
20020138551 | Erickson | Sep 2002 | A1 |
20020194324 | Guha | Dec 2002 | A1 |
20030005119 | Mercier et al. | Jan 2003 | A1 |
20030074596 | Mashayckhi et al. | Apr 2003 | A1 |
20030117954 | De Neve et al. | Jun 2003 | A1 |
20030120723 | Bright et al. | Jun 2003 | A1 |
20030154236 | Dar et al. | Aug 2003 | A1 |
20030212823 | Anderson et al. | Nov 2003 | A1 |
20030225884 | Hayden | Dec 2003 | A1 |
20040049564 | Ng et al. | Mar 2004 | A1 |
20040080558 | Blumenau et al. | Apr 2004 | A1 |
20040083345 | Kim et al. | Apr 2004 | A1 |
20040090966 | Thomas | May 2004 | A1 |
20040103104 | Hara et al. | May 2004 | A1 |
20040128442 | Hinshaw et al. | Jul 2004 | A1 |
20040143637 | Koning et al. | Jul 2004 | A1 |
20040153479 | Mikesell et al. | Aug 2004 | A1 |
20040210724 | Koning et al. | Oct 2004 | A1 |
20050010618 | Hayden | Jan 2005 | A1 |
20050144199 | Hayden | Jun 2005 | A2 |
20080021907 | Patel et al. | Jan 2008 | A1 |
20080243773 | Patel et al. | Oct 2008 | A1 |
20090271589 | Karpoff et al. | Oct 2009 | A1 |
20100235413 | Patel et al. | Sep 2010 | A1 |
20100257219 | Patel et al. | Oct 2010 | A1 |
Number | Date | Country |
---|---|---|
2002-278823 | Sep 2002 | JP |
WO 9953415 | Oct 1999 | WO |
WO-0138983 | May 2001 | WO |
WO-0237943 | May 2002 | WO |
WO-0244885 | Jun 2002 | WO |
WO-02056182 | Jul 2002 | WO |
Entry |
---|
Baltazar, H. and Garcia, A. Build Your Own SAN (2002). |
Druschel, P., Rice University and Rowstron, A., Microsoft Research, Cambridge, UK. PAST: A Large-Scale, Persistent Peer-to-peer Storage Utility. |
Ottem, E, “Getting the Most From your Storage: How to Deploy a SAN,” Gadzoox Networks, Inc. (1999). |
Networking with Pyramix. |
Storage Area Networking (SAN)—The Veritas Strategy. Q Associates (2000). |
Rapaport, L. and Baltazar, H., Introduction to SANs: Technology, Benefits, and Applications, (Jul. 9, 2001). |
“Enlist Desktops to Mirror Data,” TechTarget (2002). |
Scheuermann P. et al: “Data Partitioning and Load Balancing in Parallel Disk Systems” Techn. Rpt. A/02/96 University of Saarland. pp. 1-48. |
Wei Lui et al: “Design of an I/O balancing file system on web server clusters” Parallel Processing 2000 Intern. Workshops on Aug. 21-24, 2000, Piscataway, NJ, USA, IEEE, Aug. 21, 2000, pp. 119-125. |
Anderson T E et al: “Serverless Network File Systems” ACM Transactions on Computer Systems, Assoc. for Computing Machinery, New York, US, vol. 14, No. 1, pp. 41-79, Feb. 1, 1996. |
Hac A et al: IEEE: “Dynamic load balancing in a distributed system using a decentralized algorithm” Int. Conf. on Distributed Computing Systems, West Berlin, Sep. 21, 1987. Conf. Proceedings vol. 7, pp. 170-177. Sep. 21, 1987. |
Hartman J H et al: “The Zebra Striped Network File System” Operating Review (SIGOPS), ACM Headquarter New York, US vol. 27, No. 5, Dec. 1, 1993, pp. 29-43. |
U.S. Appl. No. 60/411,743, Hinshaw et al. |
U.S. Patent and Trademark Office, Office Action mailed Feb. 12, 2010, U.S. Appl. No. 11/585,363. |
U.S. Patent and Trademark Office, Office Action mailed Oct. 21, 2010, U.S. Appl. No. 11/585,363. |
Nakao, Kenichi, “Oracle 7 Server R.7.3.22”, Oracle Life, BNN Inc. Dec. 13, 1996, 1-7, pp. 36-39 (Domestic Engineering magazine 1997-00832-033). |
Saito, Kunihiro, “Kensho Windows 2000”, Nikkei Internet Technology, Nikkei Business Publications Inc., Nov. 11, 1999, vol. 29, pp. 116-127 (Domestic Engineering magazine 2000-00845-001). |
Arima, Tadao, “Windows 2000 de NT domain ha ko kawaru: Windows 2000 domain sekkei no chuiten”, Nikkei Windows NT, Nikkei Business Publications Inc., Aug. 1, 1000, vol. 29, pp. 155-162 (Domestic Engineer magazine 2000-01217-017). |
Nakayama, Satoru, “Hayaku, rakuni, kakujituni, download wo tukao. Reget wo tsukao”, ASCII NET J, ASCII Corporation, Aug. 4, 2000, vol. 5, pp. 32-33 (Domestic engineering magazine 2001-00219-013). |
Office Action for U.S. Appl. No. 10/762,984; Mail Date: Mar. 10, 2006. |
Office Action for U.S. Appl. No. 10/762,984; Mail Date: Aug. 18, 2006. |
Office Action for U.S. Appl. No. 10/762,984; Mail Date: Mar. 21, 2007. |
Office Action for U.S. Appl. No. 10/762,984; Mail Date: Jan. 4, 2008. |
Final Office Action for U.S. Appl. No. 10/762,984; Mail Date: Sep. 10, 2008. |
Notice of Appeal for U.S. Appl. No. 10/762,984. |
Appeal Brief for U.S. Appl. No. 10/762,984. |
Amended Appeal Brief for U.S. Appl. No. 10/762,984. |
2nd Amended Appeal Brief for U.S. Appl. No. 10/762,984. |
Examiner's Answer for U.S. Appl. No. 10/762,984; Mail Date: Jan. 28, 2010. |
Examiner's Answer for U.S. Appl. No. 10/762,984; Mail Date: Feb. 22, 2010. |
Number | Date | Country | |
---|---|---|---|
20040215792 A1 | Oct 2004 | US |
Number | Date | Country | |
---|---|---|---|
60441810 | Jan 2003 | US |