The invention relates to data storage and in particular to data block storage services that store data blocks across a plurality of servers.
As companies rely more and more on e-commerce, online transaction processing, and databases, the amount of information that needs to be managed and stored can intimidate even the most seasoned of network managers.
While servers do a good job of storing data, their capacity is limited, and they can become a bottleneck if too many users try to access the same information. Instead, most companies rely on peripheral storage devices such as tape libraries, RAID disks, and even optical storage systems. These storage devices are effective for backing up data online and storing large amounts of information. By banging a number of such devices off of a server, a network administrator can create a server farm that can store a substantial amount of data for the enterprise.
But as server farms increase in size, and as companies rely more heavily on data-intensive applications such as multimedia, this traditional storage model is not quite as useful. This is because access to these peripheral devices can be slow, and it might not always be possible for every user to easily and transparently access each storage device.
Recently, a number of vendors have been developing Storage Area Network (SAN). SANs provide more options for network storage, including much faster access than the peripheral devices that operate as Network Attached Storage (NAS) and SANs further provide flexibility to create separate networks to handle large volumes of data.
A SAN is a high-speed special-purpose network or subnetwork that interconnects different kinds of data storage devices with associated data servers on behalf of a larger network of users. Typically, a storage area network is part of the overall network of computing resources for an enterprise. A SAN is usually clustered in close proximity to other computing resources such as IBM S/390 mainframes but may also extend to remote locations for backup and archival storage, using wide area network carrier technologies such as ATM or Synchronous Optical Networks. A SAN can use existing communication technology such as optical fiber ESCON or Fibre Channel technology.
SANs support disk mirroring, backup and restore, archival and retrieval of archived data, data migration from one storage device to another, and the sharing of data among different servers in a network. SANs can incorporate subnetworks with network-attached storage systems.
Although SANs hold much promise, they face a significant challenge. Bluntly, consumers expect a lot of their data storage systems. Specifically, consumers demand that SANs provide network type scalability, service and flexibility, while at the same time providing data access at speeds that compete with server farms. This can be quite a challenge, particularly in environments where the dynamics of client data usage vary greatly and tend to change over time. For example, the speed at which a storage system can respond to a client demand, depends at least in part on the resources available on the server that is processing the request. However, client requests for data can be bursty and can tend to request certain portions of the stored data much more frequently than some of the other data. Moreover, client requests can follow patterns where certain portions of the stored data are commonly, although not always, requested along with other portions of the stored data.
In enterprise storage systems, different techniques have been developed to deal with the fact that certain portions of the stored data are requested more frequently than other portions. Further, striping techniques have been developed to allow enterprise storage systems to form patterns of data blocks that are more efficiently read from the disk storage devices. However, these techniques are readily implemented on the typical enterprise storage system by modifying the gateway or switch to monitor client requests and control how data is stored on the underlying storage media. For storage area networks such techniques can also be employed, however they force the SAN to use a gateway or switch architecture, and this can reduce the speed at which client requests can be performed.
Accordingly, it would therefore be desirable to provide a method and system that allows storage are network to control how data is stored and managed on the systems without requiring a gateway to monitor all incoming request traffic.
The systems and methods described herein include systems for providing a block level data storage service. More particularly, the systems and methods of the invention provide a block level data storage service that may be employed with a server system that partitions the block storage service across a plurality of equivalent servers. A system of equivalents severs will be understood to encompass, but not be limited to, systems comprised of a plurality of equivalent servers wherein each of the equivalent servers presents a similar interface to a client and each equivalent server presents the same response to the same request from the client. The systems and methods described herein may be applied to different applications and are not limited to any particular application, however, for the purpose of clarity, the systems and methods described herein will be described with reference to a block level data storage application wherein a plurality of data blocks are stored on a block data volume that is partitioned across a plurality of storage devices with different portions of the data volume being associated with different equivalent servers on the system.
As further described herein, the server system optionally employs an adaptive storage block data distribution process for distributing blocks of data across the different partitions of the data volume. To this end, each equivalent server includes a routing table, a data mover process and a request monitor process. The request monitor process is capable of monitoring requests made to the server from the one or more clients that are accessing the system. The request may be associated with data blocks stored on a partition or somewhere on the volume. The request monitor can monitor the different requests that clients make to the associated server. Additionally, the request monitor may communicate with other request monitor processes running on the different equivalent servers on the system. In this way, the request monitor can generate a global view of the requests being forwarded by clients to the partitioned block data storage system. By sharing this information, each equivalent server may, through its associated request monitor process, develop a global understanding of the requests being serviced by the block data storage system.
Once this global understanding of the request traffic being handled by the block data storage system is developed, each equivalent server may then employ its data mover process to move data blocks, or the responsibility for different data blocks, from one server to another different server. In one embodiment, each data mover process employs the global request data to determine distributions of data blocks that provide for more efficient service to a requesting client, more efficient allocation of resources, or in some other way improves the performance, reliability, or some other characteristic of the block data storage system.
In one particular embodiment, each data mover process is capable of communicating with another data mover process for the purpose of allowing the data mover processes of different servers to communicate when data blocks are being moved from one server to another different server. For example, in one embodiment, for the purpose of increasing reliability of data transfer, the data mover processes on the different equivalent servers can employ a transaction mechanism that monitors the transfer of data blocks from one server to the other and verifies when the block data transfer is complete and whether or not the entire transfer was successful.
To maintain an understanding of the location of the different data blocks across the different partitions of a volume and across the different volumes maintained by the data block storage system, each equivalent server maintains a routing table. To this end, each equivalent server includes a routing table process that tracks the different data blocks being stored on the block data storage system and the particular equivalent server that is responsible for each data block. In one embodiment, the routing table processes of the equivalent servers are capable of communicating with each other for the purpose of having each equivalent server maintain a routing table that provides a complete, system-wide database of the different data blocks maintained by the block data storage system and the equivalent servers that are associated with these different data blocks.
In accordance with the invention as embodied and broadly described herein, the invention provides, inter alia, methods, computer program products, and systems for allowing a plurality of servers to provide coherent support for incoming requests for services or resources. To this end, the systems and methods described herein distribute, organize and maintain resources across a plurality of services. In one preferred embodiment, the servers are truly equivalent in that they each can respond to an incoming request in the same manner. Thus, each server appears equivalent to clients that are requesting access to resources maintained on the system.
In one embodiment, the routing tables also store group membership information indicating the groups to which a server is a member. The routing table may be updated as necessary to reflect changes in group membership due to additions, removals, or temporary unavailability of the various servers that make up the group. When changes have propagated through the server group, all relevant routing tables at each server will contain identical information.
When a server receives a resource request, it uses the relevant routing table to identify which group member should actually hold the resource object or a part of the resource object. The request may then be serviced by laterally accessing the desired data object from the correct server without making expensive query-response transactions over the network.
Further features and advantages of the invention will be apparent from the following description of preferred embodiments and from the claims.
The following figures depict certain illustrative embodiments of the invention in which like reference numerals refer to like elements. These depicted embodiments are to be understood as illustrative of the invention and not as limiting in any way.
The systems and methods described herein include systems for organizing and managing resources that have been distributed over a plurality of servers on a data network. More particularly, the systems and methods described herein include systems and methods for providing more efficient operation of a partitioned service. In particular, the systems and methods described herein include systems and methods for managing the allocation of data blocks across a partitioned volume of storage. Although the systems and methods described herein will be largely directed to storage devices and applications, it will be understood by those of skill in the art that the invention may be applied to other applications, including distributed file systems, systems for supporting application service providers and other applications. Moreover, it will be understood by those of ordinary skill in the art that the systems and methods described herein are merely exemplary of the kinds of systems and methods that may be achieved through the invention and that these exemplary embodiments may be modified, supplemented and amended as appropriate for the application at hand.
Referring first to
The client 12 can be any suitable computer system such as a PC workstation, a handheld computing device, a wireless communication device, or any other such device, equipped with a network client capable of accessing and interacting with the server group 16 to exchange information with the server group 16. The network client may be a any client that allows the user to exchange data with the server. Optionally, the client 12 and the server group 16 rely on an unsecured communication path for accessing services at the remote server group 16. To add security to such a communication path, the client and the server can employ a security group system, such as any of the conventional security systems that have been developed to provide to the remote user a secured channel for transmitting data over a network. One such system is the Netscape secured socket layer (SSL) security mechanism that provides to a remote user a trusted path between a conventional web browser program and a web server.
Each server 161, 162 and 163 may comprise a commercially available server platform, such as a Sun Sparc™ system running a version of the Unix operating system.
Each server 161, 162 and 163 may also include other software components that extend their operation to accomplish the transactions described herein, and the architecture of the servers 161, 162 and 163 may vary according to the application. For example, each server may have built-in extensions, typically referred to as modules, to allow the servers to perform the operations hereinafter, or servers may have access to a directory of executable files, each of which may be employed for performing the operations, or parts of the operations described below. Further, in other embodiments, the servers 161, 162 and 163 may employ a software architecture that builds certain of the processes described below into the server's operating system, into a device driver, or into a software process that operates on a peripheral device, such as a tape library, a RAID storage system or some other device. In any case, it will be understood by those of ordinary skill in the art that the systems and methods described herein may be realized through many different embodiments, and practices, and that the particular embodiment and practice employed will vary as a function of the application of interest and all these embodiments and practices fall within the scope hereof.
In such an arrangement, the client 12 will contact one of the servers, for example server 161, in the group 16 to access a resource, such as a data block, page, file, database, application, or other resource. The contacted server 161 itself may not hold or have control over the requested resource. To address this, the server group 16 is configured to make the partitioned resources available to the client 12. For illustration, the diagram shows two resources, one resource 18 that is partitioned over all three servers, servers 161, 162, 163, and another resource 17 that is partitioned over two of the three servers. In the exemplary application of the server group 16 being a block data storage system, each resource 18 and 17 may be a partitioned block data volume. In the embodiment of
Referring now to
Referring now to
It is transparent to the client 12 to which server 161, 162, 163 it is connected. Instead, the client only sees the servers in the server group 16 and requests the resources of the server group 16. It should be noted here that the routing of client requests is done separately for each request. This allows portions of the resource to exist at different servers. It also allows resources, or portions thereof, to be moved while the client is connected to the server group 16—if that is done, the routing tables 165 are updated as necessary and subsequent client requests will be forwarded to the server now responsible for handling that request. At least within a resource 17 or 18, the routing tables 165 are identical. The described invention is different from a “redirect” mechanism, wherein a server determines that it is unable to handle requests from a client, and redirects the client to the server that can do so. The client then establishes a new connection to another server. Since establishing a connection is relatively inefficient, the redirect mechanism is ill-suited for handling frequent requests.
The resources spread over the several servers can be directories, individual files within a directory, or even blocks within a file. Other partitioned services could be contemplated. For example, it may be possible to partition a database in an analogous fashion or to provide a distributed file system, or a distributed or partitioned server that supports applications being delivered over the Internet. In general, the approach can be applied to any service where a client request can be interpreted as a request for a piece of the total resource, and operations on the pieces do not require global coordination among all the pieces.
Turning now to
As shown in
As further shown in
The request monitor processes 24A, 24B, and 24C each observer the request patterns arriving at their respective equivalent servers to determine to determine whether patterns or requests from clients 12 are being forwarded to the SAN and whether these patterns may allow for more efficient or reliable partitioning of data blocks. In one embodiment, the request monitor process 24A, 24B, and 24C merely monitor client requests coming to their respective equivalent servers. In one embodiment, the request monitor processes each build a table representative of the different requests that have been seen by the individual request monitor processes. Each of the request monitor processes 24A, 24B, and 24C are capable of communicating between themselves for the purpose of building a global database of requests seen by each of the equivalent servers. Accordingly, in this embodiment each of the request monitor processes is capable of integrating request data from each of the equivalent servers 161, 162 and 163 in generating a global request database representative of the request traffic seen by the entire block data storage system 16.
In one embodiment, this global request database is made available to the data mover processes 22A, 22B, and 22C for their use in determining whether a more efficient or reliable partitioning of data blocks is available. However, in alternate embodiments, each of the request mover processes 24A, 24B, and 24C include pattern identification processes capable of reviewing the request database to determine whether patterns of request exist within the database. For example, in one embodiment, the request monitor process 24B is capable of reviewing the global database of requests to determine whether there is a pattern where a plurality of different data blocks are typically requested either together or in sequence. If such a pattern is identified, then the pattern may be flagged and made available to any of the data mover processes 22A, 22B, or 22C for their use in determining whether data blocks could be striped across a plurality of servers to provide for more efficient servicing of client requests. Additionally, in other embodiments, the request monitor processes may be able to identify blocks of data that are typically requested together and which are being requested at a frequency that is above a pre-identified or pre-determined threshold. This allows the request monitors 24A, 24B, and 24C to identify “hot blocks” that may exist within the partitioned volume. In other embodiments, the request monitor processes 24A, 24B and 24C may be capable of identifying other patterns that occur within the requests being forwarded from clients to the block data storage system 16.
Returning again to
In one embodiment, the data mover process merely transfers page 28 from the storage device of equivalent server 162 to the storage device of equivalent server 163 and then updates the associated routing tables with this update being communicated across the plurality of routing tables 20A, 20B, and 20C within the block data storage system 16. However, in other embodiments, the data mover processes 22B and 22C may employ a transaction mechanism process that monitors the transfer of page 28 from the equivalent server 162 to the equivalent server 163 and determines when the transaction is complete and optionally whether the page 28 was transferred without error, and at that point updates the associated routing tables 20A, 20B, and 20C. The transaction employed by the data mover processes 22B and 22C may be any of the conventional transfer mechanism processes such as those commonly employed with a distributed file system.
Although
As discussed above, in certain embodiments, the systems of the invention may be realized as software components operating on a conventional data processing system such as a Unix workstation. In such embodiments, the system can be implemented as a C language computer program, or a computer program written in any high level language including C++, FORTRAN, Java or BASIC. General techniques for such high level programming are known, and set forth in, for example, Stephen G. Kochan, Programming in C, Hayden Publishing (1983).
While the invention has been disclosed in connection with the preferred embodiments shown and described in detail, various modifications and improvements thereon will become readily apparent to those skilled in the art. Accordingly, the spirit and scope of the present invention is to be limited only by the following claims.
Number | Name | Date | Kind |
---|---|---|---|
5392244 | Jacobson et al. | Feb 1995 | A |
6070191 | Narendran et al. | May 2000 | A |
6108727 | Boals et al. | Aug 2000 | A |
6122681 | Aditya et al. | Sep 2000 | A |
6141688 | Bi et al. | Oct 2000 | A |
6144848 | Walsh et al. | Nov 2000 | A |
6148414 | Brown et al. | Nov 2000 | A |
6189079 | Micka et al. | Feb 2001 | B1 |
6195682 | Ho et al. | Feb 2001 | B1 |
6199112 | Wilson | Mar 2001 | B1 |
6212565 | Gupta | Apr 2001 | B1 |
6212606 | Dimitroff | Apr 2001 | B1 |
6226684 | Sung et al. | May 2001 | B1 |
6292181 | Banjerjee et al. | Sep 2001 | B1 |
6341311 | Smith et al. | Jan 2002 | B1 |
6360262 | Guenthner et al. | Mar 2002 | B1 |
6421723 | Tawil | Jul 2002 | B1 |
6434683 | West et al. | Aug 2002 | B1 |
6460083 | Niwa et al. | Oct 2002 | B1 |
6473791 | Al-Ghosein et al. | Oct 2002 | B1 |
6498791 | Pickett et al. | Dec 2002 | B2 |
6598134 | Ofek et al. | Jul 2003 | B2 |
6687731 | Kavak | Feb 2004 | B1 |
6725253 | Okano et al. | Apr 2004 | B1 |
6732171 | Hayden | May 2004 | B2 |
6742059 | Todd et al. | May 2004 | B1 |
6766348 | Combs et al. | Jul 2004 | B1 |
6813635 | Jorgenson | Nov 2004 | B1 |
6850982 | Siegel | Feb 2005 | B1 |
6859834 | Arora et al. | Feb 2005 | B1 |
6886035 | Wolff | Apr 2005 | B2 |
6910150 | Mashayekhi et al. | Jun 2005 | B2 |
6944777 | Belani et al. | Sep 2005 | B1 |
6950848 | Yousefi'zadeh | Sep 2005 | B1 |
6957433 | Umberger et al. | Oct 2005 | B2 |
6985956 | Luke et al. | Jan 2006 | B2 |
7043564 | Cook et al. | May 2006 | B1 |
7076655 | Griffin et al. | Jul 2006 | B2 |
7085829 | Wu et al. | Aug 2006 | B2 |
7089293 | Grosner et al. | Aug 2006 | B2 |
7127577 | Koning et al. | Oct 2006 | B2 |
20010039581 | Deng et al. | Nov 2001 | A1 |
20020008693 | Banerjee et al. | Jan 2002 | A1 |
20020009079 | Jungck et al. | Jan 2002 | A1 |
20020035667 | Brunning | Mar 2002 | A1 |
20020059451 | Haviv | May 2002 | A1 |
20020065799 | West et al. | May 2002 | A1 |
20020138551 | Erickson | Sep 2002 | A1 |
20020194324 | Guha | Dec 2002 | A1 |
20030005119 | Mercier et al. | Jan 2003 | A1 |
20030074596 | Victor et al. | Apr 2003 | A1 |
20030117954 | De Neve et al. | Jun 2003 | A1 |
20030225884 | Hayden | Dec 2003 | A1 |
20040030755 | Koning et al. | Feb 2004 | A1 |
20040049564 | Ng et al. | Mar 2004 | A1 |
20040080558 | Blumenau et al. | Apr 2004 | A1 |
20040083345 | Kim et al. | Apr 2004 | A1 |
20040103104 | Hara et al. | May 2004 | A1 |
20040128442 | Hinshaw et al. | Jul 2004 | A1 |
20040143637 | Koning et al. | Jul 2004 | A1 |
20040143648 | Koning et al. | Jul 2004 | A1 |
20040153606 | Schott | Aug 2004 | A1 |
20040210724 | Koning et al. | Oct 2004 | A1 |
20050010618 | Hayden | Jan 2005 | A1 |
20050144199 | Hayden | Jun 2005 | A2 |
20070106857 | Koning et al. | May 2007 | A1 |
Number | Date | Country |
---|---|---|
WO9953415 | Oct 1999 | WO |
WO 9953415 | Oct 1999 | WO |
WO 0138983 | May 2001 | WO |
WO 0237943 | May 2002 | WO |
WO 0244885 | Jun 2002 | WO |
WO 02056182 | Jul 2002 | WO |
Number | Date | Country | |
---|---|---|---|
20040143637 A1 | Jul 2004 | US |