1. Field of the Invention
The invention relates generally to clustered storage systems and more specifically is related to improved methods and structure for shipping I/O requests between storage controllers of the clustered storage system.
2. Related Patents
This patent application is related to the following commonly owned U.S. patent applications, all filed on the same date herewith and all of which are herein incorporated by reference:
3. Discussion of Related Art
In the field of data storage, customers demand highly resilient data storage systems that also exhibit fast recovery times for stored data. One type of storage system used to provide both of these characteristics is known as a clustered storage system.
A clustered storage system typically comprises a number of storage controllers, wherein each storage controller processes host Input/Output (I/O) requests directed to one or more logical volumes. The logical volumes reside on portions of one or more storage devices (e.g., hard disks) coupled with the storage controllers. Often, the logical volumes are configured as Redundant Array of Independent Disks (RAID) volumes in order to ensure an enhanced level of data integrity and/or performance.
A notable feature of clustered storage environments is that the storage controllers are capable of coordinating processing of host requests (e.g., by shipping I/O processing between each other) in order to enhance the performance of the storage environment. This includes intentionally transferring ownership of a logical volume from one storage controller to another. For example, a first storage controller may detect that it is currently undergoing a heavy processing load, and may assign ownership of a given logical volume to a second storage controller that has a smaller processing burden in order to increase overall speed of the clustered storage system. Other storage controllers may then update information identifying which storage controller presently owns each logical volume. Thus, when an I/O request is received at a storage controller that does not own the logical volume identified in the request, the storage controller may “ship” the request to the storage controller that presently owns the identified logical volume.
While clustered storage systems provide a number of performance benefits over more traditional storage systems described above, the speed of a storage system still typically remains a bottleneck to the overall speed of a processing system utilizing the storage system.
In the clustered storage system environment, a host system may direct I/O requests to any logical volume of the clustered system. However, a host that is tightly coupled with a storage controller may direct all I/O requests only to that storage controller (i.e., where, as in the configuration of
In some prior techniques, the controller receiving the I/O request directed to a logical volume that it does not own processes the request to generate low level I/O requests to the affected physical storage devices. The low level I/O requests so generated may be performed by the controller (since all controllers are coupled with all storage devices in the clustered architecture). However, this requires complex coordination with another storage controller that presently owns the addressed logical volume. In other prior techniques, the controller receiving the I/O request would generate the lower level I/O operations directed to the affected physical storage devices but then ship those lower level physical device requests to another controller that owns the addressed logical volume. The other storage controller would process the lower level I/O in coordination with its management of the addressed logical volume that it owns. Where the original I/O request was for writing data to the logical volume, the first controller would receive the data from the requesting host and forward that data to the other controller as part of the lower level I/O operations shipped across. In the case of a read request, the other controller would perform the lower level read operations and return data to the first controller that, in turn, returned the requested data to the requesting host system. Or, in the case of a write request, the first controller (in receipt of the request directed to a logical volume) would receive the write data from the host system and forward that data to the second controller along with the lower level write operations. In other prior techniques, the controller receiving the I/O request would generate lower level operations only then to realize that the resources for those lower level operations were owned by another controller. Responsive to such a determination, the controller would simply discard the work that had been completed to decompose the logical volume request into corresponding lower level requests and ship the original logical volume request to the other controller (thus requiring duplication of the computational efforts to decompose the logical volume request on the other controller). These prior techniques result in significant processing in both the first controller that received the request and in the other controller that actually performs the required lower level I/O operations. Further, these prior techniques could “double buffer” the data associated with the original request by storing the related data locally and then forwarding the data to the intended recipient thus adding still further processing as well as memory requirements in the first controller.
Thus it is an ongoing challenge to process I/O requests in a storage controller of a clustered storage system by shipping aspects of the received request to another storage controller where ownership of logical volume may be transferred among the controllers.
The present invention solves the above and other problems, thereby advancing the state of the useful arts, by providing methods and associated structure for improved shipping of I/O requests among multiple storage controllers of a clustered storage system. Minimal processing of a received I/O request is performed in a first controller that receives a host system I/O request to determine whether the I/O request is directed to a logical volume that is owned by the first controller or to a logical volume owned by another controller. For requests to a logical volume owned by another controller, the original I/O request is modified to indicate the target device address of the other controller. The first controller then ships the request to the other controller and configures DMA capabilities of the first controller to exchange data associated with the shipped request between the other controller and memory of the host system.
In one aspect hereof, a clustered storage system is provided. The clustered storage system comprising a first storage controller adapted to receive an I/O request from an attached host system and adapted to couple with a plurality of physical storage devices on which a logical volume is configured. The system further comprising a second storage controller coupled with the first storage controller and coupled with the plurality of storage devices, wherein the second storage controller owns the logical volume. The first storage controller is adapted to detect that the received I/O request is directed to the logical volume and is further adapted to transmit the received I/O request as a modified I/O request to the second storage controller responsive to the detection. The modified I/O request comprises a modified destination address corresponding to the second storage controller. The second storage controller is adapted to process the modified I/O request by accessing affected storage devices of the plurality of physical storage devices based on a configuration of the logical volume. The second storage controller exchanges data associated with the modified I/O request with the host system through the first storage controller.
Another aspect hereof provides method for I/O request shipping between storage controllers in a clustered storage system comprising a plurality of storage controllers. The method ships I/O requests between the storage controllers. The method comprises receiving in a first storage controller an I/O request from an attached host system. The request is directed to a logical volume identified in the received I/O request and is addressed to the first storage controller using a first target device address. The method then detects that the logical volume is owned by a second storage controller wherein the second storage controller is identified by a second target device address. The method then generates a modified I/O request based on the received I/O request. Generation of the modified I/O request comprises replacing the first target device address with the second target device address. The method then transmits the modified I/O request to the second storage controller for processing therein.
Still another aspect hereof provides a storage controller operable in a clustered storage system. The storage controller comprises ownership detection logic adapted to detect that an I/O request received from an attached host system is directed to a logical volume owned by another storage controller of the clustered storage system. The storage controller is coupled with one or more other storage controllers of the clustered storage system. The controller further comprises request shipping logic communicatively coupled with the ownership detection logic to receive the I/O request. The request shipping logic is adapted to generate a modified I/O request based on the I/O request. The modified I/O request is a copy of the I/O request modified to address a target device address associate with said other storage controller. The control further comprises an inter-controller interface circuit communicatively coupled with the request shipping logic and adapted to couple the storage controller with said other storage controller. The inter-controller interface circuit has direct memory access (DMA) capability to access memory in the host system. The request shipping logic is adapted to transmit the modified I/O request to said other storage controller through the inter-controller interface circuit. The inter-controller interface circuit is adapted to transfer data associated with the modified I/O request between the memory of the host system and said other controller using the DMA capability.
Storage controller 300 comprises back end interface circuit 310 for coupling controller 300 with one or more storage devices provisioning one or more locally owned logical volumes 320. Logical volume 320 comprises a logical device provisioned on one or more storage devices coupled with controller 300 via communication path 354. Controller 300 further comprises inter-controller interface circuit 312 adapted to couple controller 310 to other storage controllers 330 of the clustered storage system. Controller 300 may be coupled with other storage controller 330 via communication path 354. Other storage controller 330 may, in turn, be coupled with one or more remotely owned logical volumes 340 via communication path 354. In the exemplary embodiment of
Those of ordinary skill in the art will recognize that back end interface 310 and inter-controller interface 312 may be portions of a common circuit. For example, each circuit (310 and 312) may be a portion of a SAS interface circuit such that 310 represents the first PHY/port coupled with the SAS domain (i.e., coupled with SAS switched fabric 354) and inter-controller interface circuit 312 represents another PHY/port coupled with the SAS domain. Or, for example, circuits 310 and 312 may utilize a common SAS PHY/port for accessing any components of the SAS domain.
In accordance with features and aspects hereof, controller 300 further comprises logical volume ownership detection logic 304 to receive and initially process an I/O request received from attached host system 302. Before performing any further processing on the received I/O request, logical volume ownership detection logic 304 determines whether the received I/O request is directed to locally owned logical volume 320 or instead is directed to remotely owned logical volume 340 (remotely owned with respect to controller 300 in receipt of the I/O request—locally owned by other controller 330). If the received request is directed to locally owned logical volume 320, the request is forwarded via path 362 to local request processing element 306 for standard processing of a received I/O request. Such normal processing comprises element 306 communicating with host system 302 via path 350 and with locally owned logical volume 320 via path 352, back end interface circuit 310, and path 354. Details of such normal I/O request processing are well known to those of ordinary skill in the art and thus beyond the scope of this discussion.
If detection logic 304 determines that the received I/O request is directed to remotely owned logical volume 340, the received request is applied to request shipping logic 308 via path 362. Request shipping logic 308 is adapted to generate a modified I/O request by essentially copying the received I/O request and modifying the copy of the request to alter the destination target device address. Rather than the target device address of storage controller 300 in the received I/O request, the modified request redirects the request to the destination target device address associated with other storage controller 330. Request shipping logic 308 then transmits the modified I/O request to other storage controller 330 via inter-controller interface circuit 312 and path 354 as indicated by thicker, bolded, dashed arrow 364.
The modified I/O request generally comprises either a read request or a write request and hence has data associated with the request. For example, in the case of a write request, associated write data will be forthcoming from host system 302 directed to storage controller 300 which, in turn, will forward the write data to other storage controller 330. Or, for example, in the case of a read request, associated data retrieved from remotely owned logical volume 340 by operation of other storage controller 330 will be returned through inter-controller interface circuit 312 to host system 302. In accordance with features and aspects hereof, all such data transfers between other storage controller 330 and host system 302 pass through controller 300 utilizing DMA capabilities of inter-controller interface circuit 312 of controller 300 to directly access memory of host system 302. As noted above, in a preferred exemplary embodiment, host system 302 couples with storage controller 300 utilizing any of a variety of memory mapped I/O bus structures (e.g., PCI, PCI Express, PPC-PLB, AMBA AHB, etc.). Thus, write data associated with a modified I/O request will be transferred from a memory of host system 302 via DMA into inter-controller interface circuit 312 for forwarding to other storage controller 330 via path 354. Or, for example, read data retrieved from remotely owned logical volume 340 by other storage controller 330 will be received in inter-controller interface circuit 312 and forwarded utilizing its DMA capabilities for storage in a memory of host system 302 directly by the DMA operation. Such DMA operations are depicted in
Eventually, other storage controller 330 completes processing of the modified I/O request shipped to it by storage controller 300. Responsive to such completion, other storage controller 330 returns a completion status message to inter-controller interface circuit 312 (via path 354). The completion status message is then forwarded back to host system 302 through request shipping logic 308 (as indicated by thicker dashed lines 366 and 368).
In some exemplary embodiments, logic elements 304, 306, and 308 may be implemented as suitably programmed instructions executed by one or more general or special purpose processors of controller 300. In other exemplary embodiments, such logic may be implemented as suitably designed custom logic circuits. Still other embodiments may utilize combinations of programmed instructions and custom designed logic circuits. Significant performance improvement is achieved in the enhanced controller of
Those of ordinary skill in the art will recognize numerous additional and equivalent elements that may be present in a fully functional storage controller and clustered storage system. Such additional and equivalent elements are omitted herein for simplicity and brevity of this discussion.
If step 402 determines that the logical volume identified in the received request is owned by another controller of the clustered storage system, step 406 next generates a modified I/O request (e.g., by operation of the request shipping logic of the enhanced controller). The modified request is generated by copying the originally received request and modifying the target device address in the copy to a target device address associated with the other controller that is determined to own the identified logical volume. In this configuration, this controller (preparing to ship the request to another controller) will become an “initiator controller” in communicating (shipping) the modified I/O request to the other controller while the other controller will be a “target controller” for the modified request being shipped.
Step 408 within the shipping logic of the enhanced controller next configures the DMA capabilities of the controller to prepare to transfer data associated with the modified request between the host system and the other controller. More specifically, request shipping logic of the enhanced controller configures DMA features of the inter-controller interface circuit to transfer data associated within the request directly between memory of the requesting host system and the second/other storage controller (through the inter-controller interface circuit of the initiator controller). The dashed line from step 408 to step 450 indicates the preparation of the DMA features of the inter-controller interface circuit to provide such DMA transfers of data between the other controller and memory of the host system. Step 410 then transmits the modified I/O request to the other controller (the controller that owns the logical volume identified in the I/O request). The dashed line connection of step 410 and step 450 indicates that the DMA features configured by step 408 are started to perform the requisite transfer of data associated with the modified I/O request between the other controller and the memory of the host system. The DMA capabilities may be configured to operate in accordance with a scatter-gather list provided by the host system in the original I/O request. The scatter-gather list entries defined locations in the host system memory for data associated with the I/O request (i.e., locations from which write data may be retrieved or locations in which read data may be stored). Further, a scatter-gather list entry may comprise a “chain” entry that points to further scatter-gather list entries. Thus the scatter-gather list entries of the chain element may themselves be retrieved by a DMA transfer.
When the other controller completes processing of the modified I/O request (including exchange of data with the host system through the DMA features of the enhanced controller), a completion status is returned from the other controller to the enhanced controller at step 412. The completion status so returned to the enhanced (initiator) controller is then forwarded to the requesting host system.
Those of ordinary skill in the art will recognize numerous additional and equivalent steps in a fully functional method such as the method of
While the invention has been illustrated and described in the drawings and foregoing description, such illustration and description is to be considered as exemplary and not restrictive in character. One embodiment of the invention and minor variants thereof have been shown and described. In particular, features shown and described as exemplary software or firmware embodiments may be equivalently implemented as customized logic circuits and vice versa. Protection is desired for all changes and modifications that come within the spirit of the invention. Those skilled in the art will appreciate variations of the above-described embodiments that fall within the scope of the invention. As a result, the invention is not limited to the specific examples and illustrations discussed above, but only by the following claims and their equivalents.
Number | Name | Date | Kind |
---|---|---|---|
6105080 | Holt et al. | Aug 2000 | A |
6487646 | Adams et al. | Nov 2002 | B1 |
6651154 | Burton et al. | Nov 2003 | B1 |
6738870 | Van Huben et al. | May 2004 | B2 |
6738872 | Van Huben et al. | May 2004 | B2 |
6754739 | Kessler et al. | Jun 2004 | B1 |
6944785 | Gadir et al. | Sep 2005 | B2 |
7058846 | Kelkar et al. | Jun 2006 | B1 |
7082390 | Bergsten | Jul 2006 | B2 |
7213102 | Buchanan, Jr. et al. | May 2007 | B2 |
7418550 | Hetrick et al. | Aug 2008 | B2 |
7480941 | Balasubramaniam et al. | Jan 2009 | B1 |
7814065 | Chan et al. | Oct 2010 | B2 |
7971094 | Benn et al. | Jun 2011 | B1 |
8001242 | Mild et al. | Aug 2011 | B2 |
8041735 | Lacapra et al. | Oct 2011 | B1 |
8190816 | Balasubramanian | May 2012 | B2 |
8261003 | Young et al. | Sep 2012 | B2 |
8370571 | Mazina | Feb 2013 | B2 |
8380885 | Natanzon | Feb 2013 | B1 |
20020103964 | Igari | Aug 2002 | A1 |
20040148477 | Cochran | Jul 2004 | A1 |
20040205074 | Berkery et al. | Oct 2004 | A1 |
20050080874 | Fujiwara | Apr 2005 | A1 |
20050097324 | Mizuno | May 2005 | A1 |
20050125557 | Vasudevan et al. | Jun 2005 | A1 |
20050188421 | Arbajian | Aug 2005 | A1 |
20050240928 | Brown et al. | Oct 2005 | A1 |
20060080416 | Gandhi | Apr 2006 | A1 |
20060143506 | Whitt et al. | Jun 2006 | A1 |
20070015589 | Shimizu et al. | Jan 2007 | A1 |
20070067497 | Craft et al. | Mar 2007 | A1 |
20070088928 | Thangaraj et al. | Apr 2007 | A1 |
20070210162 | Keen et al. | Sep 2007 | A1 |
20090119364 | Guillon | May 2009 | A1 |
20090222500 | Chiu et al. | Sep 2009 | A1 |
20100185874 | Robles et al. | Jul 2010 | A1 |
20100191873 | Diamant | Jul 2010 | A1 |
20100250699 | Brown | Sep 2010 | A1 |
20100274977 | Schnapp et al. | Oct 2010 | A1 |
20110072228 | Nagata | Mar 2011 | A1 |
20110178983 | Bernhard et al. | Jul 2011 | A1 |
20110225371 | Spry | Sep 2011 | A1 |
20120159646 | Hong Chi et al. | Jun 2012 | A1 |
20120216299 | Frank | Aug 2012 | A1 |
Entry |
---|
“Common RAID Disk Data Format Specification” Version 2.0 Revision 19 SNIA Technical Position Mar. 27, 2009. |
Ciciani et al. “Analysis of Replication in Distributed Database Systems” IEEE Transactions on Knowledge and Data Engineering, vol. 2 . No. 2 . Jun. 1990. |
Number | Date | Country | |
---|---|---|---|
20130067123 A1 | Mar 2013 | US |
Number | Date | Country | |
---|---|---|---|
61532585 | Sep 2011 | US |