1. Field of the Invention
The invention relates generally to storage systems and more specifically relates to methods and systems for improved throughput in mirroring of cache data between two storage controllers by use of bidirectional transmission on a common initiator-target nexus (ITN).
2. Discussion of Related Art
Many high performance, high reliability storage systems utilize multiple storage controllers to increase both performance and reliability. Where multiple storage controllers are utilized, it is common that the controllers exchange information in the processing of host requests received from attached host systems. A variety of communication media and protocols may be used for such exchange of information between the multiple storage controllers of the system. Some common commercially available communication media and protocols for inter-controller communications include, Serial Attached SCSI (SAS), Fibre Channel (FC), Infiniband, PCI-Express, Ethernet, etc.
One common configuration of multiple inter-connected controllers is often referred to as “dual-active” or “active-active”. Each controller of the dual-active pair comprises a cache memory used to enhance performance of its I/O request processing. In the dual-active configuration, each controller is actively processing I/O requests from attached host systems and utilizes its local cache memory for enhancing performance. Further, each active controller serves as a redundant backup for the other active controller to improve reliability in case of a failure of the other controller. Though the controllers operate substantially independently processing I/O requests, the dual-active controllers need to share cache data so that when one controller takes over control of the I/O requests for the other (failed) controller the data in the cache memory of the other (failed) controller will be known/available to the remaining active controller.
In this dual-active configuration, each controller processes an I/O request (i.e., a write request) by writing data to its local cache (and eventually flushing the data from cache to the storage devices). The controller also mirrors or copies the data just written in its local cache memory to the local cache memory of the other active controller. Thus, each controller has full knowledge of the cache data in the other controller to allow “fail-over” by a remaining active controller in the event of failure of the other active controller. The copying of cache data from the local cache of one controller to the other controller is often referred to as “mirroring” of the cache data. The inter-controller communication media and protocols are used for this mirroring of cache data between the dual-active controllers.
There are several methods for mirroring cache data depending on the architecture of the storage system. In embodiments that use the SCSI transport protocol as a layer on top of the underlying, lower-level, inter-controller communication protocols (e.g., SAS), one current technique moves the mirrored cache data across the same SAS channels that distribute data to the drives. These mirror related transactions compete for bandwidth with the write data going to the drives. In the worst case the controller's effective host bandwidth is cut in half because the data must be mirrored to the other controller and must be written to disk.
When both controllers are mirroring write data the SAS connections begin to flow data in both directions, but this does not meet the requirements of Bi-directional data flow, since the mirroring traffic flows in only in one direction on the same initiator target nexus (I_T nexus or “ITN”). In present dual-active SAS architectures, both dual-active controllers may be simultaneously attempting to write data to the other controller's cache memory. These otherwise simultaneous write transaction cannot use the same ITN (i.e., write transactions cannot simultaneously flow in opposite directions on a single established ITN connection). Even though data is flowing in both directions on the same SAS PHY (the same physical link coupling the controllers), each of the controllers are issuing writes on different ITNs. This is because both controllers are initiating the mirrored data write operations and sending mirror data to the other controller. Such a typical mirror data flow as presently practiced allows no chance to take advantage of the bidirectional nature of SAS.
Thus it is an ongoing challenge to reduce overhead and bandwidth utilization in the exchange of mirrored cache data between pairs of dual-active storage controllers.
The present invention solves the above and other problems, thereby advancing the state of the useful arts, by providing methods and systems for improved transfer of mirrored information between paired dual-active storage controllers in a storage system using a SCSI transport layer. A first portion (approximately half) of the mirrored information transfers are performed in accordance with a first manner in which the controller to receive the mirrored information issues a read operation on the initiator-target nexus (ITN) of the SCSI transport layer to retrieve the mirrored information. A second portion (approximately half) of the mirrored information transfers are performed according to a second manner in which the controller having the information to be mirrored sends the information to be mirrored to the partner controller using a write operation on the ITN. The read and write operations on the same ITN may thus overlap to improve inter-controller communications. The mirrored information may be cached write data or entire I/O requests to be shipped to a partner controller.
In one aspect hereof, a method is provided operable in a first storage controller for mirroring cached write data with a second storage controller. The method comprises receiving a write I/O request from an attached host system wherein the write I/O request comprises write data. The method then stores the write data in a cache memory of the first storage controller wherein the write data is stored at a starting location in the cache memory. Responsive to receipt of the write I/O request, the method then transmits a first command to the second storage controller wherein the first command comprises indicia of the starting location in the cache memory of the write data. The method then receives a second command from the second storage controller wherein the second command comprises a request to read the write data from the starting location. Responsive to receipt of the second command, the method then transfers the write data from the starting location in the cache memory to the second storage controller.
Another aspect hereof provides a method and associated system in which the method is operable. The method is operable in a first storage controller for transferring information to a second storage controller wherein the first and second storage controllers are coupled by SCSI transport logic operable over a communication medium coupling the first and second storage controllers. The method comprises receiving a plurality of I/O requests from one or more attached host systems and storing information derived from each I/O request in a memory of the first storage controller wherein the information derived from each I/O request is stored at a corresponding starting location in the memory. For each I/O request of a first portion of the plurality of I/O requests, the method then transfers the information from the memory of the first storage controller to the second storage controller according to a first manner. For each I/O request of a second portion of the plurality of I/O requests, the method transfers the information from the memory of the first storage controller to the second storage controller according to a second manner. The first manner further comprises, responsive to receipt of each I/O request of the first portion, transmitting a first command to the second storage controller wherein the first command comprises the starting location in the memory of the information derived from said each request of the first portion. The first manner further comprises receiving using an initiator-target nexus (ITN) of the SCSI transport layer, a second command from the second storage controller wherein the second command comprises a request to read the information from the starting location in the memory. The first method still further comprises, responsive to receipt of the second command, transmitting, using the ITN, the information from the starting location in the memory specified by the second command to the second storage controller. The second manner further comprises, responsive to receipt of each I/O request of the second portion, transmitting, using the ITN, a third command to the second storage controller wherein the third command comprises the information derived from said each I/O request.
Minoring of stored information regarding processing of I/O requests is generally performed by inter-controller communication. Switched fabric communication medium 150 may be used for such inter-controller communication between first storage controller 102 and second storage controller 122. In addition, switched fabric communication medium 150 may be used to couple both controllers 102 and 122 to storage devices 140. Features and aspects hereof to improve performance of the inter-controller communications may be beneficially applied regardless of whether the inter-controller communication medium is shared with the controller-storage device communication channel or is distinct from the storage device communications.
Each controller 102 and 122 comprises a corresponding SCSI transport layer 108 and 128, respectively. The SCSI transport layer (and associated lower link and physical layers) permit the controllers 102 and 122 to communicate with storage devices 140 as well as with the other controller of the dual-active pair. An initiator-target nexus (ITN) identifies a particular connection between a particular pair of devices in the SCSI environment. As presently practiced in the art, each controller 102 and 122 may perform required mirroring of information in its memory (104 and 124 respectively) by performing SCSI write operations using the ITN associating with the initiator controller and with the target controller for the mirrored information write operation. As noted above and in accordance with SCSI standards, where both controllers are processing requests at a substantially equal rate, the ITN associating the two storage controllers may not be used simultaneously for two write operations—e.g., a write operation from the first storage controller to mirror its information to the second storage controller and another write operation using the same ITN for the second storage controller to mirror its information to the first storage controller.
By contrast, storage controllers 102 and 122 are enhanced in accordance with features and aspects hereof to provide for bidirectional communication utilizing the same ITN for purposes of mirroring information between the two storage controllers of the dual-active pair. In particular, features and aspects hereof utilize combinations of read requests and write request on the same ITN to permit substantial overlap (i.e., bidirectional simultaneous communication) of mirroring operations between the two controllers. An information exchange element 106 and 126 in each storage controller 102 and 122, respectively, serves to coordinate the use of such bidirectional communications utilizing a single ITN between first storage controller 102 and second storage controller 122 for purposes of mirroring information in their respective memories 104 and 124. The information exchange elements as well as the SCSI transport layers may be implemented as suitably programmed instructions stored in a memory (not shown) of the controllers and executed by an appropriate general or special purpose processor (not shown). Further, these elements may also be implemented as suitably designed logic circuits or combinations of logic circuits and suitable programmed instructions.
In operation, each controller 102 and 122 is adapted/operable to receive a plurality of I/O requests from attached host systems. For any given I/O request, the controller receiving the request may be designated as the “managing” controller and the other controller may be designated as the “partner” controller for purposes of processing that request. The managing controller is operable to store information regarding its received request and further operable to mirror the stored information to the partner controller (i.e., to be stored in the partner controller's memory). Further, the managing controller is operable to perform this mirroring operation in accordance with a first manner for a first portion of the plurality of received I/O requests and in accordance with a second manner for the remaining or second portion of the plurality of received I/O requests. In accordance with the first manner of performing the mirroring operation, the managing controller transmits a first command to its partner controller indicating to the partner controller that information is available in the memory of the managing controller. The first command identifies the information but does not transmit the data as a write operation to the partner controller. The information may be identified by its starting address and length in the memory of the managing controller and/or may be identified by a tag value created by the managing controller. The managing controller then later receives from its partner controller a second command identifying the information to be read from memory of the managing controller. The information is identified in the second command by the starting location and/or tag value sent from the managing controller in the first command. This second command comprises a read operation requesting return of the identified information. Responsive to receiving the second command the managing controller transmits the requested information from the starting location in its memory to the second/partner controller thereby satisfying the read request of the partner controller. In this first manner of performing the mirror operation, the data information to be mirrored is transmitted over a SCSI transport layer using a single ITN utilizing a read request from the partner controller to the managing controller.
In accordance with the second manner of performing the mirroring operation, the managing controller transmits the information relating to the I/O request to the partner controller utilizing a write operation on the SCSI transport layer using the same ITN used by the first manner for performing a read operation. By performing both read and write operations on the same ITN, and by performing the same operation in each controller (each operating as the managing controller for its received I/O requests), the data flows for the read and write operations on the same ITN may be substantially overlapped thus improving bandwidth utilization of the inter-controller communications and reducing latency in the communications as compared to prior techniques that utilize only write operations between the paired dual-active controllers for purposes of mirroring information.
In one exemplary embodiment each managing controller may alternate use of the first and second manners of mirroring information. The alternation may be performed by switching manners for each mirror operation to be performed by the managing controller or the manners of mirroring may be switched after a predetermined number of mirror operations have been performed using one or the other manner of mirroring. In other exemplary embodiment, one of the two controllers may be configured to always utilize the first manner of mirroring operation while the other controller may be configured to always utilize the second manner for mirroring operations. In such an alternate embodiment, read and write operations may still be overlapped by utilizing the same ITN for the mirror operations performed by the first controller using write operations and the mirror operations perform a second controller using read operation. In some embodiments it may be preferred that the two controllers are operating using identical control code and logic. Hence, it may be preferred that the two controllers operate in an identical manner alternating use of the first and second manners of mirroring operation so that the code/logic may be identical between the paired dual-active controllers.
The switched fabric communication medium 150 may be utilized for coupling first and second storage controllers 102 and 122 as well as for coupling each controller with storage devices 140. Fabric 150 may be any of several well-known, commercially available communication media and associated protocols that provide for SCSI transport layers to be utilized within each controller. For example, switched fabric communication medium 150 may provide for Serial Attached SCSI (SAS) communications or may provide Fibre Channel (FC) communications. Other well-known switched fabric communication media and protocols may be utilized that also allow for a SCSI transport layer to be utilized in communications between storage controllers. Thus, in some exemplary embodiments, first storage controller 102 and second storage controller 122 made each comprise SAS storage controllers or may each comprise Fibre Channel storage controllers. In like manner, storage devices 140 may be SAS storage devices or FC storage devices. Still further, features and aspects hereof also permit overlap of the mirror operations to transfer information between storage controllers 102 and 122 with the transfers of information (e.g., write data) to the storage devices 140 from a managing controller of an I/O request.
Those of ordinary skill in the art will readily recognize numerous additional and equivalent elements that may be present in fully functional storage controllers 102 and 122. Such additional and equivalent elements are omitted herein for simplicity and brevity of this discussion. For example, those of ordinary skill in the art will readily recognize that storage controllers 102 and 122 may include one or more general or special-purpose processors and associated memory for storing programmed instructions for performing mirroring and other operations by the controllers. In addition or in the alternative, storage controllers 102 and 122 may comprise suitably designed logic circuits for processing of incoming I/O requests and for performing the mirroring operations described herein. Still further, those of ordinary skill in the art will readily recognize that the information to be mirrored by the mirror operations described herein may comprise write data cached in the memory 104 or 124 of the managing controllers 102 and 122, respectively, stored by the managing controller's processing of received write I/O requests. In a dual-active storage controller configuration, write data is stored in a cache memory of the storage controller and mirrored to the cache memory of the other controller of the dual-active configuration. Still further, in other exemplary embodiments, the information to be mirrored may comprise the entire I/O request per se. In other words, an I/O request may be stored in the memory of the managing controller and the request may be “shipped” in its entirety to the partner controller for processing of the entire request. Thus the mirror operations described herein may be used for cache memory mirroring in dual-active storage controller configurations or for I/O shipping in such dual-active configuration as well as for other purposes readily recognized by those of ordinary skill in the art.
Both controllers are also capable of utilizing a standard technique for mirroring cached write data between the controllers. Steps 220 through 222 described steps in the first controller for receiving a standard cache mirror operation from the second controller. At step 220, the first controller receives a third command from the second controller. The third command is also referred to herein as a “Mirror Write” command (MW) and represents typical processing as presently practiced in the art for mirroring cached write data utilizing a write request between the paired controllers. At step 222, first controller stores the cached data received with the MW command in accordance with parameters identified in the MW command (e.g., at locations in cache memory identified by the parameters of the MW command). This MW command utilizes the same ITN of the SCSI transport layers as is utilized by the RR command as above with respect to steps 206 and 208.
As noted above, by utilizing both read and write operations for mirroring cached write data over the same ITN, substantial overlap may be achieved such that the essentially simultaneous bidirectional transfer of mirrored cache data between the storage controllers improves inter-controller communication bandwidth utilization and reduces overhead latency in such mirrored data transfers. Still further, those of ordinary skill in the art will recognize that the first command (the RW command) may be transferred from the first controller to the second controller using the same ITN as is used for the second and third commands as well as for the data transferred responsive to receipt of the second command. In some embodiments, the first command may be transferred using a different ITN to further avoid congestion and latency associated with use of the same ITN for write operations in both directions of the single ITN.
In
Regardless of the manner of selecting between the first and second manner of transfer of mirrored information, the first and second controllers utilize the mixture of the first and second manners of transfer such that substantial overlap may be achieved in the transfer of information between the dual-active storage controllers utilizing a single ITN of the SCSI transport layers.
Those of ordinary skill in the art will readily recognize numerous equivalents and additional steps that may be present in fully operational methods such as the method of
While the invention has been illustrated and described in the drawings and foregoing description, such illustration and description is to be considered as exemplary and not restrictive in character. One embodiment of the invention and minor variants thereof have been shown and described. In particular, features shown and described as exemplary software or firmware embodiments may be equivalently implemented as customized logic circuits and vice versa. Protection is desired for all changes and modifications that come within the spirit of the invention. Those skilled in the art will appreciate variations of the above-described embodiments that fall within the scope of the invention. As a result, the invention is not limited to the specific examples and illustrations discussed above, but only by the following claims and their equivalents.