This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2015-018855, filed on Feb. 2, 2015, the entire contents of which are incorporated herein by reference.
The embodiments discussed herein are related to a storage controller, a method, and a storage medium.
In a storage system comprising multiple storage controllers, the access method from an operation server utilizing the storage system includes an active-standby method and an active-active method. In the active-standby method, only one active storage controller receives an access request from the operation server utilizing the storage system. On the other hand, in the active-active method, any storage controller may receive the access request from the operation server.
Although the processing for the access request from the operation server is complicated, demand for the active-active storage system is apt to increase for advantages of the active-active method such as that load applied to the storage controller is distributed and a failure may be restored promptly.
As for the volume switching, a technique is known that suppresses exhaustion of the in-host ID by switching an access request for a switching source volume to a switching destination volume with reference to a switching table that associates the in-host ID of the switching source volume and the in-storage ID of the switching destination volume with each other.
Also, a technique is known that executes a first remote copy on the storage basis in the normal state, and when a failure occurs in the first remote copy, enables failure handling by switching to a second remote copy on the host basis while suppressing the load to the host.
Further, a technique is known that, when an I/O request to a first volume out of a pair of volumes of the remote copy is failed, transmits the I/O request to a second volume to update the second volume and thereby improve the availability of the system.
As examples of the related techniques, Japanese Laid-open Patent Publication Nos. 2006-309638, 2006-285919, and 2009-266120 are known.
According to an aspect of the invention, a storage controller out of a plurality of storage controllers used in a storage system, each of the plurality of storage controllers being configured to control mirror processing of data, the storage controller includes: circuitry configured to: determine a storage controller of a destination of an input and output request for a volume, out of the plurality of storage controllers, based on mirror device information where identifiers and priorities of the respective storage controllers are stored while being associated with each other for each of mirror processing units, and state information being for each of storage controllers and indicating whether the storage controller is normal or not, and issue the input and output request to the determined storage controller.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention, as claimed.
An active-active storage system has a problem in that the processing for maintaining the synchronization of the storage system takes a time when a failure occurs in the storage controller. Destruction of the synchronization in the storage system means that the two data mirrored for a redundancy become not equal.
According to an aspect of the embodiment, the active-active storage system reduces the processing time when a failure occurs in a storage controller.
Hereinafter, embodiments of a storage controller and a storage control program disclosed herein are described in detail with reference to the accompanying drawings. The embodiments shall not limit the technique disclosed herein.
First, a mirror logical unit number (LUN) for configuring a mirror volume without destructing the synchronization in an active-active storage system is described.
In
The mirror LUN 44 is a virtual disk device integrating multiple segment groups 46, and one mirror LUN 44 exists in one control module 3a. Here, the segment group 46 is a processing unit of the mirroring, which consists of multiple segments. The segment is a management unit of the disk area. Each segment group 46 belongs to a mirror LUN. However, in
An operation server 10 utilizing the storage system requests the control module 3a for access to a volume 45 provided by the storage system. In
Then, the control module 3a, to which the access request is issued, performs a processing of the mirror LUN 44, that is a processing of mirroring the segment group 46 designated in the access request. When the access request is for writing data into the segment group 46, the control module 3a, to which the access request is issued, controls such that the control module 3a comprising the mirror LUN 44 writes data into two disk devices.
Thus, the synchronization of the active-active storage system is maintained by executing access from the operation server 10 to the volume 45 via the mirror LUN. If the mirroring is performed by a control module 3a that has received the request from the operation server 10 without passing through the mirror LUN 44, the synchronization may be destructed depending on the timing when multiple control modules 3a write different data into the same segment group 46.
Next, a switching of an access path to the mirror LUN 44 when the control module 3a is down is described.
As illustrated in
When the control module #1 is down, the control module #2 takes over the access of the control module #1 to the mirror LUN 44. That is, a control module 3a, which receives the access request from the operation server 10, switches the access to the mirror LUN 44 of the control module #1 to the control module #2. In
Then, instead of the control module #1, the control module #2 makes the access to the disk devices 4. The access of the control module #2 to the disk device 4 in place of the control module #1 is called a buddy access. In
Thus, when a control module 3a including the mirror LUN is down, operation of the storage system is continued by switching the access via the mirror LUN to the buddy access. Then, when the failed control module 3a is restarted, the storage system switches the buddy access to the original normal access.
However, the synchronization may be destructed when switching the access path from the buddy access to the normal access.
As illustrated in
Then, data A is written into the LUN #1 (3). Here, write of data A into the LUN #1 represents writing of data A into a disk device 4 having a LUN as the LUN #1. Hereinafter, the disk device having a LUN as a LUN #n (n is an integer) is merely referred to as the LUN #n.
Meanwhile, assume that writing of data A into the LUN #2 delays due to some reason (4). Then, assume that the control module #2 is reactivated during the delay (5). Then, as illustrated in
Then, data B is written into the LUN #1 (10), and data B is written into the LUN #2 (11). Thereafter, writing of data A from the control module #1 to the LUN #2, which has been delayed, is started (12).
Thus, if writing of data A by the buddy control module 3a is delayed due to some reason, data A may be written after data B from the reactivated control module 3a is written, and this may cause mismatching between the LUN #1 and the LUN #2 (13).
To avoid occurrence of such a problem, a method of maintaining the synchronization by the manager is adopted. Here, the manager is a module that controls an entire storage system. The manager operates in one of the multiple control modules 3a.
As illustrated in
Then, Write of data B from the operation server 10 to the same area is stopped (3). Then, when processing of the mirror LUN is switched to the control module #2 (4), the manager 43a resumes all accesses to the volume #1 (5). Then, the control module #5, which has received a request of Write of data B into the same area from the operation server 10, requests writing of data B into the reactivated control module #2 (6). Then, data B is written into the LUN #1 (7), and data B is written into the LUN #2 (8).
Thus, when a failed control module 3a is restored, the manager 43a stops access to a volume 45 with the access area included in a mirror LUN 44 existing in the failed control module 3a, and thereby avoids destruction of the synchronization.
However, this method has a problem that it takes a time for a processing by the manager 43a to calculate a volume 45 with the access area included in a mirror LUN 44 existing in the failed control module 3a and instruct respective control modules 3a to stop access. Since the mirror LUN 44 integrates segment groups 46 of many volumes 45, there are many volumes 45 with the access area included in the mirror LUN 44, and therefore, it takes a time to calculate the volumes 45 with the access area included in the mirror LUN 44. Therefore, the manager 43a becomes a bottleneck causing a prolonged I/O (Input/Output) stop time. Also, when a control module 3a operated by the manager 43a is down, the manager 43a may have to reactivate and thereby the switching is delayed.
For solving such problems, a storage system according to the embodiment is configured to control access from the operation server 10 so as to maintain the synchronization without a burden to the manager.
As illustrated in
The each shelf 2 is a chassis configured to accommodate two control modules 3 controlling the storage system 1 and four disk devices 4 identified with the LUN. Although four disk devices 4 are illustrated herein for the convenience of explanation, the shelf 2 may accommodate any number of disk devices 4. Control modules 3 accommodated by the shelf #1 are represented as a control module #1 and a control module #2, and control modules 3 accommodated by the shelf #2 are represented as a control module #3 and a control module #4. Control modules 3 accommodated by the shelf #3 are represented as a control module #5 and a control module #6.
An interconnect switch 5 is a switch configured to connect the control modules 3 to each other. The control modules 3 communicate with each other via the interconnect switch 5.
The control module 3 comprises an interface 31, an interconnect interface 32, and a serial-attached SCSI (SAS) 33. The interface 31 is a device configured to communicate with the operation server 10 utilizing the storage system. The interconnect interface 32 is a device configured to communicate with another interconnect interface 32 via the interconnect switch 5. The SAS 33 is a device configured to communicate with the disk devices 4.
The disk device 4 is a nonvolatile storage device configured to store data used by the operation server 10, which is implemented by a hard disk drive (HDD). The shelf 2 may accommodate a solid state drive (SSD) instead of the disk device 4.
As illustrated in
The storage unit 40 is configured to store data used for control of the storage system 1. The storage unit 40 stores a module management table 40a and a mirror LUN management table 40b. The module management table 40a is a table indicating whether each of the control modules 3 is normal or abnormal.
The mirror LUN management table 40b is a table associating the segment group 46 with the number of each of two control modules 3 in which the mirror LUN 44 exists.
The normal identifier is a number for identifying a control module 3 in which a mirror LUN 44 including a corresponding segment group 46 exists. The buddy identifier is a number for identifying the buddy control module 3 used instead of a control module 3 identified with the normal identifier when the control module 3 is down. That is, the control module 3 identified with the normal identifier performs processing of the mirror LUN 44 having a priority higher than a control module 3 identified with the buddy identifier. For example, if a mirror LUN 44 including a segment group 46 having the number of 1 exists in the control module #1 and the control module #2, usually a mirror LUN 44 existing in the control module #1 is used.
The cluster 41 is configured to control a storage cluster in conjunction with the cluster 41 of the other control module 3. Here, the storage cluster is a function causing the other control module 3 to automatically take over a function of a control module 3 when a failure occurs thereto.
The agent 42 is configured to monitor the state of the control module 3 and notify the cluster 41 when an abnormal control module 3 is detected. Upon being notified of the abnormal control module 3 from the agent 42, the cluster 41 decouples the control module 3 from the storage system 1. Further, the cluster 41 notifies the failed control module 3 to all other control modules 3.
Upon being notified of the failed control module 3 from the cluster 41, the agent 42 updates the module management table 40a. Upon being notified failure of the control module 3 from the cluster 41, an agent 42 of a buddy control module 3 in the failed control module 3 performs resynchronization processing.
The volume processing unit 47 is configured to process an access request of the operation server 10 to the volume 45. With the mirror LUN management table 40b, the volume processing unit 47 identifies two control modules 3 where a mirror LUN 44 exists for an area to be accessed. Then, if a normally used control module 3 out of the identified two control modules 3 is normal, the volume processing unit 47 issues I/O to the normally used control module 3. If the normally used control module 3 is not normal, the volume processing unit 47 issues I/O to the buddy control module 3.
Here, issuing the I/O means requesting the processing of the mirror LUN 44. Further, the volume processing unit 47 determines with reference to the module management table 40a whether the control module 3 is normal.
Upon receiving the I/O for the volume 45 from the operation server 10 (1), the volume processing unit 47 checks the module management table 40a (2). Then, the volume processing unit 47 issues the I/O to a control module 3 which is a normal control module 3 and has the highest priority (3). The priority of the normally used control module 3 is higher than the priority of the buddy control module 3. Therefore, in
Thus, the volume processing unit 47 identifies, with reference to the module management table 40a, a control module 3 which issue the I/O. Therefore, even when a normally used control module 3 is in failure, a control module 3 which issue the I/O may be identified easily.
As illustrated in
Then, as illustrated in
Depending on the timing when the control module #1 is down, the I/O issued to the control module #1 may be returned to the volume processing unit 47 as an error. In such a case, the volume processing unit 47 re-issues the I/O to the buddy control module 3 as illustrated in
Referring back to
The mirror control unit 48 manages a JRM segment that a just resynchronization mechanism (JRM) uses for the resynchronization. Here, the JRM segment is a collection of bits indicating whether writing is being processed for each of the segment groups 46.
Then, when notified by the cluster 41 that the control module 3 is down, the agent 42 reads a JRM segment 49 managed by the failed control module 3 and performs the resynchronization processing for a segment group 46 with the JRM segment 49 set to 1. Here, the agent 42 directly reads the JRM segment 49 managed by the failed control module 3 without passing through the manager 43. That is, the JRM segment 49 is capable to be read from a buddy control module 3. The JRM segment 49 is arranged in each of the control modules 3. Since access from a buddy control module is inferior compared with normal access, usually the JRM segment 49 is accessed from the control module 3 in which the JRM segment 49 is arranged.
However, the I/O, which comes to an area where the synchronization is not restored immediately after the on-line processing, is held, and then the held I/O is processed by the buddy control module 3 after resynchronization.
Referring back to
When the failed control module 3 is reactivated and access to the mirror LUN 44 is switched back to the failed control module 3, the manager 43 holds the I/O in a switching source control module 3. It is for the mirror LUN 44 not to be duplicated between two control modules 3, one is a switching source and another is a switching destination, when switching back the access. The switching source control module 3 receives the I/O from the volume processing unit 47, but temporarily holds the I/O processing. Then, after completion of the switching, the control module 3 returns an error and causes the volume processing unit 47 to re-issue the I/O.
Then, the control module #4 receives the I/O for the volume 45 from the operation server 10 (4), and the volume processing unit 47 of the control module #4 checks the module management table 40a (5). Then, since the control module #1 is in failure, the volume processing unit 47 issues the I/O to the control module #2 which is a buddy control module 3 (6). Here, the control module #2 holds the I/O with no processing (7).
Then, as illustrated in
Then, the control module #2 ends the I/O holding (9), and returns an I/O error for the held I/O to the control module #4 (10). Then, the volume processing unit 47 of the control module #4 re-issues the I/O to the control module #1 with reference to the updated module management table 40a (11).
Next, a flow of an I/O issue processing by the volume processing unit 47 is described.
Then, the volume processing unit 47 acquires the normal identifier and the buddy identifier for the I/O area with reference to the mirror LUN management table 40b (step S2). Then, the volume processing unit 47 determines with reference to the module management table 40a whether the control module 3 identified with the normal identifier is normal (step S3).
Then, when the control module 3 identified with the normal identifier is normal, the volume processing unit 47 determines the control module 3 identified with the normal identifier as the I/O destination (step S4), and issues the I/O to the determined control module 3 (step S5).
On the other hand, when the control module 3 identified with the normal identifier is not normal, the volume processing unit 47 determines with reference to the module management table 40a whether the control module 3 identified with the buddy identifier is normal (step S6).
Then, when the control module 3 identified with the buddy identifier is normal, the volume processing unit 47 determines the control module 3 identified with the buddy identifier as the I/O destination (step S7), and issues the I/O to the determined control module 3 (step S5). On the other hand, when the control module 3 identified with the buddy identifier is not normal, the volume processing unit 47 reports an error to the manager 43 (step S8).
Thus, the volume processing unit 47 determines the state of the control module 3 with reference to the module management table 40a, such that even when a failure occurs in the control module 3, the I/O is issued to an appropriate control module 3 without relying on the manager 43.
Next, a flow of a processing when the cluster 41 detects a failure of the control module 3 is described.
As illustrated in
On the other hand, when determined that the self device is a buddy, the agent 42 performs processing of the step S23 to the step S27 for a segment group 46 of the resynchronization target. That is, the agent 42 reads the JRM segment 49 (step S24) and identifies the segment group 46 of the resynchronization target.
Then, the agent 42 implements on-line processing (step S25) and performs resynchronization processing (step S26). Then, the agent 42 updates the module management table 40a (step S22), and ends the processing.
Thus, upon detecting a failure of a control module 3, the cluster 41 transmits the down notices to all control modules 3 other than the control module 3 where the failure has occurred, and the agent 42 updates the module management table 40a. Therefore, even when a failure occurs in a control module 3, the storage system 1 may easily switch to a buddy control module 3 without relying on the manager 43.
Next, a flow of a processing when the control module 3 is reactivated is described.
Then, the switching source control module 3 starts holding of the mirror LUN I/O (step S32). Here, the mirror LUN I/O is an I/O issued to a control module 3 where the mirror LUN 44 exists. Then, each control module 3 updates the module management table 40a (step S33).
Then, the switching source control module 3 ends holding of the mirror LUN I/O (step S34). Then, the switching source control module 3 returns the held I/O as an error and requests re-issue of the I/O to a control module 3 to which the I/O has been issued (step S35).
Then, the manager 43 instructs the switching source control module 3 to start the I/O holding (step S44). Thereafter, when the operation server 10 issues the I/O (step S45), the volume processing unit 47 of a control module 3 which has received the I/O checks the module management table 40a (step S46), and issues the I/O to the switching source control module 3 (step S47). Then, the switching source control module holds the I/O (step S48).
The manager 43 instructs all control modules 3 to update the module management table 40a (step S49), and each control module 3 responds to the manager 43 by updating the module management table 40a (step S50). Then, the manager 43 instructs the switching source control module 3 to end the I/O holding (step S51). Then, the switching source control module 3 returns the I/O error for the held I/O (step S52).
Then, the volume processing unit 47, which has received the I/O error, checks the module management table 40a (step S53) and re-issues the I/O to the reactivated control module 3 (step S54). Then, after processing the I/O, the reactivated control module returns the response to the volume processing unit 47 (step S55), and the volume processing unit 47 returns the response to the operation server 10 (step S56).
Thus, when the failed control module 3 is reactivated, the switching source control module holds the I/O, and returns the I/O error when the module management table 40a is updated. Therefore, the synchronization of the storage system 1 is maintained without increasing the processing load of the manager 43 when the failed control module 3 is reactivated.
Although the embodiment is described by referring to its own function of the control module 3, a storage control program having the same function may be obtained by implementing the function of the control module 3 with a firmware. Thus, a hardware configuration of the control module 3 implementing the storage control program is described.
The MPU 36 is a processing device configured to read and implement a firmware stored in the RAM 38. The flash memory 37 is a nonvolatile memory configured to store the firmware as the storage control program. The RAM 38 is a volatile memory configured to store the firmware read from the flash memory 37. The RAM 38 stores data in which are used for implementation of the firmware, halfway results of the firmware execution, and so on.
The storage device for storing the storage control program further includes a magnetic storage device, an optical disk, a magneto-optical recording medium, and so on. The magnetic storage device includes a hard disk drive (HDD), and so on. The optical disk includes a digital versatile disk (DVD), a DVD-RAM, a CD-ROM/RW, and so on. The magneto-optical recording medium includes a magneto-optical disk (MO), and so on.
When distributing the storage control program, a portable recording medium such as, for example, a DVD or a CD-ROM in which the storage control program is recorded is marketed. The storage control program may be stored in a storage device of a server computer and transferred from the server computer to the control module 3 via a network.
The control module 3 stores, for example, a storage control program recorded in a portable recording medium or a storage control program transferred from a server computer into the flash memory 37. Then, the MPU 36 reads the storage control program from the flash memory 37 and executes the processing according to the storage control program. The MPU 36 may read the storage control program directly from a portable recording medium and execute the processing according to the storage control program.
As described above, in the embodiment, the normal identifier and the buddy identifier are associated with each other for each of the segment groups 46 in the mirror LUN management table 40b, and the identifier of the control module 3 and the state information are associated with each other in the module management table 40a. Then, based on the mirror LUN management table 40b and the module management table 40a, the volume processing unit 47 determines a control module 3 performing the processing of the mirror LUN and issues the I/O to the determined control module 3. Therefore, even when a failure occurs in a control module 3, the storage system 1 easily switches the control module 3 performing the processing of the mirror LUN 44 without relying on the configuration information managed by the manager 43. This reduces processing time in the storage system 1 when a failure occurs.
In the embodiment, the agent 42 updates the module management table 40a based on the down notice from the cluster 41. Therefore, failure state of the control module 3 is promptly reflected on the module management table 40a in the storage system 1.
In the embodiment, the agent 42 of the buddy control module 3 performs the resynchronization processing based on the down notice from the cluster 41. Therefore, even when a failure occurs, synchronization of the mirror data is maintained in the storage system 1.
In the embodiment, when a failure has occurred in a control module performing the processing of the mirror LUN 44, the volume processing unit 47 receives the I/O error and re-issues the I/O to the buddy control module 3. Therefore, even when a control module 3 performing the processing of the mirror LUN 44 is down after the I/O has been issued thereto, the issued I/O is certainly processed in the storage system 1.
In the embodiment, when the failed control module 3 is restored, the agent 42 updates the module management table 40a. Therefore, the control module 3 performing the processing of the mirror LUN is easily re-coupled in the storage system 1.
In the embodiment, when the failed control module 3 is restored, the buddy control module 3 holds the I/O, and returns an error for the held I/O after the module management table 40a has been updated. Therefore, when the failed control module 3 is restored, destruction of the synchronization in the storage system 1 is avoided. Further, when the failed control module 3 is restored, the steps of processing performed by the manager 43 for maintaining the synchronization in the storage system 1 is reduced. Thus, this avoids the processing by the manager 43 from becoming a bottleneck for the failure restoration and reduces the time for failure restoration in the storage system 1.
In the embodiment, the shelf 2 comprises two control modules 3. However, the embodiment is not limited thereto, and may be applied in the same manner even when the shelf 2 comprises three or more control modules 3. Also, in the embodiment described above, one of two control modules 3 in the same shelf 2 becomes a buddy control module 3 of another control module 3. However, the embodiment is not limited thereto, and may be applied in the same manner even when a control module 3 in a still another shelf 2 becomes a buddy control module 3.
The agent 42 of the buddy control module 3 may perform the resynchronization processing only when the access attribute of the segment group 46 is not Read Only and the number of mirror surfaces is two or more. The reason is that the synchronization is not destructed when access from the volume processing unit 47 is Read Only or when the number of mirror surfaces is 1, and therefore these segment groups 46 may be excluded from the target of the resynchronization.
Even when a control module 3 is reactivated after all control modules 3 are down, restoration processing is desirable to be performed since the synchronization might have been destructed. When both the normally used control module 3 and the buddy control module 3 are reactivated simultaneously, restoration processing is desirable to be performed only on one of the control modules since the synchronization is destructed if the resynchronization is performed on both of the control modules. The cluster 41 manages all control modules 3 which are synchronized, and therefore is not able to recognize a specific configuration of the normally used control module 3 and the buddy control module 3. Since only the manager 43 is capable of recognizing the configuration, when all control modules 3 are down, the manager 43 performs the resynchronization processing.
All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the invention and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2015-018855 | Feb 2015 | JP | national |
Number | Name | Date | Kind |
---|---|---|---|
8812800 | Usami | Aug 2014 | B2 |
20030126107 | Yamagami | Jul 2003 | A1 |
20060036904 | Yang | Feb 2006 | A1 |
20060236050 | Sugimoto et al. | Oct 2006 | A1 |
20060248304 | Hosouchi et al. | Nov 2006 | A1 |
20080320051 | Murotani et al. | Dec 2008 | A1 |
20090271582 | Ninose | Oct 2009 | A1 |
20110197040 | Oogai et al. | Aug 2011 | A1 |
20140089730 | Watanabe | Mar 2014 | A1 |
20150212897 | Kottomtharayil | Jul 2015 | A1 |
Number | Date | Country |
---|---|---|
2003-233518 | Aug 2003 | JP |
2006-285919 | Oct 2006 | JP |
2006-309638 | Nov 2006 | JP |
2009-3499 | Jan 2009 | JP |
2009-266120 | Nov 2009 | JP |
2011-164800 | Aug 2011 | JP |
Number | Date | Country | |
---|---|---|---|
20160224446 A1 | Aug 2016 | US |