This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2012-272769, filed on Dec. 13, 2012, the entire contents of which are incorporated herein by reference.
The embodiments discussed herein are related to a storage apparatus, a control method, and a control program.
With the advent of an era of big data, techniques on “automatic hierarchization of storage”, which automatically distribute data in accordance with the characteristics of storage devices having different performances and capacities, attract attention. Accordingly, demand is increasing for inexpensive magnetic disk units with a large volume (for example, SATA-DISK of 4TB). When redundant arrays of inexpensive disks (RAID) are configured using such magnetic disk units, if a failure occurs in one unit of the magnetic disk units in operation, rebuild is carried out in a hot-spare magnetic disk unit, but it takes a long time for the rebuild. Here, rebuild is restructuring data. During rebuild, there is no redundancy of the magnetic disk units, and thus if rebuild continues for a long time, a risk of RAID failure increases.
Corruption of data files due to a RAID failure, and so on causes severe damage to a database. This is because if inconsistent data is written into a storage unit, a vast amount of workload and time become desired for identifying the cause, repairing the system, and recovering the database.
Thus, RAID compulsory restore techniques, in which when a RAID failure occurs, a RAID apparatus having the RAID failure is quickly brought back to an operable state, are known. For example, in RAIDS, when failures occur in two magnetic disk units, thus resulting in a RAID failure, if a second failed disk unit is restorable because of a temporary failure, and so on, RAID compulsory restore is carried out by restoring the second failed disk unit.
Also, techniques are known in which at the time of a RAID breakdown, RAID configuration information immediately before the breakdown is stored, and if a recovery request is given by user's operation, the RAID is compulsorily restored to the state immediately before the breakdown on the basis of the stored information (for example, refer to Japanese Laid-open Patent Publication No. 2002-373059).
Related-art techniques have been disclosed in Japanese Laid-open Patent Publication Nos. 2002-373059, 2007-52509, and 2010-134696.
However, in a RAID apparatus that has been compulsory restored, there is a problem in that no redundancy is provided, thus the occurrence of a RAID failure again is at high risk, and data assurance is insufficient.
According to an embodiment of the present disclosure, it is desirable to improve data assurance in a RAID apparatus that has been compulsory restored.
According to an aspect of the invention, a storage apparatus has a plurality of storage devices and a controller for controlling data read from the plurality of storage devices and data write to the plurality of storage devices, the controller includes a determination unit and a restore processing unit, when a new storage device has failed in a non-redundant state being a redundant group state without redundancy, in which some of the storage devices had failed out of the plurality of storage devices, the determination unit configured to determine whether execution of compulsory restore of the redundant group is possible or not on the basis of a failure cause of the plurality of failed storage devices, and if the determination unit determines that the execution of compulsory restore of the redundant group is possible, the restore processing unit configured to incorporate a plurality of storage devices including a newly failed storage device in the non-redundant state into the redundant group and to compulsorily restore the storage apparatus to an available state.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention, as claimed.
In the following, a detailed description is given of a storage apparatus, a control method, and a control program according to an embodiment of the present disclosure with reference to the drawings. In this regard, this embodiment does not limit the disclosed technique.
First, a description is given of a RAID apparatus according to the embodiment.
The CM 21 is a controller that controls data read from the RAID apparatus 2, and data write to the RAID apparatus 2, and includes a channel adapter (CA) 211, a CPU 212, a memory 213, and a device interface (DI) 214. The CA 211 is an interface with a host 1, which is a computer using the RAID apparatus 2, and accepts an access request from the host 1, and makes a response to the host 1. The CPU 212 is a central processing unit that controls the RAID apparatus 2 by executing an input/output control program stored in the memory 213. The memory 213 is a storage device for storing the input/output control program to be executed on the CPU 212 and data. The DI 214 is an interface with the DE 22, and instructs the DE 22 to read and write data.
The DE 22 includes four disks 221, and stores data to be used by the host 1. In this regard, here, a description is given of the case where the DE 22 includes four disks 221, and constitutes RAIDS (3+1), that is to say, the case where three units store data for each stripe, and one unit stores parity data. However, the DE 22 may include the disks 221 of other than four units. The disk 221 is a magnetic disk unit that uses a magnetic disk as a data recording medium.
Next, a description is given of a functional configuration of an input/output control program executed on the CPU 212.
The table storage unit 31 is a storage unit that stores data desired for controlling the RAID apparatus. The data stored in the table storage unit 31 is stored in the memory 213 illustrated in
Also, the table storage unit 31 stores information on slice_bitmap as SLU_TBL. Here, slice_bitmap is information indicating an area into which data is written in a state in which the RAID apparatus 2 lost redundancy, and represents a state of a predetermined-size area specified by logical block address (LBA) by one bit.
The state management unit 32 detects a failure in the disk 221 and the RAID apparatus 2, and manages the disk 221 and the RAID apparatus 2 using PLU_TBL and RLU_TBL. The states managed by the state management unit 32 includes “AVAILABLE”, which indicates an available state with redundancy, “BROKEN”, which indicates a failed state, and “EXPOSED”, which indicates a state without redundancy. Also, the states managed by the state management unit 32 include, “TEMPORARY_USE”, which indicates a RAID compulsory restore state, and so on. Also, when the state management unit 32 changes the state of the RAID apparatus 2, the state management unit 32 sends a configuration change notification to the write-back unit 35.
When the RAID apparatus 2 becomes a failed state, that is to say, when the state of the RAID apparatus 2 becomes “BROKEN”, the compulsory restore unit 33 determines whether the first disk and the last disk are restorable. If restorable, the compulsory restore unit 33 performs compulsory restore on both of the disks. Here, the “first disk ” is a disk that has failed first from the state in which all the disks 221 are normal, and is also referred to as a suspected disk. Also, the “last disk” is a newly failed disk when there is no redundancy in the RAID apparatus 2, and if the last disk fails, the RAID apparatus 2 becomes a failed state. In RAIDS, if two disks fail, the RAID apparatus 2 becomes the failed state, and thus a disk that has failed in the second place is the last disk.
In the case of a failure caused by a hardware factor, such as a compare error, it is not possible for the compulsory restore unit 33 to perform RAID compulsory restore. On the other hand, in the case of a transient failure, such as an error caused by a temporarily high load on a disk, and so on, the compulsory restore unit 33 performs RAID compulsory restore. In this regard, when the compulsory restore unit 33 performs RAID compulsory restore, the compulsory restore unit 33 changes the state of the RAID apparatus 2 to “TEMPORARY_USE”.
The staging unit 34 reads data stored in the RAID apparatus 2 on the basis of a request from the host 1. However, if the state of the RAID apparatus 2 is a state in which RAID compulsory restore has been performed, the staging unit 34 checks the value of slice_bitmap corresponding to the area from which data read is requested before the RAID apparatus 2 reads the stored data.
And if the value of slice_bitmap is “0”, the area is not an area into which data has been written when the RAID apparatus 2 lost redundancy, and thus the staging unit 34 reads the requested data from the disk 221 to respond to the host 1.
On the other hand, if the value of slice_bitmap is “1”, the staging unit 34 reads the requested data from the disk 221 to respond to host 1, and performs data consistency processing with the area from which the data has been read. That is to say, the staging unit 34 performs data consistency processing on the area into which data was written when the RAID apparatus 2 lost redundancy. Specifically, the staging unit 34 updates the data of the suspected disk to the latest data as to the area into which data is written when the RAID apparatus 2 lost redundancy using the data of the other disk for each stripe. This is because the suspected disk is a failed disk in the first place, and thus old data is stored in the area into which data was written when the RAID apparatus 2 lost redundancy. In this regard, a description is given later of the details of the processing flow of the data inconsistency processing by the staging unit 34.
The write-back unit 35 writes data into the RAID apparatus 2 on the basis of a request from the host 1. However, if the RAID apparatus 2 is in a state without redundancy, the write-back unit 35 sets the bit corresponding to the data write area among the bits of slice_bitmap to “1”.
Also, if it is desired to read data from the disk 221 in order to calculate a parity at the time of writing the data, the write-back unit 35 performs data consistency processing on the area into which data has been written when the RAID apparatus 2 lost redundancy. A description is given later of the details of the processing flow of the data inconsistency processing by the write-back unit 35.
The control unit 36 is a processing unit that performs overall control of the input/output control program 3. Specifically, the control unit 36 performs transfer of control among the functional units and data exchange between the functional units and the storage units, and so on so as to function the input/output control program 3 as one program.
Next, a description is given of a processing flow of processing for performing RAID compulsory restore using
As illustrated in
And the RAID apparatus performs RAID compulsory restore (operation S3). That is to say, the RAID apparatus determines whether the last disk is restorable or not (operation S4). If not restorable, the processing is terminated with keeping the RAID failure as it is. On the other hand, if restorable, the RAID apparatus restores the last disk, and the state of the RAID apparatus is set to “RLU_EXPOSED” (operation S5).
After that, when the first disk is replaced, the RAID apparatus rebuilds the first disk, and sets the state to “RLU_AVAILABLE” (operation S6). And when the last disk is replaced, the RAID apparatus rebuilds the last disk, and sets the state to “RLU_AVAILABLE” (operation S7). Here, the reason that the RAID apparatus sets the state of to “RLU_AVAILABLE” again is to change the state during the rebuild.
On the other hand, in the processing for performing RAID compulsory restore on the last disk and the first disk, as illustrated in
After that, the RAID apparatus 2 detects a failure in another disk 221, that is to say, a failure in the last disk, and sets the state of the RAID apparatus 2 to “RLU_BROKEN” (operation S23).
And the RAID apparatus 2 performs RAID compulsory restore (operation S24). That is to say, the RAID apparatus 2 determines whether the last disk is restorable or not (operation S25), and if not restorable, the processing is terminated with keeping the RAID failure as it is.
On the other hand, if restorable, the RAID apparatus 2 determines whether the first disk is restorable or not (operation S26). If not restorable, the RAID apparatus 2 restores the last disk, and sets the state to “RLU_EXPOSED” (operation S27). After that, when the first disk is replaced, the RAID apparatus 2 rebuilds the first disk, and sets the state to “RLU_AVAILABLE” (operation S28). And if the last disk is replaced, the RAID apparatus 2 rebuilds the last disk, and sets the state to “RLU_AVAILABLE” (operation S29). Here, the reason that the RAID apparatus 2 sets to “RLU_AVAILABLE” again is to change the state during the rebuild.
On the other hand, if the first disk is restorable, the RAID apparatus 2 restores the first disk, and sets the state of the first disk to “PLU_TEMPORARY_USE” (operation S30). And the RAID apparatus 2 restores the last disk, and sets the state of the last disk to “PLU_AVAILABLE” (operation S31). And the RAID apparatus 2 sets the state of the apparatus to “RLU_TEMPORARY_USE” (operation S32).
After that, when the first disk is replaced, the RAID apparatus 2 rebuilds the first disk. Alternatively, the RAID apparatus 2 performs RAID diagnosis (operation S33). And the RAID apparatus 2 sets the state to (RLU_AVAILABLE). And when the last disk is replaced, the RAID apparatus 2 rebuilds the last disk, and sets the state to (RLU_AVAILABLE) (operation S34). Here, the reason that the RAID apparatus 2 sets to “RLU_AVAILABLE” again is to change the state during the rebuild.
In this manner, by determining whether the first disk and the last disk are restorable or not, and restoring both of the disks if restorable, it is possible for the RAID apparatus 2 to perform RAID compulsory restore with redundancy.
Next, a description is given of state transition of the RAID apparatus.
After that, when another disk, that is to say, the last disk fails, the state of the RAID apparatus is changed to “BROKEN”, which indicates a failed state (ST13). And if the last disk is restored by RAID compulsory restore, the state of the RAID apparatus is changed to “EXPOSED”, which is a state without redundancy (ST14). After that, if the first disk is replaced, the state of the RAID apparatus is changed to “AVAILABLE” which is a state with redundancy (ST15).
On the other hand, in the case of performing RAID compulsory restore on the last disk and the first disk, when all the disks 221 are normally operating, the state of the RAID apparatus 2 is “AVAILABLE”, which is a state with redundancy (ST21). And if one disk 211, that is to say, the first disk fails, the state of the RAID apparatus is changed to “EXPOSED”, which is a state without redundancy (ST22).
After that, when another disk 221, that is to say, the last disk fails, the state of the RAID apparatus 2 is changed to “BROKEN”, which indicates a failed state (ST23). And if the last disk and the first disk are restored by RAID compulsory restore, the state of the RAID apparatus 2 is changed to “TEMPORARY_USE”, which is a state with redundancy and allowed to be used temporarily (ST24). After that, if the first disk is replace or RAID diagnosis is performed, the state of the RAID apparatus 2 is changed to “AVAILABLE”, which is a state with redundancy (ST25).
In this manner, by restoring the last disk and the first disk by RAID compulsory restore to change the state to “TEMPORARY_USE”, it is possible for the RAID apparatus 2 to operate in a state with redundancy after RAID compulsory restore.
Next, a description is given of a processing flow of write-back processing when the state of the RAID apparatus 2 is “EXPOSED”.
As illustrated in
As a result, if there is redundancy, the state of the RAID apparatus has not been “EXPOSED”, and thus the write-back unit 35 initializes slice_bitmap (operation S44). On the other hand, if there is no redundancy, the write-back unit 35 sets the bit of slice_bitmap corresponding to the write request range to “1” (operation S43).
And the write-back unit 35 performs data write processing on the disk 221 (operation S45), and makes a response of the result to the host 1 (operation S46).
In this manner, when the state of the RAID apparatus 2 is “EXPOSED”, the write-back unit 35 sets the corresponding bit of slice_bitmap of the write request range to “1”, and thus it is possible for the RAID apparatus 2 to identify a target area of the data consistency processing in the state of RAID compulsory restore.
Next, a description is given of a processing flow of staging processing after RAID compulsory restore using
As a result, if the value of slice_bitmap is “0”, the disk-read request range is not an area into which the RAID apparatus 2 performed data write in the state without redundancy, and thus the staging unit 34 performs disk read of the requested range in the same manner as before (operation S62). And the staging unit 34 makes a response of the read result to the host 1 (operation S63).
On the other hand, if the value of slice_bitmap is “1”, the disk-read request range is an area into which the RAID apparatus 2 performed data write in the state without redundancy, and thus the staging unit 34 performs disk read for each stripe corresponding to the requested range (operation S64).
For example, in
Also, it is assumed that a shaded portion of the storage data 51 is data corresponding to LBA=0x100 to 0x3FF. Also, assuming that slice_bitmap=0x01, from
And the staging unit 34 determines whether disk read is normal or not (operation S65). If normal, the processing proceeds to operation S70. On the other hand, if not normal, the staging unit 34 determines whether a suspected disk error has occurred or not (operation S66). As a result, in the case of an error other than the suspected disk, it is not possible to assure the data, the staging unit 34 creates PIN data for the requested range (operation S67), and makes an abnormal response to the host 1 together with the PIN data (operation S68). Here, the PIN data is data indicating data inconsistency.
On the other hand, if the suspected disk error, the staging unit 34 restores the data of the suspected disk from the other data and the parity data (operation S69). That is to say, the target area is an area into which the RAID apparatus 2 has written data in a state without redundancy, and thus the suspected disk might not store the latest data. Thus, the staging unit 34 updates the data of the suspected disk to the latest data.
For example, in
And the staging unit 34 determines whether there is data consistency or not by performing compare check (operation S70). Here, the compare check is checking whether all the bits of the result of performing exclusive-OR operation on all the data for each stripe are 0 or not. For example, in
And if there is not data consistency, the staging unit 34 restores the data of the suspected disk from the other data and the parity data in the same stripe, and updates the suspected disk (operation S71). For example, in
And the staging unit 34 sends a normal response to the host 1 together with the data (operation S72).
In this manner, if a read area is an area into which data has been written in a state in which the RAID apparatus 2 lost redundancy, by the staging unit 34 performing matching processing of the suspected disk, it is possible for the RAID apparatus 2 to assure the data at higher level.
Next, a description is given of the processing flow of write-back processing after RAID compulsory restore using
“Bandwidth” is the case where data to be written into the disk has a sufficiently large size for parity calculation, and the case where it is not desired to read data from the disk for parity calculation. For example, as illustrated in
“Readband” is the case where the size of the data to be written into the disk is insufficient for parity calculation, and it is desired to read data from the disk for parity calculation. For example, as illustrated in
“Small” is the case where the size of the data to be written into the disk is insufficient for parity calculation in the same manner as “Readband”, and it is desired to read data from the disk for parity calculation. However, if the size of data to be written into the disk is 50% or more of the data desired for parity calculation, the write-back processing is “Readband”, and if the size of data to be written into the disk is less than disk 50% of the data desired for parity calculation, the write-back processing is “Small”. For example, as illustrated in
Referring back to
On the other hand, if the kind of write-back is not “Bandwidth”, the write-back unit 35 determines whether slice_bitmap of the disk-write requested range of is hit, that is to say, whether the value of slice_bitmap is “0” or “1” (operation S85).
As a result, if slice_bitmap is not hit, that is to say, if the value of slice_bitmap is “0”, the disk-write requested range is not an area into which data is written in a state in which the RAID apparatus 2 lost redundancy, and thus the write-back unit 35 performs the same processing as before. That is to say, the write-back unit 35 creates a parity (operation S82), writes the data and the parity into the disk (operation S83), and makes a response to the host 1 (operation S84).
On the other hand, if slice_bitmap is hit, the write-back requested range is an area into which data is written in a state in which the RAID apparatus 2 lost redundancy, and thus the write-back unit 35 performs disk read for each stripe corresponding to the requested range (operation S86). Here, the case where slice_bitmap is hit is the case where the value of slice_bitmap is “1”.
For example, in
Also, it is assumed that a shaded portion of the storage data 61 is data corresponding to LBA=0x100 to 0x3FF. Also, assuming that slice_bitmap=0x01, from
And the write-back unit 35 determines whether disk read is normal or not (operation S87). If normal, the processing proceeds to operation S92. On the other hand, if not normal, the write-back unit 35 determines whether the suspected disk error has occurred or not (operation S88). As a result, in the case of an error other than the suspected disk, it is not possible to assure the data, thus the write-back unit 35 creates PIN data for the requested range (operation S89), and makes an abnormal response to the host 1 together with the PIN data (operation S90).
On the other hand, if the suspected disk error, the write-back unit 35 restores the data of the suspected disk from the other data and the parity data (operation S91). That is to say, the target area is an area into which the RAID apparatus 2 has written data in a state without redundancy, and thus the suspected disk might not store the latest data. Thus, the write-back unit 35 updates the data of the suspected disk to the latest data.
For example, in
And the write-back unit 35 determines whether there is data consistency or not by performing compare check (operation S92). For example, in
As a result, if there is data consistency, the write-back unit 35 issues disk write (operation S96) in order to write update data into the disk. And the write-back unit 35 makes a normal response to the host 1 (operation S97).
On the other hand, if there is not data consistency, the write-back unit 35 restores the data of the suspected disk from the other data and the parity data in the same stripe, and updates the suspected disk (operation S93). For example, in
And the write-back unit 35 issues disk write (operation S94), and writes the restored data and update data into the disk. For example, in
In this manner, if a write-back area is an area into which data write has been performed in a state in which the RAID apparatus 2 lost redundancy, by the write-back unit 35 performing matching processing of the suspected disk, it is possible for the RAID apparatus 2 to assure the data at higher level.
As described above, in the embodiment, when the RAID apparatus 2 becomes a failed state, the compulsory restore unit 33 determines whether the first disk and the last disk are restorable or not. If they are restorable, both of the disks are compulsorily restored. Accordingly, it is possible for the RAID apparatus 2 to have redundancy after RAID compulsory restore, and thus to improve data assurance.
Also, in the embodiment, when the RAID apparatus 2 writes data in a state without redundancy, the write-back unit 35 sets the corresponding bit to the data write area in slice_bitmap bits to “1”. And when the staging unit 34 reads data, the staging unit 34 determines whether the value of the corresponding bit to the data read area in slice_bitmap bits is “1” or not. If the bit is “1”, the staging unit 34 reads data for each stripe from the disk 221. And the staging unit 34 checks data consistency of the data for each stripe. If there is not consistency, the staging unit 34 restores the data of the suspected disk from the other data and the parity data. Also, when the write-back unit 35 writes data in the case where the kind of write-back is other than “Bandwidth”, the write-back unit 35 determines whether the value of the corresponding bit to the data write area in slice_bitmap bits is “1” or not. And if the bit is “1”, the write-back unit 35 reads the data from the disk 221 for each stripe. And the write-back unit 35 checks data consistency of the data for each stripe. If there is no consistency, the write-back unit 35 restores the data of the suspected disk from the other data and the parity data. Accordingly, it is possible for the RAID apparatus 2 to improve data consistency of the data, and data assurance.
In this regard, in the embodiment, a description has been mainly given of the case of RAIDS. However, the present disclosure is not limited to this, and for example, it is possible to apply the present disclosure to a RAID apparatus having redundancy, such as RAID1, RAID1+0, RAID6, and so on in the same manner. In the case of RAID6, if two disks fail, redundancy is lost. And by regarding these two disks as suspected disks, it is possible to apply the present disclosure in the same manner.
All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the invention and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiment of the present invention has been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2012-272769 | Dec 2012 | JP | national |