The present invention relates to the field of Redundant Arrays of Inexpensive Disks (RAID) storage systems and, more particularly, optimizing the reconstruction of the contents of a component drive in a RAID system following its failure.
Redundant Arrays of Inexpensive Disks (RAID) have become effective tools for maintaining data within current computer system architectures. A RAID system utilizes an array of small, inexpensive hard disks capable of replicating or sharing data among the various drives. A detailed description of the different RAID levels is disclosed by Patterson, et al. in “A Case for Redundant Arrays of Inexpensive Disks (RAID),” ACM SIGMOD Conference, June 1988. This article is incorporated by reference herein.
Several different levels of RAID implementation exist. The simplest array, RAID level 1, comprises one or more primary disks for data storage and an equal number of additional “mirror” disks for storing a copy of all the information contained on the data disks. The remaining RAID levels 2, 3, 4, 5 and 6, all divide contiguous data into pieces for storage across the various disks.
RAID level 2, 3, 4, 5 or 6 systems distribute this data across the various disks in blocks. A block is composed of multiple consecutive sectors. A sector is the disk drive's minimal unit of data transfer. A sector is a physical section of a disk drive and comprises a collection of bytes. When a data block is written to a disk, it is assigned a Disk Block Number (DBN). All RAID disks maintain the same DBN system so one block on each disk will have a given DBN. A collection of blocks across the various disks which have the same DBN are collectively known as stripes.
Additionally, many of today's operating systems manage the allocation of space on mass storage devices by partitioning this space into volumes. The term volume refers to a logical grouping of physical storage space elements which are spread across multiple disks and associated disk drives, as in a RAID system. Volumes are part of an abstraction which permits a logical view of storage as opposed to a physical view of storage. As such, most operating systems see volumes as if they were independent disk drives. Volumes are created and maintained by Volume Management Software. A volume group comprises a collection of distinct volumes that comprise a common set of drives.
One of the major advantages of a RAID system is its ability to reconstruct data from a failed component disk from information contained on the remaining operational disks. In RAID levels 3, 4, 5, 6, redundancy is achieved by the use of parity blocks. The data contained in a parity block of a given stripe is the result of a calculation carried out each time a write occurs to a data block in that stripe. The following equation is commonly used to calculate the next state of a given parity block:
new parity block=(old data block×or new data block)×or old parity block
The storage location of this parity block varies between RAID levels. RAID levels 3 and 4 utilize a specific disk dedicated solely to the storage of parity blocks. RAID levels 5 and 6 interleave the parity blocks across all of the various disks. RAID level 6 distinguishes itself as it has two parity blocks per stripe, thus accounting for the simultaneous failure of two disks. If a given disk in the array fails, the data and parity blocks for a given stripe contained on the remaining disks can be combined to reconstruct the missing data.
One mechanism for dealing with the failure of a single disk in a RAID system is the integration of a global hot spare disk. A global hot spare disk is a disk or group of disks used to replace a failed primary disk in a RAID configuration. The equipment is powered on or considered “hot,” but is not actively functioning in the system. When a single disk in a RAID system (or up to two disks in a RAID 6 system) fails, the global hot spare disk integrates for the failed disk and reconstructs all the volume pieces of the failed disk using the data blocks and parity blocks from the remaining operational disks. Once this data is reconstructed, the global hot spare disk may function as a component disk of the RAID system until a replacement for the failed RAID disk is inserted into the RAID. When the failed primary disk is replaced, a copyback of the reconstructed data from the global hot spare to the replacement disk may occur.
Currently, when component disks in a non-RAID 0 system fail and a replacement for that component disk is inserted into the RAID prior to completion of the reconstruction of all volume pieces from the failed disk, the global hot spare disk remains integrated for the failed disk and the reconstruction of all volume pieces from the failed disk is directed to the global hot spare disk. This approach needlessly reconstructs and copies back volume pieces which had not yet begun the reconstruction process when the replacement drive was inserted.
Therefore, it would be desirable to provide a system and a method for reconstruction and copyback of a failed disk in a RAID using a global hot spare disk where only the volume pieces of the failed disk whose reconstruction had begun prior to insertion of a replacement disk are reconstructed to the global hot spare and the volume pieces whose reconstruction had not yet begun upon replacement of the failed disk are reconstructed directly to the replacement disk.
Accordingly, the present invention is directed to a system and a method for optimized reconstruction and copyback of a failed RAID disk utilizing a global hot spare disk.
In a first aspect of the invention, a system for the reconstruction and copyback of a failed RAID disk utilizing a global hot spare is disclosed. The system comprises the following: a processing unit requiring mass-storage; one or more disks configured as a RAID system; an associated global hot spare disk; and interconnections linking the processing unit, the RAID and the global hot spare disk.
In a further aspect of the present invention, a method for the reconstruction and copyback of a failed disk volume utilizing a global hot spare disk is disclosed. The method includes: detecting the failure of a RAID component disk; reconstructing a portion of the data contained on the failed RAID component disk to a global hot spare disk; replacing the failed RAID component disk; reconstructing any data on the failed RAID disk not already reconstructed to the global hot spare disk to the replacement disk; and copying any reconstructed data from the global hot spare disk back to the replacement RAID component disk.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed. The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate an embodiment of the invention and together with the general description, serve to explain the principles of the invention.
The numerous advantages of the present invention may be better understood by those skilled in the art by reference to the accompanying figures in which:
Reference will now be made in detail to the presently preferred embodiments of the invention.
Should a component disk of a RAID system fail, a global hot spare disk will incorporate for the missing drive. Following the disk failure, when a processing unit makes an I/O request to one or more volumes in the RAID, the volumes which have individual volume “pieces” located on that disk transition into a “degraded” state. When one or more volumes become degraded, the system initiates a reconstruction of the degraded-volume pieces on the failed disk to the global hot spare disk so as to maintain the consistency of the data. This reconstruction is achieved by use of the data and parity information maintained on the remaining drives. Following reconstruction of any degraded volumes, the global hot spare disk operates as a component drive in the RAID in place of the failed disk with respect to the degraded volumes. Once a replacement disk for the failed disk is inserted back into the RAID, the degraded-volume pieces which have previously been reconstructed on the global hot spare disk are copied back to the replacement disk.
However, the possibility exists that, during the reconstruction of multiple degraded-volume pieces to the global hot spare disk, a replacement disk may be inserted in place of the failed disk. Should this situation arise, the system begins reconstructing those degraded-volume pieces of the failed disk not already reconstructed to the global hot spare disk directly to the replacement disk.
This methodology shortens the amount of time required for the reconstruction/copyback process as a whole (and thus any overall system down time). A portion of the reconstruction can be carried out directly on the replacement disk, thereby avoiding the time which would be required for copyback of that data from the global hot spare to a replacement disk.
This methodology also reduces the amount of time that a global hot spare is dedicated to a given volume group. As a global hot spare can only be incorporated for one failed RAID component disk at a time, the simultaneous failure of multiple RAID disks can not be handled. As such, minimizing the amount of time that a global hot spare is used as a RAID component disk is desirable.
A system in accordance with the invention may be implemented by incorporation into the volume management software of a processing unit requiring mass-storage, as firmware in a controller for a RAID system, or as a stand alone hardware component which interfaces with a RAID system.
Additional details of the invention are provided in the examples illustrated in the accompanying drawings.
Referring to
Referring to
Referring to
Referring to
Referring to
Referring to
Referring to
It is believed that the present invention and many of its attendant advantages will be understood by the foregoing description. It is also believed that it will be apparent that various changes may be made in the form, construction and arrangement of the components thereof without departing from the scope and spirit of the invention or without sacrificing all of its material advantages. The form herein before described being merely an explanatory embodiment thereof. It is the intention of the following claims to encompass and include such changes.