System and method for raid management, reallocation, and restriping

Information

  • Patent Grant
  • 7886111
  • Patent Number
    7,886,111
  • Date Filed
    Thursday, May 24, 2007
    17 years ago
  • Date Issued
    Tuesday, February 8, 2011
    13 years ago
Abstract
The present disclosure relates to systems and methods for RAID Restriping. One method includes selecting an initial RAID device for migration based on at least one score, creating an alternate RAID device, moving data from the initial RAID device to the alternate RAID device, and removing the initial RAID device. The method may be performed automatically by the system or manually. The method may be performed periodically, continuously, after every RAID device migration, upon addition of disk drives, and/or before removal of disk drives, etc. One system includes a RAID subsystem and a disk manager configured to automatically calculate a score for each RAID device, select a RAID device based on the relative scores of the RAID devices, create an alternate RAID device, move data from the selected RAID device to the alternate RAID device, and remove the selected RAID device.
Description
FIELD OF THE INVENTION

The present invention relates generally to disk drive systems and methods, and more particularly to disk drive systems and methods having a dynamic block architecture RAID Device Management, Reallocation, and Restriping for optimizing RAID Device layout when changes to RAID parameters or disk configuration occur.


BACKGROUND OF THE INVENTION

Existing disk drive systems have been designed in such a way that a Virtual Volume is distributed (or mapped) across the physical disks in a manner which is determined at volume creation time and remains static throughout the lifetime of the Virtual Volume. That is, the disk drive systems statically allocate data based on the specific location and size of the virtual volume of data storage space. Should the Virtual Volume prove inadequate for the desired data storage purposes, the existent systems require the creation of a new Virtual Volume and the concomitant copying of previously stored data from the old Virtual Volume to the new in order to change volume characteristics. This procedure is time consuming and expensive since it requires duplicate physical disk drive space.


These prior art disk drive systems need to know, monitor, and control the exact location and size of the Virtual Volume of data storage space in order to store data. In addition, the systems often need larger data storage space, whereby more RAID Devices are added. As a result, emptied data storage space is not used, and extra data storage devices, e.g. RAID Devices, are acquired in advance for storing, reading/writing, and/or recovering data in the system. Additional RAID Devices are expensive and not required until extra data storage space is actually needed.


Therefore, there is a need for improved disk drive systems and methods, and more particularly a need for efficient, dynamic RAID space and time management systems. There is a further need for improved disk drive systems and methods for allowing RAID management, reallocation, and restriping to occur without loss of server or host data access or compromised resiliency.


BRIEF SUMMARY OF THE INVENTION

The present invention, in one embodiment, is a method of RAID Restriping in a disk drive system. The method includes selecting an initial RAID device for migration based on at least one score, creating an alternate RAID device, moving data stored at the initial RAID device to the alternate RAID device; and removing the initial RAID device. The scores may include an initial score, a replacement score, and an overlay score. Furthermore, the method may be performed automatically by the system or manually, such as by a system administrator. The method may be performed periodically, continuously, after every RAID device migration, upon addition of disk drives, and/or before removal of disk drives.


The present invention, in another embodiment, is a disk drive system having a RAID subsystem and a disk manager. The disk manager is configured to automatically calculate a score for each RAID device of the RAID subsystem, select a RAID device from the subsystem based on the relative scores of the RAID devices, create an alternate RAID device, move a portion of the data stored at the selected RAID device to the alternate RAID device, and remove the selected RAID device.


The present invention, in yet another embodiment, is a disk drive system including means for selecting a RAID device for migration based on at least one score calculated for each RAID device, means for creating at least one alternate RAID device, means for moving data stored at the selected RAID device to the at least one alternate RAID device, and means for removing the selected RAID device.


While multiple embodiments are disclosed, still other embodiments of the present invention will become apparent to those skilled in the art from the following detailed description, which shows and describes illustrative embodiments of the invention. As will be realized, the invention is capable of modifications in various obvious aspects, all without departing from the spirit and scope of the present invention. Accordingly, the drawings and detailed description are to be regarded as illustrative in nature and not restrictive.





BRIEF DESCRIPTION OF THE DRAWINGS

While the specification concludes with claims particularly pointing out and distinctly claiming the subject matter that is regarded as forming the present invention, it is believed that the invention will be better understood from the following description taken in conjunction with the accompanying Figures, in which:



FIG. 1A is a disk array having a RAID configuration in accordance with one embodiment of the present invention.



FIG. 1B is the disk array of FIG. 1A having an additional RAID Device.



FIG. 1C is the disk array of FIG. 1B after removing a RAID Device.



FIG. 2A is a disk array having a RAID configuration in accordance with another embodiment of the present invention.



FIG. 2B is the disk array of FIG. 2A having an additional RAID Device.



FIG. 2C is the disk array of FIG. 2B after removing a RAID Device and adding another RAID Device.



FIG. 2D is the disk array of FIG. 2C after removing yet another RAID device.



FIG. 3A is a disk array having a RAID configuration in accordance with a further embodiment of the present invention.



FIG. 3B is the disk array of FIG. 3A illustrating migration of a RAID Device.



FIG. 3C is the disk array of FIG. 3B illustrating further migration of multiple RAID Devices.



FIG. 3D is the disk array of FIG. 3C illustrating yet further migration of multiple RAID Devices.



FIG. 3E is the disk array of FIG. 3A in a new RAID configuration.



FIG. 4 is a flow chart of a process of Restriping in accordance with one embodiment of the present invention.



FIG. 5 is a disk array having a RAID configuration with different-sized RAID Devices in accordance with one embodiment of the present invention.





DETAILED DESCRIPTION

Various embodiments of the present invention relate generally to disk drive systems and methods, and more particularly to disk drive systems and methods which implement one or more Virtual Volumes spread across one or more RAID Devices, which in turn are constructed upon a set of disk drives. RAID Device Management, Reallocation, and Restriping (“Restriping”) provides a system and method for changing the various properties associated with a Virtual Volume such as size, data protection level, relative cost, access speed, etc. This system and method may be initiated by administration action or automatically when changes to the disk configuration occur.


The various embodiments of the present disclosure provide improved disk drive systems having a dynamic block architecture RAID Device Restriping that may optimize RAID Device layout when changes to RAID parameters or disk configuration occur. In one embodiment, the layout of RAID Devices may be primarily rebalanced when disks are added to the system. By rebalancing, virtualization performance may be improved within the system by using the maximum available disk configuration. Restriping also may provide the capability to migrate data away from a group of disks, allowing those disks to be removed from the system without loss of uptime or data protection. Further, Restriping may provide the capability to change RAID parameters giving the user the ability to tune the performance and/or storage capacity even after the data has been written. Restriping additionally may provide an improved disk drive system and method for allowing Restriping to occur without loss of server or host data access or compromised resiliency.


Various embodiments described herein improve on the existent disk drive systems in multiple ways. In one embodiment, the mapping between a Virtual Volume and the physical disk drive space may be mutable on a fine scale. In another embodiment, previously stored data may be migrated automatically in small units, and the appropriate mappings may be updated without the need for an entire duplication of physical resources. In a further embodiment, portions of a Virtual Volume which are already mapped to appropriate resources need not be migrated, reducing the time needed for reconfiguration of a Volume. In yet another embodiment, the storage system can automatically reconfigure entire groups of Virtual Volumes in parallel. Additionally, the storage system may automatically reconfigure Virtual Volumes when changes to the physical resources occur. Other advantages over prior disk drive systems will be recognized by those skilled in the art and are not limited to those listed.


Furthermore, Restriping and disk categorization may be powerful tools for administrative control of the storage system. Disk drives which, for example, are found to be from a defective manufacturing lot, may be recategorized so that migration away from these disk drives occurs. Similarly, a set of drives may be held in a “reserve” category, and later recategorized to become part of a larger in-use group. Restriping to widen the RAID Devices may gradually incorporate these additional reserve units. It is noted that several benefits may be recognized by the embodiments described herein, and the previous list of examples is not exhaustive and not limiting.


For the purposes of describing the various embodiments herein, a “Volume” may include an externally accessible container for storing computer data. In one embodiment, a container may be presented via the interconnect protocol as a contiguous array of blocks. In a further embodiment, each block may have a fixed size—traditionally 512 bytes. Although, other sized blocks may be used, such as 256, 1,024, etc. bytes. Typically, supported operations performed on data at any given location may include ‘write’ (store) and ‘read’ (retrieve). Although, other operations, such as ‘verify’ may also be supported. The interconnect protocol used to access Volumes may be the same as that used to access disk drives. Thus, in some embodiments, a Volume may appear and function generally identical to that of a disk drive. Volumes traditionally may be implemented as partitions of a disk drive or simple concatenations of disk drives within an array.


A “Virtual Volume,” as used herein, may include an externally accessible container for storing data which is constructed from a variety of hardware and software resources and generally may mimic the behavior of a traditional Volume. In particular, a system containing a disk drive array may present multiple Virtual Volumes which utilize non-intersecting portions of the disk array. In this type of system, the storage resources of the individual disk drives may be aggregated in an array, and subsequently partitioned into individual Volumes for use by external computers. In some embodiments, the external computers may be servers, hosts, etc.


A “RAID Device,” as used herein, may include an aggregation of disk partitions which provides concatenation and resiliency to disk drive failure. The RAID algorithms for concatenation and resiliency are well known and include such RAID levels as RAID 0, RAID 1, RAID 0+1, RAID 5, RAID 10, etc. In a given disk array, multiple RAID Devices may reside on any given set of disks. Each of these RAID Devices may employ a different RAID level, have different parameters, such as stripe size, may be spread across the individual disk drives in a different order, may occupy a different subset of the disk drives, etc. A RAID Device may be an internally accessible Virtual Volume. It may provide a contiguous array of data storage locations of a fixed size. The particular RAID parameters determine the mapping between RAID Device addresses and the data storage addresses on the disk drives. In the present disclosure, systems and methods for constructing and modifying externally accessible Virtual Volumes from RAID Devices are described that provide the improved functionality.


Virtual Volume Construction


A storage system which utilizes the present disclosure may initially construct a set of RAID Devices having various characteristics on a disk array. The RAID Devices may be logically divided into units referred to herein as “pages,” which may be many blocks in size. A typical page size may be 4,096 blocks. Although, in principle any page size from 1 block onwards could be used. However, page sizes generally comprise block numbers in the power of 2. These pages may be managed by Virtual Volume management software. Initially, all the pages from each RAID Device may be marked as free. Pages may be dynamically allocated to Virtual Volumes on an as-needed basis. That is, pages may be allocated when it is determined that a given address is first written. Addresses that are read before being written can be given a default data value. The Virtual Volume management software may maintain the mapping between Virtual Volume addresses and pages within the RAID Devices. It is noted that a given Virtual Volume may be constructed of pages from multiple RAID Devices, which may further have differing properties.


Extending the size of a Virtual Volume constructed in this manner may be accomplished by increasing the range of addresses presented to the server. The address-to-page mapping may continue with the same allocate-on-write strategy in both the previously available and extended address ranges.


The performance and resiliency properties of a given Virtual Volume may be determined in large part by the aggregate behavior of the pages allocated to that Virtual Volume. The pages inherit their properties from the RAID Device and physical disk drives on which they are constructed. Thus, in one embodiment, page migration between RAID Devices may occur in order to modify properties of a Virtual Volume, other than size. “Migration,” as used herein, may include allocating a new page, copying the previously written data from the old page to the new, updating the Virtual Volume mapping, and marking the old page as free. Traditionally, it may not be possible to convert the RAID Device properties (i.e., remap to a new RAID level, stripe size, etc.) and simultaneously leave the data in place.


There are several independent parameters which may be modified to produce different Virtual Volume properties. Several of the scenarios are outlined in detail herein. However, the scenarios described in detail herein are exemplary of various embodiments of the present disclosure and are not limiting. The present disclosure, in some embodiments, may include simultaneous modification of any or all of these parameters.


RAID Parameter Modification


For purposes of illustration, a disk array 100 containing five disks 102, 104, 106, 108, 110 is shown in FIG. 1A. It is recognized that any number of disks may be used in accordance with the various embodiments disclosed herein, and an exemplary five disk system has been randomly chosen for purposes of describing one embodiment. Initially, two RAID Devices, e.g., RAID Devices A 112 and B 114, may be constructed upon a disk array. The remaining space, if any, may be unallocated and unused. Again, it is recognized that any number of RAID Devices may be used in accordance with the various embodiments disclosed herein, and an exemplary two RAID Devices have been randomly chosen for purposes of describing one embodiment. Multiple Virtual Volumes may be constructed from the pages contained in the RAID Devices. If it is desired that the properties of a given Virtual Volume be modified, additional RAID Devices may be constructed in the remaining space and the appropriate pages migrated, as described previously.



FIG. 1B depicts an embodiment of a RAID configuration after creating a new RAID Device, e.g., RAID Device C 116, and shows the migration of data from RAID Device A 112. RAID Device C 116 may differ from RAID Device A 112 in RAID level, stripe size, or other RAID parameter, etc. In some embodiments, there may be potential for improved performance by simply relocating to a RAID Device with the same parameters but a different location on the disk drives. For example, the performance of a disk drive may vary from the inside to the outside of the physical platter, and the time for head seeking may be reduced if all data is densely located.


When the migration is complete, RAID Device A 112 may be deleted, leaving the example RAID configuration shown in FIG. 1C.


The exemplary RAID reconfiguration from that of FIG. 1A to that of FIG. 1C also demonstrates the ability to move portions of Virtual Volumes. That is, in one embodiment, rather than moving an entire Volume, portions of one or more Virtual Volumes may be migrated. This may be accomplished because a single Virtual Volume may be allocated across a plurality of RAID Devices. Similarly, the example configuration demonstrates the ability to move groups of Virtual Volumes since pages migrated from one RAID Device to another RAID Device may be allocated to a plurality of Virtual Volumes.


Adding Disk Drives


Another embodiment having a disk array 200 containing five disks 202, 204, 206, 208, 210 is illustrated in FIG. 2A, where two additional disk drives 212, 214 have been added to an existing configuration. It is recognized that any number of disks may be used in accordance with the various embodiments disclosed herein, and an exemplary five disk system has been randomly chosen for purposes of describing one embodiment. Similarly, it is recognized that any number of disks may be added in accordance with the various embodiments disclosed herein, and an exemplary two additional disks have been randomly chosen for purposes of describing one embodiment. In some embodiments, it may be desirable to reconfigure the system and spread the RAID Devices across all seven disks. However, it is recognized that the reconfigured RAID Devices do not need to be spread across all available disks. In an embodiment where the RAID Devices are spread across a plurality of disks, the total throughput of the system can be increased by utilizing more hardware in parallel. Additionally, RAID Device layout constraints may result in more efficient use of the additional disks. In particular, RAID 5 typically may require a minimum number of independent disks in order to provide resiliency. Commonly encountered RAID 5 implementations may require a minimum of five disks. Thus, it may be desirable to migrate the pages from both RAID Devices A 216 and B 218, for example, to suitable replacements that span all seven disks. A possible sequence for reconfiguration is shown in FIGS. 2B-D.


In this sequence, the wider RAID Device C 220 may be created and data from RAID Device A 216 may be migrated to RAID Device C 220. RAID Device A 216 may then be deleted, and RAID Device D 222 may be created. RAID Device D 222 may be used to relocate the data previously contained in RAID Device B 218.


In doing so, the only extra space needed on the original disk drives may be that required to create RAID Device C 220. In one embodiment of the example illustration, in the case wherein no other RAID parameter changes, each extent of RAID Device C 220 may be 5/7 the size of the extent size (i.e., RAID Device C is spread among 5 initial drives+2 additional drives) used in constructing RAID Device A 216.


It is noted that the process may be entirely reversible and can be used to remove one or more disk drives from a system, such as, for example, if it was desired that disks 212 and 214 be removed from the example configuration of FIG. 2D. Similarly, multiple initial RAID Devices may be migrated to a single RAID Device, or a fewer number of RAID Devices (see e.g., FIG. 3C). Furthermore, a single initial RAID Device may be migrated to a plurality of new RAID Devices (see e.g., FIG. 3B).


The previous example of one embodiment described with reference to FIGS. 2A-D demonstrates the ability to migrate data across additional disks when unused space exists on the original disk set. In some embodiments, however, there may be insufficient disk space to migrate, remove, etc. a RAID Device. Nonetheless, it may be possible to migrate data to additional disks. In such a case, disk space may be reallocated to provide the extra space needed to perform the move. If the Replacement Score, described in detail below, of a RAID Device is higher than the initial Score, a permanent RAID Device of equal size may be allocated. No additional decisions may be required. If the Overlay Score, described in detail below, of a RAID Device is higher than the initial Score, temporary space may be used. This process is detailed for one embodiment having a disk array 300 containing four disks 302, 304, 306, 308 with reference to FIGS. 3A-E, where a multi-step migration is used. Three disks 310, 312, 314 have been added to the disk array 300. It is recognized that any number of disks may be used in accordance with the various embodiments disclosed herein, and an exemplary four disk system has been randomly chosen for purposes of describing one embodiment. Similarly, it is recognized that any number of disks may be added in accordance with the various embodiments disclosed herein, and an exemplary three additional disks have been randomly chosen for purposes of describing one embodiment.


The strategy for reconfiguring the system shown in FIG. 3A to make use of all the available disk drives in the array may include creating a temporary RAID Device or temporary RAID Devices and migrating the data from RAID Device C 320, for example, away from the original disk drives to temporary RAID Devices D 322 and E 324, for example. Temporary RAID Devices may be used in such cases where the original disk set is at, or near, capacity. In alternate embodiments, the temporary space may not need to be allocated as RAID Devices and may be used in any manner known in the art for suitably holding data. Similarly, although two temporary RAID Devices D 322 and E 324 are illustrated, it is recognized that a fewer or greater number of temporary RAID Devices may be utilized.


In one embodiment, a data progression process may manage the movement of data between the initial RAID Device and the temporary RAID Device(s), or in other cases, new permanent RAID Device(s). In further embodiments, Restriping may attempt to use the same RAID level, if available. In other embodiments, Restriping may move the data to a different RAID level.


The size of a temporary RAID Device may depend on the initial RAID Device size and available space within a page pool. The size of the temporary RAID Device may provide sufficient space, such that when the initial RAID Device is deleted, the page pool may continue to operate normally and not allocate more space. The page pool may allocate more space at a configured threshold based on the size of the page pool.


Once the data has been migrated away from RAID Device C 320, it can be deleted, providing space for a new RAID Device spanning all of the disk drives, e.g., RAID Device X 326. Deleting RAID Device C 320 may return the disk space RAID Device C 320 consumed to the free space on the disk. At this point, a disk manager may combine adjacent free space allocation into a single larger allocation to reduce fragmentation. Deleting a RAID Device may create free space across a larger number of disks than was previously available. A RAID Device with a higher Score can be created from this free space slice.


After the initial RAID Device C 320 is deleted, Restriping may create a replacement RAID Device X 326, as shown in FIG. 3C. In one embodiment, replacement RAID Device X 326 may use as many disks as possible to maximize the benefits of virtualization. Restriping may attempt to allocate a RAID Device larger than the initial RAID Device. In a further embodiment, Restriping may do this using a calculation of the Replacement or Overlay Score divided by the initial Score multiplied by the size of the initial RAID Device. This may create a RAID Device that uses the same amount of disk space per disk as before and may reduce fragmentation of the disk.


By judiciously limiting the size of the initial RAID Devices, e.g., RAID Devices A 316, B 318, and C 320, it may be possible to create RAID Device X 326 such that it can hold all the data from RAID Devices B 318 & E 324, for example, allowing the process to continue until the final configuration is achieved in FIG. 3E. That is, RAID Device Y 328 may be created, RAID Devices A 316 and D 322 may be migrated to RAID Device Y 328, and RAID Devices A 316 and D 322 may be deleted.


If a temporary RAID Device or temporary RAID Devices, e.g., RAID Devices D 322 and E 324, were created and marked as temporary, the RAID Devices may be marked for removal, as shown in FIG. 3C-E. In an embodiment, as a part of each cycle, the temporary RAID Devices may be removed. As larger replacement RAID Devices are created, the amount of temporary space needed may decline. It is noted again that in some embodiments, a RAID Device migration may not require the allocation of temporary space to migrate or remove the data.


In one embodiment of Restriping, removal of the temporary RAID Devices may use a subset of the steps used for migration or removal of the initial RAID Device, such as the movement of data and deletion of the temporary RAID Devices.


In one embodiment, if the Score of a temporary RAID Device exceeds the Score of the initial RAID Device, the temporary RAID Device may be considered a permanent RAID Device. That is, it may not be automatically deleted as a part of the process to move a RAID Device. In further embodiments, the temporary RAID Device may be kept only if it has a sufficiently higher Score than the initial RAID Device.


Restriping may involve a number of further steps to remove an original low-scoring RAID Device and replace it with a new higher-scoring RAID Device. For example, Restriping may account for the possibility that the disks in the system are full, and have no space for another RAID Device. Restriping may trim excess space before attempting to restripe a RAID Device. Trimming excess space may free up additional disk space and increase the success rate of Restriping.


In some embodiments, Restriping may reach a deadlock. For example, the size of the temporary space may consume a portion of the space needed to move the initial RAID Device. If it becomes impossible to remove a RAID Device because all pages cannot be freed, the RAID Device may be marked as failed, and Restriping may move on to the next RAID Device that can or should be migrated.


With reference to FIG. 4, a flow chart of one embodiment of a process 400 of Restriping is described. It is recognized that FIG. 4 illustrates one embodiment, and various alternative embodiments and processes may be used in accordance with the present disclosure. First, as shown in steps 402 and 404, Restriping may determine whether there is a RAID Device that should or can be migrated, removed, etc. In one embodiment, Restriping may check all of the RAID Devices within a system and select the smallest RAID Device with the lowest relative Score. In other embodiments, Restriping may select other RAID Devices, and Restriping, as described herein, is not limited to selecting the smallest RAID Device and/or the RAID Device with the lowest Score. Generally, however, if movement of the smallest RAID Device fails, a larger RAID Device may likely not succeed either. In an embodiment, the lowest scoring RAID Device may be determined by dividing the Replacement or Overlay Score by the initial Score. Other methods of determining the lowest scoring RAID Device are in accordance with the present disclosure, including using solely the initial Score of the RAID Devices.


In addition to identifying RAID Devices for migration or removal, as shown in FIG. 4, RAID Device movement may include a plurality of steps to optimize the RAID configuration, such as, but not limited to, allocating temporary space (step 406), moving data (step 408), deleting the original RAID Device (step 410), allocating a new RAID Device (step 412), and/or deleting the temporary RAID Device (steps 414 and 416). The foregoing listing of additional steps is exemplary and RAID Device movement need not require each of the listed steps, and in some embodiments, may include further or different steps than those listed. For example, in some embodiments, temporary space may not be used, and therefore, may not be allocated.


In some embodiments, Restriping may limit the movements of RAID Devices. For example, to avoid thrashing the system, Restriping may not need to absolutely maximize the Score of a RAID Device. Restriping may also mark failed RAID Devices so as not to retry them.


Restriping may recognize new disks, create new RAID devices which utilize the additional spaces, and move the data accordingly. After the process is complete, user data and free space may be distributed across the total disk drives, including the initial disks and the additional disks. It is noted that Restriping may replace RAID Devices rather than extend them. It is appreciated that the positioning of free space and user allocations on any given disk may be arbitrary, and the arrangements shown in FIGS. 1-3, as well as the remaining FIG. 5, are for illustration purposes.


Selection of RAID Device for Restriping


In one embodiment, as previously discussed, Restriping may handle:

    • Adding Drives—When additional drives are added to the disk drive system, Restriping may identify RAID Devices that use a sub-optimal number of drives. New RAID Devices may be created and the data may be moved. The original RAID Devices may be eliminated.
    • Removing Drives—Restriping may detect when disk drives have been marked for removal. RAID Devices which reside on these drives may become candidates for removal, which may be accomplished in a substantially similar manner as for Adding Drives.
    • RAID Parameter Changes—RAID Parameters, such as RAID level, number of disks within a stripe, and extent size, may be altered by the user to improve performance. Restriping may compare the desired parameters against the initial parameters and select nonoptimal RAID Devices for migration and/or removal.


In some embodiments, including embodiments having larger, more complicated systems, it may not be obvious which set of migration operations should be used in order to obtain the desired final configuration or if it is possible to get from the initial configuration to the final desired configuration within the existing resources. In one embodiment, a scoring and optimization technique may be used to select the particular RAID Device for removal and replacement. The scoring function, in an exemplary embodiment, may employ one or more of the following properties:

    • RAID Devices which span more disk drives may be preferred.
    • RAID Devices which are constructed on a homogeneous set of disk drives may be preferred. Disk drives may be categorized in order to support this function and need not be identical to belong to the same category.
    • RAID Devices which match the parameters (RAID level, stripe size, disk region, etc.) of the desired final configuration may be preferred.
    • RAID Devices which place redundant data on physically disparate disk drives may be preferred. An example may include disk drives in separate enclosures, on separate communication paths, or having independent power sources.


In another embodiment, Restriping may be divided into three components, such as scoring, examining, and moving. RAID Device scoring may be used to determine the quality of a given RAID Device based on requested parameters and disk space available. In one embodiment, scoring may generate three values. Restriping may provide a Score for an initial RAID Device and the scores of two possible alternative RAID Devices, referred to herein as the Replacement and Overlay Scores. Details of each score for one embodiment are described below:

    • Score—The score of the RAID Device in its present state. In one possible embodiment, the Score may be the number of disks used by the RAID Device less fragmentation and parameter issues. See e.g., Table 1. In alternative embodiments, a Score may be calculated in any suitable manner.
    • Replacement Score—The maximum score of a RAID Device that could be constructed from existing free space. The Replacement Score may be higher than, lower than, or equal to the Score of the RAID Device.
    • Overlay Score—The maximum score of a RAID Device if the current RAID Device is removed. The Overlay Score may be higher than, lower than, or equal to the Score of the RAID Device. In some embodiments, the Overlay Score may be desired, such as in disk full conditions, to determine if a better RAID Device can be created using the space that is already allocated by the current RAID Device.


With respect to the Replacement and Overlay Scores, the user accessible blocks for the RAID Device may remain the same as the number of disks changes. The three scores may provide the input parameters to develop a strategy for migrating from lower to higher scoring RAID Devices. In a particular embodiment, if the Replacement Score is higher than the initial Score, a straightforward migration like that described in FIGS. 1 and 2 may be possible. If the Overlay Score is better than the initial Score, and if sufficient free space is available for a temporary RAID Device or temporary RAID Devices, then a migration strategy outlined in FIG. 3 may be possible.


In one embodiment, factors used to determine the Scores may include one or more of the following:

    • Disk Folder—If a RAID Device uses a disk outside of the specified folder, the score of the RAID Device may be lowered. This situation may occur due to administrative action or during sparing, i.e., wherein spare disks may be supplied.
    • Disk Type—If a RAID Device resides on a disk of the wrong type, the score of the RAID Device may be lowered. This situation may occur if a disk fails and a spare of the appropriate type is not available. In such a case, redundancy may be maintained over ‘type purity,’ and a spare of another disk type may be used.
    • Number of Disks Used—In general, wider RAID Devices (e.g., RAID Devices spanning a relatively larger number of disk drives) may be preferred, and the RAID Devices may be given a higher score. In a further embodiment, a maximum width may be considered in order to establish more independent fault domains.
    • Number of Disks Available—This may be used to determine the Replacement and Overlay Scores of the RAID Device. In one embodiment, disks, excluding spare disks, may be checked for sufficient space to allocate a replacement RAID Device. In other embodiments, spare disks may be included in the determination of the Scores. If space exists, the Replacement and Overlay Score may be increased.
    • Disk Fragmentation—If free disk space exists around the RAID Device, the RAID Device score may be lowered. This may be an indication to Restriping that by migrating the RAID Device, disk fragmentation may be reduced.
    • RAID Level—Whether the desired RAID level matches the initial RAID level. This may include the number of disks within a stripe.
    • RAID Extent Size—Whether the extent size of the desired RAID Device, i.e., blocks per disk, matches the extent size of the initial RAID Device. In one embodiment, a determination based on RAID extent size may only lower the score of the RAID Device.


Table 1 illustrates an example embodiment of scoring factors that may be used. As illustrated in Table 1, the variables may include Disks In Class, Disks In Folder, RAID Level, RAID Repeat Factor, RAID Extent Size, and RAID Drives in Stripe. Disks In Class, as used in the example scoring factors, may be determined by the equation:

(DisksInClass−3*DisksOutOfClass)*DisksInClassConstant

where DisksInClass may be the number of disks used by the RAID Device that are of the proper class, DisksOutOfClass may be the number of disks used by the RAID Device that are not of the proper class, and DisksInClassConstant may be a multiplicative constant value. Disk classes may include, but are not limited to, 15K FC, 10K FC, SATA, etc. For example, if a RAID Device was supposed to use 10K FC disks, but included two SATA disks, the value for DisksOutOfClass would be two. Disks In Folder, as used in the example scoring factors, may be determined by the equation:

(DisksInFolder−3*DisksOutOfFolder)*DisksInFolderConstant

where DisksInFolder may be the number of disks used by the RAID Device that are in the proper folder of disks, DisksOutOfFolder may be the number of disks used by the RAID Device that are not in the proper folder of disks, and DisksInFolderConstant may be a multiplicative constant value. Disk folders may organize which disks can be used by RAID Devices. Disks may be moved into, and out of, folder objects at any time to change their usage. RAID Level, as used in the example scoring factors, may be zero if the disk is an undesired RAID level. RAID Repeat Factor, RAID Extent Size, and RAID Drives in Stripe may be a computed score of each divided by a factor of two. It is recognized that Table 1 illustrates one embodiment of example scoring factors and one embodiment of how the scoring factors are calculated and used. The example illustrated in Table 1 is for illustration purposes only and is not limiting. Any scoring factors, or group of scoring factors, may be used with the various embodiments disclosed herein. Furthermore, the scoring factors, or group of scoring factors, may be calculated or used in any suitable manner.









TABLE 1







Example RAID Scoring Factors








Variable
Score





Disk In Class
(DisksInClass − 3 * DisksOutOfClass) *



DisksInClassContant


Disks In Folder
(DiskInFolder − 3 * DisksOutOfFolder) *



DisksInFolderConstant


RAID Level
Zero if wrong type


RAID Repeat Factor
Computed Score divided by two


RAID Extent Size
Computed Score divided by two


RAID Drives in Stripe
Computed Score divided by two









In a further embodiment, Restriping may examine the Scores of the RAID Devices to determine which, if any, RAID Devices may be moved. Restriping may move RAID Devices with a score that is lower than either the Replacement or Overlay Scores. That is, in one embodiment, if the Replacement and/or Overlay Score is greater than the initial RAID Device Score the RAID Device may be a candidate to move. In other embodiments, the initial RAID Devices may be selected for migration by any other means, including situations wherein the initial RAID Device Score is higher than the Replacement and Overlay Scores or by manual selection by a user, etc. Restriping may also determine that no RAID Devices should be moved. In a further embodiment, Restriping may pick a single RAID Device from the available RAID Devices to migrate.


If Restriping identifies a RAID Device to move, migration of the RAID Device may occur. In one embodiment, migration may include determining necessary temporary space, movement of data from the RAID Device, cleanup of the initial RAID Device, and elimination of the temporary space. In another embodiment, a dynamic block architecture page pool may use the RAID Devices and handle the movement of data from lower scoring to higher scoring RAID Devices.


In another embodiment, Restriping may further reevaluate the scores of all RAID Devices after every RAID Device migration since the reallocation of disk space may change the Scores of other RAID Devices. In a further embodiment, the scores of all the RAID Devices may be periodically computed. In some embodiments, Restriping may continually compute the Scores of the RAID Devices. In yet another embodiment, the largest gain in score may be used to select a RAID Device for removal and replacement. A hysteresis mechanism may be used to prevent the process from becoming cyclic.


RAID Device scoring may also handle different-sized disk drives. FIG. 5 illustrates an example configuration 500 with different-sized disks 502, 504, 506, 508. Table 2 illustrates an example RAID Device scoring, for the configuration shown in FIG. 5, including the scoring information for the RAID Devices 510, 512, 514, 516, 518 based on the configuration 500. Relative numbers are used for simplicity. Although Table 2 illustrates scores relating to RAID Device candidates for migration, Table 2 is not limiting and any scoring combination may result in marking a RAID Device for migration or no migration.









TABLE 2







RAID Example Scoring











Device
Score
Replacement
Overlay
Restripe





P 510
4
0
4
No, at maximum


Q 512
4
0
4
No, at maximum


R 514
2
2
4
Yes, 50% of maximum


S 516
3
1
4
Yes, 75% of






maximum


T 518
2
2
2
No, at maximum






no space on






smaller disks









From the above description and drawings, it will be understood by those of ordinary skill in the art that the particular embodiments shown and described are for purposes of illustration only and are not intended to limit the scope of the present invention. Those of ordinary skill in the art will recognize that the present invention may be embodied in other specific forms without departing from its spirit or essential characteristics. References to details of particular embodiments are not intended to limit the scope of the invention.


Although the present invention has been described with reference to preferred embodiments, persons skilled in the art will recognize that changes may be made in form and detail without departing from the spirit and scope of the invention.

Claims
  • 1. A method of RAID restriping in a disk drive system, comprising: selecting a RAID device for migration from a plurality of RAID devices based on a comparison between an initial score and at least one second score calculated for each of the plurality of RAID devices, wherein: the initial score relates to the RAID device in its present state and is calculated based on one or more scoring factors; andthe second score relates to at least one hypothetical RAID device located in available disk space and is calculated based on one or more scoring factors;creating at least one alternate RAID device based on the at least one hypothetical RAID device;moving data stored at the selected RAID device to the at least one alternate RAID device; andremoving the selected RAID device.
  • 2. The method of claim 1, wherein the at least one second score comprises a replacement score relating to at least one hypothetical RAID device located in existing available disk space.
  • 3. The method of claim 1, wherein the at least one second score comprises an overlay score relating to at least one hypothetical RAID device located in a combination of existing available disk space and at least a portion of the disk space taken up by the RAID device.
  • 4. The method of claim 1, wherein the one of one or more scoring factors comprise one or more of the RAID level, RAID stripe size, RAID extent size, disk category, location on disk, disk enclosure, disk enclosure power supply, and communication path to the disk.
  • 5. The method of claim 4, wherein the factors have varying weights for use in the calculation.
  • 6. The method of claim 4, wherein selecting a RAID device for migration based on a comparison between an initial score and at least one second score calculated for each of a plurality of RAID devices comprises selecting the RAID device if the least one second score is better than the initial score.
  • 7. The method of claim 1, wherein the initial score and the at least one second score are each calculated using the same scoring factors.
  • 8. The method of claim 1, wherein the steps of selecting a RAID device for migration, creating at least one alternate RAID device, moving data, and removing the selected RAID device are done automatically without manual intervention.
  • 9. The method of claim 8, wherein the steps are performed without loss of server data access to the disk drive system and compromised resiliency of the data.
  • 10. The method of claim 8, wherein the steps are performed at least one of periodically, continuously, after every RAID device migration, upon addition of disk drives, and before removal of disk drives.
  • 11. The method of claim 1, wherein the steps of selecting a RAID device for migration, creating at least one alternate RAID device, moving data, and removing the selected RAID device are done manually.
  • 12. The method of claim 1, wherein moving data stored at the selected RAID device to the at least one alternate RAID device further comprises creating at least one temporary RAID device.
  • 13. The method of claim 12, further comprising moving data stored at the selected RAID device to the at least one temporary RAID device and then from the temporary RAID device to the at least one alternate RAID device.
  • 14. A disk drive system, comprising: a RAID subsystem; anda disk manager having at least one disk storage system controller configured to automatically: select a RAID device from the plurality of RAID devices based on a comparison between an initial score and at least one second score calculated for the plurality of RAID devices, wherein the initial score relates to the RAID device in its present state and is calculated based on one or more scoring factors and the second score relates to at least one hypothetical RAID device located in available disk space and is calculated based on one or more scoring factors;create an alternate RAID device based on the at least one hypothetical RAID device;move at least a portion of the data stored at the selected RAID device to the alternate RAID device; andremove the selected RAID device.
  • 15. The disk drive system of claim 14, wherein the at least one second score comprises an overlay score related to at least one second hypothetical RAID device located in a combination of existing available disk space and at least a portion of the disk space taken up by the RAID device.
  • 16. The disk drive system of claim 15, wherein the at least one second alternate RAID device is based on one of the first and second hypothetical RAID devices.
  • 17. The disk drive system of claim 14, wherein the disk drive system comprises storage space from at least one of a plurality of RAID levels including RAID-0, RAID-1, RAID-5, and RAID-10.
  • 18. The system of claim 17, further comprising RAID levels including RAID-3, RAID-4, RAID-6, and RAID-7.
CROSS-REFERENCE TO RELATED APPLICATION(S)

This application claims priority to U.S. provisional patent application Ser. No. 60/808,045, filed May 24, 2006, which is incorporated herein by reference in its entirety.

US Referenced Citations (228)
Number Name Date Kind
5155835 Belsan Oct 1992 A
5274807 Hoshen et al. Dec 1993 A
5331646 Krueger et al. Jul 1994 A
5371882 Ludlam Dec 1994 A
5390327 Lubbers Feb 1995 A
5502836 Hale et al. Mar 1996 A
5572661 Jacobson Nov 1996 A
5613088 Achiwa et al. Mar 1997 A
5644701 Takewaki Jul 1997 A
5659704 Burkes et al. Aug 1997 A
5664187 Burkes et al. Sep 1997 A
5696934 Jacobson et al. Dec 1997 A
5835953 Ohran Nov 1998 A
5897661 Baranovsky et al. Apr 1999 A
5933834 Aichelen Aug 1999 A
5974515 Bachmat et al. Oct 1999 A
RE36462 Chang et al. Dec 1999 E
6052759 Stallmo et al. Apr 2000 A
6058489 Schultz et al. May 2000 A
6070249 Lee May 2000 A
6073218 Dekoning et al. Jun 2000 A
6073221 Beal et al. Jun 2000 A
6073222 Ohran Jun 2000 A
6078932 Haye et al. Jun 2000 A
RE36846 Ng et al. Aug 2000 E
6115781 Howard Sep 2000 A
6173361 Taketa Jan 2001 B1
6192444 White et al. Feb 2001 B1
6212531 Blea et al. Apr 2001 B1
6269431 Dunham Jul 2001 B1
6269453 Krantz Jul 2001 B1
6275897 Bachmat Aug 2001 B1
6275898 DeKoning Aug 2001 B1
6282671 Islam et al. Aug 2001 B1
6311251 Merritt et al. Oct 2001 B1
6347359 Smith et al. Feb 2002 B1
6353878 Dunham Mar 2002 B1
6356969 DeKoning et al. Mar 2002 B1
6366987 Tzelnic et al. Apr 2002 B1
6415296 Challener et al. Jul 2002 B1
6438638 Jones et al. Aug 2002 B1
6516425 Belhadj et al. Feb 2003 B1
6560615 Zayas et al. May 2003 B1
6611897 Komachiya et al. Aug 2003 B2
6618794 Sicola et al. Sep 2003 B1
6631493 Ottesen et al. Oct 2003 B2
6636778 Basham et al. Oct 2003 B2
6718436 Kim et al. Apr 2004 B2
6732125 Autrey May 2004 B1
6799258 Linde Sep 2004 B1
6826711 Moulton et al. Nov 2004 B2
6839827 Beardsley et al. Jan 2005 B1
6839864 Mambakkam et al. Jan 2005 B2
6857057 Nelson et al. Feb 2005 B2
6857059 Karpoff et al. Feb 2005 B2
6859882 Fung Feb 2005 B2
6862609 Merkey Mar 2005 B2
6877109 Delaney et al. Apr 2005 B2
6883065 Pittelkow et al. Apr 2005 B1
6904599 Cabrera et al. Jun 2005 B1
6907505 Cochran et al. Jun 2005 B2
6912585 Taylor et al. Jun 2005 B2
6915241 Kohlmorgen et al. Jul 2005 B2
6915454 Moore et al. Jul 2005 B1
6938123 Willis et al. Aug 2005 B2
6948038 Berkowitz et al. Sep 2005 B2
6952794 Lu Oct 2005 B2
6957294 Saunders et al. Oct 2005 B1
6957362 Armangau Oct 2005 B2
6981114 Wu et al. Dec 2005 B1
6996582 Daniels et al. Feb 2006 B2
6996741 Pittelkow et al. Feb 2006 B1
7000069 Bruning et al. Feb 2006 B2
7003567 Suzuki et al. Feb 2006 B2
7003688 Pittelkow et al. Feb 2006 B1
7017076 Ohno et al. Mar 2006 B2
7032093 Cameron Apr 2006 B1
7032119 Fung Apr 2006 B2
7039778 Yamasaki May 2006 B2
7043663 Pittelkow et al. May 2006 B1
7047358 Lee et al. May 2006 B2
7051182 Blumenau et al. May 2006 B2
7058788 Niles et al. Jun 2006 B2
7058826 Fung Jun 2006 B2
7069468 Olson et al. Jun 2006 B1
7072916 Lewis et al. Jul 2006 B1
7085899 Kim et al. Aug 2006 B2
7085956 Petersen et al. Aug 2006 B2
7089395 Jacobson et al. Aug 2006 B2
7093158 Barron et al. Aug 2006 B2
7093161 Mambakkam et al. Aug 2006 B1
7100080 Howe Aug 2006 B2
7103740 Colgrove et al. Sep 2006 B1
7103798 Morita Sep 2006 B2
7107417 Gibble et al. Sep 2006 B2
7111147 Strange et al. Sep 2006 B1
7124272 Kennedy et al. Oct 2006 B1
7127633 Olson et al. Oct 2006 B1
7133884 Murley et al. Nov 2006 B1
7134011 Fung Nov 2006 B2
7134053 Moore Nov 2006 B1
7162587 Hiken et al. Jan 2007 B2
7162599 Berkowitz et al. Jan 2007 B2
7181581 Burkey Feb 2007 B2
7184933 Prekeges et al. Feb 2007 B2
7191304 Cameron et al. Mar 2007 B1
7194653 Hadders et al. Mar 2007 B1
7197614 Nowakowski Mar 2007 B2
7216258 Ebsen et al. May 2007 B2
7222205 Jones et al. May 2007 B2
7225317 Glade et al. May 2007 B1
7228441 Fung Jun 2007 B2
7237129 Fung Jun 2007 B2
7251713 Zhang Jul 2007 B1
7272666 Rowan et al. Sep 2007 B2
7272735 Fung Sep 2007 B2
7293196 Hicken et al. Nov 2007 B2
7305579 Williams Dec 2007 B2
7320052 Zimmer et al. Jan 2008 B2
7380113 Ebsen et al. May 2008 B2
7406631 Moore Jul 2008 B2
7484111 Fung Jan 2009 B2
7512822 Fung Mar 2009 B2
7533283 Fung May 2009 B2
7552350 Fung Jun 2009 B2
7558976 Fung Jul 2009 B2
7562239 Fung Jul 2009 B2
7603532 Rajan et al. Oct 2009 B2
7672226 Shea Mar 2010 B2
7702948 Kalman et al. Apr 2010 B1
20010020282 Murotani et al. Sep 2001 A1
20020004912 Fung Jan 2002 A1
20020004913 Fung Jan 2002 A1
20020004915 Fung Jan 2002 A1
20020007438 Lee Jan 2002 A1
20020007463 Fung Jan 2002 A1
20020007464 Fung Jan 2002 A1
20020046320 Shaath Apr 2002 A1
20020062454 Fung May 2002 A1
20020073278 McDowell Jun 2002 A1
20020103969 Koizumi et al. Aug 2002 A1
20020112113 Karpoff et al. Aug 2002 A1
20020129214 Sarkar Sep 2002 A1
20030005248 Selkirk et al. Jan 2003 A1
20030033577 Anderson Feb 2003 A1
20030046270 Leung et al. Mar 2003 A1
20030065901 Krishnamurthy Apr 2003 A1
20030110263 Shillo Jun 2003 A1
20030182503 Leong et al. Sep 2003 A1
20030188097 Holland et al. Oct 2003 A1
20030188208 Fung Oct 2003 A1
20030200473 Fung Oct 2003 A1
20030212865 Hiken et al. Nov 2003 A1
20030212872 Patterson et al. Nov 2003 A1
20030221060 Umberger et al. Nov 2003 A1
20030231529 Hetrick et al. Dec 2003 A1
20040015655 Willis et al. Jan 2004 A1
20040030822 Rajan et al. Feb 2004 A1
20040030951 Armangau Feb 2004 A1
20040073747 Lu Apr 2004 A1
20040088505 Watanabe May 2004 A1
20040107222 Venkatesh et al. Jun 2004 A1
20040111558 Kistler et al. Jun 2004 A1
20040117572 Welsh et al. Jun 2004 A1
20040133742 Vasudevan et al. Jul 2004 A1
20040163009 Goldstein et al. Aug 2004 A1
20040172577 Tan et al. Sep 2004 A1
20050010618 Hayden Jan 2005 A1
20050010731 Zalewski et al. Jan 2005 A1
20050027938 Burkey Feb 2005 A1
20050055603 Soran et al. Mar 2005 A1
20050065962 Rowan et al. Mar 2005 A1
20050081086 Williams Apr 2005 A1
20050108582 Fung May 2005 A1
20050114350 Rose et al. May 2005 A1
20050144512 Ming Jun 2005 A1
20050166085 Thompson et al. Jul 2005 A1
20050182992 Land et al. Aug 2005 A1
20050193058 Yasuda et al. Sep 2005 A1
20050262325 Shmueli et al. Nov 2005 A1
20060031287 Ulrich Feb 2006 A1
20060041718 Ulrich et al. Feb 2006 A1
20060059306 Tseng Mar 2006 A1
20060093282 Shepherd et al. May 2006 A1
20060107097 Zohar et al. May 2006 A1
20060161752 Burkey Jul 2006 A1
20060161808 Burkey Jul 2006 A1
20060179218 Burkey Aug 2006 A1
20060184821 Hitz et al. Aug 2006 A1
20060206536 Sawdon et al. Sep 2006 A1
20060206665 Orsley Sep 2006 A1
20060206675 Sato et al. Sep 2006 A1
20060218360 Burkey Sep 2006 A1
20060218367 Ukai et al. Sep 2006 A1
20060218433 Williams Sep 2006 A1
20060230244 Amarendran et al. Oct 2006 A1
20060248324 Fung Nov 2006 A1
20060248325 Fung Nov 2006 A1
20060248358 Fung Nov 2006 A1
20060248359 Fung Nov 2006 A1
20060248360 Fung Nov 2006 A1
20060248361 Fung Nov 2006 A1
20060248379 Jernigan, IV Nov 2006 A1
20060253669 Lobdell Nov 2006 A1
20060253717 Fung Nov 2006 A1
20060259797 Fung Nov 2006 A1
20060265608 Fung Nov 2006 A1
20060265609 Fung Nov 2006 A1
20060271604 Shoens Nov 2006 A1
20060277361 Sharma et al. Dec 2006 A1
20060277432 Patel et al. Dec 2006 A1
20070005885 Kobayashi et al. Jan 2007 A1
20070011425 Sicola Jan 2007 A1
20070016749 Nakamura et al. Jan 2007 A1
20070016754 Testardi Jan 2007 A1
20070101173 Fung May 2007 A1
20070168709 Morita Jul 2007 A1
20070180306 Soran et al. Aug 2007 A1
20070220313 Katsuragi et al. Sep 2007 A1
20070240006 Fung Oct 2007 A1
20070245084 Yagisawa et al. Oct 2007 A1
20070245165 Fung Oct 2007 A1
20070260830 Faibish et al. Nov 2007 A1
20070266066 Kapoor et al. Nov 2007 A1
20070288401 Hood et al. Dec 2007 A1
20080005468 Faibish et al. Jan 2008 A1
20080288546 Adkins et al. Nov 2008 A1
20090235104 Fung Sep 2009 A1
Foreign Referenced Citations (19)
Number Date Country
0706113 Apr 1996 EP
0 757 317 Feb 1997 EP
0780758 Jun 1997 EP
1 462 927 Sep 2004 EP
3-259320 Nov 1991 JP
7200367 Aug 1995 JP
8-44503 Feb 1996 JP
8- 278850 Oct 1996 JP
9-128305 May 1997 JP
2001-147785 May 2001 JP
2001-337850 Dec 2001 JP
2001-344139 Dec 2001 JP
2002-278819 Sep 2002 JP
2003-50724 Feb 2003 JP
2005-512191 Apr 2005 JP
WO 0013077 Mar 2000 WO
0225445 Mar 2002 WO
03048941 Jun 2003 WO
WO-2005017737 Feb 2005 WO
Related Publications (1)
Number Date Country
20080109601 A1 May 2008 US
Provisional Applications (1)
Number Date Country
60808045 May 2006 US