The present disclosure generally relates to the field of distributed storage devices, and more particularly to a system and a method for optimizing redundancy restoration in distributed data layout environments.
Storage devices, such as RAID mirror architectures, allow data to stored and protected from potential data loss. However, if multiple storage devices fail in the same architecture, the data may potentially be lost before the storage architecture has a chance to rebuild or restore the data.
Accordingly, an embodiment of the present disclosure describes a system for restoring data stored on a plurality of storage devices. The system may include a plurality of storage devices configured for providing data storage. The system may include a prioritization module communicatively coupled to the plurality of storage devices. The prioritization module may be configured for determining a restoration order of at least a first data portion and a second data portion when a critical data failure occurs. The system may include a restoration module coupled to the plurality of storage devices and the prioritization module, the restoration module configured for restoring at least the first data portion and the second data portion based upon the restoration order.
The present disclosure also describes a method for restoring data stored on a plurality of storage devices. The method may include analyzing a storage device failure occurring on at least one storage device, the at least one storage device included with a plurality of storage devices configured for providing data storage for at least a first data portion and a second data portion. The method may include determining a restoration order of at least the first data portion and the second data portion when a critical data failure has occurred. The method may include restoring at least the first data portion and the second data portion based upon the restoration order.
The present disclosure describes a computer-readable medium having computer-executable instructions for performing a method of restoring data stored on a plurality of storage devices. The method may include mapping at least a first virtual data chunk to a first storage device and a second virtual data chunk to a second storage device, the first storage device and the second storage device contained within a plurality of storage devices, the first virtual data chunk associated with a first virtual data slice and the second virtual data chunk associated with a second virtual data slice. The method may include detecting a failure of the first storage device. The method may include determining whether a zero drive redundancy event occurred on the first virtual data slice. The method may include restoring the first virtual data chunk to a first replacement storage device before the second virtual data chunk is restored to a second replacement storage device when the zero redundancy event occurred on the first virtual data slice.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not necessarily restrictive of the present disclosure. The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate subject matter of the disclosure. Together, the descriptions and the drawings serve to explain the principles of the disclosure.
The numerous advantages of the disclosure may be better understood by those skilled in the art by reference to the accompanying figures in which:
Reference will now be made in detail to the subject matter disclosed, which is illustrated in the accompanying drawings.
Referring generally to
The system 100 may include a virtual volume 104 (i.e. storage virtualization) formed by the plurality of storage devices 102. The virtual volume 104 may provide a computing device access to data in the virtual volume 104. The virtual volume 104 may be a virtual disk RAID set 104. The virtual volume 104 may include a plurality of virtual drives 106 (shown in
As illustrated in
The system 100 may include a restoration list 208 or restoration queue 208 of references to data chunks that need to be rebuilt to achieve full redundancy. The restoration list 208 may be communicatively coupled to the plurality of storage devices 102 and the virtual volume 104. The references may include pointers or the like. Upon a storage device 102 failing, a new restoration list 208 may be created listing the references to each data chunk that needs to be restored from the failed storage device 102. Specifically, the restoration list 208 may be created by copying from metadata a list of references to data chunks stored on the failed storage device 102. Alternatively, the restoration list 208 may be created by running the mapping algorithm on the virtual volume 104 to determine which data chunks were stored on the failed storage device 102 and creating a list of references to the data chunks based upon this determination. The restoration list 208 may reside on the network virtualization device 204.
The system 100 may include a prioritization module 210 communicatively coupled to the plurality of storage devices 102 and the virtual volume 104. The prioritization module 210 may determine a restoration order of the data chunks based upon a critical data failure. A critical data failure may be referred hereinafter as a critical slice failure. A critical slice failure may occur when a data slice is at risk of being lost upon another storage device 102 failing. A critical slice failure may also be referred to as zero drive redundancy.
An example of a critical slice failure, or zero drive redundancy, is depicted in
Referring generally to
An alternative embodiment of system 100 is depicted in
The storage area network 200 may further communicate with a virtual volume 104. The virtual volume 104 may communicate with a mapping algorithm. The mapping algorithm may communicate with the plurality of storage devices 102. The mapping algorithm may provide a virtual volume addressing scheme, which may allow the data chunks to be mapped from the virtual volume 104 to the plurality of storage devices 102 based on the virtual volume addressing scheme.
The prioritization module 210 and/or the restoration module 212 may be implemented as a set of computer readable code executable on a computer readable medium. The prioritization module 210 and/or the restoration module 212 may also be implemented as firmware. The prioritization module 210 and/or the restoration module 212 may also be implemented as hardware.
Referring generally to
Generally referring to
Generally referring to
Generally referring to
The drive failure flag 704 and critical restoration flag 706 are set. The drive failure flag terminates any critical restoration methods 800 from prior drive failures (prior storage device 102 failures) 812. The critical restoration flag terminates any background restoration methods 900 that were initiated from prior drive failures 914. Once the current drive failure has been characterized, a critical restoration method 800 or a background restoration method 900 is initiated. A list (restoration list 208 or restoration queue) of data chunks to restore from the newly failed drive is created 708. The restore list may be copied from storage system metadata. The restore list may also be created by running the mapping algorithm on all volumes being stored on the failed drive.
Next, a check is made to determine if there was a prior restoration in progress by checking if the restore list is empty 710. If the restore list is not empty, a previous restoration was underway. In this case, the drive failure is characterized as potentially critical, the chunk list from the newly failed drive is appended to the existing restore list 712 and a critical restoration method 800 is initiated 714. If the restore list is empty, no restoration was underway when the drive failed. In this case, the failed storage drive's data chunk list becomes the restore list 716. However, the drive failure has not yet been completely characterized.
The minimum redundancy level for all containers (volumes or objects) on the storage array (plurality of storage devices 102) is checked 718. If the minimum redundancy level is 1, a single drive failure must be treated as a critical failure; therefore, the critical restoration method 800 is entered 714. When no prior restoration was underway and the minimum redundancy level is greater than 1, all data is still protected so the drive failure can be handled using background restoration 720.
The critical restoration method 800 is entered after a drive failure has been categorized as potentially critical by the drive failure characterization method 700. The restore list is searched for cases where the set of failed drives contains multiple data chunks from the same volume and row. If the number of data chunks on a row is less than the volume's maximum redundancy, the restoration of those data chunks is left for the background restoration method 900. If the number of data chunks on a row is equal to the volume's maximum redundancy, a critical restoration is started on one chunk from that row. If the number of data chunks on a row is greater than the volume's maximum redundancy, a data loss has occurred and the volume is taken off line for further recovery.
Generally referring to
A check is made on all data chunks in the row to see if any are currently pending restoration 814. If any of the data chunks are queued for restore, all data chunks in the row are skipped by moving over them in the restore list 822. Otherwise, at least one data chunk for the row must be restored as a critical restoration. The first data chunk in the row is queued for critical restore 818 and the data chunk is marked as pending restoration 816. Finally, all remaining data chunks in the row are skipped by moving over them in the restore list 822.
Next, checks are made to determine if more critical restores should be queued. The drive failure flag is checked to determine if a new drive failure has occurred 820. Note that an additional drive failure does not imply data loss since it may only contain data chunks from rows with at least single drive redundancy. If drive failure flag is set, the current critical restoration method 800 is terminated so the next restoration method 800 can start with the added data chunks from the newly failed drive 830. The end of list condition is checked on the restore list 806. If the end of list has been reached, i.e., all rows with critical restorations have been located, the critical restoration method 800 ends and the background restoration method 900 begins 824.
The background restoration method 900 is entered after all critical restores have been queued. In background restoration, system performance takes precedence over restore because data is no longer at risk. In background restoration the restore list is processed in the order in which drives failed. When a maximum number of restores are queued, the method 900 suspends. The method 900 is restarted after each restore is completed so additional restores may be queued until the restore list is completely processed. The limit on queued restores can be changed at any time to control the amount of system performance allocated to background restoration.
Initially, a background recovery method 900 begins at the end of the drive failure characterization method 720 or at the end of a critical restoration method 824. In either of these cases, the critical restoration flag 906 and the driver failure flag 908 are cleared. The background restoration method 900 is restarted when a previously queued restore completes 902. In this case, the restored data chunk is removed from the restore list 904.
If the current location in the restore list is undefined, the current location is set to the head of the list 910. The background restoration method 900 then loops through the restore list queuing restores until one of the following conditions is met: (1) If the restore list is empty 912, the method 900 is complete and may exit 926; or (2) If the critical restoration flag is set 914, a new drive has failed so the current background restoration method 900 should exit 926. If the restore queue exceeds the performance threshold 916, queuing additional restores would adversely effect system performance, therefore, the current method 900 is suspended or exited 926. If a restore can be queued, a final check is made to determine if the current data chunk is already pending restoration 918. If the current data chunk is already pending restoration, it is skipped. If the data chunk is not being restored, it is marked pending restoration 920 and queued for restore 922. The current position in the restore list is then moved past the current data chunk and the process repeats 924.
It is to be noted that the foregoing described embodiments according to the present invention may be conveniently implemented using conventional general purpose digital computers programmed according to the teachings of the present specification, as will be apparent to those skilled in the computer art. Appropriate software coding may readily be prepared by skilled programmers based on the teachings of the present disclosure, as will be apparent to those skilled in the software art.
It is to be understood that the present invention may be conveniently implemented in forms of a software package. Such a software package may be a computer program product which employs a computer-readable storage medium including stored computer code which is used to program a computer to perform the disclosed function and process of the present invention. The computer-readable medium/computer-readable storage medium may include, but is not limited to, any type of conventional floppy disk, optical disk, CD-ROM, magnetic disk, hard disk drive, magneto-optical disk, ROM, RAM, EPROM, EEPROM, magnetic or optical card, or any other suitable media for storing electronic instructions.
It is believed that the present disclosure and many of its attendant advantages will be understood by the foregoing description, and it will be apparent that various changes may be made in the form, construction and arrangement of the components without departing from the disclosed subject matter or without sacrificing all of its material advantages. The form described is merely explanatory, and it is the intention of the following claims to encompass and include such changes.
It is believed that the present invention and many of its attendant advantages will be understood by the foregoing description. It is also believed that it will be apparent that various changes may be made in the form, construction and arrangement of the components thereof without departing from the scope and spirit of the invention or without sacrificing all of its material advantages. The form herein before described being merely an explanatory embodiment thereof, it is the intention of the following claims to encompass and include such changes.
Number | Name | Date | Kind |
---|---|---|---|
5754756 | Watanabe et al. | May 1998 | A |
6067636 | Yao et al. | May 2000 | A |
6145028 | Shank et al. | Nov 2000 | A |
6516425 | Belhadj et al. | Feb 2003 | B1 |
6625748 | Tanaka et al. | Sep 2003 | B1 |
6871263 | Dandrea | Mar 2005 | B2 |
6871271 | Ohran et al. | Mar 2005 | B2 |
7149919 | Cochran et al. | Dec 2006 | B2 |
7167963 | Hirakawa et al. | Jan 2007 | B2 |
7433948 | Edsall et al. | Oct 2008 | B2 |
7437507 | Sharma et al. | Oct 2008 | B2 |
7457929 | Kasako | Nov 2008 | B2 |
7548975 | Kumar et al. | Jun 2009 | B2 |
7788532 | Kawamura | Aug 2010 | B2 |
7877545 | Sharma et al. | Jan 2011 | B2 |
7917805 | Ninose et al. | Mar 2011 | B2 |
8015439 | Mukker et al. | Sep 2011 | B2 |
20030131182 | Kumar et al. | Jul 2003 | A1 |
20030172149 | Edsall et al. | Sep 2003 | A1 |
20030177324 | Timpanaro-Perrotta | Sep 2003 | A1 |
20060212748 | Mochizuki et al. | Sep 2006 | A1 |
20060218433 | Williams | Sep 2006 | A1 |
20060277361 | Sharma et al. | Dec 2006 | A1 |
20070094464 | Sharma et al. | Apr 2007 | A1 |
20070094465 | Sharma et al. | Apr 2007 | A1 |
20070094466 | Sharma et al. | Apr 2007 | A1 |
20080091741 | Zohar et al. | Apr 2008 | A1 |
20080126844 | Morita et al. | May 2008 | A1 |
20080320134 | Edsall et al. | Dec 2008 | A1 |
20090006746 | Sharma et al. | Jan 2009 | A1 |
20090125680 | Ninose et al. | May 2009 | A1 |
20090228651 | Sharma et al. | Sep 2009 | A1 |
20090259816 | Sharma et al. | Oct 2009 | A1 |
20090259817 | Sharma et al. | Oct 2009 | A1 |
20100115332 | Zheng et al. | May 2010 | A1 |
20110066803 | Arakawa et al. | Mar 2011 | A1 |
20110099328 | Gensler et al. | Apr 2011 | A1 |
20110167294 | Ninose et al. | Jul 2011 | A1 |
Number | Date | Country | |
---|---|---|---|
20110225453 A1 | Sep 2011 | US |