In a standard RAID-5 or RAID-6 implementation, typically equal amount of writes are performed on each drive in the RAID set over time. For RAID implementations based on SSD drives using low endurance flash e.g. TLC, this can become a serviceability & availability issue since all the drives in the RAID set could wear out almost around the same time. Even with RAID-6 implementation, only a maximum of two drive failures can be handled at a given time.
A typical RAID-6 implementation with N drives spreads N-2 write data blocks into N-2 drives and then writes two blocks of parity (called P & Q) into the remaining 2 drives. This process of spreading write data into multiple drives is called RAID striping, and each set of N-2 write data blocks along with its parity blocks is called a RAID stripe. The example below shows how RAID-6 stripes are arranged in a typical 4-drive RAID set.
WBn are the write blocks. Pn and Qn are the parity blocks for stripe# N. Parity blocks are rotated with each new stripe so that all 4 drives store write data blocks (not just parity) which in turn increases the available read bandwidth. Striping pattern for stripe#4 matches stripe#0, stripe#5 matches stripe#1 and so on.
In the above standard RAID-6 striping scheme each drive performs an equal amount of writes over time. Thus, all 4 drives will wear equally over time in such a system where all of the drives are the same, as is conventional.
Described is a RAID scheme that manages the wear of individual drives in a RAID set and significantly reduces the probability of more than two drives wearing out at the same time.
In one aspect, described is a method of operating a Raid system comprising the steps of providing a RAID system that includes a plurality of at least three low endurance flash based solid state drives; and implementing a data striping during write operations in which at least a one of a first and second of the plurality of at least three low endurance flash based solid state devices perform a predetermined percentage of more writes as compared to at least a third of the plurality of at least three low endurance flash based solid state devices, thereby causing the third of the plurality of at least three low endurance flash based solid state devices to wear out after the first and second of the plurality of at least three low endurance flash based solid state devices.
In another aspect, a rebuild operation is performed using Galois Field Multiplication, with one of an integrated circuit and an FPGA being used in preferred implementations.
These and other aspects and features will become apparent to those of ordinary skill in the art upon review of the following description of specific embodiments of the invention in conjunction with the accompanying figures, wherein
The described endurance aware RAID striping scheme significantly improves Serviceability & Availability of storage systems which use low endurance flash based SSDs. Also, the pipelined implementation architecture proposed enables high performance FPGA implementation. An illustration of an overview of a raid scheme according to a preferred embodiment is shown in
Data striping is a technique of segmenting logically sequential data, such as a file, so that consecutive segments are stored on different physical storage devices. Striping allows a processing device to request data more quickly by getting portions of the data from different ones of the different physical storage devices instead of from a single storage device, thus obtaining it more quickly than the single storage device can provide it. By spreading segments across multiple devices which can be accessed concurrently, total data throughput is increased. Data striping is also useful in other ways, such as balancing I/O load across an array of disks.
The described endurance aware striping scheme looks as follows—
With the described striping scheme, over time Drive 0 and Drive 1 perform 25% less writes compared to Drive 2 & 3. This implies Drive 2 & 3 will wear out faster compared to Drive 0 & 1. This will improve availability and serviceability since Drives 2 & 3 can be replaced without taking the storage offline and allows replacement of 2 drives at a time in a planned and phased manner. It is noted that depending on the scheme and the number of Drives, that the percentage of less writes that at least two of the Drives will be 100/# of drives percent, such that the predetermined percentage preferably ranges between 3-33%.
Provided below is another example based upon 6 drives. In this example, Drives 0-3 perform ˜16.67% less writes compared to Drive 4 & 5.
In the event of a drive failure, data blocks can be recovered by skipping drive reads for SKn blocks and using all zeroes in place of SKn for data recovery calculations.
After new drives replace old drives, the same scheme can be maintained, as there will still be a difference in the time between when the new drives and the old drives need to be replaced.
FPGA Optimized Implementation
Typical FPGA implementations of RAID-6 has to trade-off performance because of the complexity of RAID-6 computation involved which limits pushing the clock frequency higher. The proposed implementation scheme pipelines the RAID data-path in a way which allows for pushing the performance higher.
At the crux of RAID-6 algorithm lies the Galois Field Multiplication (GFM) of all bytes of a data block with one byte coefficient. The proposed scheme breaks the GFM operation into a pipeline of two basic operations of XOR and “mutliplty-by-2” (“shift” and XOR). The following section shows how the pipelined GFM (P-GFM) is implemented.
Pipelined GFM (P-GFM)
The 8-stage datapath pipeline shown in
Each stage is constructed using one XOR and one “multiply-by-2” operation, minimal levels of logic in each stage of the pipe enables running this circuit at high frequencies allowing higher performance not only in integrated circuit implementations, but even in FPGA implementations.
Using P-GFM, XOR (shown as “+” in the diagrams below) and “multiply-by-2” as basic building blocks, the RAID rebuild functions are implemented as shown in
RAID Write Datapath
As user data blocks “WBn” are received it is striped (see striping examples above) and written into the drives. The parity blocks P and Q for a stripe is calculated on the fly as shown in
RAID Rebuild Datapath
The datapath implementation below describes the rebuild pipeline which recovers data from replaced or failed drives by reading data/parity blocks from the remaining surviving drives. As such, one or two drives that have been replaced or failed can be replicated, without having to replace all the drives, one by one, which results in a long, difficult replacement process. The example herein shows how WB0 of stripe 0 is recovered after drive 0 & drive 2 are replaced or have failed.
Pxy is calculated exactly like P parity with all-zero blocks replacing the data blocks from the missing drives. Similarly, Qxy is calculated exactly like Q parity with all-zero blocks replacing the data blocks from the missing drives.
The rebuild datapath is built using just three basic compute modules i.e. XOR, multiply-by-2 & P-GFM previously described and illustrated in
Although the present invention has been particularly described with reference to embodiments thereof, it should be readily apparent to those of ordinary skill in the art that various changes, modifications and substitutes are intended within the form and details thereof, without departing from the spirit and scope of the invention. Accordingly, it will be appreciated that in numerous instances some features of the invention will be employed without a corresponding use of other features. Further, those skilled in the art will understand that variations can be made in the number and arrangement of components illustrated in the above figures.
| Number | Name | Date | Kind |
|---|---|---|---|
| 5996031 | Lim et al. | Nov 1999 | A |
| 7865761 | Chilton | Jan 2011 | B1 |
| 8661187 | Hetzler | Feb 2014 | B2 |
| 8683296 | Anderson | Mar 2014 | B2 |
| 8825938 | Ellard | Sep 2014 | B1 |
| 9032165 | Brooker | May 2015 | B1 |
| 9722632 | Anderson | Aug 2017 | B2 |
| 20030061459 | Aboulenein et al. | Mar 2003 | A1 |
| 20060143336 | Stroobach et al. | Jun 2006 | A1 |
| 20100287333 | Lee et al. | Nov 2010 | A1 |
| 20120324178 | Yoon et al. | Dec 2012 | A1 |
| 20130191601 | Peterson et al. | Jul 2013 | A1 |
| 20160246518 | Galbraith | Aug 2016 | A1 |
| 20170228158 | Kraemer | Aug 2017 | A1 |
| 20170285972 | Dalmatov | Oct 2017 | A1 |