The present application is based on and corresponds to Indian Application Number 2002/CHE/2006 filed Oct. 31, 2006, the disclosure of which is hereby incorporated by reference herein in its entirety.
RAID is a popular technology used to provide data availability and redundancy in storage disk arrays. There are a number of RAID levels defined and used in the data storage industry. The primary factors that influence the choice of a RAID level are data availability, performance and capacity.
RAID5, for example, is one of the most popular RAID levels that are used in disk arrays. RAID5 maintains a parity disk for each set of disks, and stripes data and parity across the set of available disks.
If a drive fails in the RAID5 array, the failed data can be accessed by reading all the other data and parity drives. By this mechanism, RAID5 can sustain one disk failure and still provide access to all the user data. However, RAID5 has two main disadvantages. Firstly, when a write comes to an existing data block in the array stripe, both the data block and the parity blocks must be read and written back, so four I/Os are required for one write operation. This creates a performance bottleneck, especially in enterprise level arrays. Secondly, when a disk fails, all the remaining drives have to be read to rebuild the failed data and re-create it on the spare drive. This recovery operation is termed “rebuilding” and takes some time to complete and, while rebuilding occurs, there is the risk of data loss if another disk fails.
In order that the invention may be more clearly ascertained, embodiments will now be described, by way of example, with reference to the accompanying drawing, in which:
There will be described a method of providing a RAID array.
In one embodiment the method comprises providing an array of disks, creating an array layout comprising a plurality of blocks on each of the disks and a plurality of disk stripes that can be depicted in the layout with the stripes parallel to one another and diagonal to the disks, and assigning data blocks and parity blocks in the array layout with at least one parity block per disk stripe.
There will also be described a method of storing data, a method for reconstructing the data of a failed or otherwise inaccessible disk of a RAID array of disks, and a RAID disk array.
Each parity block P1 to P10 holds the parity of the data blocks along the diagonals (running from lower right to upper left in the figure) of the disk array layout 200.
Thus:
P1=D26 (P1 thus reflects the data block on diagonally opposite corner of array layout 200)
P2=D5
P3=D4+D10
P4=D3+D9+D15
P5=D2+D8+D14+D20
P6=D1+D7+D13+D19+D25
P7=D6+D12+D18+D24
P8=D11+D17+D23
P9=D16+D22
P10=D21
where ‘+’ denotes an XOR operation.
This approach therefore divides the available blocks into ten diagonal disk stripes 204a,b,c,d,e,f,g,h,i,j with varying RAID levels:
Array layout 200 constitutes a basic block of storage (or ‘storage unit’) according to this embodiment, comprising 6×6 blocks. This storage unit comprises—in this embodiment—a square matrix, which can however be of different sizes. (In other embodiments a storage unit may not be square.) In a disk array, each stripe chunk has one or more storage units.
The parity blocks inside a storage unit are not distributed as in RAID5. However the parity blocks can be shifted to another disk in the next storage unit. For example, if a disk array has stripe chunks each with 20 storage units, then in the first storage unit, the sixth disk may hold the parity blocks, in the second storage unit, the fifth disk may hold the parity blocks, and so on. However, the parity associations in all the blocks will be the same. Thus,
A logical unit (LU) can be allocated many such storage units. Also a LU can be allocated a mix of RAID1 storage units, RAID5 storage units and diagonal stripe storage units of the present embodiment. The amount of mixing depends on what RAID1 to RAID5 ratio the data residing in the LU demands. A user can specify a particular mix, or a system might allocate a predetermined mixture of all these stripes.
Inside a diagonal stripe storage unit, data can be moved from RAID1 to RAID5-3, RAID5-4, etc, depending on which units are most used. Therefore, unlike AutoRAID where data belong to any LU can be moved from RAID1 to RAIDS, this embodiment restricts data movement across RAID levels within a LU.
The method of this embodiment should improve the write performance of the disk array when compared with conventional RAID5 in many circumstances. In conventional RAID5, small writes that come to updated data blocks perform poorly. They employ the read-modify-write (RMW) style where in both the data and parity blocks are read, modified and updated. Each RMW write requires 4 I/Os and 2 parity calculations. According to this embodiment, not all data blocks have to perform RMW writes. The data blocks in RAID5 stripes have to perform RMW writes. The data blocks in Split Parity RAID5 stripes require 3 I/Os and 1 parity calculation for each RMW. The data blocks in the RAID1 stripes require 2 writes for each incoming write.
The below table indicates the number of I/Os and parity calculations that are required to perform random I/Os (which require RMW) on both a conventional RAID5 layout and on the layout of the present embodiment, with data blocks D1 to D26 (as employed in array layout 200 of
The number of I/Os required for reads are the same. However, for the data blocks that are in RAID1 mode, reads can happen in parallel on the original and mirror blocks and hence there can be some benefit according to this embodiment.
The performance of sequential writes is difficult to predict as the performance depends on the span of the sequential writes. Generally for large sequential writes, RAID5 is expected to perform better than the method of this embodiment.
The present embodiment also provides a method of providing a RAID array, for use when storing data in a RAID array, which is summarized in flow diagram 400 of
At step 406, data and parity blocks are assigned in the next storage unit (which may be the first or indeed only storage unit). In practice this step may be performed simultaneously with or as a part of step 404. This step comprises selecting—in each respective storage unit—a block to act as parity block and the remainder of the blocks to act as data blocks. In this particular embodiment, this is done by selecting one disk of each respective storage unit, all of whose blocks—in the respective storage unit—are to act as parity blocks, though the disk selected for this purpose may differ from one storage unit to another.
This assignment also includes specifying one block of all but one of the other disks of the respective storage unit to act as a parity block. If the storage unit is one of a plurality of storage units in the stripe chunk, this step includes selecting a different disk to provide parity blocks exclusively from that selected for that purpose in the previous storage unit, but adjacent thereto (cf.
At step 408, it is determined if the stripe chunk includes more storage units. If so, processing returns to step 406. Otherwise, processing ends.
The method of this embodiment is expected to perform better than conventional RAID5 in data reconstruction operation as well.
Thus, 21 reads and 6 writes are required. By comparison, 30 reads and 6 writes would be required to perform the same recovery in normal RAID5.
This method of data reconstruction is summarized in flow diagram 600 of
At step 608, it is determined if there remains any other lost block in the failed disk. If so, processing returns to step 602. If not, processing ends.
If the disk that fails is towards the periphery of the array layout, fewer I/Os and parity calculations will be required. For example, if first disk 202a fails, then the following operations will be required:
This requires 16 reads, 4 parity calculations and 5 writes, or 21 I/Os and 4 parity calculations.
The method of this embodiment provides scope for improved data storage.
Although all the exemplary storage units described above are square (e.g. 6×6), in other embodiments this need not be so (though it may mean that there not be any RAID1 type storage). For example,
The method and array layout of the above-described embodiments may not be the most suitable in all applications. For example, the usable capacity of the array layout of
Furthermore, this method requires a more complex RAID management algorithm to manage the three different RAID levels and to keep track of the diagonal striping.
In some embodiments the necessary software for controlling a computer system to perform the method 400 of
The foregoing description of the exemplary embodiments is provided to enable any person skilled in the art to make or use the present invention. While the invention has been described with respect to particular illustrated embodiments, various modifications to these embodiments will readily be apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. It is therefore desired that the present embodiments be considered in all respects as illustrative and not restrictive. Accordingly, the present invention is not intended to be limited to the embodiments described above but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Number | Date | Country | Kind |
---|---|---|---|
2002/CHE/2006 | Oct 2006 | IN | national |