1. Field of the Invention
This invention generally relates to data storage and, more particularly, to a system and method for automating full stripe operations in a redundant array of disk drives (RAID).
2. Description of the Related Art
For RAID 5, one of the stripelets is designated as a parity stripelet. This stripelet consists of the XOR of all the other stripelets in the stripe. The operation for XOR'ing the data for a parity stripelet is referred to as P-calculation. The purpose of the parity is to provide for a level of redundancy. Since the RAID is now depicting a virtual disk consisting of multiple physical disks, there is a higher probability of one the individual physical disks failing. If one of the stripelets cannot be read due to an individual disk error or failure, the data for that stripelet can be reassembled by XOR'ing all the other stripelets in the stripe.
The RAID 5 depicted consists of n drives. In this example, n=5. The virtual disk capacity of the system is (n−1). The data block size is equal to the sector size of an individual drive. Each stripelet consists of x data blocks. In the example shown, x=4. The stripe size is (n−1)x. For example, a virtual drive may include 2 terabytes (TB), a drive may include 500 megabytes (MB), a sector may be 512 bytes, a stripelet may be 2 kilobytes (KB), and a stripe may be 8 KB.
One benefit of RAID 5 and 6, other than the increased fault resiliency, is better performance when reading from the virtual disk. When multiple read commands are queued for the RAID'ed disks, the operations can be performed in parallel, which can result in a significant increase in performance compared to similar operations to a single disk. If, however, there is a failure reading the requested data, then all the remaining data of the stripe needs to be read, to calculate the requested data.
For operations that write data to the RAID'ed disks, however, performance can be adversely affected due to the P and Q calculations necessary to maintain redundant information per stripe of data. In RAID 5, for every write to a stripelet, the previously written data to that stripelet needs to be XOR'ed with the P-stripelet, effectively removing the redundant information of the “old” data that is to be overwritten. The resulting calculation is then XOR'ed with the new data, and both the new data and the new P-calculation are written to their respective disks in the stripe. Therefore, a RAID 5 write operation may require two additional reads and one additional write, as compared to a single disk write operation. For RAID 6, there is an additional read and write operation for every Q-stripelet.
If most of the write operations are sequential in nature, the write performance penalty can be lessened significantly by performing “full stripe write” operations. This method entails caching write data into an intermediate buffer, as it normally would, but instead of reading the previously written data and parity stripelets, the controller continues to cache subsequent commands until it has either cached enough data for an entire stripe, or a timeout has occurred. If the timeout occurs, the controller continues the write as described above. However, if the entire stripe is able to be cached, the controller can calculate the P and Q stripelets without the need of reading previously written data and parity.
Although, full stripe writes increase performance by reducing the number of disk accesses, the performance is gated by certain bandwidth limitations of the processor and the memory accesses in the controller during the P and Q calculations. Typically, the controller's direct memory access (DMA) engine can be programmed by the controller's processor to perform a P or Q calculation. Once the data for the entire stripe is cached, the processor allocates a stripelet buffer for each P and Q calculation. It first fills these buffers with zeroes. It then proceeds to issue a command to the controller's DMA engine to perform a P or Q calculation for each data stripelet in cache. Upon receiving the command, the DMA engine reads a certain number of bytes, or a “line” of data, from the data stripelet in memory. It also reads the corresponding line of data from the allocated P or Q stripelet buffer. It performs the P or Q calculation on the two lines of data and writes the result back to the P or Q stripelet buffer, effectively 3 DMA operations per line. Then the next lines are read, calculated, and written back. This process continues until the calculations are complete for the entire stripelet of data. This process needs to be repeated for every cached data stripelet in the stripe. If the stripe supports multiple P and Q stripelets, the entire procedure needs to be done for each P and Q stripelet. For example, to perform a full stripe write in a 32 disk RAID 6, the processor reads 30 stripelets of data into memory, allocates and zeros out 2 stripelet buffers for the P and Q calculation, issues 30 commands to the DMA engine to perform the P calculations, and then issues 30 commands to the DMA engine to perform the Q calculations. If the stripelet size is 64 kilobytes and the line size is 512 bytes, then the P and Q calculations for the entire stripe require 23,040 DMA operations [(3*(65536/512)*30)*2] or 7680 data reads, 3840 P reads, 3840 P writes, 3840 Q reads and 3840 Q writes.
It would be advantageous if a process existed to speed up the calculation of XOR and Galois products for an entire stripe of data that did not involve the extensive use of memory or microprocessor operations.
The present invention introduces a stripe handling process that improves memory access by avoiding the writing of partially calculated data to the P and Q stripelets. Each partial calculation requires a read followed by a write to the P or Q stripelet for every read of a data stripelet. Stripe handling performs calculations for the whole stripe, allowing the P or Q stripelet to be written only once, after all the data stripelets have been read. A reading of the P and Q stripelets is no longer necessary. Since multiple calculations are done in parallel, the data stripelets need to be read only once. Considering the 32 disk RAID 6 example, the same operation using the Stripe Handler requires 3840 data reads, 128 P writes, and 128 Q writes, resulting in a total of 4096 DMA operations of 512 bytes each, versus 23,040 DMA operations for a conventional RAID 6 system. The need to pre-fill the P and Q stripelets in memory with zeroes is also eliminated. Processor overhead is also improved by creating one command versus 60 partial commands.
Accordingly, a method is provided for automating full stripe operations in a redundant data storage array. In a redundant storage device controller a parity product is accumulated that is associated with an information stripe. The parity product is stored in controller memory in a single write operation. A stored parity product can then be written in a storage device. More explicitly, a parity product may be accumulated in a RAID controller, stored in a RAID controller memory, and the stored parity product written in a RAID.
For example, the controller may receive n data stripelets for storage in the RAID. The parity product is accumulated by creating m parity stripelets, and the m parity stripelets are written into the controller memory in a single write operation.
Alternately, the controller may receive (n+m−x) stripelets from a RAID with (n+m) drives. In this aspect, accumulating the parity product includes recovering x stripelets. Then, storing the parity product involves writing x stripelets into controller memory in a single write operation.
Additional details of the above-described method and a system for automating full stripe operations in a redundant data storage array are provided below.
The RAID controller 610 may include a parity processor 616 for accumulating parity products using an exclusive-or (XOR) calculations (e.g., RAID 5), or Galois products, or a combination of Galois products and XOR calculations (e.g., RAID 6). In another aspect, the RAID controller 610 is able to parallely accumulate both P and Q parity information. The controller 610 may accumulate information from a first group of corresponding data blocks, and then accumulate information from a second group of corresponding data blocks in the same stripe. As noted in more detail below, the accumulation of parity information may be accomplished with accumulator hardware (see
The RAID controller 610 may accumulate a parity product that involves the creation of a parity stripelet or the recovery of a stripelet. In one aspect, the RAID controller 610 includes a host interface on line 614 for receiving n data stripelets for storage in the RAID. The RAID controller 610 creates m parity stripelets and writes the m parity stripelets into the controller memory 612 in a single write operation.
In one aspect the RAID includes (n+m=p) drives. The RAID controller 610 accumulates a parity product by receiving (n+m−x) stripelets from the RAID interface 608, recovers x stripelets, and writes x stripelets into controller memory 612 in a single write operation.
More explicitly, the RAID controller 610 receives n data stripelets (n=3) for storage in the RAID via the host interface 613, with a first plurality of data blocks in each data stripelet. In this example each stripelet includes 4 data blocks. The RAID controller accumulates parity information for the first group of information blocks (data blocks 0, 4, and 8) from the 3 stripelets, writes the parity information for the first group of data blocks in a single write operation. The controller 610 iteratively creates and writes parity information for groups of information blocks from the first plurality until m parity stripelets are created. In a RAID 5 system, m=1, and in a RAID 6 system, m=2. If multiple parity stripelets are created, the parity information for each parity stripelet is accumulated parallely. After processing the first group of data blocks, a second group of data blocks (data blocks 1, 5, and 9) are processed to create a corresponding parity block (RAID 5) or parity P and parity Q blocks (RAID 6). The process is iteratively repeated until the entire parity stripelet(s) is created.
As another variation however, the RAID controller 610 may receive n data stripelets for storage in the RAID via the host interface, with a first and second data block in each data stripelet (as defined above). The RAID controller 610 initially accumulates and writes a parity product(s) for a first data block in a single write operation, and subsequently accumulates and writes a parity product(s) for the second data block in a single write operation.
At a more detailed level, the parity processor 616 accumulates the parity product for the information strip by performing a parity operation with the first bit of a first stripelet (e.g., the first bit of data block 0, see
Although the system depicted in
The system of
The command is organized in groups, where each group consists of a finite number of addresses. For each group, the DMA engine is dedicated to the Stripe Handler. After completing a task on a group, the DMA Engine allows other devices within the controller to access memory, before starting work on the next group. The grouping provides predictable memory access behavior in the Stripe Handler regardless of the number of addresses in the overall command. In one implementation the maximum number of addresses per group is 4, but the command can be organized to specify a number of addresses between 1 and 4. Each group can consist of different number of addresses. The last group specifies the destination addresses of the P and Q stripelets.
In operation, the Stripe Handler starts with the first group and reads in a line of data from each source address in the group. It performs the calculations specified in the command and stores the results in internal accumulators. The accumulators are equal in size to that of a line of data. The calculations are done in parallel. The Stripe Handler then proceeds to the next group, reading a line of data from each source address, updating the P and/or Q calculations in the accumulators. It then continues with the remaining groups until a line of data has been read from all the source addresses. At the last group, the Stripe Handler writes the contents of the accumulators to the destination addresses. Since all the calculations are performed before writing to the P and/or Q, zeroing out the destination stripelets is unnecessary. The Stripe Handler then goes back to the first group and reads the next line of data from the source addresses. It then proceeds through all of the groups until a line of data has been read from all source addresses and the resulting calculations have been written to the destination addresses. This process is repeated until the entire length of all the data stripelets have been read and the entire length of the P and Q stripelets have been written.
The Stripe Handle improves memory access by avoiding the writing of partially calculated data to the P and Q stripelets. Each partial calculation requires a read followed by a write to the P or Q stripelet for every read of a data stripelet. The Stripe Handler performs calculations for the whole stripe which permits the P or Q stripelet to be written only after all data stripelets have been read. Reading of the P and Q stripelets is no longer necessary. And, since multiple calculations are done in parallel, the data stripelets need to be read only once. The need to pre-fill the P and Q stripelets in memory with zeroes is also eliminated. Processor overhead is also improved by creating a single command.
Memory access is also more efficient by grouping the reads and writes together, instead of interleaving writes with reads during partial calculation updates. The grouping of data stripelet reads provides predictable memory bandwidth utilization and allows the Stripe Handler to support any number of disks within a stripe without creating adverse side effects for other resources requiring memory bandwidth. The ability to format the command in such a way that reduces the number of addresses per group allows for tuning the memory utilization of the Stripe Handler.
Although, the above description has focused on full-stripe writes, the Stripe Handler can also be used when full-stripe reads are necessary to reconstruct data due to a failure of a disk.
Step 902 accumulates a parity product associated with an information stripe in a redundant storage device controller. The accumulating of parity product involves either creating a parity stripelet or recovering a stripelet. Typically, Step 902 parallely accumulates P and Q parity information, e.g., for a RAID 6 system. The accumulation of the parity products may use an operation such as XOR calculations, Galois products, or combination of Galois products and XOR calculations. In a single write operation, Step 904 stores the parity product in a controller memory. Step 906 writes the stored parity product in a (one or more) storage device(s).
In one aspect, accumulating the parity product in Step 902 includes accumulating the parity product in a RAID controller. Then, storing the parity product (Step 904) includes storing the parity product in a RAID controller memory. Writing the stored parity product in Step 906 includes writing the stored parity product in a RAID.
For example, Step 901a receives n data stripelets for storage in the RAID at the controller. Then, accumulating the parity product in Step 902 includes creating m parity stripelets, and storing the parity product (Step 904) includes writing the m parity stripelets into the controller memory in a single write operation.
In one aspect, receiving n data stripelets for storage (Step 901a) includes receiving a first data block in each data stripelet, and Step 902 creates m parity stripelets by accumulating parity for the first data block from the n data stripelets. Then, writing the m parity stripelets into the controller memory (Step 904) includes writing the parity information for the first data block in a single write operation.
In another aspect, receiving n data stripelets for storage in Step 901a includes receiving a first plurality of data blocks in each data stripelet. Step 902 creates m parity stripelets by accumulating parity information for a first group of data blocks from the first plurality. Then, writing the m parity stripelets into the controller memory (Step 904) includes substeps. Step 904a writes the parity information for the first group of data blocks in a single write operation. Step 904b iteratively creates and writes parity information for groups of information blocks from the first plurality until the m parity stripelets are created.
In another variation, creating m parity stripelets in Step 902 includes substeps. Step 902a accesses a DMA processor. Step 902b controls the DMA processor to partially accumulate parity information associated with the first data block in the n data stripelets. Step 902c releases control over the DMA processor. Step 902d iteratively accesses the DMA processor until the parity information for the first data block in all the n data stripelets is fully accumulated.
In a different aspect, Step 901b receives (n+m−x) stripelets from a RAID with (n+m) drives at the controller, and accumulating the parity product in Step 902 includes recovering x stripelets. Then, storing the parity product in Step 904 includes writing x stripelets into controller memory in a single write operation.
In one aspect, accumulating parity product for the information strip (Step 902) includes a different set of substeps. Step 902e performs a parity operation with the first bit of a first stripelet. Step 902f creates a partial parity accumulation. Step 902g serially performs a parity operation between the first bit of any remaining stripelets in the strip, and the partial parity accumulation. Step 902h forms the accumulated parity product in response to a final parity operation.
Alternately stated, accumulating the parity product for the information strip (Step 902) includes completely calculating a parity product for a first bit in the information strip. Then, storing the parity product in a single write operation (Step 904) includes storing only the completely calculated parity product for the first bit.
A system and method have been presented for automating full stripe operations in a redundant data storage array. RAID 5 and RAID 6 structures have been used as examples to illustrate the invention. however, the invention is not limited to merely these examples. Other variations and embodiments of the invention will occur to those skilled in the art.