Method and system for syndrome generation and data recovery

Description

TECHNICAL FIELD

Embodiments of the invention relate to syndrome generation and data recovery, and more specifically to PQ RAID syndrome generation and data recovery.

BACKGROUND

With the increase in use of large-scale storage systems, such as with Fiber Channel and Gigabit Ethernet systems, there is an increase in the susceptibility of these systems to multiple disk failures. The rapid growth of disk capacity also prolongs the disk recovery time in the event of disk failures. This prolonged recovery time increases the probability of subsequent disk failures during the reconstruction of user data and parity information stored in a faulty disk. In addition, latent sector failures caused by data that was left unread for a long period of time may prevent data recovery after a disk failure that results in loss of data. The use of less expensive disks, such as ATA (Advanced Technology Attachment) disks, in arrays where high data integrity is required also increases the probability of such disk failures.

RAID (Redundant Array of Independent Disks) architectures have been developed to allow recovery from disk failures. Typically, the XOR (Exclusive-OR) of data from a number of disks is maintained on a redundant disk. In the event of a disk failure, the data on the failed disk is reconstructed by XORing the data on the surviving disks. The reconstructed data is written to a spare disk. However, data will be lost if the second disk fails before the reconstruction is complete. Traditional disk arrays that protect the loss of no more than one disk are inadequate for data recovery, especially for large-scale storage systems.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings in which like reference numerals refer to similar elements.

FIG. 1 is a block diagram illustrating a system that allows for recovery from multiple disk failures.

FIG. 2 is a table of example values for a Galois field.

FIG. 3 is a block diagram illustrating a system according to an embodiment of the invention.

FIG. 4 is a flow diagram illustrating a method according to an embodiment of the invention.

DETAILED DESCRIPTION

Embodiments of a system and method for syndrome generation and data recovery are described. In the following description, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In other instances, well-known circuits, structures and techniques have not been shown in detail in order not to obscure the understanding of this description.

Reference throughout this specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. Thus, the appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

Referring to FIG. 1, a block diagram illustrates a system 100 that allows for recovery from multiple disk failures. The system 100 includes one or more storage blocks for storing data, such as 102-124, and two or more storage blocks for storing parity or syndrome information, such as 130-144. In one embodiment, the system 100 is a RAID (Redundant Array of Independent Disks) system. In one embodiment, two syndromes are generated and stored: P syndrome and Q syndrome. The P syndrome is generated by computing parity across a stripe. The Q syndrome is generated by using Galois Field multiplication. The regeneration scheme for data recovery uses both Galois Field multiplication and division.

The following are the equations for generating P and Q for a storage array with n data disks and two check disks:

P=D
₀
⊕D
₁
⊕D
₂
. . . ⊕D
_n−1 (Equation 1)

Q=g
⁰
*D
₀
⊕g
¹
*D
₁
⊕g
²
*D
_{2. . .⊕g}
^n−l
*D
_n−1 (Equation 2)

is the simple parity of data (D) computed across a stripe using ⊕ (XOR) operations. Q requires multiplication (*) using a Galois Field multiplier (g).

The following equations show the generation of P and Q when updating a data block D_a:

P(new)=P(old)⊕D_a(old)⊕D_a(new)

Q(new)=Q(old)⊕g^a*D_a(old)⊕g^a*D_a(new).

There are four cases of multiple disk failure that require recovery. In case one, P and Q fail. In this case, P and Q may be regenerated using Equations 1 and 2 shown above.

In case two, Q and a data disk (D_a) fail. In this case, D_amay be regenerated using P and the remaining data disks via Equation 1. Q may then be regenerated using Equation 2.

In case three, P and a data disk (D_a) fail. In this case, D_amay be regenerated using Q, the remaining data disks, and the following equation:

D
_a=(Q⊕Q_a)*g^−a=(Q⊕Q_a)*g^255−a, where

Q
_a
=g
⁰
D
₀
⊕g
¹
D
₁
⊕. . .⊕g
^a−1
D
_a−1
⊕g
^a+1
D
_a+1
. . .⊕g
ⁿ⁻¹
D
_n−1.

After D_ais regenerated, P may be regenerated using Equation 1.

In case four, two data disks (D_aand D_b) fail. In this case, D_aand D_bmay be regenerated using P and Q, the remaining data disks, and the following equations:

D
_a=(g^−a*(Q⊕Q_ab)⊕g^b−a*(P⊕P_ab))/(g^b−a⊕0000 0001)

D
_b
=D
_a⊕(P⊕P_ab), where

P
_ab
=D
₀
⊕D
₁
⊕. . .⊕D
_a−1
⊕D
_a+1
. . .⊕D
_b−1
⊕D
_b+l
. . .⊕D
_n−1

Q
_ab
=g
⁰
D
₀
⊕g
¹
D
₁
⊕. . . ⊕g
^a−1
D
_a−1
⊕g
^a+1
D
_a+1
. . .⊕g
^b−1
D
_b−1
⊕g
^b+1D_b+1. . .⊕gⁿ⁻¹D_n−1.

The following are examples of recovery from disk failures in the cases described above. In the following examples, the datapath is assumed to be one byte or 8 bits wide. Therefore, a Galois Field, GF (2⁸) is used. The invention may be implemented for datapaths that are more or less than one byte wide, and larger or smaller Galois fields may be used.

The following equations may be used for multiplying two 8-bit elements (b and c) to yield an 8-bit product (a).

b=[b7 b6 b5 b4 b3 b2 b1 0] and c=[c7 c6 c5 c4 c3 c2 c1 c0].

$a 0 = b 0 \cdot c 0 \oplus b 7 \cdot c 1 \oplus b 6 \cdot c 2 \oplus b 5 \cdot c 3 \oplus b 4 \cdot c 4 \oplus b 3 \cdot c 5 \oplus b 7 \cdot c 5 \oplus b 2 \cdot c 6 \oplus b 7 \cdot c 6 \oplus b 6 \cdot c 6 \oplus b 1 \cdot c 7 \oplus b 7 \cdot c 7 \oplus b 6 \cdot c 7 \oplus b 5 \cdot c 7$

$a 1 = b 1 \cdot c 0 \oplus b 0 \cdot c 1 \oplus b 7 \cdot c 2 \oplus b 6 \cdot c 3 \oplus b 5 \cdot c 4 \oplus b 4 \cdot c 5 \oplus b 3 \cdot c 6 \oplus b 7 \cdot c 6 \oplus b 2 \cdot c 7 \oplus b 7 \cdot c 7 \oplus b 6 \cdot c 7$

$a 2 = b 2 \cdot c 0 \oplus b 1 \cdot c 1 \oplus b 7 \cdot c 1 \oplus b 0 \cdot c 2 \oplus b 6 \cdot c 2 \oplus b 7 \cdot c 3 \oplus b 5 \cdot c 3 \oplus b 6 \cdot c 4 \oplus b 4 \cdot c 4 \oplus b 5 \cdot c 5 \oplus b 3 \cdot c 5 \oplus b 7 \cdot c 5 \oplus b 2 \cdot c 6 \oplus b 7 \cdot c 6 \oplus b 6 \cdot c 6 \oplus b 4 \cdot c 6 \oplus b 1 \cdot c 7 \oplus b 3 \cdot c 7 \oplus b 6 \cdot c 7 \oplus b 5 \cdot c 7$

$a 3 = b 3 \cdot c 0 \oplus b 2 \cdot c 1 \oplus b 7 \cdot c 1 \oplus b 1 \cdot c 2 \oplus b 7 \cdot c 2 \oplus b 6 \cdot c 2 \oplus b 0 \cdot c 3 \oplus b 6 \cdot c 3 \oplus b 5 \cdot c 3 \oplus b 7 \cdot c 4 \oplus b 5 \cdot c 4 \oplus b 4 \cdot c 4 \oplus b 6 \cdot c 5 \oplus b 4 \cdot c 5 \oplus b 3 \cdot c 5 \oplus b 7 \cdot c 5 \oplus b 2 \cdot c 6 \oplus b 6 \cdot c 6 \oplus b 5 \cdot c 6 \oplus b 3 \cdot c 6 \oplus b 2 \cdot c 7 \oplus b 4 \cdot c 7 \oplus b 1 \cdot c 7 \oplus b 5 \cdot c 7$

$a 4 = b 4 \cdot c 0 \oplus b 3 \cdot c 1 \oplus b 7 \cdot c 1 \oplus b 2 \cdot c 2 \oplus b 7 \cdot c 2 \oplus b 6 \cdot c 2 \oplus b 1 \cdot c 3 \oplus b 7 \cdot c 3 \oplus b 6 \cdot c 3 \oplus b 5 \cdot c 3 \oplus b 0 \cdot c 4 \oplus b 6 \cdot c 4 \oplus b 5 \cdot c 4 \oplus b 4 \cdot c 4 \oplus b 5 \cdot c 5 \oplus b 4 \cdot c 5 \oplus b 3 \cdot c 5 \oplus b 2 \cdot c 6 \oplus b 4 \cdot c 6 \oplus b 3 \cdot c 6 \oplus b 1 \cdot c 7 \oplus b 7 \cdot c 7 \oplus b 2 \cdot c 7 \oplus b 3 \cdot c 7$

$a 5 = b 5 \cdot c 0 \oplus b 4 \cdot c 1 \oplus b 3 \cdot c 2 \oplus b 7 \cdot c 2 \oplus b 2 \cdot c 3 \oplus b 7 \cdot c 3 \oplus b 6 \cdot c 3 \oplus b 1 \cdot c 4 \oplus b 7 \cdot c 4 \oplus b 6 \cdot c 4 \oplus b 5 \cdot c 4 \oplus b 0 \cdot c 5 \oplus b 6 \cdot c 5 \oplus b 5 \cdot c 5 \oplus b 4 \cdot c 5 \oplus b 5 \cdot c 6 \oplus b 4 \cdot c 6 \oplus b 3 \cdot c 6 \oplus b 2 \cdot c 7 \oplus b 4 \cdot c 7 \oplus b 3 \cdot c 7$

$a 6 = b 6 \cdot c 0 \oplus b \cdot c 1 \oplus b 4 \cdot c 2 \oplus b 3 \cdot c 3 \oplus b 7 \cdot c 3 \oplus b 2 \cdot c 4 \oplus b 7 \cdot c 4 \oplus b 6 \cdot c 4 \oplus b 1 \cdot c 5 \oplus b 7 \cdot c 5 \oplus b 6 \cdot c 5 \oplus b 5 \cdot c 5 \oplus b 0 \cdot c 6 \oplus b 6 \cdot c 6 \oplus b 5 \cdot c 6 \oplus b 4 \cdot c 6 \oplus b 5 \cdot c 7 \oplus b 4 \cdot c 7 \oplus b 3 \cdot c 7$

$a 7 = b 7 \cdot c 0 \oplus b 6 \cdot c 1 \oplus b 5 \cdot c 2 \oplus b 4 \cdot c 3 \oplus b 3 \cdot c 4 \oplus b 7 \cdot c 4 \oplus b 2 \cdot c 5 \oplus b 7 \cdot c 5 \oplus b 6 \cdot c 5 \oplus b 1 \cdot c 6 \oplus b 7 \cdot c 6 \oplus b 6 \cdot c 6 \oplus b 5 \cdot c 6 \oplus b 0 \cdot c 7 \oplus b 6 \cdot c 7 \oplus b 5 \cdot c 7 \oplus b 4 \cdot c 7$

FIG. 2 shows a table 200 providing example values of the Galois field multiplier g^afor g=0000 0010. The negative power of a generator for GF(2⁸) can be computed using the following equation:

g
^−a
=g
^255−a.

The following example shows how to generate P and Q parity for a disk array with four data disks and two parity disks. Assume that each data block contains one data byte. Let D_ibe the data contain in disk I (i=0,1,2,3). Consider the following data stripe:

D₀=1011 0100, D₁=0010 1100, D₂=1000 1110 and D₃=1101 0101.

Then, P may be generated using Equation 1 as follows:

$\begin{matrix} P = D_{0} \oplus D_{1} \oplus D_{2} \oplus D_{3} \\ = 1011 0100 \oplus 0010 1100 \oplus 1100 0110 \oplus 1101 0101 \\ = 1000 1011. \end{matrix}$

Q may be generated using Equation 2 as follows:

Q=g
⁰
D
₀
⊕g
¹
D
₁
⊕g
²
D
₂
⊕g
³
D
₃.

From the table in FIG. 2, g⁰=0000 0001, g¹=0000 0010, g²=0000 0100, and g³=0000 1000.

$\begin{matrix} Therefore, Q = 0000 0001 * 1011 0100 \oplus 0000 0010 * 0010 1100 \oplus \\ 0000 0100 * 1100 0110 \oplus 0000 1000 * 1101 0101 \\ = 1011 0100 \oplus 0101 1000 \oplus 0011 1111 \oplus 1110 0110 \\ = 0011 0101. \end{matrix}$

The following example shows how to recover from two disk failures using the array generated above. In the first case, P and Q fail. In this case, P and Q are regenerated using Equations 1 and 2 as shown above. In the second case, Q and a data disk (D_a) fail. In this case, D_amay be regenerated using P and the remaining data disks via Equation 1. Q may then be regenerated using Equation 2. In the third case, P and a data disk (D_a) fail. In this case, D_amay be regenerated using Q, the remaining data disks, and the following equation:

D
_a=(Q⊕Q_a)*g⁻¹=(Q⊕Q_a)* g^255−a, where

Q
_a
=g
⁰
D
₀
⊕g
¹
D
₁
⊕. . .⊕g
^a−1
D
_a−1
⊕g
^a+1
D
_a+1
. . .⊕g
ⁿ⁻¹
D
_n−l.

For example, suppose disk 2 fails. Then,

$\begin{matrix} D_{2} = (Q \oplus Q_{2}) \cdot g^{253} = (Q \oplus g^{0} D_{0} \oplus g^{1} D_{1} \oplus g^{3} D_{3}) \cdot g^{253} \\ = (\begin{matrix} 0011 0101 \oplus 0000 0001 * 1011 0100 \oplus 0000 0010 * \\ 0010 1100 \oplus 0000 1000 \end{matrix}) * g^{253} \end{matrix}$

Using the table in FIG. 2, g²⁵³=0100 0111. Therefore,

$\begin{matrix} D_{2} = (0011 0101 \oplus 1011 0100 \oplus 0101 1000 \oplus 1110 0110) * \\ 0100 0111 \\ = 0011 1111 * 0100 0111 \\ = 1100 0110. \end{matrix}$

P may then be regenerated using Equation 1, since all data blocks are now available.

In the fourth case, two data disks (D_aand D_b) fail. In this case, D_aand D_bmay be regenerated using P and Q, the remaining data disks, and the following equations:

D
_a=(g^−a*(Q⊕Q_ab)⊕g^b−a*(P⊕P_ab))/(g^b−a⊕0000 0001)

D
_b
=D
_a⊕(P⊕P_ab), where

P_ab=D₀⊕D₁⊕. . .⊕D_a−1⊕D_a+1. . . ⊕D_b−1⊕D_b+1. . .⊕D_n−1

Q
_ab
=g
⁰
D
₀
⊕g
¹
D
₁
⊕. . .⊕g
^a−1
D
_a−1
⊕g
^a+1
D
_a+1
. . .⊕g
^b−1
D
_b−1
⊕g
^b+1
D
_b+1
. . . ⊕g
ⁿ⁻¹
D
_n−1.

For example, assume that disks 1 and 3 failed. Then,

$\begin{matrix} D_{1} = (g^{- 1} \cdot (Q \oplus Q_{13}) \oplus g^{3 - 1} * (P \oplus P_{13})) / (g^{3 - 1} \oplus 0000 0001) \\ = (g^{254} * (Q \oplus Q_{13}) \oplus g^{2} * (P \oplus P_{13})) / (g^{2} \oplus 0000 0001) \end{matrix}$

$\begin{matrix} Q \oplus Q_{13} = 0011 0101 \oplus 0000 0001 * 1011 0100 \oplus 0000 0100 * \\ 1100 0110 \\ = 0011 0101 \oplus 1011 0100 \oplus 0011 1111 = 1011 1110 \\ P \oplus P_{13} = 1000 1011 \oplus 1011 0100 \oplus 1100 0110 = 1111 1001 \end{matrix}$

From the table in FIG. 2, g²⁵⁴=1000 1110 and g²=0000 0100. Therefore,

$\begin{matrix} D_{1} = ((1000 1110 * 1011 1110) \oplus (0000 0100 * 1111 1001)) / \\ (0000 0100 \oplus 0000 0001) \\ = (0101 1111 \oplus 1100 0011) / (0000 0101) \\ = (1011 1100) / (0000 0101) \\ = 0010 1100. \\ D_{3} = D_{1} \oplus (P \oplus P_{13}) \\ = 0010 1100 \oplus 1111 1001 \\ = 1101 0101. \end{matrix}$

FIG. 3 is a block diagram of a system 300 according to an embodiment of the invention. System 300 includes a multiplier 302 and one or more comparators, such as 304 and 306. In one embodiment, one or more of the comparators are XOR (Exclusive-OR) gates. System 300 may also include one or more buffers, such as 308 and 310. The buffer 308 stores the output from the comparator 306. The comparator 306 compares the data 320 read from the storage blocks shown in FIG. 1 with the output of buffer 308. In this way, the comparator 306 may be used to compute the P syndrome described above, which is the parity across a stripe.

The multiplier 302 multiplies the multiplicand 330 with the data 320 read from the storage blocks shown in FIG. 1. In one embodiment, the multiplicand 330 is a Galois field, such as shown in Table 200 of FIG. 2. The output of the multiplier 302 is compared to the output of the buffer 310 by comparator 304. In this way, a Galois field multiplication may be performed and the Q syndrome may be computed. The multiplier 302 may also be used to perform the various multiplication operations for the equations described above with respect to the four cases in which multiple disks fail. Data may be allowed to pass through the multiplier by setting the multiplicand 330 equal to one. A selector 312 may be used to select between the output of the multiplier 302 and the output of the comparator 304. In one embodiment, the selector 312 is a multiplexer (MUX).

System 300 may also include a divider 314 to be used to perform the division operations for the equations described above with respect to the four cases in which multiple disks fail. For example, in case four, the computation for regeneration of D_ahas a division operation, which may be performed by divider 314. Data may be allowed to pass through the divider 314 by setting the divisor 340 equal to one. This may be desired when no division operation is required to be performed.

As shown in FIG. 3, the system 300 performs the generation of the P and Q syndromes in parallel. Other multiplication and division operations that are required may also be performed by system 300. A selector 316 may be used to select the desired output of the system. In one embodiment, the selector 316 is a multiplexer (MUX).

FIG. 4 illustrates a method for generating parity to aid in the recovery of data in one or more storage blocks according to one embodiment of the invention. At 400, a first party factor is computed based on comparing data from one or more of the storage blocks. At 402, the data from one or more of the storage blocks is multiplied with a multiplication factor to generate a product. At 404, a second parity factor is computed based at least in part on the product. At 406, a selection is made between the first parity factor and the second parity factor. In one embodiment, the first parity factor is a P syndrome and the second parity factor is a Q syndrome as described above. In one embodiment, the second parity factor is further divided by a divisor. In one embodiment, the first parity factor and the second parity factor are buffered. In one embodiment, the first parity factor and the second parity factor are computed in parallel.

While the invention has been described in terms of several embodiments, those of ordinary skill in the art will recognize that the invention is not limited to the embodiments described, but can be practiced with modification and alteration within the spirit and scope of the appended claims. The description is thus to be regarded as illustrative instead of limiting.

Claims

1. An apparatus for generating information for reconstructing user data comprising: a first comparator to generate a first parity factor based on the user data;a multiplier to multiply the user data with a multiplication factor to generate a product;a second comparator coupled to the multiplier to generate a second parity factor based at least in part on the product; anda selector coupled to the first comparator and the second comparator to choose between the first parity factor and the second parity factor for output as the information for reconstructing the user data.
2. The apparatus of claim 1, wherein the first comparator and the second comparator operate in parallel to generate the first parity factor and the second parity factor.
3. The apparatus of claim 1, further comprising a divider coupled to the second comparator to perform division operations on the second parity factor.
4. The apparatus of claim 1, further comprising a first buffer coupled to the first comparator to store the first parity factor.
5. The apparatus of claim 1, further comprising a second buffer coupled to the second comparator to store the second parity factor.
6. The apparatus of claim 1, wherein the first comparator is an XOR (Exclusive OR) gate.
7. The apparatus of claim 1, wherein the second comparator is an XOR (Exclusive OR) gate.
8. The apparatus of claim 1, wherein the multiplier performs Galois Field multiplication.
9. The apparatus of claim 1, wherein the first comparator generates a RAID (Redundant Array of Independent Disks) syndrome based on the user data, wherein the user data is stored in one or more of the storage blocks.
10. The apparatus of claim 1, wherein the second comparator generates a RAID (Redundant Array of Independent Disks) syndrome based at least in part on the product.
11. A method for generating recovery information comprising: computing a first parity factor with a first comparator based on comparing the data from one or more storage blocks;generating a product by multiplying the data from one or more of the storage blocks with a multiplication factor;computing in parallel with the computing of the first parity factor a second parity factor based at least in part on the product, wherein the second parity factor is computed with a second comparator coupled to received the product; andoutputting the first and second parity factors.
12. The method of claim 11, wherein computing a first parity factor comprises computing a RAID (Redundant Array of Independent Disks) P-Syndrome based on the data from one or more of the storage blocks and wherein computing a second parity factor comprises computing a RAID Q-Syndrome based at least in part on the product.
13. The method of claim 12, wherein computing a RAID Q-Syndrome based at least in part on the product comprises computing a RAID Q-Syndrome based at least in part on a product of a Galois field multiplicand and the data from one or more of the storage blocks.
14. The method of claim 11, further comprising dividing the second parity factor by a divisor.
15. The method of claim 11, further comprising storing the first and second parity factors to storage blocks independent of the storage blocks storing the data.
16. A system comprising: one or more storage devices to store data; anda recovery device coupled to the one or more storage devices to generate recovery information based on the data, the recovery device including: a first comparator to generate a first parity factor based on data on one or more of the storage devices;a multiplier to multiply data from one or more of the storage devices with a multiplication factor to generate a product; anda second comparator coupled to the multiplier to generate a second parity factor based at least in part on the product.
17. The system of claim 15, wherein the recovery device further includes a divider coupled to the second comparator.
18. The system of claim 15, wherein the recovery device further includes a selector coupled to the first comparator and the second comparator to choose between the first parity factor and the second parity factor.
19. The system of claim 15, wherein the multiplier performs Galois Field multiplication.
20. The system of claim 15, wherein the first comparator generates a first RAID (Redundant Array of Independent Disks) syndrome based on data from one or more of the storage devices and the second comparator generates a second RAID syndrome based at least in part on the product.

CROSS REFERENCE TO RELATED APPLICATIONS

The instant application is a continuation application of, and claims priority under 35 USC § 120 to, U.S. patent application Ser. No. 11/021,708, filed Dec. 23, 2004.

Continuations (1)

	Number	Date	Country
Parent	11021708	Dec 2004	US
Child	12022009		US

Method and system for syndrome generation and data recovery

Information

Publication Number

Date Filed

Date Published

Inventors

CPC

US Classifications

International Classifications

Abstract

Description

Claims

CROSS REFERENCE TO RELATED APPLICATIONS

Continuations (1)