The present application is related to co-pending U.S. patent application, titled “Multiple Level (ML), Integrated Sector Format (ISF), Error Correction Code (ECC) Encoding And Decoding Processes For Data Storage Or Communication Devices And Systems,” Ser. No. 10/040,115, filed on Jan. 3, 2002, which is assigned to the same assignee as the present application.
The present invention relates to the field of data storage, and particularly to error correcting systems and methods employing an error correction algebraic decoder. More specifically, this invention relates to an improved algebraic decoder and associated method for correcting an arbitrary mixture of B-byte burst errors and t-byte random errors, provided that (B+2t) is less than, or equal to (R−1), where R denotes the number of check bytes, resulting in a decoding latency that is a linear function of the number of check bytes.
The use of cyclic error correcting codes in connection with the storage of data in storage devices is well established and is generally recognized as a reliability requirement for the storage system. Generally, the error correcting process involves the processing of syndrome bytes to determine the location and value of each error. Non-zero syndrome bytes result from the exclusive-ORing of error characters that are generated when data is read from the storage medium.
The number of error correction code (ECC) check characters employed depends on the desired power of the code. As an example, in many present day ECC systems used in connection with the storage of 8-bit bytes in a storage device, two check bytes are used for each error to be corrected in a codeword having a length of at most 255 byte positions. Thus, for example, six check bytes are required to correct up to three errors in a block of data having 249 data bytes and six check bytes. Six distinctive syndrome bytes are therefore generated in such a system. If there are no errors in the data word comprising the 255 bytes read from the storage device, then the six syndrome bytes are the all zero pattern. Under such a condition, no syndrome processing is required and the data word may be sent to the central processing unit. However, if one or more of the syndrome bytes are non-zero, then syndrome processing involves the process of identifying the location of the bytes in error and further identifying the error pattern for each error location.
The underlying mathematical concepts and operations involved in normal syndrome processing operations have been described in various publications. These operations and mathematical explanations generally involve first identifying the location of the errors by use of what has been referred to as the “error locator polynomial”. The overall objective of the mathematics involved employing the error locator polynomial is to define the locations of the bytes in error by using only the syndrome bytes that are generated in the system.
The error locator polynomial has been conventionally employed as the start of the mathematical analysis to express error locations in terms of syndromes, so that binary logic may be employed to decode the syndrome bytes into first identifying the locations in error, in order to enable the associated hardware to identify the error patterns in each location. Moreover, error locations in an on-the-fly ECC used in storage or communication systems are calculated as roots of the error locator polynomial.
A specific concern facing the data storage industry is the combination of poor read/write conditions and low signal-to-noise ratio data detection that are likely to cause read hard errors. A read hard error is comprised of an arbitrary mixture of B-byte burst errors and t-byte random errors in data sectors stored on a disk or data storage medium.
Typically, byte-alphabet, Reed-Solomon codes are used to format the stored sector data bytes into codewords, protected by redundant check bytes and used to locate and correct the byte errors in the codewords. Long codewords are more efficient for data protection against long bursts of errors as the redundant check byte overhead is averaged over a long data block. However, in data storage devices, long codewords cannot be used, unless a read-modify-write process is used because the logical unit data sector is 512 bytes long and the computer operating system assumes a 512-byte long sector logical unit. Each read-modify-write process causes a loss of a revolution of the data storage medium. Losing revolutions of the data storage medium lowers the input/output (I/O) command throughput. Therefore, frequent usage of the read-modify-write process becomes prohibitive.
Rather than uniformly adding check bytes to short codewords to correct more random errors in the short codewords, a method has been proposed for generating check bytes that are not rigidly attached to a short codeword but are shared by several short codewords in an integrated sector Reed-Solomon Error Correction Coding (ECC) format.
The combination of low signal to noise ratio and poor read/write conditions may result in both random errors as well as long bursts of byte errors (“mixed error mode”) becoming more and more likely at high areal densities and low flying heights, which is the trend in HDD industry. The occurrence of such mixed error mode combinations of random as well as burst errors is likely to cause the 512-byte sector interleaved on-the-fly ECC to fail, resulting in a more frequent use of a data recovery procedure that involves rereads, moving the head, etc.
These data recovery procedures result in the loss of disk revolutions, which, in turn, causes a lower input/output throughput. This performance loss is not acceptable in many applications such as audio-visual (AV) data transfer, for example, which will not tolerate frequent interruptions of video data streams. On the other hand, uniform protection of all single sectors against both random as well as burst errors, at the 512-byte logical unit sector format, would result in excessive and unacceptable check byte overheads. Such check byte overheads also increase the error rate due to the increase in linear density of the data.
Furthermore, the decoding latency, is typically a function of the square of the number of the check bytes (R2), which could further decrease the throughput performance of the storage system.
Therefore, it would be desirable to have an algebraic decoder and associated method for correcting an arbitrary mixture of burst errors and random errors with an improved decoding latency. The decoder is not limited to a specific number of random errors, such as 1 or 2 random errors. Further, the decoding latency should be a linear function of the overhead as compared to a conventional quadratic latency function (e.g., in the case of 2 random errors).
In accordance with the present invention, an error correction algebraic decoder and an associated method provide the capability to correct all combinations of burst errors (B) and random errors (t), provided that (B+2t) is less than, or equal to (R−1), wherein R denotes the number of check bytes, resulting in a decoding latency that is a linear function of the number of check bytes.
The above and other features of the present invention are realized by an error correction (ECC) algebraic decoder and associated method that correct a combination of a B-byte burst of errors and t-byte random errors in a failed sector, by iteratively adding and removing an erasure (N−B) times until the entire failed sector has been scanned, provided the following inequality is satisfied: (B+2t)≦(R−1), where N denotes the number of bytes, B denotes the length of the burst of errors, t denotes the total number of random errors, and R denotes the number of check bytes in the failed sector. This results in a corrected sector at a decoding latency that is a generally linear function of the number of the check bytes R, as follows: Decoding Latency=5R(N−B).
The various features of the present invention and the manner of attaining them will be described in greater detail with reference to the following description, claims, and drawings, wherein reference numerals are reused, where appropriate, to indicate a correspondence between the referenced items, and wherein:
The head stack assembly 12 also includes an E-shaped block 24 and a magnetic rotor 26 attached to the block 24 in a position diametrically opposite to the actuator arms 20. The rotor 26 cooperates with a stator (not shown) for the actuator arms 20 to rotate in a substantially radial direction, along an arcuate path in the direction of an arrow A. Energizing a coil of the rotor 26 with a direct current in one polarity or the reverse polarity causes the head stack assembly 12, including the actuator arms 20, to rotate around axis P in a direction substantially radial to the disks 14. A head disk assembly 33 is comprised of the disks 14 and the head stack assemblies 12.
A transducer head 40 is mounted on the free end of each actuator arm 20 for pivotal movement around axis P. The magnetic rotor 26 controls the movement of the head 40 in a radial direction, in order to position the head 40 in registration with data information tracks or data cylinders 42 to be followed, and to access particular data sectors on these tracks 42.
Numerous tracks 42, each at a specific radial location, are arrayed in a concentric pattern in a magnetic medium of each surface of data disks 14. A data cylinder includes a set of corresponding data information tracks 42 for the data surfaces of the stacked disks 14. Data information tracks 42 include a plurality of segments or data sectors, each containing a predefined size of individual groups of data records that are saved for later retrieval and updates. The data information tracks 42 can be disposed at predetermined positions relative to a servo reference index.
The hard drive controller 50 includes a logic drive circuit 105 that formats data from the hard disk assembly 33, for example from 8 bits to 32 bits. A FIFO register 110 stores the formatted data and exchanges the same with a sector buffer 120. The ECC system 100 receives the formatted data from the drive logic circuit 105 and performs the error correction coding algorithm of the present invention, as described herein.
A buffer manager 115 controls data traffic between the ECC system 100, a sector buffer (i.e., random access memory) 120, and a microprocessor 125. Another FIFO register 130 stores data and exchanges the same with the sector buffer 120. A sequence controller 135 is connected between the drive logic circuit 105, the microprocessor 125, and a host interface 140, to control the sequence operation of the data traffic and various commands across the hard drive controller 50. The host interface 140 provides an interface between the hard drive controller 50 and a host 60 (
First, a predetermined number of binary data elements, also termed bytes, in a data string are moved from the buffer 165 and streamed through an ECC write processor 167. In the ECC write processor 167, the data bytes are mapped into codewords drawn from a Reed-Solomon code. Next, each codeword is mapped in a write path signal-shaping unit 169 into a run length limited or other bandpass or spectral-shaping code and changed into a time-varying signal. The write path signal-shaping unit 169 includes an encoder 202 (
All the measures starting from the movement of the binary data elements from buffer 165 until the magnetic flux patterns are written on a selected disk track 42 (
When sequences of magnetic flux patterns are to be read from the disk 14, they are processed in a read path or channel (157, 159, 161, and 163) and written into the buffer 165. The time-varying signals sensed by transducer 40 are passed through the read/write transducer interface 157 to a digital signal extraction unit 159. Here, the signal is detected and a decision is made as to whether it should be resolved as a binary 1 or 0. As these 1's and 0's stream out of the signal extraction unit 159, they are arranged into codewords in the formatting unit 11.
Since the read path is evaluating sequences of Reed Solomon codewords previously recorded on the disk 14, then, absent error or erasure, the codewords should be the same. In order to test whether that is the case, each codeword is applied to an ECC read processor 163 over a path from a formatter 161. Also, the output from the ECC processor 163 is written into buffer 165. The read path also operates in a synchronous datastreaming manner such that any detected errors must be located and corrected within the codeword well in time for the ECC read processor 163 to receive the next codeword read from the disk track 42. The buffer 165 and the read and write channels may be monitored and controlled by the microprocessor 125 (
Having described the general environment in which the ECC system 100 of the present invention operates, the decoder 200, forming part of the ECC system 100, and its associated decoding method (depicted in
The main components of the ECC system 100 are illustrated in
In operation, and with reference to the decoding method 500 of
At step 530, method 500 inquires if the error locator polynomial 830 is valid, that is if the error locator polynomial 830 has distinct roots located within the range of admissible byte locations within the sector. If the error locator polynomial 830 is determined to be valid, the decoder 200 adds this error locator polynomial 830 to a solution list at step 540. Then, method 500 proceeds to step 600 where the decoder 200 removes and adds erasures as explained in
Following the iterative erasure removal and addition step 600, the decoder 200 performs a cyclic redundancy check (CRC) at step 555, selecting a valid burst and error locator as well as respective error values, to produce a corrected sector 560.
The initialization stage 510 will now be described in relation to
where “R” is the number of checks or check bytes in the sector; “i” denotes an index that goes from 0 to (R−1); and Si denotes an ith syndrome.
The initialization method 510 also calculates an erasure polynomial E(x) at step 715, as follows, assuming the first R bytes in the sector to be erased:
where αi refers to a known Galois Field representation of symbol values and locations, and where the value “i” varies between 0 and R−1.
In addition, the initialization method 510 further calculates a modified syndrome polynomial S(x)E at step 720, as follows:
S(x)E:=[E(x)*S(x)]mod xR, (3)
resulting in the modified syndrome polynomial 730.
Referring now to
To this end, decoder 200 starts at step 800 by defining initial polynomial basis values [l1, l2] 810 for solutions [B, t], such that the following inequality is satisfied:
(B+2t)≦(R−1), (4)
11:=[1, S(x)E], and (5)
12:=[0, xR], (6)
where each of [l1] and [l2] is a vector of two polynomials, and where B refers to the length of the burst of errors, and t refers to the total number of random errors within the same sector. As used herein, a “burst” is a contiguous sequence of bytes, many or all of which could be erasures; and “random errors” are individual bytes outside a burst, which are in error.
The decoder 200 iterates the calculation of the initial polynomial basis values [l1, l2] 810 for all the solutions [B, t] that satisfy the inequality (4) above, and removes the corresponding erasures, as expressed below:
Repeat [l1, l2]:=Remove Erasure ([l1, l2], αi), (7)
where i=B, . . . , (R−1), to generate an error locator polynomial v (830), which is the last value of v in l1[v, q] after (R−B) iterations of erasure removal steps, where each such removal step is illustrated in FIG. 10:
v in l1[v, q] (8)
In the foregoing expressions, [l1] is comprised of the vectors [v, q], and [l2] is comprised of the vectors [u, p], as follows:
[l1]=[v, q], and (9)
[l2]=[u, p], (10)
where v represents an error locator polynomial, q represents an error evaluation polynomial, u represents a previous error locator polynomial, and p represents a previous error evaluation polynomial.
Having determined the initial error locator polynomial 830, the decoder 200 iteratively removes and adds the erasures, as indicated at step 600 of
START:=0, and END:=B. (11)
With further reference to
With reference to
LOCS [u, v], VALS [p, q], l1:=[v, q], l2:=[u, p] (12)
At step 965, process 620 initiates an erasure removal algorithm by evaluating the polynomials q0 and p0, as expressed below:
q0:=q(β), and (13)
p0:=p(β), (14)
wherein β refers to an erasure location that is set equal to αi, as follows: (β=αi). Process 620 then inquires at step 970 if the following condition is satisfied:
q0=0, or (p0≠0 and δ(vE)>δ(p)), (15)
where the function δ(.) denotes the polynomial degree.
In effect, using the above condition (15), with step 975 selected, method 620 creates a combination of vectors l1 and l2, to selectively eliminate the candidate erasure location β positions within the sector 42 from the valid solutions of the error evaluator polynomial, as follows:
c=q0/p0, (16)
11=[v−cu, (q−cp)/(x−β)], and (17)
12=[(x−β)u, p], (18)
where q0 is the value of the error evaluation polynomial q at β=0, and p0 is the value of the previous error evaluation polynomial p at β=0 (refer to expressions 13 and 14 above).
If at decision step 970 of
c=p0/q0, (19)
11=[(x−β)v, q], and (20)
12=[u−cv, (p−cq)/(x−β)]. (21)
With reference to
LOCS [u, v], VALS [p, q], l1:=[v, q], l2:=[u, p]. (22)
At step 915, process 630 initiates an erasure addition algorithm by setting the polynomials v0 and u0, as expressed below:
v0:=v(β), and (23)
u0:=u(β), (24)
wherein, as before, (β=αi). Process 630 then inquires at step 920 if the following condition is satisfied:
v0=0, or (u0≠0 and δ(vE)>δ(p)). (25)
In effect, using the above condition (25) and the result steps 925, 930, method 630 attempts to create a combination of vectors l1 and l2, to selectively eliminate the candidate erasure location β positions within the sector 42 from the valid solutions of the error locator polynomial, i.e., it cannot be a random error location, as follows:
c=v0/u0, (26)
11=[(v−cu)/(x−β), q−cp], and (27)
12=[u, (x−β)p]. (28)
where v0 is the value of the error evaluation polynomial v at β=αi, and u0 is the value of the previous error evaluation polynomial u at β=αi (refer to expressions 23 and 24 above).
If at decision step 920 of
c=u0/v0, (29)
11=[v, (x−β)q], and (30)
12=[(u−cv)/(x−β), p−cq]. (31)
Returning now to
If at step 635 a determination is made that the error locator polynomial 830 is valid, method 600 proceeds to step 640 wherein it adds this error locator polynomial 830 to a solution list, and then proceeds to step 645 as described earlier. Subsequently, method 600 inquires at decision step 650 if the end of the sector 42 (
This iterative process is applied (N−B) times, where N is the number of bytes in the sector 42 resulting in a decoding latency that is a linear function of the number of check bytes, as set forth in the expression below:
Decoding Latency=5R(N−B). (32)
It is to be understood that the specific embodiments of the invention that have been described are merely illustrative of certain application of the principle of the present invention. Numerous modifications may be made to the mixed mode burst/random error decoder and associated decoding method described herein, without departing from the spirit and scope of the present invention. Moreover, while the present invention is described for illustration purpose only in relation to a data storage system, it should be clear that the invention is applicable as well to various communications and data processing systems.
Number | Name | Date | Kind |
---|---|---|---|
4291406 | Bahl et al. | Sep 1981 | A |
4357702 | Chase et al. | Nov 1982 | A |
4413340 | Odaka et al. | Nov 1983 | A |
4916702 | Berlekamp | Apr 1990 | A |
4951284 | Abdel-Ghaffar et al. | Aug 1990 | A |
5206864 | McConnell | Apr 1993 | A |
5321703 | Weng | Jun 1994 | A |
5377208 | Schneider-Obermann et al. | Dec 1994 | A |
5420873 | Yamagishi et al. | May 1995 | A |
5517509 | Yoneda | May 1996 | A |
5661760 | Patapoutian et al. | Aug 1997 | A |
5694330 | Iwamura et al. | Dec 1997 | A |
5712861 | Inoue et al. | Jan 1998 | A |
5781567 | Sako et al. | Jul 1998 | A |
5864440 | Hashimoto et al. | Jan 1999 | A |
5946328 | Cox et al. | Aug 1999 | A |
6321357 | Ouyang | Nov 2001 | B1 |
6651213 | Hassner et al. | Nov 2003 | B1 |
6678859 | Senshu | Jan 2004 | B1 |
Number | Date | Country |
---|---|---|
63014381 | Jan 1988 | JP |
Number | Date | Country | |
---|---|---|---|
20040030737 A1 | Feb 2004 | US |