Binary Bose-Chaudhuri-Hocquenghem (BCH) codes are commonly used error correcting codes in modern communication systems. Long BCH codes with block lengths of 32400 bits or longer are used as the outer forward error correcting code in the second generation Digital Video Broadcasting (DVB-S2) Standard from the European Telecommunications Standard Institute (ESTI). Recently, long BCH codes have been investigated for the on-chip error correction of multilevel NAND flash memories. Binary BCH codes are also used in disk drive systems. In many of these applications of BCH codes, the BCH coder-decoder is implemented in hardware, such as an application specific integrated circuit (ASIC). It would be desirable if BCH codes could be implemented in a manner that reduces power consumption and/or size. Systems that include such a BCH component could be made smaller, could operate longer off of a battery, etc.
Various embodiments of the invention are disclosed in the following detailed description and the accompanying drawings.
The invention can be implemented in numerous ways, including as a process, an apparatus, a system, a composition of matter, a computer readable medium such as a computer readable storage medium or a computer network wherein program instructions are sent over optical or communication links. In this specification, these implementations, or any other form that the invention may take, may be referred to as techniques. A component such as a processor or a memory described as being configured to perform a task includes both a general component that is temporarily configured to perform the task at a given time or a specific component that is manufactured to perform the task. In general, the order of the steps of disclosed processes may be altered within the scope of the invention. As used herein, the term ‘processor’ refers to one or more devices, circuits, and/or processing cores configured to process data, such as computer program instructions.
A detailed description of one or more embodiments of the invention is provided below along with accompanying figures that illustrate the principles of the invention. The invention is described in connection with such embodiments, but the invention is not limited to any embodiment. The scope of the invention is limited only by the claims and the invention encompasses numerous alternatives, modifications and equivalents. Numerous specific details are set forth in the following description in order to provide a thorough understanding of the invention. These details are provided for the purpose of example and the invention may be practiced according to the claims without some or all of these specific details. For the purpose of clarity, technical material that is known in the technical fields related to the invention has not been described in detail so that the invention is not unnecessarily obscured.
Since binary BCH codes are subfield subcodes of Reed-Solomon codes, high-speed decoder architectures for Reed-Solomon codes can also be used to decode binary BCH codes. Binary BCH codes exhibit three distinctive features: (i) generator polynomial coefficients are binary so that multiplication is reduced to a logical AND operation; (ii) error magnitude is always one and thus does not need to be computed through the Formey formula; (iii) the corresponding Berlekamp process skips all odd-numbered iterations of the Berlekamp-Massey process.
What are presented herein are three high-speed decoder architectures for binary BCH codes. In a first embodiment, data is split into odd-indexed data and even-indexed data. Even-indexed syndromes, but not odd-indexed syndromes, are loaded into a variant error-locator array, sometimes referred to as a variant error-locator polynomial. The architecture of the first embodiment has 2t+1 systolic units, where t is an error correction capability of the code. As used herein, a systolic unit (also referred to as a processor element) is a block, module, or piece of code configured to perform a certain process or function that is repeated in a system. In embodiments with systolic architectures, complexity and latency can be traded off. In ASIC devices, little or no power is consumed if the output of an ASIC circuit does not toggle or change. In some embodiments, not loading syndromes into the odd-term array means that zeros or some other constant is loaded and the outputs of the ASIC circuits associated with the odd-term array do not switch. In some cases, there is a 25% power saving.
To implement the first embodiment, in some cases only a control module or piece of code associated with initialization or loading initial values in an existing system needs to be changed. For example, once the proper values are loaded into some processor or data array, the subsequent processing may be the same.
The second embodiment changes the odd-term array of the first embodiment into a normal (e.g., unshifted) error-locator update architecture. The architecture of this second embodiment has
units.
Whereas the first embodiment offers power savings, the second embodiment offers both power savings and a smaller size. However, where there is a single, common PE for the first embodiment, the second embodiment has two common PEs: one for an odd-term array and one for an even-term array.
The third embodiment removes the odd-term array from the first embodiment and squeezes the odd-term error-locator update into the even-term array. It employs t+f systolic units with a defect probability of 2−m(f+1), where m denotes the finite field dimension and f is a design parameter. In other words, the smaller size of the third embodiment comes at the expense of a non-zero probability that the decoder will not generate the proper output. This non-zero probability can be controlled by the design parameter f. In applications where an extremely small size is desired, this third embodiment may be very attractive.
The underlying generator polynomial of a BCH code contains consecutive roots α, α2, . . . , α2t. For an underlying binary BCH code, the designed minimum distance d is always odd, which is actually a lower bound of the true minimum distance.
Let C(x) denote the transmitted codeword polynomial and R(x) the received word polynomial. The decoding objective is to determine the error polynomial E(x) such that C(x)=R(x)−E(x).
In the following, the Berlekamp process is introduced. It begins with the task of error correction by computing syndrome values
Si=Si=R(αi+1)=C(αi+1)+E(αi+1)=E(αi+1),i=0, 1, 2, . . . , 2t−1.
If all 2t syndrome values are zero, then R(x) is a codeword polynomial and thus is presumed that C(x)=R(x), i.e., no errors have occurred. Otherwise, let e denote the (unknown) number of errors, Xiε{α−i}i=0n−1, i=1, 2, . . . , e, denote the error locations.
The syndrome polynomial is defined to be:
the error locator polynomial:
and the error evaluator polynomial:
The three polynomials satisfy the following key equation:
Ω(x)=Λ(x)S(x)(mod x2t). (4)
The Berlekamp process is a simplified version of the Berlekamp-Massey process for decoding binary BCH codes by incorporating the special syndrome property
S2i+1=Si2,i=0, 1, 2, . . .
which yields zero discrepancies at odd-numbered iterations of the Berlekamp-Massey process. Below the inversionless Berlekamp process is re-formulated slightly, so as to facilitate the characterizations thereafter.
The following lemma characterizes the lengths of linear-feedback shift registers, LΛ(r) and LB(r−1).
Lemma 1: The lengths of linear-feedback shift registers corresponding to Λ(r)(x) and B(r−1)(x) satisfy
LΛ(r)+LB(r−1)=r−1 (5)
After constructing the error locator polynomial Λ(x), the Chien search is applied to determine all valid roots.
Parallel Inversionless Berlekamp Processes and their Architectures
A parallel Berlekamp-Massey process is one in which the discrepancy computation and error-locator updates are performed simultaneously. Note that in the conventional (i.e., non-parallel) Berlekamp-Massey process, the discrepancy value Δ(r) is computed based on the error locator polynomial Δ(r)(x), which is the primary cause of the high latency. In some cases, the discrepancies are generated iteratively so that they are operated on in parallel to update of the error locator polynomial.
First a left-shift operator “” of a polynomial is defined such that
An alternative interpretation gives more insight. Let A=[A0, A1, A2, . . . , Al] be the vector representation of the polynomial A(x), then A=[Ar, Ar+1, . . . , Al]. The discrepancy values of the inversionless Berlekamp process are iteratively generated by the following pseudo code:
S](x), γ(0) = 1,
{circumflex over (Ω)}(r)](x) − {circumflex over (Ω)}0(r) · {circumflex over (Θ)}(r−1)(x)
{circumflex over (Ω)}(r)](x)
Note that {circumflex over (Ω)}(r)(x) and {circumflex over (Θ)}(r)(x) is a left-shifted polynomial of Λ(r)(x)S(x) and B(r)(x)S(x) respectively, more specifically,
{circumflex over (Ω)}(r)(x)=[(Λ(r)S)](x),
{circumflex over (Θ)}(r)(x)=[(B(r)S)](x),
where Λ(r)(x) and B(r)(x) denote the error locator polynomial couple generated during the r-th iteration of the Berlekamp process. Herein {circumflex over (Ω)}(x) is called a variant error-locator polynomial, due to the following
Λ(r)(x)S(x)=Ω(r)(x)+xr{circumflex over (Ω)}(r)(x).
Observe that the odd terms of {circumflex over (Ω)}(x) and {circumflex over (Θ)}(x) are never exploited in the above iteration. Therefore, the above process can be refined by removing the odd terms.
{circumflex over (Ω)}(r)](x) − {circumflex over (Ω)}0(r) · {circumflex over (Θ)}(r−1)(x)
{circumflex over (Ω)}(r)](x)
By dynamically enforcing a term of {circumflex over (Θ)}(x) to zero, the unit of discrepancy computation and the unit of error-locator update can be seamlessly merged. By incorporating the method into the above Improved Iterative Discrepancy Computation Process and combining left-shifted error-locator update, a parallel Berlekamp process is obtained as follows.
{circumflex over (Ω)}(r)](x) − {circumflex over (Ω)}0(r) · {circumflex over (Θ)}(r−1)(x)
2{circumflex over (Ω)}(r)](x)
The following figure shows a block diagram of the above PIB process.
Tcrit=Tmult+Tadd (7)
which is at least twice as fast as the conventional serial implementation, whose critical path is
2Tmult+(1+┌log2t┐)Tadd
Upper array 104 computes the even terms of the error locator polynomial while lower array 106 computes the odd terms of the error locator polynomial. System 100 avoids loading odd terms of {circumflex over (Ω)}(0)(x) and {circumflex over (Θ)}(−1)(x). Note, for example, that lower array 106 has constants loaded (e.g., zeros) whereas upper layer 104 has syndromes loaded. Consequently, on average more than half of the units in lower array 106 are idle and system 200 consumes at least 25% less power than some other techniques.
Lemma 2: (i) If modify the initialization Λ(0)(x)=0 in the Berlekamp process, then the resulting error locator polynomial is the polynomial composed of the odd terms of the original error locator polynomial. (ii) If modify the initialization B(−1)(x)=0 in the Berlekamp process, then the resulting error locator polynomial is the polynomial composed of the even terms of the original error locator polynomial.
Proof: It is straightforward to show that the original error locator polynomial is the sum of the error locator polynomials obtained from (i) and (ii), respectively. Furthermore, it can be easily shown by induction that at each iteration the error locator polynomial is composed of odd terms in (i) while is composed of even terms in (ii).
Next a second embodiment is presented which replaces lower array 106 of
The detailed process is described below and the corresponding system is shown in
).
{circumflex over (Ω)}(r)](x) − {circumflex over (Ω)}0(r) · {circumflex over (Θ)}(r−1)(x)
2{circumflex over (Ω)}(r)](x), Bodd(r+1)(x) ← Λodd(r)(x)
The following figure shows a block diagram of the above rPIB process.
The third embodiment has a non-zero probability of computing an improper result but is much more efficient (e.g., even smaller) than the first and second embodiments. The Improved Iterative Discrepancy Computation is modified such that x2i is replaced with xi and is replaced with
.
1{circumflex over (Ω)}(r)](x) − {circumflex over (Ω)}0(r) · {circumflex over (Θ)}(r−1)(x)
{circumflex over (Ω)}(r)](x),
Next, a new process is presented and characterized.
1{circumflex over (Ω)}(r)](x) − {circumflex over (Ω)}0(r) · {circumflex over (Θ)}(r−1)(x),
Λ(r)](x) − {circumflex over (Ω)}0(r) · xB(r−1) (x)
1{circumflex over (Ω)}(r)](x), xB(r+1) (x) ← Λ(r)(x)
Note that in vPIB, xB(x), instead of B(x), is traced, so that the error-locator update
{circumflex over (Λ)}(r+2)(x)=γ(r)·[{circumflex over (Λ)}(r)](x)−{circumflex over (Ω)}0(r)·{circumflex over (x)}{circumflex over (B)}(r−1)(x)
is retained to be consistent with the discrepancy computation,
{circumflex over (Ω)}(r+2)(x)=γ(r)·[{circumflex over (Ω)}(r)](x)−{circumflex over (Ω)}0(r)·{circumflex over (Θ)}(r−1)(x).
ν(0)=[0, 0, . . . , 0t−1, 1t, . . . , 1].
At each iteration r=2i, the rightmost zero is flipped, such that
ν(r)=[0, 0, . . . , 0t−i−1, 1t−i, . . . , 1]
Next the defect probability of the proposed decoding process is considered. Note that the architecture overflows whenever the degree of xB(r−1)(x) is greater than
which is caused by the right-shift operations, {circumflex over (x)}{circumflex over (B)}(r+1)(x)←x·{circumflex over (x)}{circumflex over (B)}(r−1)(x). A defect occurs if xB(x) overflows and is later used to update Λ(x). Equivalently, a defect occurs if the length of
and the number of errors is greater than r/2+f. Lemma 1 indicates that
LΛ(r)+LxB(r−1)=r
which immediately yields
It indicates that f+1 consecutive zero discrepancies occur at the iterations r−2(f−1), r−2(f−2), . . . , r. When the number of errors is greater than
it is reasonable to assume each discrepancy, Δ(r−2(f−1)), Δ(r−2(f−2)), . . . , Δ(r), is randomly chosen within GF(2m) and thus the probability of being zero is 2−m. Subsequently, the probability of occurring f+1 consecutive zero discrepancies, Δ(i)=0, i=r−2(f−1), r−2(f−2), . . . , r, is 2−(f+1)m. Therefore, the decoder defect probability is upper bounded by 2−(f+1)m. The above discussion is summarized into the following lemma.
Lemma 3: When t+f units are used in the proposed vPIB architecture, the resulting defect probability is upper bounded by 2−(f+1)m, where m denotes the finite field dimension.
Although the foregoing embodiments have been described in some detail for purposes of clarity of understanding, the invention is not limited to the details provided. There are many alternative ways of implementing the invention. The disclosed embodiments are illustrative and not restrictive.
This application is a continuation of co-pending U.S. patent application Ser. No. 12/070,892 entitled BINARY BCH DECODERS filed Feb. 21, 2008 which is incorporated herein by reference for all purposes.
Number | Name | Date | Kind |
---|---|---|---|
20040177312 | Xin | Sep 2004 | A1 |
Number | Date | Country | |
---|---|---|---|
20120131423 A1 | May 2012 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 12070892 | Feb 2008 | US |
Child | 13359912 | US |