The present invention relates to a method and circuit to shorten latency of Chien's search algorithm. More particularly, the present invention relates to a method and circuit to shorten latency of Chien's search algorithm for BCH codewords.
Bose-Chaudhuri-Hocquenghem (BCH) code is one of the most widely used error correction code (ECC) techniques in the storage and communication devices. BCH code can detect and correct random errors occurred due to channel noises and defects within memory devices. To construct a BCH codeword, one should define a code length n, an error correction ability t and a primitive polynomial over extension field GF(2m). The encoding procedures of BCH codeword can be easily implemented by linear feedback shift register (LFSR) and some combination logics together. Comparing with encoding procedures of the BCH codewords, decoding procedures of the BCH codewords are much complicated as shown in
After receiving a codeword (S01), in order to decode it, one should compute a syndrome according to specified polynomials (S02). Then, depending on the syndrome, an error-location polynomial can be found (S03). Next, by calculating the roots of the error-location polynomial, error-location numbers can be obtained (S04). Finally, an erroneous codeword can be corrected by above steps (S05).
Conventionally, one may adopt Peterson-Gorenstein-Zierler algorithm (PGZ) or Berlekamp-Massey (BM) algorithm to find out the error-location polynomial. Since the computational complexity of PGZ algorithm is higher than BM algorithm and BM algorithm can achieve higher decoding speed, BM algorithm is much popular for hardware implementation.
According to the error-location polynomial λ(x)=λ0+λ1x+ . . . +λtxt, the roots of λ(x) can be found simply by substituting 1, α, α2, . . . , αn-1 (n=2m−1) into λ(x). Since αn=1, α−1=αn-1. Therefore, if α1 is an error-location number, αn-1 is another error-location number. Conventionally, this substitution procedure can be operated iteratively by Chien's search, and implemented in a circuit design as shown in
Please refer to
The adder 170 sums all products from the calculating units 101, 102 . . . and 10t with coefficient λ0. Thus, λ(α)=λ0+ζ1α+ . . . +λtαt can be obtained. If λ(α) equals zero, a is one root of λ(x). α indicates a location where an incorrect bit exists. The bit can be corrected. Otherwise, the location indicates doesn't have incorrect bit. Then, an iterative calculation begins. The calculating unit 101 is still taken for example. Product of λ1α stored in the register 131 is inputted to the multiplexer 111 through the multiplier 121. This time, new product, λ1α2, is generated. Similarly, λ2α4 . . . and λtα2t are generated from calculating unit 102 . . . and 10t, respectively. Thus, λ(α2)=λ0+λ1α2+ . . . +λtα2t can be obtained by the adder 170. If λ(α2) equals zero, α2 is one root of λ(x). α2 indicates another location where an incorrect bit exists. The iterative calculation stops after the n cycle is finished.
It is obvious from above that calculation load is significant since the whole processes takes n (2m−1) times of iteration. However, improvement of hardware can conquer this time consumptive problem and shorten latency of Chien's search. On the other hand, it is necessary that latency of Chien's search should be further shortened because data transition becomes massive and speed is fast than ever. Among the procedures of decoding of BCH codewords, Chien's search takes the most of time (around 40% of total time consumed). How to shorten latency of Chien's search is the key point to enhance efficiency of decoding BCH codewords.
According to an aspect of the present invention, a method for shortening latency of Chien's search includes the steps of: determining a shifted factor, p; receiving a BCH codeword; computing a syndrome from the BCH codeword; finding an error-location polynomial based on the syndrome; and processing Chien's search for the error-location polynomial to find out roots thereof. Numeral p is a number of successive zeroes from the first bit of the BCH codeword. The Chien's search starts iterative calculations by substituting a variable of the error-location polynomial with a nonzero element in Galois Field, GF(2m). The nonzero element ranges from αp+1 to αn. Numeral n is a codelength of the BCH codeword and equals 2m−1. Numeral m is a positive integer.
According to another aspect of the present invention, a circuit for shortening latency of Chien's search includes: a number of calculating units, each iteratively substituting a variable to a specified power of an error-location polynomial having maximum power of t with a nonzero element, k, in GF(2m), for multiplying a corresponding coefficient of the error-location polynomial by k to the specified power as a cyclic product and outputting the cyclic product, wherein k changes from αp+1 to αn sequentially for each iterative calculation, wherein n is a codelength of a BCH codeword and equals 2m−1, p is a number of successive zeroes from the first bit of the BCH codeword and m is a positive integer; and a finite adder for summing a constant coefficient of the error-location polynomial and all cyclic products outputted from the calculating units in the same iterative calculation as a judging factor.
Preferably, the circuit further has a judging module for judging if the judging factor is zero.
According to the present invention, the coefficients of the error-location polynomial may be provided by an error-location polynomial generator.
Preferable, the calculating unit include: a coefficient multiplexer for receiving the corresponding coefficient of the error-location polynomial and one cyclic product, and outputting the corresponding coefficient of the error-location polynomial during a first cycle and the cyclic product during cycles later than the first cycle as a first data; a resister, electrically linked to the coefficient multiplexer, for temporarily storing the first data for one cycle and outputting the first data; a shifting multiplier, electrically linked to the resister, for multiplying the first data by rk/α as a second data and outputting the second data; a shifting multiplexer, electrically linked to the resister and shifting multiplier, for receiving the first data and the second data and outputting the second data during a second cycle after the first cycle and the first data during cycles later than the second cycle; and an iterative multiplier, electrically linked to the shifting multiplexer, coefficient multiplexer and finite adder, for receiving the first data and second data, multiplying the received first data or second data by αr as the cyclic product and outputting the cyclic product to the finite adder and coefficient multiplexer. Numeral r is a positive integer, varies from 1 to t in one iterative calculation, and represents the order of the power of the variable in the error-location polynomial the calculating unit is for.
According to still another aspect of the present invention, a circuit for shortening latency of Chien's search includes: a number of calculating units, each iteratively substituting a variable to a specified power of an error-location polynomial having maximum power of t with a nonzero element, k, in GF(2m), for simultaneously multiplying a corresponding coefficient of the error-location polynomial by k, changing from αp+(j−1)s+1 to αp+js, to the specified power as cyclic products in the jth iterative calculation and outputting s cyclic products in the jth iterative calculation, wherein n is a codelength of a BCH codeword and equals 2m−1, p is a number of successive zeroes from the first bit of the BCH codeword, m and j are positive integers, s is a number of sets of parallel computing and the operation terminates when k runs to αn; and s finite adders, each summing a constant of the error-location polynomial and all cyclic products for one specified k outputted from the calculating units in the same iterative calculation as a judging factor.
Preferably, each calculating unit includes: a coefficient multiplexer for receiving the corresponding coefficient of the error-location polynomial and one cyclic product, and outputting the corresponding coefficient of the error-location polynomial during a first cycle and the cyclic product during cycles later than the first cycle as a first data; a resister, electrically linked to the coefficient multiplexer, for temporarily storing the first data for one cycle and outputting the first data; a shifting multiplier, electrically linked to the resister, for multiplying the first data by rk/α as a second data and outputting the second data; a shifting multiplexer, electrically linked to the resister and shifting multiplier, for receiving the first data and the second data and outputting the second data during a second cycle and the first data during cycles later than the second cycle; and s iterative multipliers, electrically linked to the shifting multiplexer, each receiving the first data and second data, multiplying the first data or second data by αqr as the cyclic product, and outputting the cyclic products to the finite adders, respectively, wherein the iterative multiplier outputs the cyclic product of αt(p+s) outputs the cyclic product to the coefficient multiplexer. r is a positive integer, varies from 1 to t in one iterative calculation and represents the order of the power of the variable in the error-location polynomial the calculating unit is for. q is a positive integer and ranges from 1 to s in one iterative calculation.
According to the present invention, an Error Correcting Code (ECC) decoder can include the aforementioned circuits for decoding BCH codewords.
Comparing with conventional Chien's search, the present invention can skip p times of iterative calculations. Therefore, it is obvious that latency of Chien's search can significantly shortened by applying the present invention.
The present invention will now be described more specifically with reference to the following embodiments.
Please refer to
All of the calculating units operate in the same way. The only difference is one calculating unit is used for iterative calculation regarding a specified coefficient of an error-location polynomial or variable to the specified power. Let's define the error-location polynomial, λ(x), as λ(x)=λ0+λ1x+ . . . +λtxt. It means λ(x) has maximum power of t. The first calculating unit 201 is used for iterative calculation regarding λ1 and x to the first power. Similarly, the second calculating unit 202 is for λ2 and x to the second power. The tth calculating unit 20t is for λt and x to the tth power. Take the first calculating unit 201 for example. The first calculating unit 201 iteratively substitutes x to the first power of λ(x) with a nonzero element, k, in GF(2m). Numeral k may be αp+1, αp+2 . . . αn. Thus, k changes from αp+1 to αn sequentially for each iterative calculation. For those skilled in the art, it is known to process Chien's search by substituting 1, α1, α2 . . . αn iteratively to get roots of λ(x). For the present invention, the spirit to shorten latency of Chien's search is to bypass p iterative calculations. It means that from α1 to αp can be omitted and not calculated (1 is not a root according to the present invention).
Here, n is a codelength of a BCH codeword. It is an integer and equals to 2m−1. For instance, if m is 6, n will be 26−1 and it is 63. Numeral m can be any positive integer for a specified Galois filed. Numeral p is a shifted factor and is the number of successive zeroes from the first bit of the BCH codeword. Please refer to
According to the present invention, when a shortened BCH codeword is received, it is often be noticed by the encoder how many zeroes (p) are removed to get the shortened BCH. Again, the shortened BCH codeword can be recognized as a standard BCH codeword with p zeroes in the beginning of the BCH codeword. Thus, the shortened BCH codeword can be reverted to the original BCH codeword with p zeroes added and keeps Chien's search by applying the method provided by the present invention. Of course, it is also workable to directly apply the present invention to a standard BCH codeword.
The first calculating unit 201 multiplies λ1 by k and λ1αp+1, λ1αp+2 . . . λ1αn will be available in each iterative calculation. They are defined as cyclic products. Similarly, the second calculating unit 202 gets λ2α2(p+1), λ2α2(p+2) . . . λ2α2n and the tth calculating unit 20t gets λtαt(p+1), λtαt(p+2) . . . λtαtn. After each iterative calculation is done, the calculating units all output the corresponding cyclic product simultaneously to the finite adder 270. The finite adder 270 sums a constant coefficient, λ0, of λ(x) and all cyclic products outputted from the calculating units as a judging factor. For example, in one cycle, the judging factor may be λ0+λ1αp+3+λ2(p+3)+ . . . +λtαt(p+3). The judging factor is judged by the judging module 290 to see if it is zero. If yes, it is indicated that the k is a root of λ(x). In order to facilitate operation of the circuit 20, the error-location polynomial generator 280 is used to provide a coefficient of λ(x) to a specified calculating unit.
A detailed illustration of the calculating unit is described below. Please see
The coefficient multiplexer 211 can receive λ1 of λ(x) from the error-location polynomial generator 280 and one cyclic product form the iterative multiplier 251. It outputs λ1 of λ(x) during a first cycle and the cyclic product during cycles later than the first cycle as a first data. That is when it is during the first cycle, the first data is λ1. During the second cycle, the first cycle becomes λ1αp+1. The resister 221 is electrically linked to the coefficient multiplexer 211. It can temporarily stores the first data for one cycle and outputs the first data. The first data can stay in the resister 221 for one cycle. The shifting multiplier 231 is electrically linked to the resister 221. It is used for multiplying the first data by k/α (αp) as a second data and outputting the second data. It should be noticed that the shifting multiplier 232 multiplies corresponding first data by 2k/α and the shifting multiplier 23t multiplies corresponding first data by tk/α. It can be concluded that an rth shifting multiplier multiplies corresponding first data by rk/α. Here, r is a positive integer, varies from 1 to t in one iterative calculation, and represents the order of the power of the variable in λ(x) that the corresponding calculating unit is for.
The shifting multiplexer 241 is electrically linked to the resister 221 and shifting multiplier 231. It can receive the first data and the second data in all cycles. During the second cycle, the shifting multiplexer 241 outputs the second data. After the second cycle, during all cycles till the end of Chien's search on one BCH codeword, the shifting multiplexer 241 outputs the first data from the resister 221. The iterative multiplier 251 is electrically linked to the shifting multiplexer 241, coefficient multiplexer 211 and finite adder 270. It receives the first data and second data, multiplies the received first data or second data by αr as the cyclic product and outputs the cyclic product to the finite adder 270 and coefficient multiplexer 211. Numeral r is the same defined as mentioned above.
The method of the present invention can be processes as below. Please refer to
According to the spirit of the present invention, the circuit 20 in the first embodiment can be further parallel computed. Please see
The circuit 30 includes t calculating units. Like the first embodiment, in order to simply illustration, only a first calculating unit 301, a second calculating unit 302 and a tth calculating unit 30t is shown. The rest calculating units are omitted but structures and functions thereof can be understood by the description below. The circuit 30 also includes s finite adders an error-location polynomial generator 380 and a judging module 390. Also, a first finite adder 271, a second finite adder 272 and a sth finite adder 27s are the targets to discuss.
All of the calculating units operate in the same way. The only difference is one calculating unit is used for iterative calculation regarding a specified coefficient of an error-location polynomial or variable to the specified power. Let's use the error-location polynomial, λ(x) described in the first embodiment for the second embodiment as well. The first calculating unit 301 is used for iterative calculation regarding λ1 and x to the first power. Similarly, the second calculating unit 302 is for λ2 and x to the second power. The tth calculating unit 20t is for λt and x to the tth power. Take the first calculating unit 301 for example. The first calculating unit 301 iteratively substitutes x to the first power of λ(x) with a nonzero element, k, in GF(2m). It is different from the first calculating units in the first embodiment that each calculating unit of the circuit 30 supports parallel computing. For example, the first calculating unit 301 can simultaneously multiply λ1 by k where k is a variable and changes from αp+1 to αp+s, to the first power as cyclic products in one iterative calculation. It then outputs the s cyclic products. Numeral s is a number of sets of parallel computing. It means one calculating unit can do s calculations at the same time. Efficiency and speed of the calculating unit in the second embodiment is s times higher than that in the first embodiment. In the next iterative calculation, k changes from αp+s+1 to αp+2s. The operation terminates when k runs to αn. In summary, k changes from αp+(j−1)s+1 to αp+js in the jth iterative calculation. For example, in the 8th iterative calculation, k changes from αp+7s+1 to αp+8s.
The s finite adders are named as a first finite adder 271, a second finite adder 272 . . . and a sth finite adder 27s. Only the three are plotted for illustration. Each of the s finite adders sums λ0 of λ(x) and all cyclic products for one specified k outputted from the calculating units in the same iterative calculation as a judging factor. For a better understanding, take k for αp+s as an example. A corresponding cyclic product from the first calculating unit 301 for αp+s is λ1αp+s. A corresponding cyclic product from the second calculating unit 302 for αp+s is λ2α2(p+s). A corresponding cyclic product from tth calculating unit 30t for αp+s is λtαt(p+s). The sth finite adder 27s sums λ0, λ1αp+s, λ2α2(p+s) . . . and λtαt(p+s) as the judging factor. If k is αp+s+1, it is calculated in the second iterative calculation and the first finite adder 271 may take care of this operation.
Like the circuit 20 in the first embodiment, the judging factor is judged by the judging module 390 to see if it is zero. If yes, it is indicated that k is a root of λ(x). In order to facilitate operation of the circuit 30, the error-location polynomial generator 380 is used to provide a coefficient of λ(x) to a specified calculating unit. Furthermore, the judging module 390 can have parallel computing ability to deal with s judging factors at the same time. The judging module 390 can also has s parallel computing units to keep up with the speed of the calculating units.
A detailed illustration of the calculating unit is described below. The first calculating unit 301 includes a coefficient multiplexer 311, a resister 321, a shifting multiplexer 331, a shifting multiplexer 341, and s iterative multipliers. For a simplified description, only a first iterative multiplier 3511, a second iterative multiplier 3512 and a sth iterative multiplier 351s are shown. Similarly, the second calculating unit 302 includes a coefficient multiplexer 312, a resister 322, a shifting multiplexer 332, a shifting multiplexer 342, a first iterative multiplier 3521, a second iterative multiplier 3522 and a sth iterative multiplier 352s. The tth calculating unit 30t includes a coefficient multiplexer 31t, a resister 32t, a shifting multiplexer 33t, a shifting multiplexer 34t, a first iterative multiplier 361t, a second iterative multiplier 362t and a st iterative multiplier 36st. The components have the same name have similar functions. Take the first calculating unit 301 as an example for illustration. Differences of components with the same names will be pointed out in the description.
The coefficient multiplexer 311 receives λ1 of λ(x) from the error-location polynomial generator 380 and one cyclic product form the sth iterative multiplier 351s. It outputs λ1 of (x) during a first cycle and the cyclic product during cycles later than the first cycle as a first data. That is when it is during the first cycle, the first data is λ1. During the second cycle, the first cycle becomes λ1αp+s. The resister 321 is electrically linked to the coefficient multiplexer 311. It can temporarily stores the first data for one cycle and outputs the first data. The first data can stay in the resister 321 for one cycle. The shifting multiplier 331 is electrically linked to the resister 321. It is for multiplying the first data by k/α (αp) as a second data and outputting the second data. It should be noticed that the shifting multiplier 322 multiplies corresponding first data by 2k/α and the shifting multiplier 32t multiplies corresponding first data by tk/α. It can be concluded that a rth shifting multiplier multiplies corresponding first data by rk/α. Here, r is a positive integer, varies from 1 to t in one iterative calculation and represents the order of the power of the variable in λ(x) that the calculating unit is for, and q is a positive integer and ranges from 1 to s in one iterative calculation.
The shifting multiplexer 341 is electrically linked to the resister 321 and shifting multiplier 331. It receives the first data and the second data and outputs the second data during a second cycle after the first cycle and the first data during cycles later than the second cycle. s iterative multipliers are electrically linked to the shifting multiplexer 331. Each of the iterative multipliers receives the first data and second data, multiplying the first data or second data by αq as cyclic products for qth iterative calculation, and outputting the cyclic products to the finite adders, respectively. Generally, the qth iterative multiplier in a rth calculating unit multiplies the first data or second data by αqr as the corresponding cyclic product. It should be noticed that the sth iterative multiplier outputs a cyclic product multiplied corresponding coefficient by αr(p+s) also outputs the cyclic product to the coefficient multiplexer 311.
Please refer to
While the invention has been described in terms of what is presently considered to be the most practical and preferred embodiments, it is to be understood that the invention needs not be limited to the disclosed embodiment. On the contrary, it is intended to cover various modifications and similar arrangements included within the spirit and scope of the appended claims, which are to be accorded with the broadest interpretation so as to encompass all such modifications and similar structures.
Number | Name | Date | Kind |
---|---|---|---|
7774688 | Teng et al. | Aug 2010 | B1 |
20050172208 | Yoon | Aug 2005 | A1 |
20090199075 | Demjanenko et al. | Aug 2009 | A1 |
Number | Date | Country | |
---|---|---|---|
20150256200 A1 | Sep 2015 | US |