The present invention relates to an apparatus for efficiently evaluating a polynomial over a finite field, and to a corresponding method. The present invention further relates to an apparatus for identifying errors in a data string based on a cyclic code, and to a corresponding method.
Evaluation of polynomials over finite fields is an important problem in a large number of applications. Examples include error detection schemes in the context of cyclic codes. Such schemes are widely employed for the encoding and decoding of (normally binary) data to be transmitted across some imperfect transmission channel such as a digital rf transmission channel, write/read operations on a medium such as a CD or DVD etc. Due to noise or impairments of the transmission channel, the transmitted data may become corrupted. To identify and correct such errors, so-called forward error correction schemes have been developed. Such schemes employ cyclic codes over a finite field. Well known classes of error-correcting cyclic codes are the so-called Reed-Solomon codes or, more generally, the so-called BCH codes (see references [1], [2]).
A finite field (also known as a Galois field) is a field composed of a finite number of elements. The number of elements in the field is called the order or cardinality of the field.
This number is always of the form pm, where p is a prime number and m is a positive integer. A Galois field of order q=pm will in the following be designated either as GF(pm) or as Fq, these symbols being fully synonymous. A polynomial over an arbitrary field (including a finite field) will be designated as P(x), as p(x) or a similar symbol. An element in which the polynomial is to be evaluated will in the following be designated by lower-case Greek letters such as α, β or γ. The definitions and properties of finite fields are described in many standard textbooks of mathematics, e.g., [12] or [14], and reference is made to such standard textbooks for details.
The well known Horner's rule is a universal algorithm for evaluating a polynomial which works in any field, including finite fields. This algorithm computes the value P(α) of a polynomial
P(x)=anxn+an−1xn−1 . . . +a0
in an iterative manner as suggested by the following formula:
( . . . ((anα+an−1)α+an−2)α+ . . . )α+a1)α+a0.
In many applications over finite fields, however, this algorithm is not very efficient and requires significant computational efforts in terms of CPU time and memory usage. Furthermore, Horner's rule is inherently serial in nature and cannot readily be parallelized.
WO 99/37029 proposes a device and method of evaluating a polynomial more efficiently. The polynomial is split into sub-polynomials, which are then evaluated in the usual manner using Horner's rule. While this approach allows for better parallelization, there is still much room for improvement in terms of computational complexity, especially when the order of the polynomial becomes large.
A standard method of decoding a cyclic code up to the BCH bound is the Gorenstein-Peterson-Zierler decoding procedure. This procedure comprises four steps:
Evaluation of polynomials is extensively involved in particular in the first and fourth steps. The second step is usually efficiently done through Berlekamp-Massey algorithm. For the third step, usually an algorithm called the Chien search is employed. This algorithm may however be unacceptably slow if the error-locator polynomial has a large degree. It is therefore desirable to provide an apparatus and method that allow to determine the error positions in a more efficient manner than by the Chien search.
In a first aspect, it is an object of the present invention to provide an apparatus for efficiently evaluating a polynomial over a finite field. This object is achieved by an apparatus having the features laid down in claim 1.
It is a further object of the present invention to provide an efficient computer-implemented method of evaluating a polynomial over a finite field. This object is achieved by a method as laid down in claim 11.
In a second aspect, it is an object of the present invention to provide an apparatus for efficiently identifying errors in a data string based on a cyclic code, in particular, for locating the error positions in the data string in an efficient manner. This object is achieved by an apparatus having the features laid down in claim 9.
It is a further object of the present invention to provide an efficient computer-implemented method for efficiently identifying errors in a data string, in particular, for efficiently locating the error positions. This object is achieved by a method as laid down in claim 17.
Further embodiments of the invention are laid down in the dependent claims.
Preferred embodiments of the invention are described in the following with reference to the drawings, which are for the purpose of illustrating the present preferred embodiments of the invention and not for the purpose of limiting the same. In the drawings,
A first embodiment of the present invention is described in the following with reference to
The apparatus of
={1,α, . . . ,αm-1}
of GF(pm), where α is a root of a primitive polynomial g(x) of degree m over GF(p). Then γ may be written as
γ=a0+a1α+ . . . +am-1αm-1
and it is uniquely identified by an m-dimensional vector with entries in GF(p),
α=[a0,a1, . . . ,am-1].
A primitive element β in the subfield GF(pr) is taken to be the power of α with exponent
and may also be represented in the base :
β=b0+b1α+ . . . +bm-1αm-1,
thus, it is uniquely identified by an m-dimensional vector with entries in GF(p)
β=[b0,b1, . . . ,bm-1].
It is observed that in a large number of applications the coefficients of p(x) are either in the finite field GF(p) or in the extension field GF(pm): the complexity of the Algorithm in the two fields is very different, but strictly connected.
In general terms, the apparatus (and consequently also the Algorithm) of
Inputs of the algorithm:
These inputs are entered into the apparatus or read (received) by the apparatus by a coefficient-receiving module 101 and by an input value-receiving module 102.
An optional iteration determining module 103 reads or calculates the desired or optimum number of iterations L. Alternatively, the number of iterations may be predetermined and hard-coded into the apparatus or software (e.g., in applications where the degree n of the polynomial is fixed).
An optional initialization submodule 104 optionally decomposes the input polynomial into a sum of polynomials with coefficients in a subfield GF(p) of order p, as detailed further below (see Remark 4).
In a decomposition and evaluation module 110, a pL×┌n/pL┐ matrix 112 is defined that is used to store the coefficients of the polynomials into which p(x) is partitioned. The apparatus then iteratively carries out a decomposition of the polynomial into a sum of smaller entities (powers of smaller polynomials multiplied by powers of the variable x) by looping over a splitter module 111 for a number of L times, using the matrix 112 to store the coefficients after each iteration.
In an evaluation module 113, the apparatus evaluates the smallest polynomials obtained by looping over the splitter module 111 and computes the output value p(γ) of the polynomial starting from the data produced by the splitter module 111.
The output of the apparatus and algorithm is the value p(γ).
In the following, the special prime p=2 will be treated separately since the corresponding fields have peculiar properties that are not shared by the other finite fields, which allows for some further simplifications of the algorithm for p=2. Since in this case the Algorithm can be explained and understood more easily, it will be described first, as an introduction to the more general ideas discussed subsequently.
In practice, the coefficients of the polynomial will often be binary numbers, i.e., the coefficients will be elements of GF(2), and the polynomial will be evaluated in an element of an extension field GF(2m) with m>1. In this case, the above-described algorithm may be implemented particularly efficiently. This will be explained in more detail in the following, referring to the evaluation of a polynomial p(x), with coefficients in GF(2), in a point γεGF(2m) with m>1.
Any polynomial p(x) with binary coefficients can be written as a sum of two polynomials by collecting odd and even powers of x:
p(x)=xp1(x2)+p2(x2)=xp1(x)2+p2(x)2,
where p1(x) has degree not greater than └(n−1)/2┘ and p2(x) has a degree not greater than └n/2┘, where half brackets └•┘ denote the familiar floor function which rounds the argument to the largest previous integer, and half brackets ┌•┐ denote the familiar ceiling function which rounds the argument to the smallest following integer.
Therefore, knowing p1(γ) and p2(γ), the value p(γ) can be obtained as
p(γ)=p1(γ)2+γ·p2(γ)2,
performing two squares, one multiplication, and one sum in GF(2m).
Clearly, the procedure can be iterated: at each i-th step, the number of polynomials pij(x) doubles, i.e., j varies between 1 and 2i, and their degree is divided roughly by 2. The number of squares at each step is equal to the number of polynomials, and the number of multiplications by γ is half the number of polynomials, as is the number of additions.
After L steps it is necessary to evaluate 2L polynomials of degree nearly n/2L, then p(γ) is reconstructed performing back the operations previously described. The total cost of the procedure, in terms of multiplications and additions, is composed of the following partial costs:
Evaluation of 2L polynomials pLj(X), of degree ┌n/2L┐ at the same point γ.
The fastest way to evaluate 2L polynomials at the same point is to evaluate the powers γh for
and to obtain each pLj(γ) by adding those powers corresponding to non-zero coefficients; the number of additions per each polynomial is nearly n/2L, then the total number of additions is not more than n.
The actual number of additions is much less if sums of equal terms can be reused, and it is upper bounded by O(n/ln(n)). This bound is a consequence of the fact that in order to evaluate 2L polynomials of degree h=┌n/2L┐ at the same point β, we have to compute 2L sums of the form
γi
having at disposal the h powers γi. One can then think of a binary matrix of dimensions
2L×┌n/2L┐
to be multiplied by a vector of powers of γ, and assuming
(as will be shown below), one may consider the matrix to be square and apply Theorem 2 of Ref. [11].
To establish how many iterations L should be used, one may minimize the total number of multiplications (since multiplications are much more costly than additions, additions may be neglected). The best choice for L is obtained when the total number of multiplications ┌n/2L┐ required to compute the powers of γ entering the evaluations of pLj(γ) is roughly equal to the number 2L+1−2+2L−1 (which is approximately 3·2L) of multiplications required to reconstruct p(γ). This yields an approximate equation for L:
which gives the approximate value
Then, the total number N of multiplications in GF(2m) required for evaluating p(γ) is
N=2(3·2L)≈√{square root over (12n)}.
Numerical comparisons, reported in the following Table, indicate that the advantage of the proposed method for evaluating the polynomials with respect to Horner's rule can be significant already for small n:
When the polynomial p(x) has coefficients in GF(2r), let β be an element (primitive) of GF(2r) defining a basis for this field, then p(x) can be written as
p(x)=p0(x)+βp1(x)+β2p2(x)+ . . . +βr-1pr-1(x),
where pi(x), i=0, . . . , r-1, are polynomials over GF(2). Therefore, the problem of evaluating p(γ) is reduced to the problem of evaluating r polynomials pi(x) with binary coefficients in the point γεGF(2m), followed by the computation of r-1 products and r-1 sums in GF(2m). The total complexity is approximately r·√{square root over (12n)}.
There are also other options for computing p(γ) which may give a smaller number of multiplications, in any case the proposed strategy gives an upper bound (possibly tight) to the sufficient number of multiplications for computing p(γ).
In the following, a more general description of the Algorithm will be provided, which is not restricted to polynomials with binary coefficients.
Consider a polynomial P(x) of degree n over a finite field GF(pr), and let γ denote an element of GF(pm), r being a divisor of m. One may write P(x) as
P
0(xp)+xP1(xp) . . . +xp−1Pp−1(xp),
where P0(xp) collects the powers of x with exponent a multiple of p and xiPi(xp) collects the powers of the form xap+i.
If σ is the Frobenius automorphism of GF(pm) mapping γ to γp, one can write the expression above as
P
0
−1(x)p+xP1−1(x)p . . . +xp−1Pp−1−1(x)p,
where Pi−1(x), and in general Pi−k(x) stand for the polynomials obtained from the corresponding Pi(x) by substituting their coefficients with their transforms through the automorphism σ−k for every k. Notice that the polynomials Pi−1(x) have degree at most ┌(n−i)/p┐. One can take the exponent out of the brackets as the field has characteristic p.
P(γ) for a particular value γ can be then obtained from {Pi−1(x)} by making p p-th powers, p−1 multiplications and p−1 sums.
If the procedure is iterated for L steps, then the total cost of evaluating P(γ) comprehends the following:
p−1+(p2−P)+ . . . +pL−pL−1=pL−1
p−1+(p2−P)+ . . . +pL−pL−1=pL−1
So altogether one would like to minimize the following number of multiplications:
where 2└log2 p┘ refers to a p-th power made by successive squaring (the factor 2 in front of └log2 p┘ is substituted by 1 when p is 2), the automorphism σL counts like a power with exponent pL, with L≦r-1, and ┌n/PL┐ are the powers of γ we need to compute, while pr−1 are all their possible nonzero coefficients. Once all the powers of γ have been multiplied by the possible coefficients, one actually needs also to compute at most n additions to get the value of the polynomials.
If the coefficients are known to belong to GF(p), then the total cost is at most
since σ does not change the coefficients in this case. Then the best value for L is nearly
Given the previous Remark, one may look back at the general picture where the polynomial p(x) has coefficients in GF(pr), with r being a divisor of m. If β is an element of GF(pr) defining a power basis, then p(x) can be written as
p(x)=p0(x)+βp1(x)+β2p2(x)+ . . . +βr-1pr-1(x),
where pi(x), i=0, . . . , r-1, are polynomials over GF(p). Thus p(γ) can be obtained as a linear combination of the r numbers pi(γ). Therefore, the problem of evaluating p(γ) is reduced to the problem of evaluating r polynomials pi(x) with p-ary coefficients followed by the computation of r-1 products and r-1 sums in GF(pm).
The total complexity is approximately
r√{square root over (8n(p-1)└log2 p┘)}.
In the binary case, that is if p=2, the complexity is r·√{square root over (12n)}.
This initial decomposition may be optionally carried out in the initialization submodule 104 of
The invention can also be put in practice with different arrangements in the order of the steps. A variant is for example the following: if we suppose the coefficients to be in GF(p), we can obtain P(γ) as the linear combination
P
0(σ(γ))+αP1(σ(γ))+ . . . +αp−1Pp(σ(γ)),
the notation being slightly amended as compared to above, however with the same meaning as before. A possible strategy is now to evaluate recursively the powers γj for j from 2 up to p, and σ(γ)i from j from 2 up to the integer part of n/p, compute the p numbers P1,i(σ(γ)) using n sums and at most (p−2)n/p products (the powers of σ(γ) times their possible coefficients), and obtain P(γ) with p−1 products and p−1 additions. The total number Mp(n) of multiplications is 2p−3+(p−1)n/p at most. The mechanism can be iterated, smaller polynomials are obtained, and after L steps the total cost includes p−1 products to evaluate the first p powers of α; L−1 products to evaluate the first L powers of σ(γ); (p−2)(L−1) products to evaluate (σi(γ)j, i=1, . . . , L−1, j=2, . . . , p−1; at most n/pL products to evaluate powers of σL(γ); at most (p−2) n/pL products to evaluate the polynomials in the final step in σL(γ); p−1 multiplications by powers of σ(γ).
This argument can be generalized when the coefficients are in a bigger subfield.
An example of a practical implementation of the algorithm for binary coefficients and of a corresponding apparatus is illustrated in
In the following, loops within the algorithm are conventionally written in the form
which, borrowed from the semantic of MAPLE, is self-explicative. The following example concerns the case of p=2. The description for finite fields of odd characteristic can be obtained from this making the obvious adaptations.
Initially, the optimal number L of iterations (or steps) is computed or may be pre-computed, using the expression √{square root over (12n)}, and a matrix M (reference sign 112) of size 2L×┌n/2L┐ is generated in memory.
The matrix M is now loaded with the entries taken from P; this operation consists in a loop of length n+1, i.e. the index l varies from 0 to n, and at each step the following operations are executed:
A column vector A (reference sign 115) of dimension ┌n/2L┐ is loaded with the consecutive powers γj−1 for j from 1 to ┌n/2L┐.
The initial values pLj(γ) are computed and stored in vector Out (reference number 116) of dimension 2L, i.e. the matrix product Out=MA is computed as follows:
(a)
A loop of length L is started, at each cycle the number of pij(γ) is halved, until only one value is obtained and the algorithm stops. Defining a vector OUT of dimension 2L−1, the operations are
The algorithm has been simulated in MAPLE for test purposes only. The MAPLE programs are given below along with simulation times which show that already in a poor software implementation significant gain can be observed. Implementation in, e.g., the C language, assembler, or hardware implementation will give even better performances.
To reliably estimate the evaluation time, an external loop is executed for evaluating the same polynomial in a number N=1000 of points. If T is the measured time, then T/N is a good estimation for the time required to evaluate the polynomial in a single point.
The polynomial has been chosen randomly with an average number of non-zero coefficients approximately close to n/2. This situation is typical of the polynomials that represents received code words.
The Horner rule is a simple loop of length n:
The algorithm has been implemented considering several simple loops of length not larger than √{square root over (n)}. The input is the same used with the Horner's rule.
Possible applications for the presently proposed apparatus and algorithm are the following:
This scheme is used for example in PayTV access control systems. Suppose a server wants to distribute a key K to a subset of the set of all possible users, namely the subset of the people who paid for a particular content. Suppose users U1, . . . , Un are in this subset. Then the server can publish the following polynomial
where xi stands for the binary string that user Ui is supposed to have as a ticket and h is a hash function that the server will change each time it publishes a new polynomial.
An authorized user Ui gets K by evaluating p(x) in h(xi). As the polynomial can be pretty big, if the number of authorized users is big, an efficient polynomial evaluation algorithm is desirable.
So here the input is p(x), the polynomial made public by the server, and the output is p(h(xi)) computed by the user Ui to get K.
These computations are key operations in the algebraic decoding of cyclic codes, like BCH and Reed-Solomon codes. Here the input is a received word r to be decoded. This is in form of a string of symbols (in the binary alphabet for example) and is transformed into a polynomial R(x) simply by considering those symbols as its coefficients.
The output we want is R(αi), 1≦i≦2t, where 2t is the number of syndromes to be computed (depending on the BCH bound t, a parameter of the code in use), and α is an element of the field where the computations occur.
A scheme of the whole decoding procedure is illustrated in
Unit 110 is a syndrome computation unit, which outputs R(αi), 1≦i≦2t as said. These values are the inputs for unit 120, which produces the error locator polynomial σ(z) (usually by means of Berlekamp-Massey algorithm). Error-locating unit 130 looks for the roots of this polynomial, as they correspond to the positions li of the errors in the received word. Finally the outputs of units 120 and 130 are used in error-computing unit 140 to compute the error magnitudes ρi (this step can be omitted in the binary case).
The error-locating unit 130 usually uses an algorithm known as Chien search. According to one aspect of the present invention, it is proposed instead to use the well-known Cantor-Zassenhaus algorithm (factoring module 131) first to find the roots in a representation where it is still not evident what are the corresponding error positions, then finally find the error positions by computing discrete logarithms by means of Shank's algorithm (logarithm-computing module 132). This will be explained in more detail further below.
Unit 140 applies Forney's algorithm and involves the evaluation of some polynomials built from the outputs of units 120 and 130. This step is not needed in the case of binary codes.
Application C: Secret Sharing schemes
These are (t, n)-threshold schemes based on interpolating polynomials. The secret key K is broken into pieces or shadows for n users, so that at least t users are needed to retrieve the key K, and no group of fewer than t users can do it. The sharing mechanism is the following: a server chooses a polynomial over a certain finite field, of degree t−1, with constant term K and all other coefficients randomly chosen. Then the server evaluates this polynomial in n different field elements, and the outputs are the shadows to be distributed to the users. Then any group of t can retrieve K by Lagrange interpolation.
If the number of users is large, then a fast evaluation algorithm to get the shadows to be distributed is desirable.
In the following, application of the invention in the context of the algebraic decoding of cyclic codes [n, k, d] up to the BCH bound is illustrated. Today error correcting codes of sizes must be managed that require efficient algorithms, possibly at the limit of their theoretical minimum complexity.
For easy reference, the algebraic decoding of cyclic codes is summarized in the following: let C be an [n, k, d]cyclic code over a finite field GF(q), q=ps for a prime p, with generator polynomial of minimal degree r=n−k
g(x)=xr+g1xr-1+ . . . +gr-1x+gr,
g(x) dividing xn−1, and let a be a primitive n-th root of unity lying in a finite field GF(pm), where the extension degree is the minimum integer m such that n is a divisor of pm−1
Assuming that C has BCH bound t, then g(x) has 2t roots with consecutive power exponents, so that the whole set of roots is
R={α
l+1,αl+2, . . . αl+2t,αs
where it is not restrictive to take l=0 as it is usually done.
Let R(x)=g(x)+e(x) be a received code word such that the error pattern e(x) has no more than t errors. The Gorenstein-Peterson-Zierler decoding procedure, which is a standard decoding procedure for every cyclic code up to the BCH bound, is made up of four steps:
S
j
=R(αl+j)j=1, . . . 2t.
σ(z)=zt+σ1zt−1+ . . . +σt−1z+σt
αj
Prior-art implementations of this decoding algorithm combine the computation of 2t syndromes using Horner's rule, the Berlekamp-Massey algorithm to obtain the error-locator polynomial, the Chien search to locate the errors, and the evaluation of Forney's polynomial Γ(x) to estimate the error magnitudes.
The computation of the 2t syndromes using Horner's rule requires 2tn multiplications in GF(qm), which may be prohibitive when n is large. Horner's rule may be replaced by the Algorithm for evaluating polynomials according to the present invention, as discussed above. The Berlekamp-Massey algorithm has multiplicative complexity O(t2), is very efficient and will not be discussed further later on. The Chien search requires again O(tn) multiplications in GF(qm). Forney's algorithm again requires O(t2). Notice that this fourth step is not required if we deal with binary codes and that both the first and the fourth steps consist primarily in polynomial evaluations, so they can benefit from any efficient polynomial evaluation algorithm, as described above.
The standard decoding procedure is satisfactory when the code length n is not too large (say <103) and efficient implementations are set up taking advantage of the particular structure of the code. The situation changes dramatically when n is of the order of 106 or larger. In this case a complexity O(tn), required by the Chien search, is not acceptable anymore. In the following, a method to make this step more efficient and practical even for large n is described.
We will follow the usual approach of focusing as above in computing the number of multiplications, as they are more expensive than sums: for example in GF(2m) the cost of an addition is O(m) in space and one clock in time, while the cost of a multiplication is O(m2) in space and O(log2 m) in time.
The syndromes are computed in the manner as described above. Once the error locator polynomial σ(z) has been computed from the syndromes using the Berlekamp-Massey algorithm, its roots represented in the form αl
The Cantor-Zassenhaus probabilistic factorization algorithm is very efficient in factoring a polynomial and consequently in computing the roots of a polynomial. Since σ(z) is the product of t linear factors z+ρi, over GF(qm), (i.e. ρi is a q-ary polynomial in α of degree m-1), this factoring algorithm can be directly applied to separate these t factors. Thus, the error positions li are obtained by computing the discrete logarithm of ρi=αl
The Cantor-Zassenhaus algorithm is described here for easy reference. Only the case of characteristic 2 is treated here, which is by far the most common in practice; the general situation is described in [3, 6].
Assume that p(z) is a polynomial over GF(2m) that is a product of t polynomials of degree 1 over the same field GF(2m), m even (when m is odd it is enough to consider a quadratic extension and proceed as in the case of even m). Suppose that α is a known primitive element in GF(2m), and set
then ρ=αl
As shown in [6], the polynomial b(z) can be chosen of the form z+β, using b(z)=z as initial choice. Let θ be a generator of the cyclic subgroup of GF*(2m) of order lm. If
z
lm=ρi mod σ(z),iε{1,1,2},
then each root ζh of σ(Z) is of the form αiθj. If this is the case, which does not allow us to find a factor, we repeat the test with b(z)=z+β for some β and we will succeed as soon as the elements ζh+β are not all of the type αiθj for the same iε{0, 1, 2}. This can be shown to happen probabilistically very soon, especially when the degree of σ(z) is high.
Shank's algorithm can be applied to compute the discrete logarithm in a group of order n generated by the primitive root α. The exponent l in the equality
αl=b0+b1α+ . . . +bs-1αs-1
is written in the form
l=l
0
+l
1
┌√{square root over (n)}┐
A table T is constructed with ┌√{square root over (n)}┐ entries αl
A
j=(b0−b1α+ . . . +bs-1αs-1)α−1j=0, . . . ,┌√{square root over (2)}┐−1,
and looking for Aj in the Table; when a match is found with the κ-th entry, we set l0=j and l1=κ, and the discrete logarithm l is obtained as j+κ┌√{square root over (n)}┐.
This algorithm can be performed with complexity O(√{square root over (n)}) both in time and space (memory). In our scenario, since we need to compute t roots, the complexity is O(t√{square root over (n)}).
We observe that the above procedure can be used to decode beyond the BCH bound, up to the minimum distance, whenever the error locator polynomial can be computed from a full set of syndromes [4, 7, 20, 23].
The Cantor-Zassenhaus algorithm finds the roots X of the error locator polynomial, then the baby-step giant-step algorithm of Shank's finds the error positions. As said in the introduction, this is the end of the decoding process for binary codes. For non-binary codes, Forney's polynomial
Γ(x)=σ(x)S(x) mod x2t+1,
where
S(x)=Σi=12tSixi[2],
yields the error values
Again we remark that this last step can benefit from an efficient polynomial evaluation algorithm, such as the one presented above.
Given the importance of cyclic codes over GF(2m), for instance the Reed-Solomon codes that are used in any CD-ROM, or the famous Reed-Solomon code [255, 223, 33] over GF(28) used by NASA ([24]), an efficient evaluation of polynomials over GF(2m) in points of the same field is of the greatest interest. In the previous remarks we have shown that efficient methods do exist, even more, in particular scenarios additional gains can be obtained by a clever choice of the parameters, for example choosing L as a factor of m that is close to the optimum given above, together with some arrangements as explained below.
The idea will be illustrated considering the decoding of the above mentioned Reed-Solomon code, namely we show how to obtain the 32 syndromes.
Let
r(x)=Σi=0254rixi,riεF2
be a received code word of a Reed Solomon code [255, 223, 33] generated by the polynomial
g(x)=┌i=132(x−αi),
with α a primitive element of GF(28), i.e. a root of x8+x5+x3+x+1. Our aim is to evaluate the syndromes
S
j
=r(αj),j=1, . . . ,32.
We can argue in the following way. The power β=α17 is a primitive element of the subfield GF(24), it is a root of the polynomial x4+x3+1, and has trace 1 in GF(24). Therefore, a root δ of z2+z+β is not in GF(24), but it is an element of GF(28), and every element of GF(28) can be written as a+bδ with a,bεGF(24). Consequently, we can write r(x)=r1(x)+δr2(x) as a sum of two polynomials over GF(24), evaluate each ri(x) in the roots αj of g(x), and obtain each syndrome
S
j
=r(αj)=r1(αj)+δr2(αj)
with one multiplication and one sum.
Now, following our proposed scheme, if p(x) is either r1(x) or r2(x), in order to evaluate p(αj) we consider the decomposition
p(x)=(p0+p2x+ . . . +p254x127)2+x(p1+p3x+ . . . +p253x126)2,
where we have not changed the coefficients computing σ−1 for each of them, as a convenient Frobenius automorphism will come into play later. Now, each of the two parts can be decomposed again into the sum of two polynomials of degree at most 63, for instance
p
0
+p
2
x+ . . . +p
254
x
127=(p0+p4x+ . . . +p252x63)2+x(p2+p6x+ . . . +p254x63)2
and at this stage we have four polynomials to be evaluated. The next two steps double the number of polynomials and halve their degree; we write just one polynomial per each stage
p
0
+p
4
x+ . . . +p
252
x
63=(p0+p8x+ . . . +p248x31)2+x(p4+p12x+ . . . +p252x31)2
p
0
+p
8
x+ . . . +p
248
x
31=(p0+p16x+ . . . +p240x15)2+x(p8+p24x+ . . . +p248x15)2
Since we choose to stop the decomposition at this stage, we have to evaluate 16 polynomials of degree at most 15 with coefficients in GF(24), but before doing this computation we should perform the inverse Frobenius automorphism σ−4 on the coefficients, however σ−4(pi)=pi because the coefficients are in GF(24) and any element β in this field satisfies the condition
β2
Now, let K be the number of code words to be decoded. It is convenient to compute only once the following field elements:
αi,i=2, . . . ,254.
(this requires 253 multiplications); and
αi·βj for i=0, . . . ,254 and j=1, . . . ,14,
which requires 255×14=3570 multiplications.
Then only sums (that can be performed in parallel) are required to evaluate 16 polynomials of degree 15 for each αj, j=1, . . . ,32. Once we have the values of these polynomials, in order to reconstruct each of r1(αj) or r2(αj), we need
Summing up, every r(αj)=r1(αj)+δr2(αj) is obtained with 2×45+1=91 multiplications. Then the total cost of the computation of 32 syndromes drops down from 31+32×254=8159 with Horner's rule to 32×91+3570+253=6735. Since we have K code words the total cost drops from 31+8128 K to 3823+2912 K, with two further advantages:
Clearly, these decoding schemes can be generalized for cyclic codes over any GF(pm) with m not prime.
In the previous sections we presented methods to compute syndromes and error locations in the GPZ decoding scheme of cyclic codes up to their BCH bound, which are asymptotically better than the classical algorithms. The following example illustrates the complete new procedure.
Consider a binary BCH code [63; 45; 7] with generator polynomial
g(x)=x18+x17+x14+x13+x9+x7+x5+x3+1
whose roots are
Let c(x)=g(x)I(x) be a transmitted code word, and the received word be
r(x)=x57+x56+x53+x52+x50+x48+x46+x44+x42+x39+x31+x18+x17+x14+x13+x7+x5+x3+1
where three errors occurred. The 6 syndromes are
For example, S1 has been computed considering r(x) as a sum of the polynomials
Each square polynomial splits into two polynomials
Again each square polynomial splits into two polynomials
Therefore we need the following powers
The roots of σ(z) are computed as follows using the Cantor-Zassenhaus algorithm. Let ρ=α21 be a cube root of the unity, consider a random polynomial, for instance Z+ρ, of degree less than 3 and compute
a(z)=(z+ρ)21 modulo σ(z)
(the exponent of z+ρ is (2m−1)/3=21):
(α5+α4+α2+α+1)z2+(α3+α+1)z+α5+α4+x3+1.
In this case a(z) has no root in common with σ(z), while
gcd(a(z)+1,σ(z))=z+(α4+α3+1)(l−31),
gcd(a(z)+ρ,σ(z))=z+(α5+α4+α2+1)(l−9),
gcd(a(z)+ρ2,σ(z))=z+(α3+α)(l−50),
The error positions have been obtained using Shank's algorithm with a table of 8 entries, and a loop of length 8 for each root, for a total of 24 searches versus 63 searches of Chien's search.
Number | Date | Country | Kind |
---|---|---|---|
00085/11 | Jan 2011 | CH | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP2012/050704 | 1/18/2012 | WO | 00 | 7/18/2013 |