1. Field of the Invention
The present invention relates to a low complexity bit-parallel systolic architecture, and more particularly to a low complexity bit-parallel systolic architecture for computing C+AB, AB, C+AB2 or AB2 over a class of GF(2m) free global connection.
2. Description of Related Art
Finite fields GF(2m) have broadly applied to error control coding and cryptography [reference 12]. The fundamental operations in a finite field are addition, multiplication, exponentiation, division and multiplicative inversion. However, information processing usually requires the power-sum (C+AB2) operation to be performed in error control coding. AB2 circuits have been shown to be more effective than AB circuits in performing exponentiation, inversion and division in GF(2m). This AB2 operation can be performed by typical multiplication, but not necessarily in an efficient way. Recently, several studies have sought to solve this problem. For example Wei [reference 1] presented a systolic array with bi-directional data flow to compute C+AB2 over GF(2m) using the standard basis representation, Wang and Guo [reference 2] presented a systolic array with unidirectional data flow over GF(2m); Liu [reference 3] proposed an AB2 multiplier that used a cellular architecture in GF(2m) and was based on an irreducible all one polynomial (AOP), and Lee [reference 4] presented a bit-parallel systolic array over a class of GF(2m) which also based on an irreducible AOP. This study focuses on the implementation of the systolic circuit of the C+AB, AB, C+AB or AB2 operation over the class of AOP-based GF(2m) and the class of equally spaced polynomial based (ESP-based) GF(2m).
Irreducible AOP or irreducible ESP generates a special finite field, in which arithmetic operation can be simplified. In 1989, Itoh and Tsujii [reference 5] designed two low-complexity multipliers in a class of GF(2m) based on the irreducible AOP of degree m or the irreducible ESP of degree mr. Since then, many bit-parallel low-complexity multipliers have been proposed for error-control coding or cryptographic applications, such as those described in [references 6-9]. Recently, Lee et. al. [reference 10] employed cyclic shifting and inner product to implement efficient systolic multipliers over a class of GF(2m), in which an irreducible AOP or an irreducible ESP generates each element of the finite field, such that the systolic circuits have low latency and low complexity. However, the circuit includes many surplus inputs and latches [reference 10] if the order m of GF(2m) is large. Later, Lee et. al. [reference 11] used some global connections disused inputs and latches in another design. In particular, public-key cryptography applies the finite field GF(2m) [reference 12], in which the order m ranges from dozens to hundreds. If m is in the order of hundred, then reducing the number of redundant inputs and latches or eliminating the global connections becomes important.
This study develops an algorithm for computing C+AB, AB, C+AB2 or AB2 over a class of fields GF(2m) using the characteristics of an irreducible AOP of degree m. Based on the algorithm, a ringed parallel-in parallel-out systolic multiplier for computing C+AB2 is proposed. The multiplier consists of m2 identical cells, each consisting of one 2-input AND gate, one 2-input XOR gate and three 1-bit latches. The gates in the multiplier are fewer than in [reference 3, 4, 10 or 11]. The architecture includes no redundant inputs, latches and has no global connections; it is therefore is suitable for use in VLSI design. Moreover, extending this algorithm enables the ringed bit-parallel systolic architecture over the class of GF(2m) also to be applied to ESP-based multiplication over the class of GF(2nr).
The main objective of the present invention is to provide an improved a bit-parallel systolic architecture for computing C+AB, AB, C+AB2 or AB2 over a class of GF(2m) based on the irreducible all one polynomial (AOP) or the irreducible equally spaced polynomial (ESP), where A, B and C are elements of GF(2m).
To achieve the objective, If elements over GF(2m) are represented by extended forms, then these elements have two important properties: first, the polynomial of the elements is cyclic with modulo xm+1+1, and second, some fixed zero terms of the product of two elements can be ignored in the polynomials. Then, with these properties, ringed low-complexity bit-parallel systolic multipliers are presented. The ringed bit-parallel systolic multiplier over the class of GF(2m) requires few gates and no global connections. Accordingly, the new multiplier has a low complexity and few input pins. This ringed configuration can be easily implemented by taking advantage of three-dimensional routing in VLSI systems. The architecture of the multiplier was designed to compute C+AB2 over GF(24), based on the irreducible AOP, or over GF(26), based on the irreducible ESP as examples, respectively. Notably, the field GF(24) or GF(26) is used to illustrate the structures and operations of the two new multipliers presented in this paper, However, the extension of these structures to a general case of GF(2m) is straightforward.
Further benefits and advantages of the present invention will become apparent after a careful reading of the detailed description with appropriate reference to the accompanying drawings.
1. Mathematical Background
These section introduces the properties of the cyclic shifting and the inner product of the field GF(2m) based on an irreducible AOP introduced in [reference 10]. These properties are important in developing the multipliers hereinafter.
1.1 Extended Canonical Basis
A polynomial of the form p(x)=p0+p1x+ . . . +pmxm over GF(2) is called an AOP of degree m if pi=1 for i=0, 1, . . . , m [reference 5]. An AOP has been shown to be irreducible if and only if m+1 is a prime and 2 is a primitive element of the field GF(m+1). For m≦100, the possible values of m for which an AOP of degree m is irreducible, are 2, 4, 10, 12, 18, 28, 36, 52, 58, 60, 66, 82 and 100.
Suppose that a is a root of an irreducible AOP of degree m; then any element A in the Galois field GF(2m) can be represented as A=a0+a1a+a2a2+ . . . +am−1am−1, where the coefficients aiεGF(2) for 0≦i≦m−1, and {1, a, a2, . . . , am−1} is called a canonical basis of GF(2m). Notably, the element A can also be represented as A=A0+A1a+A2a2+ . . . +Amam, with Ai=ai+Am for 0≦i≦m−1 and Am=0 or 1. The basis {1, a, a2, . . . , am} is then called an extended basis of the canonical basis {1, a, a2, . . . , am−1}.
1.2 Inner Product
Let P(x)=1+x+x2+ . . . +xm be an irreducible AOP of degree m; and let α be a root of P(x), such that P(α)=1+α+α2+ . . . +αm=0. Then,
αm+1=1, (1)
Definition 1: Let A=A0+A1a+A2a2+ . . . +Amam be an element in GF(2m), which is represented with the extended basis. Then, A(1)(=Am+A0a+A1a2+ . . . +Am−1am) and A(−1)(=A1+A2a+A3a2+ . . . +A0am) denote the elements obtained by shifting A cyclically one position to the right and one position to the left, respectively.
Analogously, A(i) and A(−i), where i=0, 1, 2 . . . m, represent the elements obtained by shifting A cyclically i positions to the right and i positions to the left, respectively.
where <θ>, the subscript of A<θ>, represents the least nonnegative residues of θ modulo m+1 (for all AOP-based GF(2m)). Notably, A(0)=A(−0)=A.
An important operation, called the inner product, is defined as follows.
Definition 2: Let A=A0+A1a+ . . . +Amam and B=B0+B1a+ . . . +Bmam be two elements of GF(2m), where a is a root of the irreducible AOP of degree m. Then the inner product of A and B is defined as,
By Definitions 1 and 2, the inner product of A(i) and B(i) is given by,
The inner product operation defined in Definition 2 is important in the proposed algorithm.
Theorem 1: Assume that A=A0+A1a+ . . . +Amam and B=B0+B1a+ . . . . +Bmam are two elements in GF(2m). Then, the A and B over GF(2m) can be multiplied using,
Based on theorem 1, bit-parallel systolic multipliers for computing C+AB2 was presented in [reference 3] and [reference 4] the latency of those multipliers is only m+1 clock cycles. However, the circuit still requires (m+1)2 cells and 5m+3 input pins. Following the above preliminaries, Section 3 presents a modified multiplier for computing C+AB over GF(2m), based on an irreducible AOP.
2. Multiplier for Computing C+AB2
2.1 Representation for Computing C+AB2
Definition 3: Let B=B0+B1a+ . . . +Bmam be over GF(2m) be generated by an irreducible AOP of p(x), where a is a root of the irreducible AOP of p(x). Then the square of B is defined as,
Let A and B be two elements of GF(2m), both represented with the extended basis {1, a, a2, . . . , am}; then, the inner product of A and B2 is obtained by,
By Definitions 1 and 2 again, the inner product of A(i) and (B2)(−i) is given by,
According to Eqs. (1) and (7), the product of A and B2 over GF(2m) is,
Assume that A=A0+A1a+A2a2+A3a3+A4a4 and B=B0+B1a+B2a2+B3a3+B4a4 are two elements in the field GF(24). Let D=D0+D1a+D2a2+D3a3+D4a4 denote the product of A and B2 over GF(24).
Then, from Eq. (1), a5=1, and from Eq. (11), the coefficients of D are given by,
D0=A0B0+A4B3+A3B1+A2B4+A1B2,
D1=A1B0+A0B3+A4B1+A3B4+A2B2,
D2=A2B0+A1B3+A0B1+A4B4+A3B2,
D3=A3B0+A2B3+A1B1+A0B4+A4B2,
and
D4=A4B0+A3B3+A2B1+A1B4+A0B2.
2.2 AOP-Based Algorithm and Circuit
Theorem 2: Assume that A=A0+A1a+A2a2+ . . . +Amam and B=B0+B1a+B2a2+ . . . +Bmam are two elements in GF(2m). Then, A and B2 over GF(2m) can be multiplied using,
Proof: A and B are two elements in GF(2m); then, the product A and B2 can be obtained from Eq. (11) as,
Splitting the right side of this equation into two terms with i=even and i=odd, yields,
Notably, m must be even for an irreducible AOP of degree m. Substituting ai=am+1+i and <i−j>=<m+1+i−j> into the second term on the right side of Eq. (12) gives
Taking i=2p for i=even where p=0, 1, . . . , m/2, and taking i=2p−m−1 for i=odd, where p=(m/2)+1, (m/2)+2, . . . , m, Eq. (13) can be rewritten as,
Let k be an integer such that 0≦k≦m. Then <p+k> must be in the range 0≦<p+k>≦m for 0≦p≦m. Thus, j=<p+k> can be substituted into the subscripts of A<2p−j>Sj in Eq. (14) to obtain,
Comparing Eq. (15) with Eq. (10) finally gives,
That is,
Assume that {1, a, a2, a3, a4} is an extended basis of the field GF(24). Let A=A0+A1a+A2a2+A3a3+A4a4 and B=B0+B1a+B2a2+B3a3+B4a4 be two elements of the field GF(24). And let D=D0+D1a+D2a2+D3a3+D4a4 be the product of A and B2. By employing the properties of am+1+i=ai modulo (am+1+1) for m=4, the product D can then be computed using Theorem 2:
Definition 4: Let A=A0+A1a+ . . . +Amam and B=B0+B1a+ . . . +Bmam be two elements of GF(2m), represented with the extended basis {1, a, a2, . . . , am}, where a is a root of the irreducible AOP of degree m. If A and B are represented with Am=Bm=0, then AiBm and AmBi equal zero, for 0≦i≦m. Those terms are called fixed zero terms.
Definition 4 yields the following theorem.
Theorem 3: Assume that A=A0+A1a+ . . . +Amam and B=B0+B1a+ . . . +Bmam are two elements in GF(2m), and a is a root of the irreducible. AOP of degree m. If A and B are represented with Am=Bm=0, then the product of A and B over GF(2m) includes 2m+1 fixed zero terms.
Proof: According to Eq. (11), the product of A and B2 over GF(2m) has (m+1)2 terms Since Am=Bm=0, Eq. (11) can be simplified as,
According to Eq. (16) the product of A and B2 over GF(2m) has m×m=m2 terms. Therefore, the product of A and B2 over GF(2m) has 2 m+1 fixed zero terms.
Using theorem 3, the C+AB2 circuit can be simplified by omitting the fixed zero terms. The following example illustrates the fixed zero terms of C+AB2 over GF(24).
Assume that {1, a, a2, a3, a4} is an extended basis of the field GF(24). Let A=A0+A1a+A2a2+A3a3+A4a4, B=B0+B1a+B2a2+B3a3+B4a4 and C=C0+C1a+C2a2+C3a3+C4a4 be three elements of the field GF(24), where A4=B4=C4=0. Let D=D0+D1a+D2a2+D3a3+D4a4 be the product of C+AB 2. The product D can then be computed using theorems 1 and 3:
Example 3 involves nine fixed zero terms that forms A4Bi and AiB4 are zeroes and need not be computed.
2.3 Ringed AOP-Based circuit
Using the cyclic property of the sequence <a0 a2 a4 a1 a3>,
T0,j=C<2j>, initialization, for j=0, 1 . . . , m. (17)
Ti+1,j=Ti,j+Aj(i)Sj(−i), for i=0, 1 . . . , m and j=0, 1 . . . , m (18)
D<2j>=Tm+1,j, for j=0, 1 . . . , m (19)
Where Sj is defined as in Eq. (8). The product D can be computed, as the following steps:
The item a3 is rearranged to the leftest by cyclic property in above steps. The advantage of the circuit in
The positions of the ring using latches instead of U-cells are as the follows.
Where Pi,j denotes position in row i and column j. In
3. Modified ESP-Based Multiplier
This section proposes an ESP-Based multiplier. The method for computing C+AB2 based on an irreducible AOP can also be applied to compute the multiplication based on an irreducible ESP.
3.1 Algorithm
A polynomial of the form g(x)=1+xr+ . . . +x(n−1)r+xnr is called an r-equally spaced polynomial (r-ESP) of degree nr. Let g(x)=p(xr), then p(x) is an AOP of degree n. If p(x) is an irreducible AOP, then r-ESP g(x) has been shown to be irreducible if and only if r=(n+1)j≠1 modulo (n+1)r, for j≧1 [reference 5]. For nr≦100, the possible pairs (nr,r) for which an r-ESP of degree nr is irreducible, are (6,3), (18,9), (20,5), (54,27) and (100,25).
Now, suppose that a is a root of the irreducible r-ESP of degree nr. Then, an element A in the Galois field GF(2nr) can be represented as A=a0+a1a+ . . . +anr−1anr−1 using the canonical basis {1, a, a2 . . . , anr−1} where aiεGF(2) for 0≦i≦nr−1. The element A can also be represented using the extended basis {1, a, a2, . . . , a(n+1)r−1}, as,
where Ai=ai, for 0≦i≦nr−1 and Ai=0 for nr≦i≦(n+1)r−1.
Assume that a is a root of the r-ESP g(x)=1+x3+x6 (that is, g(x) is an irreducible ESP with nr=6 and r=3). Then, {1, a, a2, a3, a4, a5} is a canonical basis of the Galois field GF(26) and {1, a, a2, a3, a4, a5, a6, a7, a8} can be used as an extended basis of this canonical basis. Thus, an element in GF(26) can be represented as A=a0+a1a+a2a2+a3a3+a4a4+a5a5=A0+A1a+A2a2+A3a3+A4a4+A5a5+A6a6+A7a7+A8a8 using the extended basis, where the A=ai, for 0≦i≦5, and A6=A7=A8=0.
Theorem 4: Assume that A=A0+A1a+ . . . +A(n+1)r−1a(n+1)r−1 and B=B0+B1a+ . . . +B(n+1)r−1a(n+1)r−1 are two elements in GF(2nr), which are represented with the extended basis {1, a, a2, . . . , a(n+1)r−1} where a is a root of the irreducible r-ESP of degree nr. Then, the product of A and B2 over GF(2nr) includes (2n+1)r2 fixed zero terms of the form AiBj or AjBi, for nr≦j≦(n+1)r−1, and 0≦i≦(n+1)r−1, if A and B are represented with Aj=Bj=0, for nr≦j≦(n+1)r−1.
Proof: According to Eq. (16), the product of A and B2 over GF(2nr) is,
where <θ>, the subscript of B<θ>, denotes the least nonnegative residues of θ modulo (n+1)r (for all ESP-Based GF(2nr)). Equation (20) has ((n+1)r)2 multiplicative terms. Since Aj=Bj=0 for nr≦j=(n+1)r−1, Eq. (20) can be simplified as,
According to Eq. (21) the product of A and B2 over GF(2nr) has (nr)2 terms. Therefore, the product of A and B2 over GF(2m) has ((n+1)r)2−(nr)2=(2n+1)r2 fixed zero terms.
Since a is a root of the irreducible r-ESP g(x)=1+xr+ . . . +xnr, g(a)=1+ar+ . . . +anr=0. Assume that two elements A=A0+A1a+A2a2+ . . . +A(n+1)r−1a(n+1)r−1 and B=B0+B1a+B2a2+ . . . +B(n+1)r−1a(n+1)r−1; then, the product of A and B2, according to Theorem 2 and Eq. (20), can be expressed as,
Thus, the method of multiplication based on an irreducible AOP can also be used for multiplication based on an irreducible ESP.
3.2 Ringed Circuit of an ESP-Based Multiplier
Assume that two elements A=a0+a1a+a2a2+a3a3+a4a4+a5a5=A0+A1a+A2a2+ . . . +A8a8 and B=b0+b1a+b2a+b3a+b4a4+b5a5=B0+B1α+B2α2+ . . . +B8α8, Let D=D0+D1a+D2a2+ . . . +D8a8 be the product of AB2+C, where A, B and C are elements over GF(26). Set the initial value T0=C. The product D can then be computed using Eq. (22), as follows.
The sequence D0, D2, D4, D6, D8, D1, D3, D5, D7, is a permutation of the sequence D0, D1, D2, D3, D4 D5, D6, D7, D8. Notably, the terms that include A6, A7, A8, B6, B7 and B8 are all zeros, such that AjBk and AkBj need not be computed for 6≦j≦8 and 0≦k≦8. Using Eq. (18), the zeroth ring level, U cells for computing the bit operation T1,3=T0,3+A3B6, T1,5=T0,5+A5B7 and T1,7=T0,7+A7B8 can be replaced by bit latches respectively, because B6=B7=B8=0, and those for performing the bit operation T1,6=T0,6+A6B3 T1,7=T0,7+A7B8, and T1,8=T0,8+A8B4 can be replaced by bit latches since A6=A7=A8=0. In the first level ring, A4 or B4 shifts to the right or the left, respectively. Then, each bit operation for computing T2,2=T1,2+A1B6, T2,4=T1,4+A3B7, T2,6=T1,6+A5B8, T2,7=T1,7+A6B4, T2,8=T1,8+A7B0 and T2,<9>=T2,0=T1,0+A8B5 requires only one bit latch instead of a U cell, respectively.
Now, positions of the ring that uses latches rather than cells is described briefly as follows.
where position Pi,j, in which i and j are the row and column numbers, respectively.
As introduced in Section 3, use a ringed structure to realize the circuit of the cyclic shift sequence <a0 a2 a4 a6 a8 a1 a3 a5 a7>.
The positions of the ringed ESP-based over GF(2nr) are obtained according to a general rule as follows.
Clearly, the proposed three-dimensional ESP-based systolic architecture over GF(2nr) requires only (n+1)r clock cycles. Moreover, the circuit needs no global connections and the proposed ESP-based systolic multiplier can save (2n+1)r2 U cells by ignoring the fixed zero terms.
4. Comparison and Discussion
This work has presented a three-dimensional ringed parallel systolic AOP-based multiplier for computing C+AB, AB, C+AB2 or AB2 over GF(2m). The latency of the AOP-based multipliers is only m+1 clock cycles in performing a multiplication over GF(2m). The number of input pins is only 3m, which equals the sum of the number of bits in A, B and C. Table 1 compares the new AOP-based parallel systolic multipliers with those of Liu [reference 3], Lee [reference 4] and Lee [reference 11]. The table reveals that the ringed AOP-based multipliers (RAOPM) include fewer gates and fewer input pins than other multipliers. Clearly, the ringed systolic multipliers involve much low hardware complexity and no global connections, which characteristics are of course advantageous in VLSI implementation. Notably, the Architecture of C+AB2 is used to illustrate the structures and operations of a new multiplier presented in this paper, However, the extension of these structures to a general case of C+AB, AB or AB2 is straightforward.
Although the invention has been explained in relation to its preferred embodiment, it is to be understood that many other possible modifications and variations can be made without departing from the spirit and scope of the invention as hereinafter claimed.