This disclosure relates to combinational circuits, and more specifically to a method of optimizing combinational circuits.
Combinational circuits have many possible applications. One such application is computing an inverse in a Galois Field, which is a field containing a finite number of elements. A combinational circuit is a circuit having an output value determined by the values of its inputs. Combinational circuits can be represented as circuit schematics using Boolean logic gates (such as AND gates and XOR gates), or can be represented mathematically using formulas having operations corresponding to logic gates. For example, an AND gate corresponds to a field multiplication operation, and an XOR gate corresponds to a field addition operation. Logic gates can be arranged to calculate functions, and binary string output of a function may be referred to as a “target signal.” In a typical truth table for a function, the target signal corresponding to the function is the last column of the truth table.
A combinational circuit may have both linear and non-linear portions, where the “non-linear” portions contain AND gates and XOR gates, and the “linear” portions contain only XOR gates. A quantity of AND gates of a combinational circuit may be referred to as the “multiplicative complexity” of the circuit. Combinational circuits and their associated formulas can be extremely large and complex in certain applications, such as microprocessors.
A method of simplifying a combinational circuit establishes an initial combinational circuit operable to calculate a target signal. A quantity of multiplication operations performed in a first portion of the initial combinational circuit is reduced to create a first, simplified combinational circuit. The first portion includes only multiplication operations and addition operations. A quantity of addition operations performed in a second portion of the first, simplified combinational circuit is reduced to create a second, simplified combinational circuit. The second portion includes only addition operations. Also, the second, simplified combinational circuit is operable to calculate the target signal using fewer operations than the initial combinational circuit.
A computer-implemented method of simplifying a plurality of formulas establishes a plurality of formulas. The formulas include only addition operations, and the formulas correspond to a portion of a combinational circuit including only addition operations. A basis set including a plurality of input signals is defined. Using a computer, a distance vector is determined that includes one value for each of the plurality of formulas, the one value corresponding to a number of addition operations necessary to calculate a corresponding formula using signals from the basis set. Using the computer, two basis vectors are determined whose sum, when added to the distance vector, reduces at least one value in the distance vector, and the sum is added to the basis set. The steps of determining two basis vectors whose sum, when added to the basis set, reduces at least one value in the distance vector, and adding the sum to the basis set may be selectively repeated until the basis set includes sums corresponding to each of the plurality of formulas.
A combinational circuit for a Substitution-Box for the Advanced Encryption Standard having a total of 115 Boolean gates comprises a first, input portion, a second portion coupled to the first, input portion, and a third, output portion coupled to the second portion. The first, input portion has 23 XOR gates. The second portion has 30 XOR gate and 32 AND gates, and computes the non-linear component of inversion in GF(256). Also, in the second portion 11 of the 30 XOR gates and 5 of the 32 AND gates are operable to perform inversion in GF(16). The third, output portion has 26 XOR gates and 4 XNOR gates.
These and other features of the present invention can be best understood from the following specification and drawings, of which the following is a brief description.
A first, non-linear portion of the combinational circuit is identified (step 104). A method 200 (see
A second, linear portion of the combinational circuit is identified (step 110) that includes only addition operations (XOR gates). A method 300 (see
y=x2x3x4+x1x3+x2x3+x3+x4 formula #1)
y2=x1x3x4+x1x3+x2x3+x2x4+x4 formula #2
y3=x1x2x4+x1x3+x1x4+x1+x2 formula #3
y4=x1x2x3+x1x3+x1x4+x2x4+x2 formula #4
Formulas 5-8, shown below, show four example inputs x1-x4 that may be used with formulas 1-4.
x1=0000000011111111 formula #5
x2=0000111100001111 formula #6
x3=0011001100110011 formula #7
x4=0101010101010101 formula #8
Inputting the values for x1-x4 shown in formulas 5-8 into formulas 1-4 yields the values for signals y1-y4 shown in formulas 9-12 below.
y1=0110010001010111 formula #9
y2=0101001101110001 formula #10
y3=0000111110010011 formula #11
y4=0000101001101111 formula #12
If one calculated formulas 1-4 separately, starting from scratch each time, 18 multiplications (AND operations) and 16 additions (XOR operations) would be required. This would be inefficient because certain terms such as x1x3 and x1x4 appear more than once, and recalculating those terms would be a waste of resources, whether those resources were computer processor calculations or wasted space occupied by excess logic gates in a circuit.
The method 200 may be used to simplify formulas 1-4 and reduce a quantity of multiplications (AND operations) performed in formulas 1-4. In one example, the method 200 could be first applied to formula 4 for y4. The initial formula 4 for y4 (reproduced below) uses 5 multiplications and 4 additions. However it can be seen that formula 4 can be simplified by factoring out x1x3 from the first two terms as shown in formula 13 below, which only requires 3 multiplications and 4 additions (a reduction of 1 multiplication). The question then becomes whether y4 can be processed using less than 3 multiplications.
y4=x1x2x3+x1x3+x1x4+x2x4+x2 formula #4
y4=(x1x3)(x2+1)+(x1+x2)x4+x2 formula #13
To apply the method 200 to y4, a polynomial (formula) to be simplified is obtained (step 202), which in this case will be formula 4 for y4. The formula is representative of one output of a non-linear portion of the combinational circuit.
A set K of known input signals x1-x4 is obtained (step 204). Pairs of the input signals x1-x4 are added together to determine at least one sum using a computer (step 206), and K is expanded to include the sums (step 208) forming an expanded set K′. Because step 208 involves randomly chosen sums, the signals in K′ at this point do not require any additional multiplications. Signals in the set K′ are then multiplied to determine at least one product using the computer (step 210), and K′ is expanded to include the at least one product (step 212). Steps 210-212 yield signals that require at most one more multiplication than the original set of known signals. Steps 206-212 are then selectively repeated (step 214) until either a desired target signal is found, or a maximum number of multiplications is reached (which in the case of formula 4 this is 3 multiplications). A new formula may then be obtained (step 216) and steps 206-214 may be selectively repeated for the new formula.
For y4 the method 200 can yield the following simplified formula:
y4=(x1+x2)(x4+x1x3)+x2 formula #14
A circuit specification is then generated (step 218) including each addition performed in step 206 and each multiplication performed in step 210, as shown above in formula 14. In one example, step 218 may include creating a set of short equations, or “straight line program,” as shown in formulas 15-19 shown below. Although the term “straight line program” is used throughout this application, it is understood that a straight line program is just one type of a circuit specification. It is understood that other types of circuit specifications could be used, such as Verilog code.
t1=x1+x2 formula #15
t2=x1·x3 formula #16
t3=x4+t2 formula #17
t4=t1·t3 formula #18
y4=x2+t4 formula #19
The improved formula 14 for y4 requires only 2 multiplications and 3 additions, instead of 3 multiplications and 4 additions as shown in formula 13. Thus, although inputting the values of inputs x1-x4 into formula 4, formula 13 or formula 30 will yield the same result for y4, formula 14 is the most efficient way to achieve this result. As described above, we know that it is not possible to compute y4 using fewer than 2 multiplications, so once formula 14 is determined (which uses two multiplications) step 116 is complete with regards to y4.
The method 200 may then be applied to formula 2 for y2. Looking at formula 2 for y2 (reproduced below) we see that formula 2 has a degree, or δ, of 3 (step 104). So if we can compute y2 using two multiplications the method 100 has succeeded.
y2=x1x3x4+x1x3+x2x3+x2x4+x4 formula #2
At this point the expanded set known signals K′ (step 204) would be as follows:
K′={x1,x2,x3,x4,t1,t2,t3,t4,y4} formula #20
Represented in binary notation the new signals are as follows:
t1=0000111111110000 formula #21
t2=0000000000110011 formula #22
t3=0101010101100110 formula #23
t4=0000010101100000 formula #24
Performing steps 206-212 for formula 2 yields the following formula for y2 that only requires two multiplications:
y2=x4+(x2+x1x3)(x3+x4) formula #25
These steps may be repeated to obtain simplified versions of y3 and y4 as shown in the straight line program of formulas 26-41 below.
t1=x1+x2 formula #26
t2=x1·x3 formula #27
t3=x4+t2 formula #28
t4=t1·t3 formula #29
y4=x2+t4 formula #30
t5=x3+x4 formula #31
t6=x2+t2 formula #32
t7=t6·t5 formula #33
y2=x4+t7 formula #34
t8=x3+y2 formula #35
t9=t3+y2 formula #36
t10=x4·t9 formula #37
y1=t10+t8 formula #38
t11=t3+t10 formula #39
t12=y4·t11 formula #40
y3=t12+t1 formula #41
As described above, if one used formulas 1-4 for calculating y1-y4 separately, 18 multiplications (AND operations) and 16 additions (XOR operations) would be required. However, using the straight line program for calculating y1-y4 shown in formulas 26-41, only 5 multiplications and 11 additions are required. So, in the example of formulas 1-4 applying method 200 can yield a reduction of 13 multiplications and 5 additions.
z0=w0+w1+w2 formula #42
z1=w1+w3+w4 formula #43
z2=w0+w2+w3+w4 formula #44
z3=w1+w2+w3 formula #45
z4=w0+w1+w3 formula #46
z5=w2+w3+w4 formula #47
Formulas 42-47 for z0-z5 can also be represented in the form of matrix M shown in formula 48 (shown below), with each row of M representing one of the formulas for z0-z5. For example, the first row of M corresponds to z0, and includes a “1” for each of w0, w1 and w2 (which are all included in formula 42) and a “0” for each of w3 and w4 (neither of which are present in formula 42).
If each of formulas z0-z5 were calculated separately from scratch, 14 additions (XOR operations) would be required. One may apply the method 300 to the matrix M to see if a simplified short line program to solve for z0-z5 using a reduced number of additions can be determined by using formula 49 below as a reference.
f(w)=Mw formula #49
where M is the matrix of formula 48; and
An input vector S is shown below in formula 50 and includes the values shown in formulas 51-55. The vector S acts as a set of signals to serve as a basis for the method 200 (step 304). As shown in formulas 51-55, each of the values w0-w4 is a row of an identity matrix.
S={w0,w1,w2,w3,w4} formula #50
w0=10000 formula #51
w1=01000 formula #52
w2=00100 formula #53
w3=00010 formula #54
w4=00001 formula #55
The following distance vector is then determined (step 306), as shown in formula 56:
D=[2 2 3 2 2 3] formula #56
Each value in the distance vector D corresponds to a quantity of additions needed to compute a zn value. For example, computing z0 requires 2 additions, computing z1 requires 2 additions, computing z2 requires 3 additions, etc.
Two basis vectors are then chosen (step 308) whose sum, when added to the basis D minimizes the sum of the new distances. In one example w1+w3 may be selected, as shown in formula 57 below. The sum is then added to the input vector S to form Supdated (step 310) as shown in formula 58 below.
t100=w1+w3 formula #57
w1+w3=[0 1 0 1 0] formula #58
Supdated={w0,w1,w2,w3,w4,t100} formula #59
As shown in formulas 42-47, formulas w1, w3 and w4 all include the term w1+w3, which can now be replaced by t100 reducing the number of additions required to determine w1, w3 and w4 by one, and thus also reducing corresponding distance vector D to form an updated distance vector D updated (step 312) as shown in formula 60 below:
Dupdated=[2 1 3 1 1 2] formula #60
Steps 308-312 may then be repeated until the distance vector D is minimized, if possible to include all zeros, as shown by formulas 61-87 below.
t101=w0+t100 formula #61
w0+t100=[1 1 0 1 0] formula #62
[1 1 0 1 0]=z4 formula #63
Dupdated=[2 1 3 1 0 2] formula #64
At this point we have found signal z4, so the sums of formulas 57 and 61 are saved in a straight line program.
t102=w2+t100 formula #65
w2+t100=[0 1 1 1 0] formula #66
[0 1 1 1 0]=z3 formula #67
Dupdated=[2 1 3 0 0 1] formula #68
At this point we have found z3, so formula 65 is added to the straight line program.
t103=w4+t100 formula #69
w4+t100=[0 1 0 1 1] formula #70
[0 1 0 1 1]=z1 formula #71
Dupdated=[2 0 3 0 0 1] formula #72
At this point we have found z1, so formula 69 is added to the straight line program.
t104=w2+t103 formula #73
w2+t103=[0 1 1 1 1] formula #74
[0 1 1 1 1]=z5 formula #75
Dupdated=[2 0 2 0 0 0] formula #76
At this point we have found z5, so formula 73 is added to the straight line program.
t105=w0+w1 formula #77
w0+w1=[1 1 0 0 0] formula #78
Dupdated=[1 0 1 0 0 0] formula #79
t106=w2+t105 formula #80
w2+t105=1 1 0 0 1 formula #81
[1 1 1 0 0]=z0 formula #82
Dupdated=[0 0 1 0 0 0] formula #83
At this point we have found z0, so formulas 77 and 80 are added to the straight line program.
t107=t103+t106 formula #84
t103+t106=[1 0 1 1 1] formula #85
[1 0 1 1 1]=z2 formula #86
Dupdated=[0 0 0 0 0 0] formula #87
At this point we have found z2, so formula 84 is added to the straight line program. Also, since the distance vector Dupdated now includes only zeros we are finished. Notice that this last operation added [01111] and [11000], to obtain [10111], so there was a cancellation in the second entry, adding two ones to get a zero. This possibility makes this technique different from prior techniques. For example, under the PAAR algorithm, no cancellation of elements is allowed.
Combined together, here is the straight line program for computing z0-z5, which only requires 8 XOR operations, instead of the 14 XOR operations required if z0-z5 are calculated separately.
t100=w1+w3 formula #57
t101=w0+t100 formula #61
t102=w2+t100 formula #65
t103=w4+t100 formula #69
t104=w2+t103 formula #73
t105=w0+w1 formula #77
t106=w2+t105 formula #80
t107=t103+t106 formula #84
In one example, if during step 308 there is a tie between multiple pairs of basis vectors (i.e. the sum of two sets of basis vectors achieves a reduction in D of the same magnitude), then the tie may be resolved by using one of a plurality of tie-breaking techniques that utilize a Euclidean norm of the updated distance vector. The Euclidean norm is calculated by calculating a square root of a sum of squares of elements of the updated distance vector.
In a first tie-breaking technique, a pair of basis vectors is selected whose sum induces the largest Euclidean norm in the new distance vector. For example, if a sum of a first pair of basis vectors resulted in a distance vector of [0 0 3 1] (which has a Euclidean norm of √{square root over (02+02+32+12)}=3.16) and a sum of a second pair of basis vectors resulted in a distance vector of [1 1 1 1] (which has a Euclidean norm of √{square root over (12+12+12+12)}=2.00) the first pair would be chosen because it induces a higher Euclidean norm. Of course, the step of actually calculating the square root could be omitted, as 3.162 would still be greater than 2.002.
In a second tie-breaking technique, a pair of basis vectors is selected who has the greatest value of a square of the Euclidean norm minus the largest element in the distance vectors.
In a third tie-breaking technique, a pair of basis vectors is selected who has the greatest value for a square of the Euclidean norm minus the difference between the largest two elements of the distance vector.
In a fourth tie-breaking technique, if a pair of basis vectors induces a Euclidean norm larger than a previous pair of basis vectors, then one of the pairs is randomly chosen (with a probability of ½).
Although the methods 200 and 300 have been described as applied to separate sets of formulas, it is understood that they could be applied to a single circuit or set of formulas. For example, method 200 could be applied to first with the aim of reducing non-linear components of a circuit while possibly extending linear components. Then method 300 could be applied to optimize the linear components. Also, it is understood that if the circuit contained multiple linear portions and multiple non-linear portions, the methods 200 and 300 could be applied to each of those portions to attempt to reduce the total number of gates in the circuit.
Although a preferred embodiment of this invention has been disclosed, a worker of ordinary skill in this art would recognize that certain modifications would come within the scope of this invention. For that reason, the following claims should be studied to determine the true scope and content of this invention.
This application is a divisional of U.S. application Ser. No. 12/367,660 filed on Feb. 9, 2009 now U.S. Pat. No. 8,316,338.
Number | Name | Date | Kind |
---|---|---|---|
4816999 | Berman et al. | Mar 1989 | A |
5189629 | Kohnen | Feb 1993 | A |
5282147 | Goetz et al. | Jan 1994 | A |
5473547 | Muroga | Dec 1995 | A |
5619418 | Blaauw et al. | Apr 1997 | A |
5721690 | Asaka | Feb 1998 | A |
6006023 | Higashida | Dec 1999 | A |
7010763 | Hathaway et al. | Mar 2006 | B2 |
7106860 | Yu et al. | Sep 2006 | B1 |
7257229 | Leshem | Aug 2007 | B1 |
7340694 | Baumgartner et al. | Mar 2008 | B2 |
7346862 | Zhuang | Mar 2008 | B2 |
7676778 | Arbel et al. | Mar 2010 | B2 |
20030053623 | McCanny et al. | Mar 2003 | A1 |
20030068036 | Macchetti et al. | Apr 2003 | A1 |
20030198345 | Van Buer | Oct 2003 | A1 |
20060093136 | Zhang et al. | May 2006 | A1 |
20080092091 | Baumgartner et al. | Apr 2008 | A1 |
20080256274 | Wohl et al. | Oct 2008 | A1 |
Number | Date | Country |
---|---|---|
1292067 | Mar 2003 | EP |
1465365 | Oct 2004 | EP |
2004014016 | Feb 2004 | WO |
2004102870 | Nov 2004 | WO |
Entry |
---|
Sumio Morioka and Akashi Satoh. “An Optimized S-Box Circuit Architecture for Low Power AES Design.” In CHES2002, vol. 2523 of Lecture Notes in Computer Science. pp. 172-186. Springer, 2003. |
Francois-Xavier Standaert, Gael Rouvroy, Jean-Jacques Quisquater, and Jean-Didier Legat. “Efficient Implementation of Rijndael Encryption in Reconfigurable Hardware: Improvement and Design Tradeoffs.” UCL Crypto Group, Belgium, (2003). |
Atri Rudra, Pradeep K. Dubey, Charanjit S. Jutla, Vijay Kumar, Josyula R. Rao, and Pankaj Rohatgi. “Efficient Implementation of Rijndael Encryption with Composite Field Arithmetic.” In CHES2001, vol. 2162 of Lecture Notes in Computer Science, pp. 171-184. Springer, 2001. |
D. Canright. “A Very Compact S-Box for AES.” Workshop on Cryptographic Hardware and Embedded Systems (CHES2005), Lecture Notes in Computer Science 3659, pp. 441-455, Springer-Verlag, (2005). |
D. Canright. “Masking a Compact AES S-Box.” Naval Postgraduate School Technical Report: NPS-MA-07-002. (2007). |
Liu Zhenglin, Zeng Yonghong, Zou Xuecheng, and Lei Jianming. “A Low-Power and Compact AES S-Box IP in 0.25um CMOS for Wireless Sensor Network.” Mechatronics and Automation, 2007. ICMA 2007, Intenrational Conference on Volume, Issue, Aug. 5-8, 2007 pp. 723-728. Digital Object Identifier. 10.1109/ICMA.2007.4303633. |
Nele Mentens, Lejla Batina, Bart Preneel, and Ingid Verbauwhede. “A Systematic Evaluation of Compact Hardware Implementations for the Rijndael S-Box.” A.J. Menezes (Ed.): CT-RSA 2005, LNCS 3376, pp. 323-333, 2005. Springer-Verlag Berlin Heidelberg 2005. |
Sumio Morioka and Akashi Satoh. “A 10-Gbps Full-AES Crypto Design with a Twisted BDD S-Box Architecture.” IEEE Transactions on Very Large Scale Integration (VLSI) Systems vol. 12, Issue 7, Jul. 2004 pp. 686-691. Year of Publication: 2004—ISSN:1063-8210. |
Namin Yu and Howard M. Heys. “A Compact Asic Implementation of the Advanced Encryption Standard with Concurrent Error Detection.” Electrical and Computer Engineering, Memorial University of Newfoundland, Canada, 2007. |
Joan Boyar and Rene Peralta. “Tight Bounds for the Multiplicative Complexity of Symmetric Functions.” Theoretical Computer Science, 396 (1-3): 223-246, 2008. |
Joan Boyar, Philip Matthews, and Rene Peralta. “On the Shortest Linear Straight-Line Program for Computing Linear Forms.” In MFCS, pp. 168-179. 2008. |
D. Canright. “A Very Compact Rijndael S-Box.” Technical Report NPS-MA-05-001, Naval Postgraduate School, 2005. |
Toshiya Itoh and Shigeo Tsujii. “A Fast Algorithm for Computing Multiplicative Inverses in GF (2m) Using Normal Bases.” Information and Computation 78 (3): 171-177, 1998. |
Christof Paar. “Some Remarks on Efficient Inversion in Finite Fields.” In 1995 IEEE International Symposium on Information Theory, Whistler, B.C. Canada. |
Christof Paar. “Optimized Arithmetic for Reed-Solomon Encoders.” In IEEE International Symposium on Information Theory, p. 250. 1997. |
Akashi Satoh, Sumio Morioka, Kohji Takano, and Seiji Munetoh. “A Compact Rijndael Hardware Architecture with S-Box Optimization.” C. Boyd (Ed.) Advanced in Cryptology—Proceedings of ASIACRYPT 01, vol. 2248 of Lecture Notes in Computer Science, pp. 239-254. Springer-Verlag, 2001. |
Pawel Chodowiec and Kris Gaj. “Very Compact FPGA Implementation of the AES Algorithm.” In C.D. Walter et al., editor, CHES2003, vol. 2779 of Lecture Notes in Computer Science, pp. 319-333. Springer, 2003. |
Johannes Wolkerstorfer, Elisabeth Oswald, and Mario Lamberger. “An Asic Implementation of the AES SBoxes.” In CT-RSA, vol. 2271 of Lecture Notes in Computer Science, pp. 67-78. Springer, 2002. |
Number | Date | Country | |
---|---|---|---|
20130007086 A1 | Jan 2013 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 12367660 | Feb 2009 | US |
Child | 13615795 | US |